[03:18] morning. [06:19] Morning [06:21] howdy [06:23] hey. [06:29] davecheney, Aram, TheMue, morning [06:39] fwereade: morning [06:39] fwereade: looking forward to your testing change getting merged, we have a lot of overlap [06:39] davecheney, just saw your mail [06:40] davecheney, I'm feeling very conflicted ovr it [06:40] davecheney, part of me is saying "just break it into 6 separate CLs" [06:40] fwereade: i don't think that will reduce the wall time of merging it [06:41] davecheney, but a bigger part is saying "this is really all one change: it's actually going to make it harder and take longer" [06:41] davecheney, cool, glad it seems that way to you too :) [06:41] fwereade: if there are improvements to be made, they can be done after this change it merged [06:42] davecheney, let's hope niemeyer sees it that way :) [06:42] aye, there's the rub [06:50] Hmm, once again morning, connection seems to be broken. [07:01] TheMue, heyhey [07:04] fwereade: How has the weekend been? [07:12] TheMue, very nice thanks [07:13] TheMue, went to a charmingly low-rent charity event on saturday -- bunch of different foods and presentations from various nationalities [07:16] have a nice evening folks, i'll be online later [07:24] fwereade: We've been outside a lot, at friends on Saturday and in our own garden and a park on Sunday. [07:24] TheMue, lovely :) [07:24] fwereade: Enjoyed it a lot. [07:25] TheMue, I bet, sounds pleasingly idyllic [07:26] fwereade: Yep. We've got a park here near our home town which has founded for larger exhibition more than 10 years ago. And since then we're several times there each year. [07:27] fwereade: https://plus.google.com/photos/107694490695522997974/albums/5760290760673390033 shows some pics from yesterday [07:28] TheMue, awesome! [07:29] TheMue, Malta is a bit lacking in green spaces, they're probably what I miss most [07:29] fwereade: Hehe, yes, can imagine. Your change hasn't been small. Where in England did you lived before? [07:31] TheMue, london, which is pretty green really all things considered [07:31] TheMue, I grew up in the countryside in gloucestershite though so that's really what I feel is a "correct" environment, if you know what I mean [07:32] er, gloucestershi*r*e [07:32] fwereade: *rofl* [07:32] TheMue, it's a nice place, a little village in the costwolds called eastleach [07:33] fwereade: I sadly hadn't the chance to visit Britain yet, but there are so many places I want to see. [07:33] fwereade: From the crowded London to the Outer Hebrides. [07:33] TheMue, never been on the outlying islands [07:34] TheMue, been camping by the sea pretty far north in scotland [07:34] fwereade: So far I've only been in Heathrow for transit. *lol* [07:34] TheMue, haha [07:34] TheMue, the lake district is gorgeous [07:35] fwereade: We've seen so many fantastic locations in TV. I think we'll take a longer time and a Land Rover Defender and then cruise all over UK. [07:37] TheMue, I can think of worse ways to spend a month :) [07:38] fwereade: I would also take an other great Britain car, but that would be a bit expensive: An Aston Martin. [07:39] TheMue, haha [07:41] fwereade: I've got a part of a single malt cask on the Isle of Arran. We bought it a few years ago with some friends. [07:42] TheMue, I remember talking about it in budapest :) [07:43] fwereade: I can't hide that I've got a passion for the British culture, even that I've not been there. Funny, isn't it? [07:47] TheMue, it has good bits and bad bits, but the good bits somehow seem to have generated a lot of good PR ;) [07:48] fwereade: You definitely have better insides than me. OTOH I'm only a tourist. :D [08:14] fwereade: I'm off for about an hour, routine visit at the dentist. [09:00] morning folks, tool question, does lbox submit -adopt notice local modificattions you've had to make to someone else's branch? [09:32] it does [10:42] I hate it when I fix bugs but I don't understand my own fix. [10:44] heh. [10:44] go it. [10:44] wow, this one was subtle. [11:13] fwereade, is there any docs extant for various cli changes in gojuju ? [11:13] actually even an auto generated go docs at a public site would be nice [11:15] hazmat, heyhey [11:16] hazmat, --help will tell you what exists, but it is not otherwise explicitly available anywhere; that is a sensible idea [11:21] * hazmat works through compiling the tree [11:21] oh.. install is now get [11:22] bson in charm urls? [11:24] has install always been get. its been a while i guess [11:24] oh.. the bson is the incremental serialization work [11:24] for mongo [11:25] i thought most of that was in mstate [11:29] hazmat, sorry, I am missing a little bit of context [11:29] fwereade, ./cloudinit_test.go:210: undefined: Commentf ? do i need a more recent version of go? or am i missing a lib [11:29] hazmat, that's in gocheck [11:29] fwereade, shouldn't i have gotten an import error then? [11:30] hazmat, probably out-of-date [11:31] hazmat, `go get -u launchpad.net/juju-core/...`, I think (but make sure you don;t have local changes you want to keep if you do this) [11:31] hazmat, also worth checking what version of go you have [11:31] 1.0.1 [11:32] fwereade, re go get juju-core what's the ... ? [11:32] literal ... [11:32] sure.. just curious what it meant [11:32] hazmat, and everything underneath it [11:32] ah [11:32] fwereade, thanks [11:32] hazmat, (or imported by it, but that's go get's work not the ...'s, if you see what I mean) [11:33] hazmat, not sure whether 1.0.1 will pick all the latest library versions, if you still have trouble consider updating that [11:59] hazmat, offhand, do you have any idea of the approximate range of ratios between zookeeper time and wall clock time at the client end of a specific connection? [12:02] hazmat, s/zookeeper time and/the apparent rates of progression of zookeeper time and of/ [12:04] fwereade, not sure what you mean.. notifications from zk are delivered in order [12:05] but the delay in delivery is subject to the quality of the network connection [12:05] hazmat, they are delivered in order but 2 conns do not necessarily have the same idea of "now" [12:05] fwereade, right [12:05] hazmat, not just that, I'm pretty sure the docs state that two conns can be out of syn by order-of 10s of seconds [12:05] fwereade, each conn is an independent view of the ordered stream [12:06] hazmat, quite so; this indicates that a client may see two events that are 100ms apart in ZK time arrive only 50ms appart on wall clock time [12:06] hazmat, or possibly 1ms apart [12:06] fwereade, that sounds odd, but given a conn on a separate server of the zk quorum that's a little out of date perhaps.. the conn can request the server catch up via sync [12:07] fwereade, in what context is that an issue? [12:07] hazmat, yeah, I know about sync, it's not in gozk AFAICS [12:07] are you trying to have multiple conns in a single server? [12:07] er. process [12:07] hazmat, testing of the presence nodes using two separate connections [12:08] fwereade, the possibility is greatly diminished of long deltas in independent views of time if their connected to the same server and the same network quality [12:08] fwereade, there are numerous tests in txzk that i've looped 10s of k that exercise multiple conns fwiw [12:09] fwereade, is this a theoretical concern or something you've seen in practice? [12:09] hazmat, ok; but I have directly observed an alternate connection, running in test X, to have a conception of ZK "now" that corresponds to a state that was current during a previous test [12:10] fwereade, are the tests building state across tests? [12:10] hazmat, this happens very unpredictably [12:10] hazmat, no [12:10] hazmat, the main connection nukes everything between test cases [12:10] so it sounds like a test framework issue then with cleanup [12:11] fwereade, or perhaps a bug in gozk delivering events on closed conns [12:11] fwereade, does the conn get closed/open for teardown/setup? [12:11] hazmat, no; I'll try doing that [12:12] fwereade, if it doesn't then yes.. its quite possible your getting event delivery on old state [12:12] hazmat, but based on what I've seen that will only add more uncertainty... [12:12] keep in mind you've only got one execution thread [12:12] hazmat, new conns get events from older state [12:13] fwereade, was the old conn closed? [12:13] hazmat, no, the old conn is still being used to perform new operations, which I expect the alternate conn to respond to [12:13] hazmat, or vice versa in some tests [12:13] fwereade, that sounds like an event dispatch issue with events [12:14] i'd try instrumenting gozk [12:14] hazmat, is it not behaviour consistent with the somewhat loose guarantees that ZK makes? [12:14] and printing out with handle info [12:14] hazmat, yeah, I will look into that, it's perfectly plausible [12:14] fwereade, old events on new conns? no [12:15] that's not part of the guarantee, the event only happens on state observation [12:15] and temporarily if that state is new, it should never see an old event [12:15] hazmat, potentially old view of history implies potentially old state changes, doesn't it? [12:15] ie. seeing old event on new state, would be a violation of the guarantee zk makes [12:15] hazmat, does a new conn guarantee up-to-date view of history? [12:16] fwereade, temporarily its should be up to date [12:16] hazmat, it explicitly does not AIUI [12:16] fwereade, connected to the same server yes it does [12:16] fwereade, the only exception is if your running a quorum of servers [12:16] and connecting to a not quite up to date server with the new conn [12:17] hazmat, hold on though: it guarantees that a single client will only see a single view of history, and that that view is independent of the server it connects to [12:17] its not eventually consistent [12:17] hazmat, therefore, surely, it is possible that two clients connected to the same server may have an alternate view of history [12:17] fwereade, feel free to verify, but it sounds like a code bug not a zk bug [12:18] fwereade, the state is in memory on the zk server and modified by each op [12:18] fwereade, and flushed to disk, a new client, will see current state [12:18] as i said there are exceptions, but not to a single server setup [12:19] hazmat, Single System Image [12:19] A client will see the same view of the service regardless of the server that it connects to. [12:19] and even then the limiting factor is the overall speed of the quorum to propagate changes [12:19] fwereade, history doesn't exist from a client perspective, there is only present state and future observation [12:20] hazmat, yes: but for it to have the single system image, history surely *must* exist at the server level? [12:20] fwereade, yes.. but your asking about the delta between multiple clients [12:20] fwereade, it does but its not exposed [12:20] fwereade, and its only the delta on disk, not in mem [12:24] hazmat, will look into it further, but still unconvinced that the fact the server is standalone guarantees consistency of client connections [12:24] fwereade, the watch notifications for the client are in mem and are queued up [12:24] again the notification carries no state [12:25] only the change info, observation is required to capture state [12:25] fwereade, feel free to verify, but it sounds like a code bug not a zk bug [12:26] fwereade, the zk lists are pretty helpful [12:26] hazmat, sure, that's the plan -- but I'm not actually claiming a ZK bug, I'm just claiming that this surprising behaviour is not actually inconsistent with the guarantees made by the docs [12:27] fwereade, that a new client sees non current state [12:27] against a single server [12:28] i don't think so [12:28] hazmat, I agree that the explanation for this bit: [12:28] Sometimes developers mistakenly assume one other guarantee that ZooKeeper does not in fact make. This is: [12:28] Simultaneously Conistent Cross-Client Views [12:28] hazmat, *does* always mention multiple servers [12:30] fwereade, like i said.. on a single server.. that's not possible.. we have many tests in python to verify that [12:31] but i'm just repeating myself at this point... [12:31] fwereade, and the form of consistency i'm referencing is weaker than whats in there [12:32] hazmat, ok, I misunderstood your statement that you had verified this precise behaviour, and I accept your diagnosis of the likely cause; that is why I'm looking into it ;) [12:32] hazmat, the actual original question I asked though is different, and does potentially involve multiple servers [12:32] hazmat, and client connections from separate machines [12:33] fwereade, perhaps you should backup and explain what the goal is? [12:36] hazmat, the goal is to understand what could be causing unpredictable test failures in which two separate zk connections on the client are respectively seeing snapshots of state that appear indicate that deleting a node on conn A does not guarantee that the next request for state made on conn B will see that the node has been deleted [12:36] fwereade, has the delete node op completed before B makes a request? [12:37] fwereade, and you don't want to restrict to single server? [12:37] hazmat, the calls are performed synchronously; my understanding is that the call completing without error indicates that the operation has completed successfully [12:38] hazmat, in the general case of the problem, assuming multiple zookeeper, my initial question about relative rates of time progression may be relevant, but just for now we can worry about single servers [12:38] fwereade.. well its still likely async.. there are two results to check [12:39] the api call, and the result call [12:40] given that you've got a single server (*ignoring multi server for a moment) for your tests it would appear to be a bug in gozk [12:40] the unpredictable nature of the failures reinforces that guess, namely that something isn't properly waiting on results [12:41] hazmat, ok, so, by "still likely async", do you mean "a line of code immediately following `err := conn.Delete("/some/path"); c.Assert(err, IsNil)` is not guaranteed to see the change" [12:41] fwereade, you need to go deeper [12:41] fwereade, conn.Delete is what? [12:41] a gozk binding to the libzk [12:41] underneath the hood its doing what [12:41] hazmat, that is what I intend to do [12:41] i'd guess adelete [12:41] which is async [12:42] * hazmat takes a look [12:43] interesting [12:43] hazmat, zoo_delete which I presume is not adelete [12:43] Delete does zoo_delete which is synchronous. [12:43] gozk is sync. [12:43] wow [12:43] ok.. [12:44] fwereade, so i'd suggest instrumenting delete and the subsequent client op with some prints [12:44] I suspect you simply check against the wrong version. [12:45] that's why you don't see the change. [12:45] * Aram didn't read much of the backlog. [12:45] so gozk is sync, and gojuju runs with a single thread ? [12:45] Aram, interesting idea [12:48] Aram, versions are only passed for modifications [12:48] in this case its an observation that shows old state [12:49] hazmat, sorry, lunch [12:49] * hazmat moves onto openstack provider review [13:10] fwereade, could pastebin the test in question? [13:18] hazmat, the clearest situation is in line 10 of http://paste.ubuntu.com/1071241/ -- when it occurs, other connections are known to have been going around creating the node we're watching in the past [13:20] fwereade, that looks like the same connection? [13:20] hazmat, yes, it is an alternate connection is a previous test about which I am concerned [13:21] Gooooood morning! [13:21] hi niemeyer. [13:21] how was your trip? [13:22] niemeyer, g'morning, how was i/o? [13:22] and how was SF? [13:22] hazmat: Superb [13:22] Aram: Superb too :) [13:23] Aram: Great to meet all the folks [13:25] niemeyer: Heya [13:25] niemeyer: yeah, that must have been great. [13:25] fwereade, what happens to the conn in setup/teardown? [13:27] is there anyway to get the go test to be verbose about test cases being run? [13:27] hazmat, in TearDownTest, recursively delete everything and (IIRC, checking) panic on error except nonodoe [13:27] http://arethegovideosupyet.com/ < This is great :) [13:27] fwereade, so it is the same open connection for multiple tests? [13:27] hazmat, -test.v -gocheck.vv should give you plenty [13:27] hazmat, yes [13:28] niemeyer: Yep, funny idea. Waiting for the next ones to be online. [13:29] How're things going in juju-dev land? [13:32] niemeyer: First a hurdle but now moving forward. [14:42] fwereade: So, hazmat tells me we're not testing our code properly because we have a bug. What's up there? [14:42] niemeyer, I am still trying t characterize it properly [14:43] niemeyer, the stars appear to be aligned such that I can repro more often than not [14:43] fwereade: Ah, this is perfect [14:43] niemeyer, but I am still trying to coax an "aha" out of the data [14:43] fwereade: http://paste.ubuntu.com/1071241/ is this the test? [14:43] niemeyer, that is one of the many that *can* exhibit anomalous behaviour [14:44] niemeyer, but the vast majority of the presence suite has been *slightly* flaky for a while, and I now appear to be close to pinning down the problem [14:44] fwereade: Ah, that's awesome [14:47] fwereade: Does it fail if you run just presence in isolation? [14:47] niemeyer, very very very rarely [14:47] fwereade: Your more-often-than not is achieved with a few packages, or just with the whole suite? [14:48] niemeyer, just state [14:48] Nice [14:49] fwereade: What's the liveness timing being used by the tests? [14:49] niemeyer, 50ms, and there is certainly something tricksy there which I think I am close to accounting for [14:50] niemeyer, ie sometimes we get mtimes more than 100ms apart when we do that [14:50] fwereade: This may well be the issue [14:50] fwereade: The GC may be stopping to collect stuff [14:50] niemeyer, this is surely *part* of the issue [14:50] niemeyer, I am trying to eliminate it and see whether I can goose theweirder one into existence [14:55] fwereade: Does it change the situation if you set GOMAXPROCS=4? [15:03] niemeyer, hmm, quite possibly; but that's another dimension of phase space that I don't need right now I think [15:04] fwereade: Well, perhaps without the parallel collection that went into tip it wouldn't make much of a difference anyway [15:10] niemeyer, well, we need timing tweaks, but the real important ones will come in real usage I think [15:11] niemeyer, if we make them noticeably more generous than whatever I find to be rock-solid in test usage we will hopefully be ok [15:11] fwereade: Agreed [15:12] fwereade: Is there anything more unusual than the timing bits, that you'd like a second pair of eyes over? [15:15] niemeyer, if I can't figure it out today I will cry uncle, but I feel like I'm converging on something [15:16] fwereade: Superb, you have me excited on the other side meanwhile ;-D [15:16] niemeyer, cool :) === niemeyer_ is now known as niemeyer === Aram2 is now known as Aram === Aram2 is now known as Aram [18:22] fwereade: Any luck there? [18:33] niemeyer, broke for supper at an opportune point; it looks like there was a case we'd missed in the code, subtly distinct from the normal failures due to c pauses/whatever [18:34] niemeyer, state seems to be solid, given a fix for that [18:34] niemeyer, trying a few full runs [18:34] fwereade: Oh, so happy that you found it.. even if I don't know what the issue really is yet :-) [18:35] niemeyer, there's another bit of the issue which is trivial and embarrassing and kinda contributed to some of the confusion [18:35] niemeyer, not all the tests were nicely cleaning up pingers on failure [18:36] niemeyer, so I plan to do a pass for that too [18:36] fwereade: Aha, that was my initial guess at the problem [18:36] fwereade: Not to be embarrassed, though.. it's kind of easy to miss tear downs in any testing [18:37] niemeyer, the thing is, I knew about that -- the first failure often causes a cascade -- but there was always a particular mode of initial failure that didn't seem to match reality [18:38] fwereade: Well, I appreciate you going after root cause.. a lot of people just ignore the obvious hints and go for the trivial solutions [18:38] niemeyer, has to be done really [18:38] niemeyer, fwiw there's a very occasional store failure that I never remember to capture [18:38] niemeyer, primarily because the sheer weight of mongo logs is intimidating [18:39] niemeyer, I promise that next time I see it I will make a proper bug [18:39] ;) [18:39] fwereade: If it's the one I'm thinking off, it's just timing [18:40] niemeyer, sounds very plausible [18:40] fwereade: I feel bad for it.. I've been kind of postponing increasing the timing to see if it will force myself to get the test to run faster rather than going for the easy solution [18:40] fwereade: I can tell it's not working so far [18:40] niemeyer, I can sympathise === philipballew_ is now known as philipballew [19:14] brb === robbiew is now known as robbiew-afk [21:59] fwereade: Dude [21:59] fwereade: There? === robbiew-afk is now known as robbiew [22:37] davecheney: Morning! [22:38] niemeyer: howdy! [22:39] davecheney: Good to see you around from the usual time zone :) [22:39] davecheney: Less overlap, but at least it's easy to actually talk :) [22:39] indeed [22:39] hows things ? [22:39] Pretty good. Just having a look at William's monster branch [22:39] Looks very nice [22:40] niemeyer: it would be awesome if that got a green light [22:40] i need his refactorings of zksuite etc for the local ec2 tests [22:40] davecheney: Thanks for reviewing it too, btw.. really appreciate having more eyes [22:40] niemeyer: anytime [22:40] davecheney: It will surely get a green light [22:40] it is big, and a lot of the changes are one line per file [22:40] davecheney: huge improvement overall [22:41] indeed [22:41] davecheney: The concerns I have so far are lateral.. e.g., mstate needs to be included [22:46] niemeyer: in related news, I have a fix for the location constraint issue in goamz [22:46] but am unsure how to write tests for it [22:57] davecheney: Hmm [22:58] davecheney: Can you please open a CL with the fix? [22:58] davecheney: I can have a look and suggest something [22:58] niemeyer: twosecs [22:59] niemeyer: https://codereview.appspot.com/6344050 [23:00] davecheney: Checking [23:01] niemeyer: i can add support for LocationConstraint parsing into s3 test if you like [23:03] davecheney: Okay, so [23:03] davecheney: The testing seems to be easy to do inside s3_test.go [23:04] davecheney: We mock the server, and can easily compare the result against something we own. [23:04] davecheney: We don't even need to parse it [23:04] davecheney: Check out.. hmmm.. [23:05] davecheney: Well, we actually don't have an example yet [23:05] davecheney: But the req we get out of WaitRequest is a normal http.Request [23:05] davecheney: With a Body and all [23:05] niemeyer: yeah, I can address the TODO in the s3_test server [23:06] davecheney: A second detail: it looks like this info is well suited for a Name field [23:06] davecheney: Region.name [23:06] davecheney: Region.Name [23:07] niemeyer: yes, there are a number of places where we want to convert from the Region type back to its canonical name [23:39] niemeyer: https://codereview.appspot.com/6347059 << adds Region.Name [23:40] davecheney: The map is a nice touch, thanks [23:40] davecheney: LGTM [23:40] niemeyer: that was a TODO from environs/ec2 [23:41] davecheney: Ah, I didn't recall.. still looks like a good idea then! ;-) [23:42] niemeyer: i'll address the todo in juju after I commit the location constraint fix [23:42] so that people get a hint to go update goamz [23:42] Super [23:43] * niemeyer => dinner.. back soon [23:47] niemeyer: would you mind commiting williams lp:~fwereade/juju-core/vast-zookeeper-tests-cleanup ? [23:57] davecheney: Hmm.. there are a few trivial details there to be sorted out.. I think I'd prefer to let him consider these details, including the mstate stuff, to see how to push it forward before getting it in