/srv/irclogs.ubuntu.com/2012/07/02/#juju-dev.txt

Arammorning.03:18
TheMueMorning06:19
davecheneyhowdy06:21
Aramhey.06:23
fwereadedavecheney, Aram, TheMue, morning06:29
davecheneyfwereade: morning06:39
davecheneyfwereade: looking forward to your testing change getting merged, we have a lot of overlap06:39
fwereadedavecheney, just saw your mail06:39
fwereadedavecheney, I'm feeling very conflicted ovr it06:40
fwereadedavecheney, part of me is saying "just break it into 6 separate CLs"06:40
davecheneyfwereade: i don't think that will reduce the wall time of merging it06:40
fwereadedavecheney, but a bigger part is saying "this is really all one change: it's actually going to make it harder and take longer"06:41
fwereadedavecheney, cool, glad it seems that way to you too :)06:41
davecheneyfwereade: if there are improvements to be made, they can be done after this change it merged06:41
fwereadedavecheney, let's hope niemeyer sees it that way :)06:42
davecheneyaye, there's the rub06:42
TheMueHmm, once again morning, connection seems to be broken.06:50
fwereadeTheMue, heyhey07:01
TheMuefwereade: How has the weekend been?07:04
fwereadeTheMue, very nice thanks07:12
fwereadeTheMue, went to a charmingly low-rent charity event on saturday -- bunch of different foods and presentations from various nationalities07:13
davecheneyhave a nice evening folks, i'll be online later07:16
TheMuefwereade: We've been outside a lot, at friends on Saturday and in our own garden and a park on Sunday.07:24
fwereadeTheMue, lovely :)07:24
TheMuefwereade: Enjoyed it a lot.07:24
fwereadeTheMue, I bet, sounds pleasingly idyllic07:25
TheMuefwereade: Yep. We've got a park here near our home town which has founded for larger exhibition more than 10 years ago. And since then we're several times there each year.07:26
TheMuefwereade: https://plus.google.com/photos/107694490695522997974/albums/5760290760673390033 shows some pics from yesterday07:27
fwereadeTheMue, awesome!07:28
fwereadeTheMue, Malta is a bit lacking in green spaces, they're probably what I miss most07:29
TheMuefwereade: Hehe, yes, can imagine. Your change hasn't been small. Where in England did you lived before?07:29
fwereadeTheMue, london, which is pretty green really all things considered07:31
fwereadeTheMue, I grew up in the countryside in gloucestershite though so that's really what I feel is a "correct" environment, if you know what I mean07:31
fwereadeer, gloucestershi*r*e07:32
TheMuefwereade: *rofl*07:32
fwereadeTheMue, it's a nice place, a little village in the costwolds called eastleach07:32
TheMuefwereade: I sadly hadn't the chance to visit Britain yet, but there are so many places I want to see.07:33
TheMuefwereade: From the crowded London to the Outer Hebrides.07:33
fwereadeTheMue, never been on the outlying islands07:33
fwereadeTheMue, been camping by the sea pretty far north in scotland07:34
TheMuefwereade: So far I've only been in Heathrow for transit. *lol*07:34
fwereadeTheMue, haha07:34
fwereadeTheMue, the lake district is gorgeous07:34
TheMuefwereade: We've seen so many fantastic locations in TV. I think we'll take a longer time and a Land Rover Defender and then cruise all over UK.07:35
fwereadeTheMue, I can think of worse ways to spend a month :)07:37
TheMuefwereade: I would also take an other great Britain car, but that would be a bit expensive: An Aston Martin.07:38
fwereadeTheMue, haha07:39
TheMuefwereade: I've got a part of a single malt cask on the Isle of Arran. We bought it a few years ago with some friends.07:41
fwereadeTheMue, I remember talking about it in budapest :)07:42
TheMuefwereade: I can't hide that I've got a passion for the British culture, even that I've not been there. Funny, isn't it?07:43
fwereadeTheMue, it has good bits and bad bits, but the good bits somehow seem to have generated a lot of good PR ;)07:47
TheMuefwereade: You definitely have better insides than me.  OTOH I'm only a tourist. :D07:48
TheMuefwereade: I'm off for about an hour, routine visit at the dentist.08:14
hazmatmorning folks, tool question, does lbox submit -adopt notice local modificattions you've had to make to someone else's branch?09:00
hazmatit does09:32
AramI hate it when I fix bugs but I don't understand my own fix.10:42
Aramheh.10:44
Aramgo it.10:44
Aramwow, this one was subtle.10:44
hazmatfwereade, is there any docs extant for various cli changes in gojuju ?11:13
hazmatactually even an auto generated go docs at a public site would be nice11:13
fwereadehazmat, heyhey11:15
fwereadehazmat, --help will tell you what exists, but it is not otherwise explicitly available anywhere; that is a sensible idea11:16
* hazmat works through compiling the tree11:21
hazmatoh.. install is now get11:21
hazmatbson in charm urls?11:22
hazmathas install always been get. its been a while i guess11:24
hazmatoh.. the bson is the incremental serialization work11:24
hazmatfor mongo11:24
hazmati thought most of that was in mstate11:25
fwereadehazmat, sorry, I am missing a little bit of context11:29
hazmatfwereade, ./cloudinit_test.go:210: undefined: Commentf ? do i need a more recent version of go? or am i missing a lib11:29
fwereadehazmat, that's in gocheck11:29
hazmatfwereade, shouldn't i have gotten an import error then?11:29
fwereadehazmat, probably out-of-date11:30
fwereadehazmat, `go get -u launchpad.net/juju-core/...`, I think (but make sure you don;t have local changes you want to keep if you do this)11:31
fwereadehazmat, also worth checking what version of go you have11:31
hazmat1.0.111:31
hazmatfwereade, re go get juju-core what's the ... ?11:32
Aramliteral ...11:32
hazmatsure.. just curious what it meant11:32
fwereadehazmat, and everything underneath it11:32
hazmatah11:32
hazmatfwereade, thanks11:32
fwereadehazmat, (or imported by it, but that's go get's work not the ...'s, if you see what I mean)11:32
fwereadehazmat, not sure whether 1.0.1 will pick all the latest library versions, if you still have trouble consider updating that11:33
fwereadehazmat, offhand, do you have any idea of the approximate range of ratios between zookeeper time and wall clock time at the client end of a specific connection?11:59
fwereadehazmat, s/zookeeper time and/the apparent rates of progression of zookeeper time and of/12:02
hazmatfwereade, not sure what you mean.. notifications from zk are delivered in order12:04
hazmatbut the delay in delivery is subject to the quality of the network connection12:05
fwereadehazmat, they are delivered in order but 2 conns do not necessarily have the same idea of "now"12:05
hazmatfwereade, right12:05
fwereadehazmat, not just that, I'm pretty sure the docs state that two conns can be out of syn by order-of 10s of seconds12:05
hazmatfwereade, each conn is an independent view of the ordered stream12:05
fwereadehazmat, quite so; this indicates that a client may see two events that are 100ms apart in ZK time arrive only 50ms appart on wall clock time12:06
fwereadehazmat, or possibly 1ms apart12:06
hazmatfwereade, that sounds odd, but given a conn on a separate server of the zk quorum that's a little out of date perhaps.. the conn can request the server catch up via sync12:06
hazmatfwereade, in what context is that an issue?12:07
fwereadehazmat, yeah, I know about sync, it's not in gozk AFAICS12:07
hazmatare you trying to have multiple conns in a single server?12:07
hazmater. process12:07
fwereadehazmat, testing of the presence nodes using two separate connections12:07
hazmatfwereade, the possibility is greatly diminished of long deltas in independent views of time if their connected to the same server and the same network quality12:08
hazmatfwereade, there are numerous tests in txzk that i've looped 10s of k that exercise multiple conns fwiw12:08
hazmatfwereade, is this a theoretical concern or something you've seen in practice?12:09
fwereadehazmat, ok; but I have directly observed an alternate connection, running in test X, to have a conception of ZK "now" that corresponds to a state that was current during a previous test12:09
hazmatfwereade, are the tests building state across tests?12:10
fwereadehazmat, this happens very unpredictably12:10
fwereadehazmat, no12:10
fwereadehazmat, the main connection nukes everything between test cases12:10
hazmatso it sounds like a test framework issue then with cleanup12:10
hazmatfwereade, or perhaps a bug in gozk delivering events on closed conns12:11
hazmatfwereade, does the conn get closed/open for teardown/setup?12:11
fwereadehazmat, no; I'll try doing that12:11
hazmatfwereade, if it doesn't then yes.. its quite possible your getting event delivery on old state12:12
fwereadehazmat, but based on what I've seen that will only add more uncertainty...12:12
hazmatkeep in mind you've only got one execution thread12:12
fwereadehazmat, new conns get events from older state12:12
hazmatfwereade, was the old conn closed?12:13
fwereadehazmat, no, the old conn is still being used to perform new operations, which I expect the alternate conn to respond to12:13
fwereadehazmat, or vice versa in some tests12:13
hazmatfwereade, that sounds like an event dispatch issue with events12:13
hazmati'd try instrumenting gozk12:14
fwereadehazmat, is it not behaviour consistent with the somewhat loose guarantees that ZK makes?12:14
hazmatand printing out with handle info12:14
fwereadehazmat, yeah, I will look into that, it's perfectly plausible12:14
hazmatfwereade, old events on new conns? no12:14
hazmatthat's not part of the guarantee, the event only happens on state observation12:15
hazmatand temporarily if that state is new, it should never see an old event12:15
fwereadehazmat, potentially old view of history implies potentially old state changes, doesn't it?12:15
hazmatie. seeing old event on new state, would be a violation of the guarantee zk makes12:15
fwereadehazmat, does a new conn guarantee up-to-date view of history?12:15
hazmatfwereade, temporarily its should be up to date12:16
fwereadehazmat, it explicitly does not AIUI12:16
hazmatfwereade, connected to the same server yes it does12:16
hazmatfwereade, the only exception is if your running a quorum of servers12:16
hazmatand connecting to a not quite up to date server with the new conn12:16
fwereadehazmat, hold on though: it guarantees that a single client will only see a single view of history, and that that view is independent of the server it connects to12:17
hazmatits not eventually consistent12:17
fwereadehazmat, therefore, surely, it is possible that two clients connected to the same server may have an alternate view of history12:17
hazmatfwereade, feel free to verify, but it sounds like a code bug not a zk bug12:17
hazmatfwereade, the state is in memory on the zk server and modified by each op12:18
hazmatfwereade, and flushed to disk, a new client, will see current state12:18
hazmatas i said there are exceptions, but not to a single server setup12:18
fwereadehazmat, Single System Image12:19
fwereade    A client will see the same view of the service regardless of the server that it connects to.12:19
hazmatand even then the limiting factor is the overall speed of the quorum to propagate changes12:19
hazmatfwereade, history doesn't exist from a client perspective, there is only present state and future observation12:19
fwereadehazmat, yes: but for it to have the single system image, history surely *must* exist at the server level?12:20
hazmatfwereade, yes.. but your asking about the delta between multiple clients12:20
hazmatfwereade, it does but its not exposed12:20
hazmatfwereade, and its only the delta on disk, not in mem12:20
fwereadehazmat, will look into it further, but still unconvinced that the fact the server is standalone guarantees consistency of client connections12:24
hazmatfwereade, the watch notifications for the client are in mem and are queued up12:24
hazmatagain the notification carries no state12:24
hazmatonly the change info, observation is required to capture state12:25
hazmatfwereade, feel free to verify, but it sounds like a code bug not a zk bug12:25
hazmatfwereade, the zk lists are pretty helpful12:26
fwereadehazmat, sure, that's the plan -- but I'm not actually claiming a ZK bug, I'm just claiming that this surprising behaviour is not actually inconsistent with the guarantees made by the docs12:26
hazmatfwereade, that a new client sees non current state12:27
hazmatagainst a single server12:27
hazmati don't think so12:28
fwereadehazmat, I agree that the explanation for this bit:12:28
fwereadeSometimes developers mistakenly assume one other guarantee that ZooKeeper does not in fact make. This is:12:28
fwereadeSimultaneously Conistent Cross-Client Views12:28
fwereadehazmat, *does* always mention multiple servers12:28
hazmatfwereade, like i said.. on a single server.. that's not possible.. we have many tests in python to verify that12:30
hazmatbut i'm just repeating myself at this point...12:31
hazmatfwereade, and the form of consistency i'm referencing is weaker than whats in there12:31
fwereadehazmat, ok, I misunderstood your statement that you had verified this precise behaviour, and I accept your diagnosis of the likely cause; that is why I'm looking into it ;)12:32
fwereadehazmat, the actual original question I asked though is different, and does potentially involve multiple servers12:32
fwereadehazmat, and client connections from separate machines12:32
hazmatfwereade, perhaps you should backup and explain what the goal is?12:33
fwereadehazmat, the goal is to understand what could be causing unpredictable test failures in which two separate zk connections on the client are respectively seeing snapshots of state that appear indicate that deleting a node on conn A does not guarantee that the next request for state made on conn B will see that the node has been deleted12:36
hazmatfwereade, has the delete node op completed before B makes a request?12:36
hazmatfwereade, and you don't want to restrict to single server?12:37
fwereadehazmat, the calls are performed synchronously; my understanding is that the call completing without error indicates that the operation has completed successfully12:37
fwereadehazmat, in the general case of the problem, assuming multiple zookeeper, my initial question about relative rates of time progression may be relevant, but just for now we can worry about single servers12:38
hazmatfwereade.. well its still likely async.. there are two results to check12:38
hazmatthe api call, and the result call12:39
hazmatgiven that you've got a single server (*ignoring multi server for a moment) for your tests it would appear to be a bug in gozk12:40
hazmatthe unpredictable nature of the failures reinforces that guess, namely that something isn't properly waiting on results12:40
fwereadehazmat, ok, so, by "still likely async", do you mean "a line of code immediately following `err := conn.Delete("/some/path"); c.Assert(err, IsNil)` is not guaranteed to see the change"12:41
hazmatfwereade, you need to go deeper12:41
hazmatfwereade, conn.Delete is what?12:41
hazmata gozk binding to the libzk12:41
hazmatunderneath the hood its doing what12:41
fwereadehazmat, that is what I intend to do12:41
hazmati'd guess adelete12:41
hazmatwhich is async12:41
* hazmat takes a look12:42
hazmatinteresting12:43
fwereadehazmat, zoo_delete which I presume is not adelete12:43
AramDelete does zoo_delete which is synchronous.12:43
Aramgozk is sync.12:43
hazmatwow12:43
hazmatok..12:43
hazmatfwereade, so i'd suggest instrumenting delete and the subsequent client op with some prints12:44
AramI suspect you simply check against the wrong version.12:44
Aramthat's why you don't see the change.12:45
* Aram didn't read much of the backlog.12:45
hazmatso gozk is sync, and gojuju runs with a single thread ?12:45
hazmatAram, interesting idea12:45
hazmatAram, versions are only passed for modifications12:48
hazmatin this case its an observation that shows old state12:48
fwereadehazmat, sorry, lunch12:49
* hazmat moves onto openstack provider review12:49
hazmatfwereade, could pastebin the test in question?13:10
fwereadehazmat, the clearest situation is in line 10 of http://paste.ubuntu.com/1071241/ -- when it occurs, other connections are known to have been going around creating the node we're watching in the past13:18
hazmatfwereade, that looks like the same connection?13:20
fwereadehazmat, yes, it is an alternate connection is a previous test about which I am concerned13:20
niemeyerGooooood morning!13:21
Aramhi niemeyer.13:21
Aramhow was your trip?13:21
hazmatniemeyer, g'morning, how was i/o?13:22
Aramand how was SF?13:22
niemeyerhazmat: Superb13:22
niemeyerAram: Superb too :)13:22
niemeyerAram: Great to meet all the folks13:23
TheMueniemeyer: Heya13:25
Aramniemeyer: yeah, that must have been great.13:25
hazmatfwereade, what happens to the conn in setup/teardown?13:25
hazmatis there anyway to get the go test to be verbose about test cases being run?13:27
fwereadehazmat, in TearDownTest, recursively delete everything and (IIRC, checking) panic on error except nonodoe13:27
niemeyerhttp://arethegovideosupyet.com/ < This is great :)13:27
hazmatfwereade, so it is the same open connection for multiple tests?13:27
fwereadehazmat, -test.v -gocheck.vv should give you plenty13:27
fwereadehazmat, yes13:27
TheMueniemeyer: Yep, funny idea. Waiting for the next ones to be online.13:28
niemeyerHow're things going in juju-dev land?13:29
TheMueniemeyer: First a hurdle but now moving forward.13:32
niemeyerfwereade: So, hazmat tells me we're not testing our code properly because we have a bug. What's up there?14:42
fwereadeniemeyer, I am still trying t characterize it properly14:42
fwereadeniemeyer, the stars appear to be aligned such that I can repro more often than not14:43
niemeyerfwereade: Ah, this is perfect14:43
fwereadeniemeyer, but I am still trying to coax an "aha" out of the data14:43
niemeyerfwereade: http://paste.ubuntu.com/1071241/ is this the test?14:43
fwereadeniemeyer, that is one of the many that *can* exhibit anomalous behaviour14:43
fwereadeniemeyer, but the vast majority of the presence suite has been *slightly* flaky for a while, and I now appear to be close to pinning down the problem14:44
niemeyerfwereade: Ah, that's awesome14:44
niemeyerfwereade: Does it fail if you run just presence in isolation?14:47
fwereadeniemeyer, very very very rarely14:47
niemeyerfwereade: Your more-often-than not is achieved with a few packages, or just with the whole suite?14:47
fwereadeniemeyer, just state14:48
niemeyerNice14:48
niemeyerfwereade: What's the liveness timing being used by the tests?14:49
fwereadeniemeyer, 50ms, and there is certainly something tricksy there which I think I am close to accounting for14:49
fwereadeniemeyer, ie sometimes we get mtimes more than 100ms apart when we do that14:50
niemeyerfwereade: This may well be the issue14:50
niemeyerfwereade: The GC may be stopping to collect stuff14:50
fwereadeniemeyer, this is surely *part* of the issue14:50
fwereadeniemeyer, I am trying to eliminate it and see whether I can goose theweirder one into existence14:50
niemeyerfwereade: Does it change the situation if you set GOMAXPROCS=4?14:55
fwereadeniemeyer, hmm, quite possibly; but that's another dimension of phase space that I don't need right now I think15:03
niemeyerfwereade: Well, perhaps without the parallel collection that went into tip it wouldn't make much of a difference anyway15:04
fwereadeniemeyer, well, we need timing tweaks, but the real important ones will come in real usage I think15:10
fwereadeniemeyer, if we make them noticeably more generous than whatever I find to be rock-solid in test usage we will hopefully be ok15:11
niemeyerfwereade: Agreed15:11
niemeyerfwereade: Is there anything more unusual than the timing bits, that you'd like a second pair of eyes over?15:12
fwereadeniemeyer, if I can't figure it out today I will cry uncle, but I feel like I'm converging on something15:15
niemeyerfwereade: Superb, you have me excited on the other side meanwhile ;-D15:16
fwereadeniemeyer, cool :)15:16
=== niemeyer_ is now known as niemeyer
=== Aram2 is now known as Aram
=== Aram2 is now known as Aram
niemeyerfwereade: Any luck there?18:22
fwereadeniemeyer, broke for supper at an opportune point; it looks like there was a case we'd missed in the code, subtly distinct from the normal failures due to c pauses/whatever18:33
fwereadeniemeyer, state seems to be solid, given a fix for that18:34
fwereadeniemeyer, trying a few full runs18:34
niemeyerfwereade: Oh, so happy that you found it.. even if I don't know what the issue really is yet :-)18:34
fwereadeniemeyer, there's another bit of the issue which is trivial and embarrassing and kinda contributed to some of the confusion18:35
fwereadeniemeyer, not all the tests were nicely cleaning up pingers on failure18:35
fwereadeniemeyer, so I plan to do a pass for that too18:36
niemeyerfwereade: Aha, that was my initial guess at the problem18:36
niemeyerfwereade: Not to be embarrassed, though.. it's kind of easy to miss tear downs in any testing18:36
fwereadeniemeyer, the thing is, I knew about that -- the first failure often causes a cascade -- but there was always a particular mode of initial failure that didn't seem to match reality18:37
niemeyerfwereade: Well, I appreciate you going after root cause.. a lot of people just ignore the obvious hints and go for the trivial solutions18:38
fwereadeniemeyer, has to be done really18:38
fwereadeniemeyer, fwiw there's a very occasional store failure that I never remember to capture18:38
fwereadeniemeyer, primarily because the sheer weight of mongo logs is intimidating18:38
fwereadeniemeyer, I promise that next time I see it I will make a proper bug18:39
fwereade;)18:39
niemeyerfwereade: If it's the one I'm thinking off, it's just timing18:39
fwereadeniemeyer, sounds very plausible18:40
niemeyerfwereade: I feel bad for it.. I've been kind of postponing increasing the timing to see if it will force myself to get the test to run faster rather than going for the easy solution18:40
niemeyerfwereade: I can tell it's not working so far18:40
fwereadeniemeyer, I can sympathise18:40
=== philipballew_ is now known as philipballew
niemeyerbrb19:14
=== robbiew is now known as robbiew-afk
niemeyerfwereade: Dude21:59
niemeyerfwereade: There?21:59
=== robbiew-afk is now known as robbiew
niemeyerdavecheney: Morning!22:37
davecheneyniemeyer: howdy!22:38
niemeyerdavecheney: Good to see you around from the usual time zone :)22:39
niemeyerdavecheney: Less overlap, but at least it's easy to actually talk :)22:39
davecheneyindeed22:39
davecheneyhows things ?22:39
niemeyerPretty good. Just having a look at William's monster branch22:39
niemeyerLooks very nice22:39
davecheneyniemeyer: it would be awesome if that got a green light22:40
davecheneyi need his refactorings of zksuite etc for the local ec2 tests22:40
niemeyerdavecheney: Thanks for reviewing it too, btw.. really appreciate having more eyes22:40
davecheneyniemeyer: anytime22:40
niemeyerdavecheney: It will surely get a green light22:40
davecheneyit is big, and a lot of the changes are one line per file22:40
niemeyerdavecheney: huge improvement overall22:40
davecheneyindeed22:41
niemeyerdavecheney: The concerns I have so far are lateral.. e.g., mstate needs to be included22:41
davecheneyniemeyer: in related news, I have a fix for the location constraint issue in goamz22:46
davecheneybut am unsure how to write tests for it22:46
niemeyerdavecheney: Hmm22:57
niemeyerdavecheney: Can you please open a CL with the fix?22:58
niemeyerdavecheney: I can have a look and suggest something22:58
davecheneyniemeyer: twosecs22:58
davecheneyniemeyer: https://codereview.appspot.com/634405022:59
niemeyerdavecheney: Checking23:00
davecheneyniemeyer: i can add support for LocationConstraint parsing into s3 test if you like23:01
niemeyerdavecheney: Okay, so23:03
niemeyerdavecheney: The testing seems to be easy to do inside s3_test.go23:03
niemeyerdavecheney: We mock the server, and can easily compare the result against something we own.23:04
niemeyerdavecheney: We don't even need to parse it23:04
niemeyerdavecheney: Check out.. hmmm..23:04
niemeyerdavecheney: Well, we actually don't have an example yet23:05
niemeyerdavecheney: But the req we get out of WaitRequest is a normal http.Request23:05
niemeyerdavecheney: With a Body and all23:05
davecheneyniemeyer: yeah, I can address the TODO in the s3_test server23:05
niemeyerdavecheney: A second detail: it looks like this info is well suited for a Name field23:06
niemeyerdavecheney: Region.name23:06
niemeyerdavecheney: Region.Name23:06
davecheneyniemeyer: yes, there are a number of places where we want to convert from the Region type back to its canonical name23:07
davecheneyniemeyer: https://codereview.appspot.com/6347059 << adds Region.Name23:39
niemeyerdavecheney: The map is a nice touch, thanks23:40
niemeyerdavecheney: LGTM23:40
davecheneyniemeyer: that was a TODO from environs/ec223:40
niemeyerdavecheney: Ah, I didn't recall.. still looks like a good idea then! ;-)23:41
davecheneyniemeyer: i'll address the todo in juju after I commit the location constraint fix23:42
davecheneyso that people get a hint to go update goamz23:42
niemeyerSuper23:42
* niemeyer => dinner.. back soon23:43
davecheneyniemeyer: would you mind commiting williams lp:~fwereade/juju-core/vast-zookeeper-tests-cleanup ?23:47
niemeyerdavecheney: Hmm.. there are a few trivial details there to be sorted out.. I think I'd prefer to let him consider these details, including the mstate stuff, to see how to push it forward before getting it in23:57

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!