/srv/irclogs.ubuntu.com/2013/01/17/#juju-dev.txt

TheMueMorning.07:45
rogpeppemornin' all07:53
fwereade_rogpeppe, TheMue: heyhey07:56
rogpeppefwereade_: hiya07:56
TheMuefwereade_, rogpeppe: Heyhey.08:01
rogpeppeTheMue: yo!08:01
TheMueBtw, TestAttemptTiming in trivial seems to have a failure. The final test loop traverses 'want' and then compares the wanted values with the wanted values. ;)08:12
TheMueFound it yesterday when adding interval behaviours (static, linear, exponential).08:13
rogpeppeTheMue: oops08:24
rogpeppeTheMue: i'll have a look08:24
TheMuerogpeppe: It'll be fixed with my CL.08:26
rogpeppeTheMue: cool. i should've broken the test first!08:27
TheMuerogpeppe: But the test itself is cool, I use the same pattern for the behaviour tests.08:29
rogpeppeTheMue: yes, when it's fixed, the test still passes :-)08:30
TheMuerogpeppe: +108:30
TheMuerogpeppe: Only change is that I now work with x * time.Millisecond etc. instead of simple fixed values.08:31
rogpeppeTheMue: seems reasonable. athough i actually found the 0.5e9 idiom quite readable - ignoring the e9, it's just seconds.08:32
TheMuerogpeppe: Indeed, keeping the time base in mind.08:33
TheMuerogpeppe, fwereade_: Anyone got a "301 response missing Location header" when doing juju status --debug ?09:17
rogpeppeTheMue: i haven't used juju status --debug, i don't think.09:18
rogpeppeTheMue: i'll give it a go09:18
TheMuerogpeppe: Thx09:18
fwereade_TheMue, don't think I've ever run it with --debug :)09:18
TheMuerogpeppe: Last time it worked. I'm trying it on ap-southeast-109:18
TheMuefwereade_: Dave showed it in his bug, it logs the reconnection tries.09:19
rogpeppeTheMue: you're trying to reproduce dave's bug?09:20
TheMuerogpeppe: Yes, I once had a solution where he said he would like a different one. Now I'm trying it and get this error by EC2.09:21
TheMuerogpeppe: It's not Daves bug, he only entered it: https://bugs.launchpad.net/juju-core/+bug/108486709:21
_mup_Bug #1084867: conn: conn connections should pause between retry attempts <juju-core:In Progress by themue> < https://launchpad.net/bugs/1084867 >09:21
fwereade_TheMue, rogpeppe: btw, https://codereview.appspot.com/7095059/ has a couple of changes and some discussion on death-and-destruction.txt; I would be most grateful for your thoughts09:22
TheMuefwereade_: Will take a look.09:22
fwereade_TheMue, rogpeppe: also: https://codereview.appspot.com/7095049/ is a 50-line text document with one LGTM already, if anyone fancies taking a quick look at that09:23
* TheMue detects some kind of review beggin'. ;)09:23
rogpeppefwereade_: i think that using string consts rather than ints "because someone may want to log something sometime" is perhaps misguided. if i see a string, i generally assume it's because of some external constraint (e.g. it's going into a database or something). iota consts work well.09:29
rogpeppefwereade_: and they involve less code (runtime and bytes of source)09:29
fwereade_rogpeppe, heh, when I see int constants I assume that overwhelming efficiency considerations demand the use of opaque identifiers ;p09:29
rogpeppefwereade_: naah, they're just the natural thing to use in Go09:30
fwereade_TheMue, I'm trying not to overdo it ;p09:30
rogpeppefwereade_: they're only opaque if you print 'em (and even then, it's trivial to work out which one is which)09:31
TheMuefwereade_: NP09:34
TheMuerogpeppe: IMHO numbers are for calculations :)09:34
rogpeppeTheMue: what's an if statement if not a calculation? :-)09:35
rogpeppefwereade_: when i queries "maintain the service" i thought it was a typo. doesn't endpointRemoveOps return the operations necessary to remove an endpoint?09:35
rogpeppefwereade_: "the operations that are needed to maintain the service" doesn't mean much to me09:36
TheMuerogpeppe: It's a kind of consideration, like human beings do.09:36
fwereade_rogpeppe, it's the operations that apply ot the given endpoint in the course of removing the relation09:36
fwereade_rogpeppe, to the endpoint's service, rather09:36
rogpeppefwereade_: so do you mean: "returns the operations that are needed to maintain the service identified by the supplied endpoint when removing a relation" ?09:38
rogpeppefwereade_: i look at that comment, and it doesn't seem to tell me anything about what the returned operations are for09:38
rogpeppefwereade_: i can try to guess from the function name, but i'm not sure i'm guessing right09:38
fwereade_rogpeppe, yeah, the "when removing..." will help, thanks09:39
fwereade_rogpeppe, still not convinced that iota gains us anything beyond a few bytes, or that the gain is worth the cost in clarity09:45
rogpeppefwereade_: i don't see any clarity gain. only that you can print out a slightly nicer debug message if you're debugging this code.09:46
fwereade_rogpeppe, that is what I am referring to09:46
rogpeppefwereade_: how is that "clarity"?09:46
rogpeppefwereade_: the code becomes slightly simpler09:47
rogpeppefwereade_: the executable slightly smaller. and if you're debugging that code, seeing 0, 1 or 2 in that context will probably be just fine.09:47
fwereade_rogpeppe, it is clarity for the maintainer logging wtf goes on there, and not making him have to look up what 0, 1, or 2 mean09:47
rogpeppefwereade_: if i'm adding a debug print there, i know i'm printing a removeMode and it's utterly trivial to find the definition.09:49
fwereade_rogpeppe, ISTM that you are repeatedly explaining why strings are superior09:49
fwereade_rogpeppe, are you seriously concerned that "destroy-service" is unclear, or that the executable size will balloon unacceptably?09:51
fwereade_rogpeppe, those are the negative consequences of using strings, and I don't really feel they carry much weight in our situation09:51
fwereade_rogpeppe, the negative consequence of using an iota is that it becomes slightly harder to debug the code09:51
rogpeppefwereade_: i think it's a visceral reaction to unnecessary executable size bloat09:52
rogpeppefwereade_: plus the source becomes a little smaller too09:52
TheMuerogpeppe: Do we have a source size problem? ;)09:55
rogpeppefwereade_: ok, i do realise that the gains are minute (42 bytes on executable size and 80 in the source), y'know, death by a thousand cuts. hmm, i'm not really sure why i get such a strong reaction.09:57
rogpeppeTheMue: actually we do, but it won't be solved in this kinda way09:57
rogpeppeTheMue: not source size, sorry, binary size, yeah09:57
TheMuerogpeppe: Oh, we have a binary size problem? Good to know. In which way?09:58
rogpeppeTheMue: it takes ages to upload the binaries.09:58
fwereade_rogpeppe, I do understand the feeling but in my own mind this battle played out a long while ago and I went with "always use strings for constants unless there is an overriding reason not to"09:58
* TheMue comes from a world running fat binary 24/709:58
TheMuerogpeppe: OK, sounds reasonable.09:59
rogpeppeTheMue: we can easily cut the executable size by at least a third, by combining jujuc and jujud09:59
rogpeppeTheMue: it's on my todo list on the kanban...10:00
TheMuerogpeppe: Fine, so it is covered.10:00
TheMuerogpeppe: It's a different kind of problem than in my former jobs, where software is only deployed once per release over a local network.10:01
rogpeppefwereade_: in general i'd use int constants, and possibly add a String method if i needed to debug 'em.10:01
rogpeppefwereade_: i've been meaning to hack up a quick automatic String method generator for constants actually10:01
rogpeppefwereade_: then if you need to debug, you just run it to generate the necessary source code, and remove when done.10:02
fwereade_rogpeppe, yeah, if something like that was easy to add I'd be fine with it -- but doing it by hand pushes me over to "just use strings" ;)10:02
fwereade_rogpeppe, but I generally prefer to keep my source code modifications to an absolute minimum when debugging things10:03
fwereade_rogpeppe, so I think it'd only really work for me if it were a code-generation step10:03
fwereade_rogpeppe, that was always applied10:03
rogpeppefwereade_: i think it's probably my experience with the go core. there, you never get symbolic string constants - constants of that nature are always iota ints. and that works pretty well.10:08
fwereade_rogpeppe, "always code as though the maintainer will be an angry lunatic who knows where you live" ;)10:14
rogpeppe:-)10:15
fwereade_rogpeppe, TheMue: addressed https://codereview.appspot.com/7137044 I think10:17
rogpeppefwereade_: LGTM10:31
rogpeppefwereade_: and on doc-entity-creation too10:38
fwereade_rogpeppe, awesome, tyvm10:39
TheMueAh! It's working. One should use the EC2 console right.11:04
TheMuerogpeppe, fwereade: https://codereview.appspot.com/6949044/ is in for review again.11:24
TheMuefwereade: Now I'll continue with your reviews.11:24
fwereadeTheMue, thanks11:27
niemeyerMorning all11:28
fwereadeTheMue, https://codereview.appspot.com/7137044/ should be quick and easy if you haven't started the others yet11:28
fwereadeniemeyer, heyhey11:28
TheMueniemeyer: Hiya.11:28
fwereadeniemeyer, thanks for the comments on death-and-destruction, I think your points are now addressed or justified11:29
TheMuefwereade: Started with the ...5059, but will now do this one first.11:29
fwereadeniemeyer, btw, you know all the stateful table-based watcher tests we have?11:30
niemeyerfwereade: Cool, thanks, I'll have a look11:30
niemeyerfwereade: Yeah11:30
fwereadeniemeyer, I think I've convinced myself that they'e just entirely Bad, and should be replaced with literate inline tests (broken out into separate scenarios where possible)11:31
fwereadeniemeyer, and also that many of them are just testing way two much -- "add two services", "add three services", etc11:31
fwereadeniemeyer, general agreement?11:32
niemeyerfwereade: Yeah, definitely11:32
fwereadeniemeyer, I don't want to spend too much time on it but I have to take an axe to machine_test.go to finally get rid of AddUnitSubordinateTo, so I think that at least will just get a full rewrite11:33
mgzfwereade: was dimitern having more internet woes?11:34
fwereademgz, not that I know of11:34
niemeyerfwereade: Ah, makes sense11:34
TheMuefwereade: LGTM (and lunchtime ;) ).11:35
fwereadeTheMue, <311:35
fwereademgz, although now I think of it he did mention having an engineer come round; shall I text him?11:35
mgzfwereade: not urgent, but hopefully he gets online again :)11:36
fwereademgz, I'll ping him but will be off for lunch in a mo myself11:36
mgzwallyworld: looks like bug 109166911:39
_mup_Bug #1091669: GET Public-readable container's object with another user credentials is raising 403(Forbidden) exception <OpenStack Object Storage (swift):New> < https://launchpad.net/bugs/1091669 >11:39
mgzwhich looks like a dupe of bug 102072211:40
_mup_Bug #1020722: swift_auth middleware disallows access to public Swift URLs <OpenStack Object Storage (swift):In Progress by dan-prince> < https://launchpad.net/bugs/1020722 >11:40
mgzjam: want to hop on mumble?11:43
rogpeppefwereade, TheMue, niemeyer: a small CL: https://codereview.appspot.com/712804711:45
fwereaderogpeppe, gtg lunch I'm afraid, will be slightly extended today11:46
rogpeppefwereade: np11:46
rogpeppe niemeyer: trivial? https://codereview.appspot.com/713704812:22
niemeyerrogpeppe: LGTM12:25
rogpeppeniemeyer: thanks12:25
wallyworldfwereade: g'day, i've pushed some changes to the mp we have been discussing, are you able to take another look?12:30
niemeyerfwereade: Provided another partial review on https://codereview.appspot.com/7095059/13:06
niemeyerfwereade: Hopefully not too off track..13:22
rogpeppeniemeyer: there's a discussion about the juju-core API about to happen, if you fancy joining in13:30
rogpeppehazmat: is it happening?13:31
hazmatrogpeppe, its happening13:32
hazmatmramm, rogpeppe https://plus.google.com/hangouts/_/7eba46242c127e6b2d38ec9e3db2a8a5898dece913:33
rogpeppehazmat: ok, joining now13:33
fwereadeniemeyer, responded14:18
niemeyerfwereade: Cool, will see after the meeting, thank you14:18
fwereadeniemeyer, cheers14:18
=== slank_away is now known as slank
=== TheRealMue is now known as TheMue
niemeyerfwereade: ping14:31
fwereadeniemeyer, pong14:32
niemeyerfwereade: Hey14:32
niemeyerfwereade: Just going over your review now14:32
TheMuefwereade: Thx for the LGTM.14:32
fwereadeniemeyer, cool, thanks14:32
fwereadeTheMue, yw :)14:32
niemeyerfwereade: Some of it is mainly doc it seems14:32
fwereadeniemeyer, yeah, clearly I've failed to explain endpointRemoveOps well14:33
niemeyerfwereade: Some of it seems pretty subjective, requiring knowledge of the whole plan to extract the true meaning14:33
TheMueSh**, build a KVM VM for MAAS failed.14:33
fwereadeniemeyer, that has been on my mind; the trouble is I'm not sure how to do it without smearing multiple copies of the same information through half a dozen places14:33
niemeyerfwereade: Imagine I say something like this: "Hey William, would you mind to implement a method that returns the operations that are needed to maintain global consistency with respect to the service identified by the supplied endpoint during the relation's removal?14:34
niemeyer"14:34
niemeyerfwereade: :)14:34
fwereadeniemeyer, of *course* I would -- what magnificent wordsmith created that wonderful prose!?14:34
* fwereade kids14:34
fwereadeniemeyer, it may be that I've picked the wrong structure for the code14:35
fwereadeniemeyer, hmm, a thought14:35
niemeyerfwereade: I think it'd simplify matters in that specific case to drop the logic that checks mode and service name from within the method14:36
niemeyerfwereade: Have you considered the use of (leavingScope bool, destroyService string) instead of mode?14:37
niemeyerOr destroyingService, rather14:37
fwereadeniemeyer, destroyingService is misnamed in that case14:37
niemeyerfwereade: You actually already opted for the latter in one case14:37
fwereadeniemeyer, when that var means "this is the service that's being destroyed", that is indeed what I named it14:38
fwereadeniemeyer, in endpointRemoveOps it means "the service that is relevant to the supplied mode"14:38
fwereadeniemeyer, which, in the case of modeLeaveScope, is the service of the departing unit (ie the service that does not need to be removed)14:38
fwereadeniemeyer, would it maybe help if mode and serviceName were packed into a strcut with documentation right there?14:39
niemeyerfwereade: Gosh.. it's worse than I thought then :-(14:39
niemeyerfwereade: serviceName is a wildcard variable that refers to *a* service name.14:40
fwereadeniemeyer, I was not explicit that mode was modified by serviceName, but I feel that was expressed a touch robustly ;)14:40
fwereadeniemeyer, yes: a service name whose actual meaning is determined by mode14:40
niemeyerfwereade: Yeah.. this is super confusing14:41
fwereadeniemeyer, the obvious answer is to duplicate the common operations in the various places we need to remove relations14:41
fwereadeniemeyer, I'm not quite sure that's an actual improvement though14:41
niemeyerfwereade: I'm not suggesting duplication.. I'm just looking for ways to clarify what is actually going on there14:42
niemeyerfwereade: I obviously got a few things wrong in an honest attempt to understand what was written14:42
fwereadeniemeyer, what do you think of the struct idea?14:42
niemeyerfwereade: Yeah, that seems like an improvement for sure14:43
fwereadeniemeyer, I wondered about tying mode and serviceName together but worried that it would be dismissed as premature fancification :)14:43
fwereadeniemeyer, clearly it's not though14:43
niemeyerfwereade: One of the core points is that it shouldn't be "serviceName"14:43
niemeyerfwereade: It doesn't make sense to have leaveScope.serviceName, for example14:43
fwereadeniemeyer, even if it's always the name of a service?14:43
niemeyerfwereade: and that's one of the issues14:43
fwereadeniemeyer, it identifies the service that cannot be removed as a result of the change14:44
niemeyerfwereade: Yes, and what the heck that means again? :-)14:44
niemeyerWhat's a service that "cannot be removed" as a result of which change?14:44
fwereadeniemeyer, I had hoped that death-and-destruction.txt made this stuff clearer than it managed to14:45
fwereadeniemeyer, when removing a relation during a destroy operation, we know that both services are Alive14:45
fwereadeniemeyer, when leaving scope, we don't14:46
fwereadeniemeyer, but when leaving scope we know that a unit is leaving scope; and hence we know its service has at least one unit; and hence we know we can't ever need to remove it14:46
fwereadeniemeyer, we have no such guarantees re the service on the other side of the relation, so we need to check its state to see whether it's Dying and only referenced by the relation being removed14:47
fwereadeniemeyer, this is not the biggest deal ever -- we could just check both services just the same when leaving scope -- but it's an extra roundtrip that seemed like a bad idea14:48
niemeyerfwereade: Thanks for the explanation14:49
niemeyerfwereade: I don't think I made myself clear regarding in the previous point, sorry14:49
niemeyerfwereade: The point is that there's a specific meaning for that service name14:49
niemeyerfwereade: In the context of the caller14:49
rogpeppelunch14:49
niemeyerfwereade: It doesn't make sense to call that leaveScope.theServiceThatCantBeRemove14:49
niemeyerd14:50
niemeyerfwereade: You see what I mean?14:50
niemeyerfwereade: THere's a lot of implicit knowledge that must be distilled into something tangible when we reach the code14:50
niemeyerfwereade: The death-and-destruction document is great as reasoning for what and how things must be put in place14:50
niemeyerfwereade: We just need to be careful to implement it in a way we'll still understand when we come back in two weeks14:51
fwereadeniemeyer, hmm: I thought we were talking about something like `type relationRemoval struct {mode removeMode; info string}`14:52
niemeyerfwereade: Stuff like lack of docs, unnamed results, magic variables that mean completely different things based on caller context, etc, will make this process very painful14:52
niemeyerfwereade: Hmm.. I don't think that helps much14:53
niemeyerfwereade: You're just putting the exact same parameters in a struct14:53
niemeyerfwereade: And what's "info string"?14:54
fwereadeniemeyer, or "payload", or whatever14:54
niemeyerfwereade: What's that?14:54
niemeyerfwereade: What's a payload of a relationRemoval?14:54
fwereadeniemeyer, its meaning depends on the value of the mode -- are you implacably opposed to this technique, or are you just pointing out that the names I have so far come up with are rubbish?14:55
niemeyerfwereade: I think you're moving in good direction, but the underlying structure still needs some clarification.. I don't think the change is big, btw14:55
fwereadeniemeyer, sure -- I am as keen as you are to find ways to simplify this14:56
niemeyerfwereade: Cool, just a sec14:56
fwereadeniemeyer, would you expand on your leaveScope.something idea?14:57
niemeyerfwereade: Yeah, I'm just writing down some sample14:58
fwereadeniemeyer, (incidentally -- the unnamed result is sometimes useful but not always, and so it's left unnamed in the cases where it's not used; more than oneperson has flagged this, and I'm somewhat baffled by it)14:59
niemeyerfwereade: Okay, I had another pass at the code, and here is another proposal15:01
* fwereade listens15:02
niemeyerfwereade: As far as the local logic is concerned, and as long as I'm not missing something else (possibly I am), there are only two factors that distinguish what these methods are doing:15:02
niemeyerfwereade: 1) A service is being destroyed15:02
niemeyerfwereade: 2) A service is necessarily alive15:02
niemeyerfwereade: Is that correct?15:03
fwereadeniemeyer, I think that is a reasonable way of looking at it, yes15:03
niemeyerfwereade: Can we express this as f(survivingService, destroyingService string)?15:04
fwereadeniemeyer, I don't think it quite works15:05
niemeyerfwereade: Okay, let's see then15:05
fwereadeniemeyer, I think the problem is in (2)15:06
fwereadeniemeyer, trying to express the cases in relevant terminology15:06
fwereadeniemeyer, actually no15:06
fwereadeniemeyer, thinking, don't wait for me to type ;)15:07
niemeyerfwereade: Hehe :)15:07
=== meetingology` is now known as meetingology
=== _mup__ is now known as _mup_
fwereadeniemeyer, there are sometimes services we never want to modify; all other services always need to be decreffed; sometimes a decreffed service may need to be removed15:09
fwereadeniemeyer, so actually we *can*, but we need to fix the names15:09
fwereadeniemeyer, ignoreService, maybeDestroyService?15:09
fwereadeniemeyer, or perhaps destroyingService, leavingScopeUnit (ha, eww) is cleanest15:10
fwereadeniemeyer, departingUnit15:11
fwereade(btw are we doing the kanban review?)15:11
fwereadeniemeyer, I worry that those won;t make the call sites very clear though15:12
niemeyerfwereade: Hmm, that looks good15:12
fwereadeniemeyer, definitely worthy of further investigation though15:12
niemeyerfwereade: So what would departingUnit be/15:12
niemeyer?15:12
fwereadeniemeyer, the unit whose departure from scope caused the removal of the relation15:12
niemeyerfwereade: The actual *Unit15:13
niemeyer?15:13
fwereadeniemeyer, I was expecting a name15:13
niemeyerfwereade: Can we have just that?15:13
fwereadeniemeyer, all I need from it is the service name15:13
fwereadeniemeyer, maybe cleaner to pass in the unit15:13
niemeyerfwereade: So departingUnit is a service name? That's confusing.15:14
fwereadeniemeyer, no... it's a unit name, from which I will extract a service name15:14
niemeyerfwereade: Might as well be the *Unit, if we have that in the call sites15:14
niemeyerfwereade: (that need to provide it)15:14
fwereadeniemeyer, yeah, it's available15:15
niemeyerfwereade: Cool, that'll be a lot clearer15:15
fwereadeniemeyer, ok, fantastic -- might even let me write removeOps such that I don't need endpointRemoveOps15:16
fwereademaybe15:16
niemeyerfwereade: Woot15:16
niemeyerfwereade: Okay, a parenthesis just for a second now that we're good on that front:15:16
niemeyerfwereade: <fwereade> niemeyer, (incidentally -- the unnamed result is sometimes useful but not always, and so it's left unnamed in the cases where it's not used; more than oneperson has flagged this, and I'm somewhat baffled by it)15:16
fwereadeniemeyer, ah yes15:17
niemeyerfwereade: I *think* I was talking about something else, since hopefully it was uncontroversial15:17
niemeyerfwereade: By unnamed result I meant in the method signature15:17
fwereadeniemeyer, ahhh ok, sorry, I completely misunderstood15:17
niemeyerfwereade: func f() (A, B, bool, error) isn't great15:18
niemeyerfwereade: Since the bool is completely unreadable15:18
niemeyerfwereade: All one can tell is that there's something being returned that is true or false :)15:18
fwereadeniemeyer, I write doc comments with the expectation that people will read them ;p15:18
niemeyerfwereade: Funny enough, the parameter was *also* undocumented in that case :-)15:19
niemeyerfwereade: Well, or maybe it was and I just can't read15:19
niemeyerfwereade: Yeah, "and whether"15:20
fwereadeniemeyer, I think we're talking about the bool documented as "whether those operations will lead to the relation's removal"15:20
niemeyerfwereade: Cool, my bad15:20
fwereadeniemeyer, np15:20
niemeyerfwereade: Still, the name would be useful for folks like me that can't read ;-)15:20
fwereadeniemeyer, good idea, chers15:21
niemeyerfwereade: I hope you're okay with the piecemeal reviews I've been providing15:21
fwereadeniemeyer, absolutely15:21
niemeyerfwereade: The logic is so dense that I find tricky to sustain attention for too long15:21
fwereadeniemeyer, I am not unfamiliar with that feeling myself ;)15:22
fwereadeniemeyer, and it's useful besides15:22
fwereadeniemeyer, so np at all15:22
niemeyerfwereade: Neat, thanks15:22
niemeyerfwereade: I'll provide more pieces today15:22
fwereadeniemeyer, lovely, thanks15:22
niemeyerSome food first, though!15:23
niemeyerbiab15:23
fwereadeniemeyer, enojy15:23
rogpeppeback15:45
rogpeppeoops, did i miss the kanban meeting? or didn't it happe?15:50
rogpeppen15:50
TheMuerogpeppe: it didn't happen, right now niemeyer is at lunch15:51
rogpeppeTheMue: cool. i went for a lunchtime walk and totally forgot about it.15:52
* TheMue usus this time to shrink his vmware image, gone too large during testing for maas15:52
rogpeppeTheMue: have you actually verified that your retry changes fix the problem?16:02
rogpeppeTheMue: that is, have you tried running live tests with your branch?16:03
TheMuerogpeppe: yep, on southeast and east with --debug, looks good16:03
rogpeppeTheMue: did you run it with -gocheck.vv and see the attempts get more infrequent?16:04
rogpeppeTheMue: i have a suspicion that the changes you've made won't change the usual redial case16:04
TheMuerogpeppe: not that way, but i can do it once my image is up again16:04
rogpeppeTheMue: the place you've changed (in juju.NewConn) is very rarely hit16:05
TheMuerogpeppe: i have been able to see it like in the lp bug before, that is now gone16:05
TheMuerogpeppe: Dave suggested it16:05
rogpeppeTheMue: i'm pretty certain that the place that needs an exponential backoff is actually the mgo package16:05
TheMuerogpeppe: there I've placed it before (but linear). you've seen Daves comments?16:06
rogpeppeTheMue: looking16:07
rogpeppeTheMue: i don't believe dave has it right16:08
rogpeppeTheMue: the problem is that mgo.DialWithInfo continually retries with no delay16:09
TheMuerogpeppe: yes, that's why I first placed it there16:10
rogpeppeTheMue: i didn't see a CL on the mgo package16:10
rogpeppeTheMue: can you point me to it?16:10
TheMuerogpeppe: earlier changes in the same cl16:10
rogpeppeTheMue: the correct fix would be in the mgo package itself16:11
TheMuerogpeppe: so you say we have three nested retry loops?16:11
rogpeppeTheMue: i think we've only got two, and one in the usual case.16:11
mrammrogpeppe: I had a doctors appointment so I had to miss the kanban meeting16:12
rogpeppemramm: ah, nice coincidence!16:12
rogpeppeTheMue: mgo has a redial loop and juju.Conn has a redial loop, but only in the ErrUnauthorized case, which only happens for as long as juju bootstrap takes to run, which is about 0.5s usually16:13
rogpeppeTheMue: 60s is vast overkill there16:13
rogpeppeTheMue: but it's there to cater for VM scheduling variance16:14
rogpeppeTheMue: BTW it's nice to see an exponential backoff AttemptStrategy, but given that the usual point of exponential backoff is to avoid thundering herds, i'd be tempted to add a random element too.16:15
TheMuerogpeppe: as own behaviour?16:16
rogpeppeTheMue: ?16:16
TheMuerogpeppe: the random element. do you think of a random time behaviour or a random factor inside the exponential behaviour? and if so than in which way?16:17
rogpeppeTheMue: i'd start with a random delay, then multiply that by a constant factor, adding or subtracting a smallish random delta each time16:18
rogpeppeTheMue: you might want to google for best practice in this area though - the above is just a guess16:19
TheMuerogpeppe: what's the motivation behind it?16:19
rogpeppeTheMue: if a state server goes down and we've got 10000 clients, each client is going to see the server go down at the same time, then initiate the same redial loop.16:20
rogpeppeTheMue: so we'll get a pattern of huge surges, when we want the redials spread out over time16:20
rogpeppeTheMue: but as i said, juju isn't doing the redial in this case, so the right fix is in mgo, and you'll have to ask niemeyer what his preferred method is there.16:21
TheMuerogpeppe: ok, randomize the initial delay and then also a random base number to get a more shallow or more steep exponential behavior, sounds good16:23
rogpeppeTheMue: no, i think the exponent should probably be constant. but if we perturb the numbers each time, the different clients will tend to diverge (although i'd have to think hard if i wanted to show that mathemetically :-])16:25
rogpeppeTheMue: but for the time being, i think this CL isn't helping anything. i'd leave the exponential backoff alone for now, we don't need it.16:26
TheMuerogpeppe: all with the same random seed based on the same utc. ;)16:26
rogpeppeTheMue: i wouldn't seed based on the utc16:26
TheMuerogpeppe: /me too, just a joke16:27
TheMuerogpeppe: could you please enter your thoughts into the cl as a comment so that we can update Dave?16:31
rogpeppeTheMue: will do16:31
TheMuerogpeppe: great, tyvm16:31
rogpeppeTheMue: here's a fun way of generating a seed without using time or importing crypto/rand: http://play.golang.org/p/Bd2cIFoJ-L16:33
niemeyerfwereade: I didn't know "white lie" was an international term, btw :)16:38
fwereadeniemeyer, heh, nor did I16:39
niemeyerfwereade: ping16:58
fwereadeniemeyer, pong16:58
niemeyerfwereade: Yo16:59
fwereadeniemeyer, how goes?16:59
niemeyerfwereade: Trying to understand the logic in Service.destroyOps that builds the relation removal ops16:59
* fwereade looks at it16:59
niemeyerfwereade: It looks like the way in which the ops are prepared is only guaranteed by a count17:00
niemeyerfwereade: But there's apparently no guarantee that the count, being the same, refers to the right relations17:01
fwereadeniemeyer,     asserts := D{{"life", Alive}, {"txn-revno", s.doc.TxnRevno}}17:01
fwereadeniemeyer, ah, no, that's not necessarily enough is it...17:01
* fwereade thinks17:02
niemeyerfwereade: It might be17:02
niemeyerfwereade: Since an inc or dec must touch it17:02
fwereadeniemeyer, yeah, I'm trying to remember ;)17:02
niemeyerfwereade: But then what's the role of the relation count there?17:02
fwereadeniemeyer, not sure what the question is exactly, but let me try:17:03
dimiternfwereade: can you take a look at this: https://codereview.appspot.com/7133043/ - if you think it's fine to land this it'll simplify my branch on bootstrapping17:03
fwereadeniemeyer, if getting the service's relations gets us the "wrong" number, we should definitely refresh and retry17:04
fwereadedimitern, soon :)17:04
dimiternfwereade: 10x17:04
fwereadeniemeyer, if we have the right number, that's enough reason to speculatively go ahead17:04
fwereadeniemeyer, but we can only determine whether it's ok to remove the service by counting the relations we need to remove and seeing whether it's the same as the relations we know we have17:05
fwereadenumber of relations we know we have^^17:05
fwereadeniemeyer, if it isn't, then at least one relation is left Dying, and we can't remove the service until the relation is removed17:06
fwereadeniemeyer, not sure if I've covered what you're asking...17:06
niemeyerfwereade: Yeah, I think so17:07
fwereadeniemeyer, still not sure about whether there's a hole, though, need to think about that separately17:07
niemeyerfwereade: I'll recommend a comment in that location17:08
niemeyerfwereade: and a second one on removeCount == RelationCount17:08
fwereadeniemeyer, ok, sgtm17:08
niemeyerfwereade: Explaining how we can reach that state17:08
niemeyerfwereade: This is the first one:17:09
niemeyer/ This is just an early bail out. The relations obtained may still be wrong,17:09
niemeyer/ and that'll be asserted by the txn-revno comparison below.17:09
niemeyerActually, I'll reword slightly, but that's the idea17:10
fwereadeniemeyer, (I have thus far been unable to construct a sequence of events that screws up the relations without also hitting txn-revno)17:10
fwereadeniemeyer, yep, sounds sane17:10
rogpeppedimitern: looking17:12
fwereade(ty rogpeppe, sorry dimitern, still a bit distracted by this conversation)17:13
dimiternfwereade: np :)17:16
dimiternrogpeppe: 10x, and the other one closely related is https://codereview.appspot.com/7086055/ - I really'd like to have these landed soon17:17
niemeyerfwereade:17:18
niemeyer/ Relations that aren't Alive won't have been removed because17:18
niemeyer/ they're already on their way to death by other means.17:18
niemeyer/ If that's the case, do not remove the service either,17:18
niemeyer/ and let it be dealt with together with the dying relations.17:18
niemeyerfwereade: Is that it17:18
niemeyer?17:18
fwereadeniemeyer, yeah, well put17:19
niemeyerfwereade: Supa17:19
fwereadeniemeyer, just reproposed with nicer Relation.removeOps17:20
fwereadeniemeyer, endpointRemoveOps did disappear :)17:21
niemeyerfwereade: Woohay!17:22
fwereadeniemeyer, btw, I think we can drop the txn-revno check on service destroy, it's an awfully heavy stick to use17:27
niemeyerfwereade: Oh17:29
fwereadeniemeyer, we can just add a D{{"life", Dying}} assert for each relation that isn't removed17:29
niemeyerfwereade: and how do we tell that there are no new relations?17:29
fwereadeniemeyer, in that case, for the relation count to match but the relations to be wrong, the missing relation is guaranteed to fail a txn17:29
rogpeppedimitern: both reviewed17:30
dimiternrogpeppe: tyvm17:30
fwereadeniemeyer, (we can't hit that situation without an equal number of adds and removes)17:30
fwereadeniemeyer, (we also need to assert unitcount in that case ofc)17:30
niemeyerfwereade: Well, yes.. and?17:30
niemeyerfwereade: I mean, that's a real situation17:30
fwereadeniemeyer, right, and if we have an assert for the state of every initially-known relation we're fine, because any of those that are removed will fail an assert and we'll retry17:31
fwereadeniemeyer, right?17:32
niemeyerfwereade: Sounds sane.. thinking17:32
fwereadeniemeyer, I'd really like to drop the txn-revno check, it changes at the drop of a hat ;p17:32
niemeyerfwereade: Yeah, that should be sane.. assert on obtained relation count == service relation count, plus asserts on all individual relations not removed17:34
fwereadeniemeyer, cool17:35
niemeyerfwereade: The only thing that could blow that scenario up is  a relation added without being perceived, and that can't happen in that case17:35
fwereadeniemeyer, yeah, I think so17:35
fwereadeniemeyer, (I'm *really* coming to appreciate mgo/txn :))17:36
niemeyerfwereade: Neat, that's a relief :)17:36
niemeyerfwereade: Btw, I've just sent the notes on service.go17:36
fwereadeniemeyer, lovely, thanks17:37
fwereadeniemeyer, let me know what you think of the relation.go changes :)17:37
niemeyerfwereade: state/service.go:422 is the only thing that is perhaps worth of a sync up17:38
niemeyerfwereade: Cool, will refresh and look again17:38
fwereadeniemeyer, just saw that one, thinking a mo17:38
fwereadeniemeyer, heh, let me hunt around a mo, I think that's addressed in the followup...17:40
fwereadeniemeyer, yeah, you're absolutely right, I forgot to backport that one17:40
fwereadeniemeyer, although actually I think it should maybe assert D{{"life", Dying}, {"unitcount", 1}, {"relationcount", 0}}17:42
fwereadeniemeyer, just because txn-revno is less stable than those17:43
niemeyerfwereade: Which unit is that 1?17:43
niemeyerfwereade: Well, I suppose there will be asserts on the unit itself on the same transaction17:43
* niemeyer looks again17:43
fwereadeniemeyer, that's the unit being removed17:43
fwereadeniemeyer, it's in removeUnitOps17:44
niemeyerfwereade: Yeah, DocExists, ool17:44
niemeyercool17:44
fwereade(heh, laura is asking cath for a "darky-light black pen")17:47
fwereade(apparently, what she meant was a "brown pencil")17:48
niemeyerfwereade: ROTFL17:49
niemeyerfwereade: darky-light is a much nicer name than brown, FWIW17:50
niemeyer:-)17:50
niemeyerfwereade: The new removeOps is great17:50
fwereadeniemeyer, yeah, thanks for pressing me on that one17:51
niemeyerfwereade: np17:51
niemeyerfwereade: Okay, those were the big chunks it seems.. I'll do a pass on everything else now17:52
fwereadeniemeyer, sweet17:52
niemeyerfwereade: The test changes makes it pretty obvious that the changes performed are an API improvement18:09
niemeyerfwereade: Besides making things right18:10
fwereadeniemeyer, cool, glad you like it :)18:10
fwereadeniemeyer, bah, supper is on the table, I should stop18:10
fwereadeniemeyer, I'll pop back on later for a final check/propose if I can18:11
niemeyerfwereade: Beautiful.. you'll probably have a LGTM with minor comments if any18:11
=== mthaddon` is now known as mthaddon
rogpeppei have to go now.18:13
=== negronjl` is now known as negronjl
niemeyerrogpeppe: Have a good one18:13
rogpeppeniemeyer: i think i've finally arrived at a good idea of how the API architecture will work18:14
rogpeppeniemeyer: it'd be good to have a discussion about it to see if i'm on crack or not.18:14
rogpeppeniemeyer: tomorrow, perhaps18:14
niemeyerrogpeppe: Ah, great18:14
niemeyerrogpeppe: Certainly up for the talk18:14
niemeyerfwereade: Okay, reviewed everything18:15
rogpeppeniemeyer: in particular, i think i finally understand how things can work with multiple state servers (both mongo and API)18:15
rogpeppeniemeyer: so we have something to aim towards, even if we don't go all the way immediately18:15
niemeyerfwereade: All good.. will just wait for the service.go repropose18:16
rogpeppeg'night all18:16
niemeyerrogpeppe: Cool, night!18:16
niemeyerfwereade: There's one trivial on a test too, optional18:17
fwereadeniemeyer, good point re test, proposed, bbiab18:46
niemeyerfwereade: Please ping when you're back18:59
niemeyerfwereade: Just want to run a quick idea by you before we're over with this18:59
fwereadeniemeyer, back19:13
niemeyerfwereade: I was wondering if we actually need the Dying relations at all19:14
fwereadeniemeyer, oh, yes, go on?19:14
niemeyerI mean, the assert on the Dying relations19:14
niemeyerfwereade: The alive case has an assert on {"relationcount", D{{"$gt", removeCount}}, which means it must necessarily be alive anyway19:15
niemeyerfwereade: Because it has relations to justify it, whether it was the relations we've seen or not19:16
niemeyerfwereade: Oh, wait, nope.. that's not it19:16
fwereadeniemeyer, no -- the relation count only goes down when the relation is removed19:16
niemeyerfwereade: If there are other relations, we must check those19:16
fwereadeniemeyer, yes, I think so19:16
niemeyerfwereade: So the relationcount should be the one we've seen, sholudn't it?19:16
niemeyerfwereade: Exactly, that is19:17
fwereadeniemeyer, yes, you're completely right19:17
fwereadeniemeyer, curse that crack19:17
fwereadeniemeyer, units can have $gt, relations must be exact19:18
niemeyerfwereade: Cool19:18
niemeyerfwereade: It's slightly suboptimal that there's a hidden requirement in that logic19:21
niemeyerfwereade: destroyOps must necessarily have an assertion on the existence of the relation document for it to be reliable19:22
niemeyerfwereade: If len(ops) > 019:22
fwereadeniemeyer, I wondered about adding an assertDying bool param there19:23
fwereadeniemeyer, to get the relation to generate the assert itself19:23
fwereadeniemeyer, seemed slightly less good, especially since we're already using len(ops) == 0 as a test for already-dying elsewhere19:24
fwereadeniemeyer, I'd be open to arguments that errAlreadyDying would work just as well and maybe be clearer19:24
niemeyerfwereade: Yeah, that'd be much better19:24
fwereadeniemeyer, ok, cool19:25
niemeyerfwereade: The point is that we won't foolishly change destroyOps to not return that error without seeing the consequences19:25
niemeyerfwereade: But it's easy to change what len(ops) == 0 means19:25
niemeyerfwereade: Unintendedly19:25
fwereadeniemeyer, yep, consider me convinced19:25
fwereadeniemeyer, proposed again :)19:42
niemeyerfwereade: Awesome, looking19:42
niemeyerfwereade: There's a len(ops) == 0 in Service.Destroy.. does that have to change?19:44
niemeyerfwereade: Or perhaps not change, but a new case added?19:44
fwereadeniemeyer, I didn't think of that, but you're quite right again19:44
niemeyerfwereade: hmm.. I guess not19:44
niemeyerfwereade: That's service.destroyOps, though19:44
fwereadeniemeyer, yeah, but the same arguments apply19:45
fwereadeniemeyer, I should just be explicit19:45
niemeyerfwereade: Cool19:45
fwereadeniemeyer, I'll just make the errAlreadyDying message non-entity-specific19:45
niemeyerfwereade: Sounds good19:45
niemeyerfwereade: a pre LGTM on this19:45
niemeyerfwereade: The branch seems ready, thanks a lot for all the work19:46
fwereadeniemeyer, sweet :D19:46
fwereadeniemeyer, tyvm19:46
davecheneymramm: ping22:35
mrammdavecheney: pong22:35
mrammI'm available in skype, and am in the google hangout22:35
davecheneymramm: that google hangout is fine22:38
davecheneybut you still haven't invited me22:38
davecheneyjust some other david cheney22:38

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!