[00:22] EMPTY REVIEW QUEUE [00:23] * niemeyer steps out to celebrate [04:04] afternoon [04:05] anyone in the chan ? [04:16] davecheney: hi [04:20] howdy [04:21] can I ask, what happens if you run juju status on a freshly bootstrapped environ ? [04:21] looking at the code, the result should be yamls {} [04:23] juju status returns a yaml with one machine (bootstrap machine) [04:26] ahh yes, of course [04:27] * davecheney goes back to mocking [04:31] davecheney: http://bazaar.launchpad.net/~juju/juju/trunk/view/head:/juju/control/tests/sample_cluster.yaml [04:31] andrewsmedina: ta [04:46] davecheney: sleep time [04:46] davecheney: see you tomorrow [04:47] andrewsmedina: np, thanks for your help [04:52] davecheney: :D [06:41] davecheney: hiya [06:44] morning [06:45] rog, davecheney, heyhey [06:45] fwereade_: yo! [06:54] rog, does this look in any way familiar to you? [06:54] ../state/state.go:8: inconsistent definition for type goyaml._Ctype_struct___9 during import\n\tstruct { start *goyaml._Ctype_yaml_tag_directive_t; end *goyaml._Ctype_yaml_tag_directive_t; top *goyaml._Ctype_yaml_tag_directive_t }\n\tstruct { start *goyaml._Ctype_yaml_simple_key_t; end *goyaml._Ctype_yaml_simple_key_t; top *goyaml._Ctype_yaml_simple_key_t }\n [06:54] rog, goyaml builds just fine normally, but falls over like that in TestPutTools [06:55] fwereade_: it's a cgo error, but i don't think i've seen it before [06:55] fwereade_: have you tried go build launchpad.net/juju-core/... ? [06:56] fwereade_: because it looks to me like TestPutTools is failing because it can't build the commands. [06:56] rog, yes indeed, that is the weirdness [06:56] rog, it only fails when I clear out pkg [06:57] fwereade_: hmm, i wonder if it's something to do with GOPATH. [06:57] fwereade_: what happens if you do GOPATH="" go build launchpad.net/juju-core/cmd/... ? [06:58] hmm, it just should fail with "pkg not found" i guess [06:58] rog, in PutTools (or whatever that calls) I presume? [06:58] rog, I'll see :) [06:58] fwereade_: yeah. [06:59] fwereade_: so you cleaned out pkg by doing rm -r $GOPATH/pkg ? [06:59] rog, yeah [06:59] fwereade_: i'll see if i can reproduce. were you testing all packages, or just one? [06:59] rog, all packages [07:02] fwereade_: if you clean out pkg, then do "go install launchpad.net/juju-core/...", *then* start the tests, does it then succeed? [07:02] rog: fwereade_ have eithre of you installed gustavo's patch to the go tool ? [07:02] davecheney: nope. [07:03] davecheney, nope [07:03] davecheney: i don't have a huge $GOPATH/src dir, so it's not so important for me [07:03] cool, i have been playing with it, and had some weirdness [07:03] rog, I expect so, I will check [07:04] rog, empty GOPATH just won't build as expected [07:06] fwereade_: hmm, i've got a test that's hanging again :-( [07:06] rog, installed packages work just fine [07:06] rog, bah :( [07:06] rog, which one? [07:06] fwereade_: unrelated [07:06] fwereade_: weird [07:06] fwereade_: i'm not sure - it only prints the test name when the test has timed out! [07:06] the machine: section in juju status isn't actually state.Machines is it [07:07] it environ.Instances [07:07] fwereade_: i'm gonna kill it and see what's happening [07:08] the machines: section is reporting details about the environ.AllInstances [07:08] fwereade_: it's the juju package that's hanging [07:09] rog, huh, weird... won't -gocheck.vv tell you though? [07:09] rog, not so helpful if it's unreliable [07:09] davecheney, sorry, I'm not quite following you [07:09] davecheney, I'm pretty sure status should be telling us about machines, not instances [07:09] the two keys that are reported 'machines' in juju status [07:09] fwereade_: yeah, but i'd already run it without gocheck.vv, and i suspect it's a sporadic failure [07:10] fwereade_: I think they are what we call environ.Instances [07:10] because the first field in the yaml is dns-name [07:10] a property which is only availble on the environ.Instance [07:10] fwereade_: it's actually hanging trying to dial zookeeper [07:10] davecheney, I'm suspicious of that assertion, but juju/control/status.py is a rats' nest so I'll have to check ;) [07:11] fwereade_: pm = yield self.provider.get_machine(instance_id) [07:11] fwereade_: i think that dial should have a timeout actually. it's wrong that zk tries to redial indefinitely. [07:11] i think we have a cleaner separation of machine, the virtual, and instance, the conrete in the go impl [07:11] rog: zk never times out [07:11] it sucks [07:11] davecheney, STM it's only doing that with the machines it already knows about from state [07:12] davecheney: +1 [07:12] davecheney, look 17 lines before that [07:12] fwereade_: as you say, it is an uncomforable union of both concepts [07:13] davecheney, I dunno, I think status info from both is relevant [07:13] davecheney, I think it's ok to smoosh them together there [07:13] fwereade_: thanks for the clarification [07:13] I was hoping to have more to show [07:13] fwereade_: apart from that one test hangup, all the tests worked ok for me after removing pkg. [07:13] davecheney, instances are really an internal detail, but their properties apply to the machines the user cares about [07:13] but all I have tonight is a harness [07:14] davecheney, status is not going to be all that small I'm afraid [07:14] davecheney, probably smaller than the python, but still [07:14] fwereade_: indeed, which is why I put some time into setting up a table driven harness [07:14] davecheney, +1 [07:14] imma signing off in a few minuts [07:14] they are closing the coworking space [07:15] davecheney, enjoy your evening :) [07:15] but I look forward to your comments [07:15] davecheney: have fun [07:15] davecheney: when do you leave for lisbon? [07:16] rog: sunday [07:16] but I don't get there til monday 14:25 [07:16] davecheney: i saw that [07:16] I have NFI how to get from the airport to the hotel [07:16] davecheney: there's some info on the sprint wiki page [07:16] need to get some euros as well [07:17] rog: I could have go there a day erlier for an extra 800 euro, and a detour via CDG [07:17] davecheney: well, i might catch you tomorrow morning then... [07:17] davecheney: marvellous [07:17] davecheney: well, i'm glad you're making it regardless! [07:17] for small values of yay [07:18] anyway, i'll catch you around [07:18] davecheney: i usually order currency from travelex to be picked up at the airport. that way you get the convenience of just picking it up, but a much better rate. [07:19] fwereade_: does malta use euros? [07:19] rog, yeah [07:19] fwereade_: well aren't you lucky then?! [07:19] rog, luckyish :) [07:20] :-) [07:20] rog, my bank is still shit at getting me more money when I need it overseas [07:20] fwereade_: not good [07:20] rog, I usually have to bite down the swears and hit up mygradually dwindling UK funds [07:21] right-o [07:21] have a good day gentlemen [07:21] davecheney: and you. enjoy the evening! [07:21] have a good night davecheney :) [07:21] fwereade_: is that because they don't like you using your card overseas? [07:22] rog, I have a credit card that *works* but is such an aggressively shitty deal that I don't really use it [07:22] rog, they're essentially charging me to borrow the money from myself [07:22] fwereade_: just get cash in advance... [07:23] rog, and I have no way of telling what my "balance" is on it [07:23] rog, I have another one which works only in totally arbitrary circumstances [07:23] fwereade_: another possibility is to get a "foreign currency" card, which you can charge up as required. [07:23] rog, twas sold to me as "this will let you get cash from HSBC machines in the UK without us charging you" [07:23] fwereade_: you get the best rate that way. [07:24] fwereade_: ha [07:24] rog, in practice it *sometimes* works everywhere *except* HSBC machines in the UK, where it never works [07:24] fwereade_: i have a possible idea for why your build might be failing [07:24] rog, I keep meaning to go and shout at them for a host of reasons but I have better things to do with my time ;) [07:24] rog, oh yes? [07:25] fwereade_: i think it must be something in your environment. i'm trying to remember what the gcc heuristics are for finding include files. [07:26] rog, ahhh, yes, that does make perfect sense [07:26] fwereade_: i'm wondering if it's right that bundleTools runs in a clean environment with just GOPATH, GOBIN and PATH set [07:26] rog, I suspect it might be better to just use whatever env the developer is using in the first place [07:26] fwereade_: what happens if you unset everything in your environment except those vars? [07:27] fwereade_: that's what i'm thinking [07:27] fwereade_: ... and then run go install launchpad.net/juju-core/cmd/... [07:28] fwereade_: i'd be interested to find out what var it was, but i think you're right - the fix is to mutate the env, not come up with a new one [07:28] fwereade_: (assuming i've got the right diagnosis, of course!) [07:31] rog, I'm peering at my env and can't see what could be the issue :/ [07:31] fwereade_: i suspect it might be $HOME [07:32] fwereade_: try HOME="" go install launchpad.net/juju-core/cmd/... [07:32] rog, sorry, just a mo [07:39] rog, empty HOME works fine :( [07:42] fwereade_: try applying lp:~rogpeppe/juju-core/go-build-env and see if it works then [07:42] fwereade_: (your go test, that is) [07:42] fwereade_: (the thing that was failing before) [07:43] rog, will do in a sec, thanks [07:45] rog, I have a side issue to ponder: in https://codereview.appspot.com/6402048/ niemeyer suggests a RelationHandler type [07:45] rog, and I'm not actually sure whether *that* has a benefit over putting the methods directly onto Relation [07:45] fwereade_: yeah, i saw that. i haven't thought it over yet though. [07:46] fwereade_: yeah, what fields are in RelationHandler other than the Relation itself? [07:47] fwereade_: oh, hold on, it's got a particular set of relation end points in it [07:47] rog, that's the only way we have to identify/get a Relation in the first place isn't it? [07:47] rog, the endpoint list is equivalent to the relation's name [07:48] * rog does a godoc to remind himself of the API [07:48] fwereade_: yeah that does sound right [07:49] rog, there is *something* up there [07:49] rog, possibly we ought to be doing it with endpoint identifiers rather than actual RelationEndpoints [07:49] rog, and doing the translation internally [07:50] fwereade_: why's that? [07:50] rog, but I'm not sure that entirely fits expected usage [07:50] rog, mainly because that will be a lot more convenient for add-relation and remove-relation [07:51] rog, which is the only time I'm aware of when we do stuff with relations *other than* via something that already exists within state [07:51] rog, which can identify them how it likes internally [07:51] rog, same as the name/key distinction [07:51] rog, btw, go-build-env works great [07:52] fwereade_: ah good. so *something* in your environment is the culprit! [07:53] fwereade_: i'm not sure i get it. RelationEndpoint is part of the public API; why not use it? [07:53] fwereade_: or are you suggesting removing it from the public API? [07:55] rog, I'm not quite sure :) [07:55] rog, possibly it is theright approach, and we just need a State.Endpoint(name string) method [07:56] rog, except, hmm, that's actually State.Endpoints... [07:56] fwereade_: what would it return? [07:56] rog, meh, it'll become clearer as I do more with it [07:57] rog, look, you need a state to construct a meaningful RelationEndpoint, right? [07:57] fwereade_: BTW i'm all for the idea of reducing the number of types in state [07:57] fwereade_: do you? [07:58] rog, yeah, it's all information that comes from the charm [07:58] fwereade_: well, i suppose you hav... yeah [07:59] fwereade_: a relation endpoint is always associated with exactly one unit, right? [07:59] rog, no, with a service [08:00] rog, the user expresses endpoints as service_name[:relation_name] [08:01] rog, and we need to do magic to figure out what endpoints are actually referred to by those strings [08:01] fwereade_: and for a scope=local relation, there's still only one endpoint for the service? [08:02] rog, and possibly barf if they're ambiguous and ask the user to specify the [:relation] bit [08:02] rog, yeah, the endpoints are purely about the connections between services [08:02] fwereade_: so the RelationEndpoint type is about isolating that magic, so that you're dealing with unambiguous things? [08:03] rog, the details of which units actually talk to one another is handled completely separately, although the choice of how to do so is ofc governed by the endpoint's scope [08:03] rog, or rather the narrowest scope present in the endpoints of the relation as a whole [08:03] fwereade_: so you do some magic to produce an endpoint, then use that in the API? [08:03] rog, I don't do anything, we don't yet have add-relation ;p [08:03] fwereade_: will do, then :-) [08:04] rog, but, yeah, that's what we'll have to do [08:04] rog, I'm saying we should have AddRelation(epnames ...string) [08:04] rog, ...probably, anyway [08:04] rog, I'm not 100% sure [08:05] fwereade_: where epnames are potentially ambiguous? [08:05] rog, quite possibly we should actually have State.RelationEndpoints(epnames ...string) ([]RelationEndpoint, error) [08:05] fwereade_: that's Service.AddRelation, presumably? [08:05] rog, yes [08:06] rog, no, State.AddRelation [08:06] fwereade_: ah. [08:08] fwereade_: i'm not sure if that magic needs to be in state [08:08] fwereade_: it seems to me like it's something that can be built using the primitive already available. [08:08] primitives [08:09] rog, yeah, reasonable perspective [08:10] fwereade_: it seems to me like that's the meat of the logic of add-relation [08:10] fwereade_: and it could sit happily in the juju package [08:10] rog, ok, I think I'm convinced, thanks :) [08:10] fwereade_: then the state API is still very general [08:11] fwereade_: cool [08:11] rog, (also the meat of remove-relation) [08:11] fwereade_: but... i'm not sure we've answered the original question [08:11] rog, (although I suspect we should have a slightly different algorithm there) [08:11] fwereade_: should RelationHandler just be Relation? [08:12] fwereade_: ah, no, it can't be [08:12] rog, I'm leaning towards "yes" [08:12] rog, really? bother [08:12] fwereade_: because it's more than a relation - it encapsulates a set of units cooperating in a relation [08:13] rog, all that is implicit in the *Unit params to the methods, innit? [08:13] fwereade_: so for locally scoped relations, you'll have a RelationHandler for each machine running the service [08:14] rog, no you won't [08:14] fwereade_: no? [08:14] rog, you just do rh.Join(localUnit); rh.Watch(localUnit), surely? [08:14] rog, ok, the name of Watch is sl. misleading [08:15] rog, WatchAsThoughYouWere(localUnit) ;) [08:15] rog, the rh itself is the "same" object regardless of what UA is running it [08:15] fwereade_: but if all machines have also doing rh.Join, won't we see changes from all machines? [08:16] rog, well, in a global relation, yes; in a locally scoped relation, no [08:16] oh i think i see [08:16] rog, it figures out what scope to watch from the unit we pass in [08:16] rog, ie it finds all the units in the same scope as the one suggested [08:16] fwereade_: you're relying on the fact that every time we get a relation, it's freshly made [08:16] rog, expand please [08:18] fwereade_: are you saying that the Join method doesn't actually affect the state at all? [08:18] hmm, i think i'm confused [08:19] rog, no, I'm not, but I can't connect that with what you're asking [08:19] rog, that would not surprise me, I *think* I now get relations, but it has been a somewhat arduous path [08:19] fwereade_: i'll try and recap; tell me where i go wrong. [08:20] rog, cool [08:22] fwereade_: someone does add-relation svc1:relation1 svc2:relation2; we look up the endpoints for both of those, and call State.AddRelation with those. [08:22] rog, yes [08:23] fwereade_: the unit agent on each machine sees the new relation and calls Join on it. [08:24] rog, yes [08:24] fwereade_: the unit agent runs the relation1-relation-joined hook. [08:24] rog, no [08:25] ah, no i see [08:25] rog, a *different* unit agent runs that hook when it detects that the first UA called Join [08:25] of course [08:25] fwereade_: ok, the unit agent see *anouther* machine that has called Join on the new relation. [08:25] rog, yep [08:25] fwereade_: so it runs the relation1-relation-joined hook. [08:26] rog, (ofc it only sees *that* because it's watching the relation from its own perspective, by having called Watch(itsOwnUnit) [08:26] rog, yes [08:26] fwereade_: but how does this work with locally scoped relations? [08:27] rog, the state.unitScopePath is at the heart of it [08:27] rog, are you familiar with the contents of the /relations/relation-XXX node? [08:27] fwereade_: remind me :-) [08:28] rog, if it's a globally scoped relation, it has a settings subnode, and a subnode for each role in the relation [08:28] rog, if it's container-scoped, it contains a subnode for each container that holds some unit participating in the relation, and each of those container nodes contains the role and settings nodes [08:29] rog, when watching, or joining, on behalf of a given unit, we figure out what the role-and-settings-parent node is for that unit, based on the relation's scope and (maybe) the unit's container [08:29] rog, and then we just watch/write to the contents of that tree [08:30] rog, (note: the top-level relation node of a container-scoped relation does *not* have role or settings nodes) [08:30] rog, make sense? [08:31] rog, (similarly, the Settings(*Unit) method does the same thing) [08:32] fwereade_: ah, so the Join method does the magic based on the unit [08:32] rog, yes; really, *every* Relation(Handler)? operation depends on a specific unit to get the appropriate perspective [08:32] fwereade_: BTW unitScopePath doesn't actually seem to be used anywhere [08:33] rog, update your tree :) [08:33] fwereade_: ah ha! [08:33] rog, and then for examples of actually constructing a useful one, take a look at the relation-unit branch itself [08:34] rog, in Unit.AgentJoin [08:36] rog, it's very small but I'm rather pleased with it [08:36] rog, the python seems to do path-hackery all over the place :) [08:36] fwereade_: ok, so i *think* i understand now, and it seems reasonable then that the methods should be directly on the Relation. [08:36] rog, cool [08:37] rog, I think I'll talk about it with niemeyer too, but it's good to know I'm not *obviously* all underpants-on-head about it ;) [08:38] fwereade_: how would Relation.Join(unit) differ from Unit.AgentJoin(relation) BTW? [08:39] rog, it's only responsible for the Pinger [08:39] rog, AgentJoin does both the pinger *and* the watching *and* gives access to the settings node [08:40] rog, I *suspect* we'll still end up with something similar in effect to the RelationUnit but I can't see far enough into the future to be sure [08:41] rog, the precise types we need to sling around will become clearer as I work on the queuing and tie it into cmd/jujuc/server.ClientContext [08:42] fwereade_: if RelationHandler is actually Relation, then it seems like we've managed to lose a type, which sounds like a good thing. [08:42] fwereade_: well, lose a public type anyway [08:42] rog, yeah, I think so [08:43] fwereade_: i'm definitely +1 on that [08:43] rog, cool :) [08:43] fwereade_: i think Join, Settings and Watch on Relation is an easier thing to understand [08:43] fwereade_: i hadn't quite grasped what a RelationUnit actually *was* :-) [08:44] rog, and we don't have ServiceRelation any more, I never really understood what that was either ;) [08:44] fwereade_: cool :-) [08:45] * rog likes it when an API comes together [09:05] fwereade_: https://codereview.appspot.com/6421045 [09:06] rog, LGTM [09:07] * fwereade_ tuts in niemeyer's general direction [09:07] (config gofmt ;)) [09:08] * fwereade_ suddenly grasps the full horror of how the tests will look if he tries: [09:08] var hookQueueTests = []struct { [09:08] adds []state.RelationUnitsChange [09:08] gets []relationer.HookInfo [09:08] }{ [09:09] ...given that: [09:09] / RelationUnitsChange holds settings information for newly-added and -changed [09:09] / units, and the names of those newly departed from the relation. [09:09] type RelationUnitsChange struct { [09:09] Changed map[string]UnitSettings [09:09] Departed []string [09:09] } [09:09] / UnitSettings holds information about a service unit's settings within a [09:09] / relation. [09:09] type UnitSettings struct { [09:09] Version int [09:09] Settings map[string]interface{} [09:09] } [09:09] ...and: [09:09] / RelationUnitsChange holds settings information for newly-added and -changed [09:09] / units, and the names of those newly departed from the relation. [09:09] type RelationUnitsChange struct { [09:10] Changed map[string]UnitSettings [09:10] Departed []string [09:10] } [09:10] / UnitSettings holds information about a service unit's settings within a [09:10] / relation. [09:10] type UnitSettings struct { [09:10] Version int [09:10] Settings map[string]interface{} [09:10] } [09:10] * fwereade_ ponders grumpily [09:34] morning [10:23] rog: Btw, your log-tracking-testing will soon fly into the trunk. Just doing the final steps. [10:24] TheMue: cool. i saw gustavo's remarks, and breathed a sigh of relief that he thought it was ok :-) [10:24] TheMue: morning, BTW! [10:25] rog: morning and yes, it's indeed a nice idea. didn't thought so in the beginning, but seeing it in real life feels good. [10:26] TheMue: it went through a couple of more complex iterations before it arrived at the paste i gave you. i'm quite happy how it turned out actually. [10:28] fwereade_: erm, you pasted the same thing twice [10:28] fwereade_: did you mean to post the HookInfo type definition? [10:28] rog, oh, blast, yeah [10:28] type HookInfo struct { [10:28] Name string [10:28] RemoteUnit string [10:28] Members map[string]map[string]interface{} [10:28] } [10:28] rog, but not to worry, it's readable now [10:29] rog, there are rather a lot of cases to deal with, though ;) [10:39] fwereade_: what are the names in RelationUnitsChange.Departed, BTW? unit names? [10:39] And once again heavy rain. Hey, weather god, it's enough! [10:42] TheMue: you'll be too hot next week! [10:42] rog: Yes, I've seen, and I promised my family to bring some good weather with me like from Oakland. [10:43] fwereade_: also, it seems odd that RelationUnitsChange gives no way of telling *which* unit changed. is that because we don't need to know? [10:43] rog: Tomorrow the school holidays begin, so it *has* to change. ;) [10:50] ec2 ssd instances! [10:50] http://aws.typepad.com/aws/2012/07/new-high-io-ec2-instance-type-hi14xlarge.html [10:52] TheMue: we haven't really had a summer here. it's been continual rain. and when i got back, found that there'd been some serious rain while we were away: http://www.youtube.com/watch?v=huCnxI8qDb0 http://www.youtube.com/watch?v=lpJVTCLHKUE [10:53] hazmat: sounds expensive :-) [10:54] rog, depends on if you need it.. netflix reduced their costs by half for the same throughput on a benchmark.. http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html [10:55] hazmat: interesting [10:55] rog: Hard storms, yes. We had some of those in July too. May/June have been some wonderful warm days where I worked on the veranda. [10:58] Hmm, Gavin Harrison really motivates me to realize an old dream: learn to play drums. [10:58] He's the drummer of Porcupine Tree. [11:01] fwereade_: i don't think you'd need an intention log per relation [11:01] fwereade_: i *think* that one log would suffice [11:03] TheMue: one forecast i saw had it at 37 degrees in lisbon on one day [11:09] rog: ouch [11:09] TheMue: but mostly around 30, which is bearablew [11:09] s/w// [11:16] rog: Yes, just took a look too. 30 is fine [11:24] rog, sorry, lunch :) [11:24] fwereade_: np! [11:25] rog, RelationUnitsChange has Changed map[unitname]info, Departed []unitname [11:26] rog, you're expected to know which relation changed because you started watching a particular relation [11:26] fwereade_: ah, i think the documentation should mention that the key of Changed is the unit name :-) [11:26] rog, sensible, cheers [11:27] fwereade_: i glanced at it my brain said "attribute name" [11:27] rog, I thought it kinda went without explicitly saying when I wrote it but it could certainly be clearer [11:27] fwereade_: some of are stupider than you might think :-) [11:28] rog, I'm interested to hear about your further thoughts on intention logs though [11:29] rog, (sorry I had to dash yesterday) [11:29] fwereade_: np [11:30] fwereade_: i'm thinking that when we get an error executing an intention, we can write "broken" to the intention log. [11:30] fwereade_: that doesn't stop the updates coming through the queue for that relation, of course. [11:30] fwereade_: but i'm reconsidering if having the queue in a separate goroutine is the right approach after all [11:33] rog, sorry, I'm not coming up with any responses [11:34] fwereade_: that's fine, it's all vapour anyway :-) [11:34] fwereade_: just my vague thoughts on the matter. you're much closer to the problem. [11:35] rog, I am definitely having difficulty visualising an intention log with the unit lifecycle state and all the relation lifecycle states and brokens and dones and collapses and resolve-handling :) [11:35] rog, collapses and resolves, in-memory, for a single relation, is occupying most of my brain ATM :) [11:37] rog, I think I have something that will work and be easy enough to extend to handle process bounces, but we'll see [11:44] fwereade_: this kind of API, perhaps? http://paste.ubuntu.com/1099903/ [11:46] rog, ATM I have http://paste.ubuntu.com/1099905/ [11:48] fwereade_: i'm not sure just RelationUnitsChange is enough, as it doesn't hold the name of the relation that changed. [11:48] fwereade_: of course, it could get that [11:49] rog, I'm assuming one of these per relation [11:49] rog, and that the client should know what relation they're dealing with [11:50] fwereade_: that sounds good actually [11:50] rog, it's about all I can fit in my head ATM :) [11:50] :-) [12:28] niemeyer: yo! [12:29] Morning! [12:30] niemeyer: Heya [13:10] niemeyer: what if there's already a setting for GOBIN in the environment? are later entries guaranteed to override earlier ones? [13:12] niemeyer, heyhey! [13:13] niemeyer, rog, I have an in-memory HookQueue for your perusal: https://codereview.appspot.com/6422049 [13:14] fwereade_: cool [13:15] fwereade_: (not entirely sure about "relationer" as a name, even though we already have the dubious "machiner") [13:16] rog: I guess we shouldn't trust it indeed [13:16] fwereade_: how about just "hook" [13:16] rog: It works, but who knows [13:16] rog, because this is only one of many components of a system that will deal purely with unit agents' participation in relations? [13:16] niemeyer: yeah, it seems impl dependent to me [13:17] fwereade_: i'd really expect to see "uniter" :-) [13:17] rog, uniter will come [13:17] fwereade_: under worker/, anyway [13:17] rog, that has a whole bunch of other responsibilities, including kicking off stuff in relationer [13:18] fwereade_: maybe worker/uniter/relation ? [13:18] rog, like how we have worker/provisioner/firewall? :p [13:19] fwereade_: in a way, firewaller is an independent worker in its own right. [13:19] fwereade_: i'm not *sure* that can be said for relationer [13:19] fwereade_: the fact that the firewaller happens to run in the PA is historical accident IMO [13:21] rog, I guess I could live with worker/uniter/relation/hookqueue.go, but the unit stuff and the relation stuff are really pretty distinct [13:22] fwereade_: "hooks" seem fairly closely tied to the unit agent to me [13:22] fwereade_: i'd be happy for all this to live inside the unit agent, tbh [13:23] fwereade_: not entirely sure what we gain by having a separate package [13:24] rog, mainly what we gain is an opportunity to avoid the ickiness of python, in which we have relation stuff and unit stuff smeared across the same package together despite the connections between the two being very very tenuous [13:24] fwereade_: i'd've thought that we could make that distinction very clear by the types we use [13:25] rog, the unit stuff starts and stops the relation stuff, it is true, but the only other point of contact that I can recall is a shared hook executor, which feels to me much like the environ shared between the provisioner and the firewaller [13:26] rog, uniter.UnitWorkflow/RelationWorkflow, uniter.UnitLifecycle/RelationLifecycle feels nasty to me [13:26] fwereade_: that's ok, i think. we already have fairly self-contained modules within the same package. [13:26] rog, compared to uniter.Workflow/Lifecycle, relationer.Workflow/Lifecycle [13:26] fwereade_: does any of this stuff need exporting? [13:27] fwereade_: i think of all this as internal details of the unit agent/worker. [13:27] rog, certainly the workflow stuff [13:27] rog, or, hmm, maybe we can get away without that [13:27] fwereade_: by workflow stuff, you mean? [13:28] rog, ZK storage of unit/relation state, used by status, but that will probably want to be a separate package in itself [13:28] fwereade_: feels to me like that would live in state [13:29] rog, also disk storage, I think, although we my come up with some nice way around that [13:30] rog, anyway, I think we have now discussed this enough to make the necessary move half a dozen times if it becomes apparent that it is misplaced ;) [13:30] fwereade_: tbh, i'm happy having the relation hook queue stuff in separate package if it's really self-contained, but "relationer" isn't the right name. it's not an "er" :-) [13:30] rog, I am impressed by your precognitive ability :/ [13:30] fwereade_: ok, i may be wrong! [13:31] fwereade_: as always... [13:31] rog, not to worry, but the code in there is about 1000 times more interesting than the name of the package IMO [13:32] fwereade_: indeed, i'm looking through it. just thought i'd convey my naming discomfort... [13:33] rog, no worries :) [13:34] fwereade_: any particular reason you need to go through the changed units in alphabetical order, BTW? [13:34] rog, consistent tests [13:34] * rog just noticed the comment, duh [13:34] :) [13:35] * fwereade_ thinks he's spotted a bug [13:35] * fwereade_ writes a test [13:36] * fwereade_ stops and tries to figure out how to [13:39] * fwereade_ thinks it's not a bug [13:39] niemeyer, ping [13:40] fwereade_: Yo [13:40] niemeyer, I was wondering if there was any particular reason to have RelationHandler separate from Relation? [13:40] * niemeyer looks at what Relation is today and how it's used [13:42] niemeyer, and I also think that Watch() should really return a RelationUnitsWatcher, which would then be an exported type, and could be tested in a more pleasing fashion [13:43] fwereade_: Both suggestions look sane [13:43] niemeyer, sweet, thanks [13:47] fwereade_: the hook queue code looks reasonable [13:47] fwereade_: one question: do we actually need a queue there? [13:47] rog, sorry, where? [13:47] fwereade_: could we just keep a bag of stuff waiting to be done? [13:47] fwereade_: in the hook queue [13:48] fwereade_: does it matter which order the relation hooks are executed (between relations, that is)? [13:48] rog, across different relations, no it doesn't [13:48] rog, within one relation yes it does [13:49] fwereade_: i was wondering about a data structure like this: http://paste.ubuntu.com/1100094/ [13:50] rog, needs versions but that's by the by [13:50] fwereade_: yeah [13:51] rog, the prospect of entirely dropping ordering seems odd to me [13:51] fwereade_: it's probably crack, mind [13:51] fwereade_: well, we're not *entirely* dropping it :-) [13:51] fwereade_: and if we restart, the order is going to be arbitrary anyway... [13:51] rog, it feels like we'd need to add a bunch of icky heuristic stuff to ensure we didn't just, say, not bother to handle a departed for 20 minutes because there were lots of interesting changes going on [13:52] fwereade_: statistically that would be very unlikely :-) [13:55] fwereade_: BTW shouldn't Done call remove() ? [13:55] rog, remove() is dead [13:56] rog, proposal is updated [13:56] rog, and the queue is updated in Next [14:01] rog, anyway I'm not *sure* but it feels like the unordered solution would be more churny with a heavily loaded queue [14:01] rog, and I think there's something to be said for predictability [14:01] fwereade_: here's a more predictable solution in the same vein: http://paste.ubuntu.com/1100115/ [14:01] fwereade_: we have a simple linked list of units [14:02] rog, and I think ensuring joined-then-immediately-changed might also be kinda icky [14:03] fwereade_: i know it's probably utterly irrelevant, but i'd like to see the queue operations be O(1), and i don't think it's hard. [14:04] rog, that's pretty nice, I'm still not sure it does anything other than push the complexity around, but I'll look into it some more [14:04] fwereade_: i'm pretty sure it makes it more efficient. but simpler... i dunno. [14:04] fwereade_: i'm hoping so. [14:06] rog, maaaaybe :) [14:07] rog, it's missing membership and settings information [14:08] rog, and I'm not sure it's entirely trivial to add them cleanly [14:08] rog, (also, list needs to be doubly linked, but that is also by the by) [14:09] fwereade_: i thought a single link would probably be ok [14:09] rog, how do we reorder in O(1)? [14:09] fwereade_: we do we reorder? [14:09] s/we/why/ [14:11] fwereade_: members looks like it's applied only by Next, and probably wouldn't change that much. [14:11] fwereade_: settings is in unitStatus.settings [14:11] rog, because some earlier events are replaced by later events, eg depart on top of change, or change on top of depart [14:11] rog, yeah, but you're missing all the units that aren't in the queue [14:11] fwereade_: no reordering required there, i think. [14:13] fwereade_: and i *think* that the unitStatus map can contain all units, not just ones in the queue [14:13] rog, hmm, the python did need it, but I think you're right; this implementation doesn't [14:13] fwereade_: (ok, i realise that's definitely a change from where i started :-]) [14:14] rog, sure, that's the nature of the beast [14:15] rog, serializing that with all those pointers will be icky, but serialization and reconciliation is going to be interesting regardless I think [14:16] fwereade_: serialisation would be easy i think, but i'm still not convinced we want to serialise this. [14:16] fwereade_: interesting, yes [14:18] rog, this has all the state we need to serialize wrt the hook queue, I think [14:19] fwereade_: i'm still thinking that incremental might be the way to go, but that may be entirely crack [14:31] fwereade_: http://paste.ubuntu.com/1100165/ [14:32] fwereade_: i'll get back to what i'm supposed to be doing now, erk! [14:51] fwereade_: when you run a departed hook, does it have access to the old settings of the remote unit? [14:51] rog, no [14:52] fwereade_: hmm, that seems a pity in a way. i can see that it might be useful. [14:52] rog, I think it's just *gone*, too bad, is the thinking :) [14:53] fwereade_: it means that if you had any resources associated with some attribute exposed by the remote unit, you'd have to keep track of it yourself. but actually, i can see stronger arguments for the settings not being available - if you restart, you won't have access to them. [14:55] Aargh! What's that? My proposal ended with a "can't upload base of .lbox: ERROR: Checksum mismatch" and the original issue is messed up. I thought I did it right with the prerequisite this time. *hmpf* [15:01] * TheMue softly switches into panic mode [15:14] *phew* [15:14] Wrote down a recipe ones and it worked. [15:26] * niemeyer => lunch [15:41] rog, fwiw, there is definite ugliness related to restarting when there's an inflight hook in which some members no longer exist [15:42] rog, haven't quite figured out what to do about that yet [15:43] fwereade: hmm, how is that awkward? [15:44] rog, we don't really want to just remove members from the relation without executing departed hooks [15:44] rog, we can't sanely execute the departed hooks while we're in a screwed state from the previous errored hook [15:45] fwereade: ah! you mean resolving, not restarting? [15:45] rog, I mean restarting, I was using inflight to denote resolviness [15:45] fwereade: ah, so we're in an error state when we restart? [15:46] rog, yeah [15:46] rog, or, well, we might be [15:46] rog, the hook queue doesn't have to worry about *that* at least [15:48] fwereade: hmm, maybe the --retry could be interpreted as "retry that hook eventually" :-) [15:48] fwereade: so you'd run the departed hooks first [15:48] rog, and what if it's just in limbo? [15:49] fwereade: sorry, what if _what_ is just in limbo? [15:50] rog, the queue... sitting there for an arbitrary amount of time, waiting for resolution [15:51] fwereade: well, we know that the queue is waiting for resolution, right? i'm not sure i see the problem. [15:52] rog, the relation should surely still contain its original members until we have executed departed hooks? [15:52] rog, but we don't have access to those members, because they don't exist any more [15:53] fwereade: what can access those members? we're not running any hooks for that relation, no? [15:53] rog, doesn't the hook that errored (or just was not completed) want them? [15:53] rog, forget resolved entirely [15:54] fwereade: if we forget resolved, then we have to assume that the hooks all complete successfully, no? [15:54] rog, ha ha :) [15:54] rog, if we haven't said Done(), we can't assume that, can we? [15:55] rog, it is *important* that we don't skip hooks ;) [15:55] fwereade: i think that if the unit agent goes down while executing a hook, we could treat that as an error and require resolution [15:55] fwereade: after all, we won't *know* if the hook failed or not [15:56] rog, could do, might make life easier for us [15:56] rog, but there *should* be nothing wrong with re-executing a hook [15:58] fwereade: here's another little question: what happens if a relation gets departed while its join hook is executing. are we still required to execute the change hook? [15:58] rog, no, tested [15:59] fwereade: so we can execute joined then immediately departed [15:59] rog, yeah, the python knowingly does that [15:59] fwereade: cool. i thought that *should* be the case, but... [15:59] fwereade: so there *is* something wrong with re-executing a hook, in general. [16:00] rog, sorry, why is that so? [16:00] fwereade: if we crashed while a change hook is executing, and the relation is departed while we're down, we don't want to re-execute the change hook, but the departed hook, i think. [16:02] rog, you know what? all these are specific cases of the general "we sometimes need access to relation unit settings after the units are gone" problem [16:03] rog, the python addresses this by saying LALALA I CAN'T HEAR YOU when you point this out [16:03] fwereade: i'm not sure [16:03] rog, andtrusting that nobody ever actually deletes relation unit settings [16:08] fwereade: i'm not sure i see any places above where we need access to relation unit settings after the units are gone. [16:09] rog, that happens *all* the time [16:09] rog, whenever two units depart at the same time? [16:09] rog, we execute 2 departed hooks [16:09] fwereade: in fact... we always provide the "latest" version of a unit's settings. what happens if that unit has departed already but we haven't seen the departed hook yet [16:09] ? [16:09] rog, the second unit is still apparently present during the first depart [16:10] FYI, http://www.oscon.com/oscon2012/public/content/video sabdfl keynoting about juju [16:10] rog, that should not be such a problem, we just use the ones we already have [16:10] fwereade: ah, but not if we've restarted, right? [16:10] rog, indeed [16:10] fwereade: ah, and that's what you meant by "nobody ever actually deletes relation unit settings" [16:10] fwereade: ... from zk, yes? [16:13] * TheMue watches Mark [16:13] * rog too [16:14] rog, sorry, the last thing I said was "rog, but that does kinda offend my sensibilities a little" [16:15] fwereade_: last thing i saw was "indeed" [16:15] rog, unless we do what we do in python and just hope they exist, at least [16:15] rog, which they will, as long as we don't GC them until the whole relation is gone [16:15] rog, but that does kinda offend my sensibilities a little [16:15] fwereade_: yeah, me too [16:15] the *easy* solution, of which niemeyer is not fond for entirely sensible reasons, is to store all the settings as well [16:16] rog, blast, gtg :( [16:16] fwereade_: k [16:16] rog, I am interested to hear any brainwaves you may have [16:17] fwereade_: i'll continue thinking on it [16:17] rog, cheers [16:17] fwereade_: i'm around late this evening BTW [16:17] fwereade_: though i should pack! [16:17] rog, I *might* be, my later plans remain unclear [16:19] TheMue: crossed fingers the demo works :-) [16:19] rog: Yes ;) Always the hard part. [16:21] essentially [16:21] rog: That has been nice. [16:21] rog: Ha, here he smiles. [16:35] TheMue: why bother checking that machineUnitsChanges is closed at all? we never close it - there's no need to check closed on *every* channel receive, i think. [16:40] rog: Which line? [16:40] TheMue: line 64 [16:41] TheMue: case change, ok := <-fw.machineUnitsChanges: [16:41] TheMue: why not just: case change := <-fw.machineUnitsChanges ? [16:42] TheMue: it's trivially verifiable that we don't close the channel [16:42] rog: just a habbit, add a note [16:58] afk [17:44] fwereade__: ping [20:13] I'll step out for a coffee break, back soon [20:58] Back [21:00] niemeyer, heyhey, I'm around for a bit [21:00] fwereade__: yo [21:00] niemeyer, what can I do for you? [21:01] fwereade__: I just wanted to run a seed concept by you [21:01] niemeyer, cool [21:01] fwereade__: I haven't reviewed your branch yet, but hopefully will do that today still (just working a bit on config) [21:01] fwereade__: But, [21:01] fwereade__: One thing that we talked about at some point in the past [21:01] fwereade__: and that seems curious in the context of the "hook queue" [21:02] fwereade__: Is that, in fact, we may not really need a *queue* per se [21:02] fwereade__: The only thing we need to know is what to run next [21:03] fwereade__: That is, we don't have to take action on events coming from ZooKeeper immediately, as we're doing something else [21:03] fwereade__: Because we have to handle the possibility of losing events in case of errors anyway [21:04] * fwereade__ continues to listen with interest [21:04] fwereade__: So, in a way, this means we can be more comfy and simply consume the next event until we have a decision to do something at hand [21:05] fwereade__: and then let the next queue of events completely unhandled while we do that [21:05] fwereade__: Does that ring any bells? [21:06] niemeyer, so essentially we just snapshot the state whenever we think it might be a good time to run some hooks, and at that point figure out the diff? [21:06] fwereade__: Not even snapshot [21:07] fwereade__: We consume events from the queue until we have a decision on what to do [21:08] fwereade__: But the "queue" is the network itself, and the zookeeper library.. we don't have to be processing everything in advance [21:09] fwereade__: Of course, we still need the logic for keeping track of where we stand, etc [21:09] niemeyer, I am having a little bit of trouble seeing the distinction [21:09] fwereade__: and recording that for error recovery [21:10] fwereade__: Maybe there's none.. [21:10] niemeyer, rog had some interesting ideas about it not really actually needing to be ordered, which I thought were worthy of a bit of exploration [21:11] fwereade__: But if nothing else, I see a small conceptual difference in trying to maintain a queue of upcoming events, and simply computing what's the *next* event [21:11] fwereade__: Uh.. that's not what I mean [21:11] fwereade__: Order is important, I suspect [21:12] niemeyer, that is my *instinct* but I can't quite figure out *why*, apart from it being an awful lot easier to get my head around [21:12] niemeyer, be that as it may, I don't want to derail [21:12] fwereade__: That'd be good enough of a reason to keep it ordered, but a unit departing and then joining sounds very different from joining and then departing :) [21:12] niemeyer, what the hook queue *does*, though, both here and in python, is just keep track of the next event per unit [21:13] fwereade__: Next event, or next eventS [21:13] niemeyer, and I'm not sure we can reduce that to just the next event without dropping considerations of order [21:13] ? [21:13] niemeyer, next event singular [21:13] niemeyer, it collapses operations aggressively [21:15] niemeyer, if there's more than one queued event per unit we've done it wrong ;) [21:15] fwereade__: Is there ever a case where we have more than one pending operation? [21:15] niemeyer, joined/changed but that's handled internally [21:15] fwereade__: Okay, cool [21:15] fwereade__: So maybe it's already optimal [21:16] niemeyer, well, it's surely not *optimal*, I definitely intend to look into a couple of rog's less surprising ideas [21:16] fwereade__: That's really it. I wasn't sure if there was anything to adapt, but I had this thing in my head that I pondered if could be useful or not [21:16] niemeyer, I'm pretty sure it's a little nicer than the python [21:17] niemeyer, but I actually recommend you not worry about that review until tomorrow [21:17] niemeyer, because it *might* get much better [21:17] fwereade__: Ohh, exciting :-) [21:17] niemeyer, and even if it doesn't, a day's delay won't hurt :) [21:18] fwereade__: True [21:18] fwereade__: Have plenty to do on the config side [21:18] fwereade__: I'm hoping to push the whole thing for review today still [21:18] niemeyer, I confess to being a little relieved that you haven't done the whole config thing in 5 minutes flat ;) [21:19] fwereade__: Haha :) [21:19] fwereade__: I'm taking the chance to clean up a few other edges too, such as bring into life that old StringMap type [21:19] niemeyer, I do have one worry about the relation hooks, though [21:20] niemeyer, nice [21:20] fwereade__: I'll also twist Schema so it handles defaults by itself [21:20] niemeyer, +1 [21:20] fwereade__: Oh, what's that? [21:20] niemeyer, you remember we discussed sending settings through that whole pipeline rather than just versions? [21:21] fwereade__: Right [21:21] niemeyer, I'm not sure we actually really gain very much from that [21:21] fwereade__: Hmm [21:21] fwereade__: Why's that? [21:21] niemeyer, because I can't see any way around us *sometimes* needing to access settings from departed units [21:22] fwereade__: I'm missing both ends of the problem I think [21:22] niemeyer, ok, simplest possible case [21:22] niemeyer, two units depart at once [21:22] fwereade__: Ok [21:24] niemeyer, we run hooks for the first one, claiming the second one is still part of the relation, and so it's perfectly legitimate for the hook to go through the membership and do something based on the settings for the departed unit that it doesn't yet know is departed [21:24] niemeyer, with settings cached in memory this is nice and easy [21:24] fwereade__: I actually think we should allow units to query settings for departed units [21:25] fwereade__: But not sure if that's relevant for the problem you're explaining [21:25] niemeyer, I think it's very relevant [21:26] niemeyer, the idea that we then can't really GC any relation settings node until the whole relation is down bothers me a little, because it feels somehow untidy [21:26] niemeyer, *but* I don;t think there's any way past that [21:27] niemeyer, excpet to cache the whole damn relation state to disk all the time, which is just too much to deal with [21:27] fwereade__: It would not solve either way.. the unit could have changed its settings before departure [21:27] fwereade__: I think it's fine to leave the garbage uncollected for a while [21:27] niemeyer, true [21:28] fwereade__: We can GC it eventually in the corrective agent hinted at over the lifecycle conversation [21:28] niemeyer, yeah, I'm happy enough not worrying about that really, it hasn't seemed to hurt in python too much [21:29] fwereade__: Okay, but what's that issue that makes sending settings with the version non-useful? [21:29] niemeyer, ok, so, we end up with a situation where any relation hook might relation-get some departed unit (whether or not it is known to be departed according to the "current" membership) [21:30] niemeyer, so we need both the get-arbitrary-settings code that we currently have in python *and* the use-cached-settings code we're working towards here [21:31] fwereade__: Okay, and you feel the get-arbitrary-settings always would be simpler [21:31] niemeyer, and I'm not really sure that the somewhat reduced network access is really worth the extra code plus memory budget [21:31] niemeyer, I don't feel sure either way [21:32] fwereade__: I'm happy to go with whatever you feel would result in less code and less mental trickiness [21:32] niemeyer, but using sheer projected simplicity of code as a heuristic I tend *slightly* towards the always-get-arbitrary [21:32] niemeyer, I don;t think I want to toss the caching *yet* [21:32] niemeyer, but consider this early warning that I *might* [21:33] fwereade__: Thanks for the hinting, simplicity rocks :) [21:33] niemeyer, and I'm pleased to hear you're ok exploring the alternatives if it does start to bother me seriously ;) [21:34] niemeyer, anyway, I think I'm going to try to rework the RelationUnit tests in terms of Join, Watch and Settings [21:35] fwereade__: Superb [21:35] fwereade__: Btw, [21:35] fwereade__: While you're there, [21:35] niemeyer, if you happen to see a propose of that stuff land while you're still around, it would be higher-value than the hook queue for now :) [21:36] fwereade__: When looking at the tests I was vaguely considering how tricky it would be to break those tests down into independent tests, or even a table-styled test that used "scripting" rather than that long sequence of contiguous operations [21:37] niemeyer, I think that they will get a *lot* cleaner now that the functionality isn't all hidden in one magic type with freaky action-at-a-distance [21:37] fwereade__: I wouldn't be specially bad if we had to merge them in that long chain, because they're quite well documented (thanks) and sensible [21:38] fwereade__: Having them independent would just be a nice bonus [21:38] niemeyer, cheers :) [21:39] niemeyer, (well, action at a distance is reasonable and expected, it's the bidirectional action at a distance that makes it a little tricky to follow I think) [21:39] fwereade__: Yeah [21:40] niemeyer, btw, re the principalKey change [21:40] fwereade__: Ah, yeah? [21:41] niemeyer, the reason to have an empty principal denoting unit-is-principal it the *topology* is because topology units don't know their own keys [21:42] niemeyer, so, with *just* a topology unit, we can't immediately tell whether its principal differs from itself [21:42] fwereade__: Interestingly, it's not the only reason [21:43] fwereade__: We've debated this very recently in the context of mstate [21:43] niemeyer, ah, ok, that is the reason that made me feel it was more trouble than it was worth to change it [21:43] niemeyer, go on [21:43] fwereade__: myKey == principalKey requires a full scan [21:43] fwereade__: principalKey == "" is indexable [21:43] niemeyer, ah, ys [21:44] fwereade__: Aram had suggested the same change you suggested in the database side, and we ended up rolling back because of that [21:45] niemeyer, I looked at mstate briefly, and it didn't seem to need any changes to work nicely as is [21:45] fwereade__: For that reason, I suggested keeping principalKey in the *Unit as "" too [21:45] niemeyer, ok, that SGTM [21:45] fwereade__: Simply because it's easier to keep in mind what to expect in the principalKey field [21:45] niemeyer, yeah, very sensible [21:45] niemeyer, ok, I'll do that now :) [21:46] fwereade__: Cheers! [21:55] morning davecheney [21:56] ok, this one is starting to piss me off: [21:56] ---------------------------------------------------------------------- [21:56] FAIL: watcher_test.go:44: WatcherSuite.TestContentWatcher [21:56] test 0 [21:56] watcher_test.go:55: [21:56] c.Fatalf("didn't get change: %#v", test.want) [21:56] ... Error: didn't get change: watcher.ContentChange{Exists:false, Version:0, Content:""} [21:56] OOPS: 3 passed, 1 FAILED [21:56] --- FAIL: TestPackage (6.43 seconds) [21:56] FAIL [21:57] I've seen it 3 times today, and it disappears like mist in the sun when I try to look for it :/ [21:58] * niemeyer looks at the test [21:59] fwereade__: 200ms seems on the short side [22:00] niemeyer, yeah, I suspect that's it, but it is one of those ones that bothers me because it only seems to show up in a full test run [22:00] niemeyer, on its own it seems 100% reliable [22:00] fwereade__: Full test run == JVM on ZK with data, Go runtime with data, etc [22:00] fwereade__: Sum up two GCs, plus the stars in the right location.. [22:00] niemeyer, yeah, indeed [22:01] fwereade__: It's the first test, interestingly [22:01] niemeyer, I just like to imagine that if I run it alone often enough the stars should eventually align right [22:01] fwereade__: I'll have a run through the code just in case [22:02] niemeyer, before you get too deep, I'm just proposing unit-principal-key again [22:03] niemeyer, https://codereview.appspot.com/6421049 [22:03] fwereade__: Looking [22:03] niemeyer, and if you decide the ContentWatcher test just needs a timeout bump, please assume a pre-emptive LGTM on that :) [22:03] fwereade__: LGTM! [22:03] niemeyer, cheers [22:03] fwereade__: Thanks :) [22:05] fwereade__: We should be printing the watcher Err() with those messages [22:05] niemeyer, good point [22:11] fwereade__: Looks quite straightforwardly correct on the ContentWatch side at least [22:11] fwereade__: I was just pondering about something too [22:11] fwereade__: Do you have an SSD? [22:11] niemeyer, yeah, it seemed sane to me [22:11] niemeyer, nope :) [22:12] fwereade__: Yeah [22:12] fwereade__: That may be it [22:12] fwereade__: We'll get the "All good!" from ZK before it's actually ready to take action [22:12] niemeyer, ahhhh, I did not know that [22:12] thanks zookeeper :/ [22:12] fwereade__: The fact it's the first test seems to point in that direction [22:13] niemeyer, I have never seen it on any test other than the first [22:13] fwereade__: If zk was on the way to attending your request, the After() ticker would already be running [22:13] fwereade__: It also explains why repeating rarely catches the bug [22:14] fwereade__: At this point it's all cached [22:14] niemeyer, yeah, sounds pretty plausible to me [22:14] niemeyer, timeout it is then :) [22:14] fwereade__: I bet that if you flush disk and memory buffers, you'll be able to repeat it easily [22:16] Try this: sync; echo 3 | sudo tee /proc/sys/vm/drop_caches [22:16] fwereade__: and then run the tests again [22:19] fwereade__: Btw, do run sync first :) [22:19] fwereade__: This isn't entirely safe [22:20] niemeyer, I was reading around it and decided I could probably live with the risks [22:20] fwereade__: LOL [22:22] niemeyer, not reproed yet, but it does seem to goose the occasional ssh and store tests into failure pretty well [22:23] niemeyer, I'm comfortable bumping the timeout and seeing whether it happens again :) [22:23] fwereade__: Sounds good :) [22:30] All those "ok" in go test ./... feel so great :-) [22:30] fwereade__: Are you up for a quick review? [22:30] niemeyer, sure [22:30] fwereade__: You already reviewed most of it before, actually.. I'm just reviving it [22:31] niemeyer, even better ;) [22:32] lbox propose churning [22:33] Aaaaaaand [22:34] fwereade__: https://codereview.appspot.com/6423062 [22:34] * fwereade__ looks [22:45] niemeyer, LGTM [22:45] fwereade__: Cheers! [22:45] fwereade__: Will add default handling now [22:47] niemeyer: thanks for your review [22:47] davecheney: np! [22:47] davecheney: Morning! [22:48] i will have to repropose it as i've back ported from my next branch [22:48] (as it addressed some of the things you raised) [22:48] but the change is quite large [22:48] well, more than a LGTM will cover [22:53] davecheney: I'm happy for this branch to go in as-is and these changes be done in the follow up [22:54] niemeyer: i'd like to spend a little more time on it [22:54] obtained string = "error: json: unsupported type: map[int]interface {}\n" [22:54] to make sure it's a good skelton to build upon [22:54] davecheney: Aha, ok [23:12] niemeyer: json doesn't support map[int]interface{}, keys must be strings ... :( [23:12] davecheney: Indeed [23:12] davecheney: Do we trust that? [23:12] s/trust/depend on/ [23:12] niemeyer: possibly [23:12] davecheney: We shouldn't [23:13] davecheney: The json spec itself forbids anything else but strings [23:13] davecheney: So if we're doing that, it's fine to fix it [23:13] wonderful [23:14] as long as yaml formats a string key without quotes, this will be fine [23:20] niemeyer: would you recommend I do with structs to describe the status data, or nested maps ? [23:21] /s/do/go [23:21] davecheney: I'd personally prefer structs, unless the dynamism of fields being present or not turns out to make it more messy than anticipated [23:22] i'm hoping ,omitempty will get us most of the way [23:22] davecheney: The struct gives a very nice descriptive view in one location of what the map is composed of [23:22] also, for test stability, structs are probably necessary [23:23] however, the names like Relation, Result are already declared in the cmd pkg's namespace [23:25] davecheney: Indeed. Might be worth having a nice prefix on all of hem [23:25] them [23:25] Dinner.. biab [23:25] kk [23:39] niemeyer: structs won't work, the yaml output expects the output of each machine to be a map [23:39] so machines => map[int]map[string]string [23:44] davecheney: Oops [23:45] davecheney: Ok [23:45] it's not the end of the world [23:46] niemeyer: some things can be structs, like Service, Unit, Relation, but Machine must be a map to match the python output [23:47] davecheney: Hmm [23:48] davecheney: It can be a map like map[int]Foo, though [23:48] niemeyer: yes [23:52] fwereade__: ping