[00:35] thumper: re last nights discussion of pacakging [00:36] can you outline the concerns of the packager's ? [00:36] primarily it seemed to be that we weren't following company standards [00:36] of that I have no doubt [00:37] but that isn't very actionable [00:38] i'm guessing the lack of build reproducability is a red flag [00:38] I'm going to email people concerned to see if we can get help from people who do know the standards [00:38] awesome [00:38] one guy is even in sydney, so pairing might be good if you are interested [00:39] sure [00:39] I'll keep you in the email loop [02:14] davecheney: can I get you to merge that packaging branch of mine? then I can move the card to done [02:22] thumper: how was your foray into packaging? [02:22] bigjools: brief [02:25] thumper: i have some questions about that change [02:26] davecheney: shoot [02:26] see review [02:27] kk [02:39] davecheney: replied [02:39] * thumper is EODing early due to overly long meeting last night [02:41] * thumper will look back in later [02:41] kk [02:42] thumper: fair enough, that makes sense [03:25] Anyone else getting this build error? go/src/launchpad.net/juju-core/cmd/juju/constraints.go:66: undefined: results [03:25] Looks like a :=/= typo. [03:25] * thumper tries [03:25] jtv: which revno ? [03:26] The latest. [03:26] I'm also getting this one, which I guess is good news: [03:26] constraints_test.go:283: [03:26] // A failure here indicates that goyaml bug lp:1132537 is fixed; please [03:26] // delete this test and uncomment the flagged constraintsRoundtripTests. [03:26] +1 for being unhelpful :) [03:26] +1 for getting that [03:26] davecheney: I get it with tip [03:26] jtv: yes [03:26] confirmed, # launchpad.net/juju-core/cmd/juju [03:26] cmd/juju/constraints.go:66: undefined: results [03:26] cmd/juju/constraints.go:66: cannot assign to results [03:26] cmd/juju/constraints.go:67: undefined: results [03:26] will submit a fix [03:26] Ah thanks. [03:26] plz fix [03:26] what the fuck people [03:27] we just talked about this last night [03:27] it worked on my commit [03:27] yep, last commit added that [03:27] * davecheney sends a shittyram [03:28] gram? [03:28] jtv: perhaps he is sending it to mramm [03:28] or he is sending a pooey male sheep [03:28] thumper: that's why I asked :-P [03:29] Confucius say, man-sheep eat from curry tree, he be shitty ram for a day [03:29] sorry, type [03:29] typo [03:29] i meant shittyHAM [03:29] Oh! [03:29] Why didn't you SAY so [03:29] it's my accent [03:29] davecheney: I think the whole = / := thing is horrible [03:29] and needs a rethink [03:29] so many problems come about from it [03:30] FWIW I guess you _can_ create this kind of bug just from concurrent commits. [03:30] The shadowing is really horrible though. Makes you wonder what's so horrible about a mere unused variable. [03:30] foo, err := bar() [03:30] if err != nil { panic(err) } [03:31] if foo { [03:31] sklurg, err := szot() [03:31] Oops now I've created a second err. [03:31] jtv: I don't think you have [03:32] That's what I thought. [03:32] It'll shadow quietly. [03:32] it isn't shadowed [03:32] it is using the same err [03:32] a new sklurg is created though [03:33] hmm, inside the if though, new scope and all... [03:33] I've not tested... [03:33] I don't actually know what is happening there [03:33] perhaps davecheney does [03:33] davecheney: knows what ? [03:34] thumper: I do know _this_ happens: http://play.golang.org/p/uV__rKVATq [03:35] Woo! Test suite completed with just 1 compile error and 1 test failure. [03:35] davecheney: you have a local packaging branch...? [03:35] davecheney: https://code.launchpad.net/~thumper/juju-core/package/+merge/157020 has instructions at the top [03:35] davecheney: bzr merge ... [03:35] davecheney: bzr commit... bzr push [03:35] davecheney: done [03:35] thumper: oh, ok, it's that simple [03:35] * davecheney wonders what lbox does ... [03:35] davecheney: previously was referring to jtv's comment above about errors [03:36] davecheney: lbox does that too, but slowly [03:36] and with extra checking [03:36] and many round trips [03:38] thumper: are you talking about shadowing ? [03:38] davecheney: yeah, is the err a shadow or using the one from the previous scope? [03:38] it is what it is [03:38] it can't be changed now, it's in the spec [03:41] thumper: just tried the sklurg/err thing and yes, it creates a new sklurg *and* a new err. [03:47] looks like others feel the same, https://twitter.com/GolangSucks/status/319999909866651648 [03:48] jtv: is that you @GolangSucks ? [03:59] thumper: nope [03:59] Bit strong for me... You know what I'm like. I'm trying to see the good things. :) [04:01] But I do notice a surprising similarity. "for item := range x counts x. Unless it's a channel in which case it returns values." Pretty much a literal match with something I wrote. [04:01] The unnecessary choices are something I mentioned as well. Lots of make-work. [05:00] Getting a compile error from trunk: http://paste.ubuntu.com/5678748/ [05:01] bigjools: probably got versoin skew with openstack [05:01] davecheney: oh I assumed a go get -u would update that? [05:01] try go get -u launchpad.net/goose/... [05:01] it might [05:02] as in, I did a get -u on juju-core [05:02] ohhh, i wouldn't do that [05:02] it'll totally screw your bzr checkout [05:02] wha? [05:03] go get -u doesn't understand cobzr branches [05:03] I don't use cobzr [05:03] fair enough [05:03] that's the least of its problems anyway, I aliased my branch command so it doesn't use trees [05:04] which breaks go get [05:04] bigjools: probably easier to skip it and just to a bzr pull [05:04] yup! [05:04] anyway - still got that compile error [05:04] after goose updated [05:05] I am on raring if that makes any difference [05:07] bigjools: pls hold [05:07] davecheney: hlding :) [05:07] and holding [05:08] bigjools: fastest way to a fix [05:09] rm -rf $GOPATH/src/launchpad.net/goose [05:09] go get launcpad.net/goose/... [05:09] or whatever method you want to use to manage your source [05:09] ok [05:09] oh did it move? [05:10] it did not move [05:10] actually [05:10] that isn't strictly true [05:10] quickest way to fix that is: [05:10] bzr pull --remember lp:goose [05:11] aaaand done. Loads of updates, crikey. [05:11] someone fucked a lot of things up when the kicked us all out of the ~gophers gropu [05:11] ok new errors [05:11] canonical/GO/src/launchpad.net/juju-core/cmd/juju/constraints.go:66: undefined: results [05:12] fix proposed [05:12] ah is this what you were talking about jtv? [05:13] https://codereview.appspot.com/8370047/ [05:13] yup [05:15] davecheney: given that it's blocking everyone, are you going to land it right away or do you have to wait for 2 pairs of eyes? [05:16] bigjools: that is all the enchouragement I need :) [05:16] davecheney: that was the intention :) [05:16] *cough* [05:20] comitted [05:21] davecheney: compiles \o/ [05:52] and the raring mongo seems to have ssl support \o/ [05:54] bigjools: where is the lp project for mongo ? [05:54] ~mongodb looks stale [05:54] davecheney: not sure there is one - I am just using the latest package, I think jamespage packaged it [05:54] now, I am getting loads of test failures :/ [05:55] [LOG] 92.72463 ERROR api: error receiving request: read tcp 127.0.0.1:38922: use of closed network connection [05:55] all over [05:55] bigjools: id suggest that ssl doesn't work [05:56] are you on raring? [05:57] nope [06:21] davecheney: I think raring's mongo (2.2.4?) is incompatible with the current juju [06:22] it definitely has ssl support [06:24] thumper: you're on raring? [06:31] hmm I have it working on a fresh raring VM. I wonder what's up with my local env... [06:45] mornin' all! [06:46] evening [06:47] morning [07:17] error: Failed to load data for branch bzr+ssh://bazaar.launchpad.net/~dave-cheney/juju-core/111-add-go-build-check-to-lbox/: Server returned 502 and body: [07:17] [07:17] thanks for coming in today, launchpad [07:33] * fwereade_ -> shops, bbiab [09:07] rogpeppe: ping [09:07] dimitern: pong [09:07] * fwereade_ slept badly, and is more than usually stupid today, and is going to lie down for a bit [09:08] rogpeppe: hey, let me pass an idea by [09:08] dimitern: ok [09:08] rogpeppe: how about adding Id() to AgentState, so the jujud/agent can find out which entity id it's associated with and check if it's sane, and then set the password? [09:10] * rogpeppe goes to look [09:11] dimitern: i'm not sure i understand the motivation. [09:11] rogpeppe: it seems all implementations, except the upgrader are backed by a state object - machine or unit [09:11] dimitern: the code that sets the password first gets the entity [09:11] dimitern: there's already a method, Entity, to do that [09:11] rogpeppe: ah! [09:12] dimitern: if we make the Entity code return an error if something about it is bogus, then you've got what you want, no? [09:12] dimitern: that's what i was suggesting last night [09:12] rogpeppe: exactly [09:13] rogpeppe: for example how is a machine taken as an entity? [09:13] dimitern: i'm not sure i understand the question [09:14] rogpeppe: trying to find the Entity interface [09:14] dimitern: there's no Entity interface [09:14] dimitern: Entity returns an AgentState interface value [09:14] rogpeppe: ok, but where's the Id()? [09:14] rogpeppe: it's not in the AgentState [09:15] dimitern: Entity is implemented by the agent in question. [09:15] rogpeppe: and all I get back is an AgentState - should I cast it to something to get its Id? [09:15] dimitern: so we have MachineAgent.Entity and UnitAgent.Entity [09:15] dimitern: each of those knows the id [09:15] rogpeppe: ok, got you - you mean only return the entity if it's sane [09:15] dimitern: for instance, look at the implementation of MachineAgent.Entity [09:16] dimitern: yes [09:16] dimitern: there's already an error return [09:16] rogpeppe: that should work, cheers [09:16] dimitern: cool [09:26] dimitern: you might find this quite cool: i've been trying to isolate a go runtime bug that occurs when running the uniter tests against go tip. i wanted to eliminate the possibility that it was a problem with our use of unsafe and cgo (goyaml uses cgo), so i mocked up goyaml to make rpc calls instead. here's the "mock" goyaml implementation: lp:~rogpeppe/+junk/yamlserver [09:26] dimitern: it worked - i got uniter tests passing when using it, and i'm still seeing the runtime crashes. [09:27] rogpeppe: cool, I'll take a look [09:28] dimitern: i think it's nice that i can swap out an entire implementation like that and almost nothing needed to change in any of the other code (i just needed to change the uniter tests to register the expected types) [09:28] rogpeppe: so it uses goyaml on the rpc server [09:29] dimitern: yeah [09:29] rogpeppe: and the crash in uniter tests is no longer there? [09:29] dimitern: no, it still crashes [09:29] dimitern: which is what i was hoping [09:29] rogpeppe: so it's not our code then [09:29] dimitern: because it means it's (almost certainly) not a problem with our code [09:29] dimitern: yeah [09:30] rogpeppe: good to hear, but any idea is it in goyaml or is it go-tip related bug? [09:30] dimitern: it's a go-tip related bug. probably the new garbage collector. [09:31] dimitern: but perhaps the new scheduler. [09:31] dimitern: http://code.google.com/p/go/issues/detail?id=5193 [09:31] rogpeppe: hmm.. it seems the closer we get to 1.1 release more nasty bugs appear [09:31] dimitern: they've shoved a lot of stuff in quite recently. [09:31] dimitern: i'm very concerned too. [09:34] rogpeppe: it seems at least they're looking into it, it's not just collecting dust in the issues list [09:34] dimitern: yeah, they're concerned too [09:40] rogpeppe: well, there are a lot of smart people there, so there's hope of fixing it [09:40] dimitern: yeah. it would be nice if i could replicate the problem with less than x00,000 lines of source code :-) [09:41] rogpeppe: definitely :) [09:42] dimitern: unfortunately the problem is so intermittent, that's not easy [09:47] rogpeppe: how often can you reproduce it? [09:47] dimitern: about once in every 3 or 4 runs [09:48] dimitern: it dies in quite a few different ways [09:49] rogpeppe: bugger [09:50] dimitern: see the collection of panics in my last message on the above issue [09:50] rogpeppe: in random places of the code or at least these are often the same? [09:50] rogpeppe: ah, ok [09:54] rogpeppe: nasty.. they're in at least 3 different places [09:55] dimitern: yeah. i'm pretty sure they're all symptomatic of something else (for instance a race or other bug in the garbage collector) [09:55] dimitern: i see other problems too which aren't panics, but are symptomatic of memory corruption [09:56] rogpeppe: so you nailed it down to r16008 [09:56] (the link is broken there btw) [09:57] dimitern: i'm not sure. [09:57] dimitern: that might just be the change that made the garbage collector bug show up [09:57] dimitern: because 16008 is when the new scheduler was introduced [09:59] rogpeppe: in can be in either the new scheduler or the GC, both nasty internals [09:59] rogpeppe: oh well, we'll see how it goes [09:59] dimitern: yeah [10:08] rogpeppe: so if a machine doesn't have instid, is it correct to call EnsureDead() before returning it from the Entity()? That way it'll be detected in openState and the worker will die properly [10:09] dimitern, hell no :) [10:09] fwereade_: why? [10:09] dimitern, we don't want the machine to die -- there'll be another agent along soon to do the right thing [10:09] dimitern, we just want the bad agent to go away and not come back :) [10:10] fwereade_: exactly [10:10] fwereade_: so we want the instance dead, not the machine [10:10] dimitern, yeah, exactly [10:10] dimitern: and the agent will die immediately anyway [10:10] rogpeppe: not really [10:10] dimitern, sorry about the terminology, it is annoying ;) [10:10] rogpeppe: it'll die (now) iff the machine is missing or dead [10:10] dimitern: because the run loop will terminate [10:11] dimitern: ah, you're right - it'll keep on trying [10:11] dimitern, rogpeppe: how would you feel about just writing a flag somewhere local saying "stopped" or something [10:11] rogpeppe: I'll have to introduce a specific ErrUnprovisioned for example, and use that to amend the cases in which the worker dies [10:11] fwereade_: i really don't care about this case [10:12] fwereade_: it's a ghost machine [10:12] fwereade_: i don't care if it carries on trying [10:12] a ghost *instance*, rather [10:12] rogpeppe: but it'll be nicer to finish off the rouge MA as early as possible in this case, isn't it? [10:12] rogpeppe, so long as we return nil from Run it'll only retry once per restart anyway I guess [10:12] rogpeppe, but wait [10:13] dimitern: i don't want to introduce more logic for this case which is actually vanishingly unlikely [10:13] rogpeppe, if we *don't* mark it stopped, then a badly-timed restart of its instance can fuck up the other machine, I think [10:13] fwereade_: how so? [10:14] fwereade_, rogpeppe: how about that specific error (ErrUnprovisioned) or something? [10:14] (returned from MA.Entity) [10:15] or maybe (nastier) return NotFoundf(), although it's clearly a lie, but it'll only affect the agent [10:15] rogpeppe, agreed vanshingly unlikely, but: (1) bad machine logs in with old password (2) provisioner sets password for good machine and starts an instance (3) bad machine changes password and starts running [10:15] rogpeppe, (4) provisioner takes down bad machine (5) good machine can't run [10:16] fwereade_: why is this at all so unlikely? [10:16] dimitern, it requires pathologically bad timing [10:16] fwereade_: given infinite time and scenarios, everything is possible :) [10:17] dimitern, exactly so [10:17] dimitern, that is why I would prefer a little tweak to the agent running [10:18] dimitern, first thing an agent should do is check its conf for a stopped flag; last thing it should do if it's returning nil from Run is to set the stopped flag [10:18] fwereade_: why would the bad machine ever get as far as changing the password? [10:18] rogpeppe: this happens first thing after running the MA, right? [10:18] rogpeppe, because the provisioner set one at the worst possible time... say during a 5-second t1.micro sleep on the bad machine [10:19] fwereade_: the bad machine doesn't care - it looks at the instance id and sees that it doesn't match its own instance [10:19] fwereade_: i don't see that there's any way it can get past that barrier [10:19] rogpeppe, how's it meant to find that out? we're planning to drop InstanceId from EnvironProvider [10:19] rogpeppe: it cannot know it's own instance id, only the machine id [10:19] rogpeppe: it can check is it "" or not [10:20] fwereade_: orly? [10:20] rogpeppe, yesrly, it's caused problems for both maas and openstack [10:20] fwereade_: so how do we deal with the "provisioner unprovisions itself" problem? [10:20] rogpeppe, and there's no reason for it except to prevent the provisioner from screwing itself [10:21] rogpeppe, so, we special-case in the provisioner instead [10:21] fwereade_: I think MAAS guys figured it out, and in openstack it can be done through the storage [10:21] dimitern, I know it can be done [10:21] fwereade_: how does that work? [10:21] dimitern, but there is a different shitty wa of doing it in each case [10:22] rogpeppe, if there's only one machine in state, grab the instance id from provider state... done. right? [10:23] dimitern, note that the put-it-in-environ-state works for the bootstrap machine only -- it won't let us get an InstanceId on an arbitrary machine [10:23] * dimitern keeps waiting for an answer on errUnprovisioned (locally in jujud only) [10:23] fwereade_: assuming the machine doesn't come up really fast, i suppose :-) [10:24] dimitern: we're getting there :-) [10:24] :) [10:24] rogpeppe, even if it does, we fail out and get restarted in 5s [10:24] fwereade_: no, if it does, we unprovision ourselves [10:24] fwereade_: or... no, i see [10:25] rogpeppe, this would happen before we do anything else, so I think it's safe [10:25] fwereade_: i'm not sure you can guarantee it happens before we do anything else [10:26] fwereade_: the provisioner is just another API client, right? [10:26] rogpeppe, what's affected by this other than the provisioner? [10:26] fwereade_: someone external could jump in before the provisioner and add another machine [10:26] rogpeppe, ah! [10:26] rogpeppe, ok, so we special-case it to be machine 0 then ;p [10:27] fwereade_: how about HA state then? [10:27] dimitern: HA state is fine - we only have this problem at bootstrap [10:28] dimitern, HA state is orthogonal, I still plan to bootstrap one machine and add HA later [10:28] ah, ok then [10:28] dimitern, even if it's part of jujud bootstrap itself to add N more machines to state directly [10:28] fwereade_: ah, i've got an idea [10:29] fwereade_: the bootstrap init code marks machine 0 with an instance-id "pending" [10:29] fwereade_: when the provisioner comes up, if it sees machine 0 with "pending" instance id, it fetches it from the provider [10:30] fwereade_: otherwise i don't see how the machine-0 special casing works [10:31] rogpeppe, not sure about a magic value in that field, I'd kinda rather it be separate [10:32] * dimitern still doesn't quite get why we need to special-case machine-0 [10:32] dimitern: because machine-0 is the *only* place that haven't got the state to write the new instance id to [10:32] dimitern, we can't always know machine 0's instance id before we provision it [10:32] dimitern, and we can't write it to state just after we provisioned it, because there's no state yet [10:33] fwereade_: yeah, an extra field in Machine would work as well [10:33] oh, I see now [10:33] rogpeppe, I'm wondering whether it's actually a property of the environment [10:34] fwereade_: oh, doh! [10:34] rogpeppe, Environment.BootstrapComplete() (bool, error) ? [10:34] fwereade_: it's something that can be done by bootstrap-init [10:34] fwereade_: the provisioner doesn't need to know anything about it [10:35] rogpeppe, hmm [10:35] fwereade_: i mean bootstrap-state, of course [10:35] rogpeppe, yeah, that's right [10:35] rogpeppe, oh no wait [10:36] rogpeppe, we need the user to have connected once before we have the keys that let us find out the instance id [10:36] * dimitern goes for a run until the argument settles, bbi30m [10:36] rogpeppe, that doesn't matter now [10:36] :) [10:37] fwereade_: hmm, i suppose so. unless the provider can make it available to anyone [10:37] rogpeppe, but might it if the CLI were doing an API connection? [10:38] rogpeppe, I think it's ok to put it in the provisioner anyway -- the locality of reference would help, I think [10:38] fwereade_: the code would be much simpler if it could go in bootstrap-state, but i'd forgotten about the provisioner keys issue [10:39] rogpeppe, I know it would :( [10:39] rogpeppe, but I honestly think it's not too bad [10:39] rogpeppe, it exists to fix the provisioner [10:39] fwereade_: yeah, but the muck spreads into quite a few places [10:40] fwereade_: perhaps we change Machine.InstanceId to return, instead of a bool, one of InstanceIdNotSet, InstanceIdOk or InstanceIdPending, rather than add another bool. [10:41] fwereade_: then if the provisioner finds machine 0 with InstanceIdPending, it knows where to look [10:41] fwereade_: and it can then fill in the instance id and carry on as normal [10:42] rogpeppe, it feels quite localized to me anyway though [10:42] fwereade_: well, it affects the provisioners and the state as well as the provisioner, so not *very* localized :-) [10:43] s/provisioners/providers/ [10:43] rogpeppe, I don't think it even has to affect the providers [10:43] fwereade_: don't they have to be able to return the instance id for machine 0 ? [10:44] rogpeppe, they all have to list instances on demand, and those instances have an InstanceId method [10:44] fwereade_, rogpeppe sorry to interupt, I'm still having problems connecting to the api https://pastebin.canonical.com/88504 [10:44] fwereade_: that doesn't help [10:44] rogpeppe, meaning that we only have to worry about a single mechanism for determining instance id [10:45] fwereade_: we need to know which of those instances corresponds to machine 0's instance [10:45] mattyw: OpenID discovery error: HTTP Response status from identity URL host is not 200. Got status 503 [10:45] rogpeppe, how are those other instances going to get started without a provisioner? :) [10:45] fwereade_: bootstrap [10:45] fwereade_: and possibly by a previous environment [10:47] fwereade_: if we could assume that we'd only ever get one instance after bootstrap, the problem would be much easier [10:47] rogpeppe, do you mean a previous environment that has not been properly shut down, or just an eventual-consistency phantom? [10:47] fwereade_: the former. [10:47] rogpeppe, screw that, user error [10:47] rogpeppe, if destroy-environment failed, run it again until it succeeds [10:48] fwereade_: but eventual consistency could make a properly shut down environment show the same symptom [10:48] aaaaaargh (facepalm) [10:50] found the failure, only had to move three loc a bit. didn't seen which value differs otherwise in all that debugging output. (hmpf) [10:52] fwereade_: the scenario that concerns me is when, somehow, the token that signifies that an environment has been bootstrapped has gone (perhaps someone removed all their buckets), so we bootstrap and find all the previous instances. [10:52] rogpeppe, to be utterly explicit, this is when the env reports one terminated machine to be running and one running machine not to exist, right? [10:53] fwereade_: no, it's when the env reports more than one running instance, only one of which is the bootstrapped instance. [10:53] fwereade_, rogpeppe https://pastebin.canonical.com/88504/ should be working again now - it's trouble connecting to the api [10:53] rogpeppe, meh, fail out when len(instances) != 1 [10:54] fwereade_: that is a possibility, yeah. we could mark the bootstrap machine with an error status in that case. [10:55] rogpeppe, not even that [10:55] rogpeppe, just try again later [10:56] fwereade_: the user needs at least some feedback, otherwise they'd have no idea why "juju deploy" isn't working. [10:56] fwereade_: or... just don't unprovision any machines in that case, i guess [10:57] rogpeppe, but wait I'm being stupid [10:57] rogpeppe, we *always* set the instance id in the provider storage anyway [10:57] fwereade_: yes [10:57] rogpeppe, provisioner, as soon as it has keys, calls ensureBootstrapComplete() [10:57] fwereade_: well, in the existing providers [10:58] fwereade_: we don't make it available though [10:58] rogpeppe, agreed that without storage we will have a hard time [10:58] fwereade_: we had plans to move away from that storage in ec2 [10:58] fwereade_: and use instance tags instead [10:58] fwereade_: ah! [10:58] rogpeppe, well, that would be an alternative solution [10:59] fwereade_: this is a good reason why generating the env UUID on the client side would be good [10:59] rogpeppe, tagging instances with uuid? yeah [10:59] fwereade_: yeah [10:59] rogpeppe, it feels to me like storage is the tool we have available now though [11:00] fwereade_: that's true. but i'd like our solution to work with tags too. [11:00] mattyw: looking [11:01] rogpeppe, something funny about `invalid entity name ""` given the `info.Tag = "user-admin"` [11:01] fwereade_: that's what i'm thinking [11:01] mattyw: is this on go trunk? [11:02] mattyw: i mean juju-core tip! [11:03] rogpeppe, no, it's 2 or 3 days old [11:03] mattyw: what revid? [11:03] rogpeppe, 1060 I think *double checks* [11:04] rogpeppe, just to make things worse, it's actually of our own branch https://code.launchpad.net/~cloud-green/juju-core/trunk so we had a stationary target to work on [11:05] mattyw: that's ok - i'll branch that and have a look [11:07] rogpeppe, anyway, for maximum paranoia we can wait until there is a running instance with an id that matches the one in storage [11:08] fwereade_: i think just st.Machine("0").SetInstanceId(environ.InstanceIdOfBootstrapMachine()) should be fine [11:09] rogpeppe, ok, sgtm :) [11:11] mattyw: ok, looks like you're on 1091. have you made any of your own changes to that branch? [11:11] mattyw: (if not, i'm not quite sure why you merged trunk rather than just pulling it) [11:12] rogpeppe, btw did the terminated-flag-in-conf idea seem to hold water? [11:12] fwereade_: oh yes, i was meaning to get back to that [11:12] fwereade_: i'm not sure it eliminates the race entirely [11:16] rogpeppe, hmm. "if there's an instance ID but an independent attempt to log in with my original credentials fails, it's not my instance id, so bomb out with errTerminating; otherwise go ahead and change it"? [11:17] fwereade_: sounds plausible [11:18] fwereade_: although a bit more heavyweight than i'd like [11:19] fwereade_: a simple nonce might be better [11:20] rogpeppe, chosen by provisioner, set on machine, passed through StartInstance? [11:20] fwereade_: yup [11:20] rogpeppe, nice [11:21] fwereade_: perhaps have it returned by Machine.SetInstanceId [11:21] fwereade_: which would increment it [11:21] rogpeppe, we need to pick a nonce before we start the machine [11:21] fwereade_: doh! [11:23] fwereade_: just use a random number. [11:25] fwereade_: if we get the provisioner to pick a random number n when it starts, then n.seqno actually provides useful information about which provisioner (in an HA world) provisioned the machine. [11:27] fwereade_: and it's a nice consistent link between new machine and the state. i *think* i like it. [11:27] s/new machine/new instance/ of course [11:28] rogpeppe, I don't follow the "seqno" bit, but if you mean $(provisioner-id)-$(random-number) I'm +1 [11:28] fwereade_: hmm, mind you, the initial password is a pretty good link too [11:29] rogpeppe, a prefix on the random password? that feels a bit off to me [11:29] fwereade_: i mean $(random-provisioner-id)-$(sequentially-incrementing-number) [11:29] fwereade_: no, i mean that if you manage to log in with the initial password, you've a pretty good idea that you're talking to the right state. [11:32] rogpeppe, damn lunch [11:32] fwereade_: enjoy [11:33] * dimitern is back [11:33] rogpeppe, no changes, the only reason I haven't pulled is my inexperience with bzr [11:34] mattyw: sorry, i'll take another look in a mo [11:36] dimitern: there's already a way of telling the run loop to terminate [11:36] dimitern: see isFatal in agent.go [11:37] dimitern: if you return errUnprovisioned and add errUnprovisioned to the isFatal cases, it should all just work. [11:37] dimitern: ha! [11:38] dimitern: in fact, just return &fatalError{"machine was not provisioned"} [11:38] * rogpeppe feels a little bit smug [11:39] mattyw: are you bootstrapping with the same version as you're compiling your client against? [11:39] just finished reading through the log [11:40] rogpeppe: sgtm, nicer [11:41] rogpeppe: unfortunately it doesn't work [11:42] dimitern: oh? [11:42] rogpeppe: isFatal is checked after we're in runLoop [11:43] dimitern: and? [11:43] rogpeppe: so it won't solve the issue about messing up the password by a bad machine [11:43] dimitern: the password is changed inside runOnce [11:43] dimitern: erm... i think :-) [11:44] dimitern: yeah [11:44] rogpeppe: istm it's in openState:SetMongoPassword [11:44] dimitern: yup [11:44] dimitern: and what's openState called by? [11:44] rogpeppe: by RunLoop's callback in runagentloop [11:45] dimitern: which is called? [11:45] rogpeppe: so it's too late then [11:45] dimitern: i don't think so [11:45] dimitern: runOnce calls openState [11:45] rogpeppe: why not just have Entity return a fatalError instead and add it to the conditions? [11:46] dimitern: what conditions? [11:46] if state.IsNotFound(err) || err == nil && entity.Life() == state.Dead { err = worker.ErrDead } [11:47] in openState [11:47] make that || isFatal(err) || ... and all done [11:47] dimitern: there's no point in doing that [11:48] dimitern: if the err is not nil, it returns in [11:48] s/in/it [11:48] dimitern: on the next line [11:48] rogpeppe: that's ater [11:48] rogpeppe: yeah, the first if is what i'm aiming for [11:48] dimitern: you want openState to return ErrDead? [11:49] rogpeppe: it already does, doesn't it? [11:49] dimitern: yeah, or a fatalError if Entity returns that [11:49] rogpeppe: ok, it seems I'm not explaining myself well enough [11:50] rogpeppe: err = worker.ErrDead is what I want to be the case, just like the other conditions, where the machine is missing or dead [11:50] rogpeppe: this causes the next if to return ErrDead [11:50] rogpeppe: and kills the worker before setting the password or anything bad happens [11:50] dimitern: even if you return a fatalError, the worker will be killed before that [11:51] dimitern: but... [11:51] dimitern: there's something to be said for returning ErrDead, because then the agent exits and never restarts [11:51] rogpeppe: yeah, actually.. but the test fails [11:52] rogpeppe: look at this http://paste.ubuntu.com/5679464/ [11:52] dimitern: are you saying you're returning a *fatalError from Entity and the agent isn't exiting? [11:52] rogpeppe: this test succeeds when I used errUnprovisioned and checked for it in the first if, with fatalError it fails, saying it's still running [11:52] dimitern: could you paste your Entity implementation, please? [11:54] rogpeppe: http://paste.ubuntu.com/5679472/ [11:55] rogpeppe: and now I'm getting "timed out waiting for agent to finish; stop error: " in the assert after runAgentTimeout [11:55] rogpeppe, no, I'm not bootstrapping using the same version [11:55] dimitern: hmm, that's weird. [11:56] dimitern: it *should* work. could you push your branch? [11:56] rogpeppe: ok, just a sec [11:56] mattyw: are you bootstrapping with a newer version, by any chance? [11:56] mattyw: the API has recently changed in a backwardly incompatible way [11:57] rogpeppe, I don't think so. My juju command is 1.9.12-precise-amd64 [11:57] mattyw: in general, while we're developing stuff fast, you need to run exactly the same version of client and server. [11:57] rogpeppe, ok. I'll try that [11:57] mattyw: you can use juju bootstrap --upload-tools [11:58] mattyw: but currently any charms you deploy have to be from the same series as your local machine [11:58] rogpeppe, I forget, I've had a lot of problems trying to bootstrap at all recently [11:58] mattyw: every time you see a problem, add a bug! [11:58] rogpeppe, they're the same series [11:58] rogpeppe, I do :) [11:59] rogpeppe: lp:~dimitern/+junk/machine-agent-status [11:59] mattyw: (or add to an existing bug if it seems the same) [11:59] dimitern: ta [12:00] rogpeppe, is there a version of the repo you'd recomend running against at the moment? [12:02] mattyw: i'd try and keep in sync with tip, though it's a pain [12:03] rogpeppe: hmm.. for some reason the old change got pushed, not the one with fatalError [12:03] dimitern: perhaps you didn't commit [12:03] rogpeppe: ouch :( haven't saved it in emacs [12:03] rogpeppe: sorry, let me try again [12:04] dimitern: ah, that might have something to do with it! [12:05] rogpeppe: now it works with fatalError, sorry for the noise [12:05] dimitern: np. i was a bit confused :-) [12:05] dimitern: but... [12:05] rogpeppe: the only change is I needed to change the assert to check for the error after runAgentTimeout [12:05] dimitern: having said that... [12:06] dimitern: i'm wondering if you should return ErrDead anyway [12:06] rogpeppe: maybe upstart will mess it up if it's not ErrDead and restart it? [12:06] dimitern: exactly [12:07] rogpeppe, mattyw: FWIW I landed --fake-series yesterday [12:07] rogpeppe, mattyw: it seemed to work for me [12:07] dimitern: in fact, that's precisely the behaviour we *want* when upgrading [12:07] fwereade_: \o/ [12:07] rogpeppe: ok, ErrDead it is then - directly from the Entity() [12:07] rogpeppe, mattyw: `juju bootstrap --upload-tools --fake-series precise,quantal` [12:07] rogpeppe: or fatalError + isFatal() in the if -> ErrDead? [12:08] dimitern: i think so. although it's kind of a lie. [12:08] rogpeppe, I'm -1 on reusing ErrDead [12:08] dimitern: nononono [12:08] rogpeppe, it means something specific [12:08] rogpeppe, ok thanks, what does --upload-tools actually do? [12:08] fwereade_: yes [12:08] so we're back on errUnprovisioned :) [12:09] fwereade_: i think we're just using ErrDead as a convenience [12:09] fwereade_: i think we can change all instances of ErrDead to ErrDeadOrInvalid [12:09] rogpeppe, dimitern: what's wrong with errTerminating? the top-level Run can set the conf to terminated and return nil; other conditions that indicate we should stop, like ErrDead, can do the same [12:10] rogpeppe, call it ErrTerminateAgent and I'm fine [12:10] fwereade_: what do you mean by "set the conf to terminated"? [12:10] * dimitern an there goes another 20m for 2 lines of code ;) [12:11] rogpeppe, I mean, set a flag meaning "don't even try" that we can check on startup [12:11] fwereade_: why bother? [12:11] I'm fine to call it errTerminated instead of errUnprovisioned, although the former seems not quite right - do you intend to reuse it for something else? [12:12] dimitern: i think i'd have type terminalError {err string} [12:12] dimitern: then the message printed by the agent when exiting can be accurate [12:12] rogpeppe: lethalCase{} ? :D [12:12] dimitern, I'm saying that Run at the top level should have a single error that it treats as "return nil, causing upstart to stop bothering" [12:13] fwereade_: agreed. currently that's just ErrDead. [12:13] fwereade_: we don't even have to get to run [12:13] rogpeppe, maybe we don't, let me reload state [12:13] fwereade_: but we'd change that to check for *terminalError [12:14] fwereade_: ah! it does need to check for ErrDead too [12:15] fwereade_: because that's what the workers return if the machine is dead. [12:15] rogpeppe, yeah, exactly [12:15] fwereade_: but tbh why not just reuse it? [12:15] fwereade_: we're adding more code for a marginal case [12:15] rogpeppe, I'm fine having the workers who need to return ErrTerminateAgent, I'm not so fine having an unprovisioned machine return ErrDead [12:16] fwereade_: the machine might as well be dead as far as the agent is concerned [12:16] fwereade_: ok, i'd be happy with that [12:16] * rogpeppe has lunch [12:16] rogpeppe, cool, cheers [12:18] fwereade_: how about having a isDeadOrTerminated(err) checking for both ErrDead and ErrTerminated and is it in place of err == ErrDead checks? [12:20] dimitern, I'm just saying s/ErrDead/ErrTerminateAgent/ throughout [12:21] fwereade_: actually, I can just add ErrTerminateAgent and return in from Entity, make it work like ErrDead wrt upstart and draw the line there, changing all the workers can be a follow-up [12:21] dimitern, forget ErrDead [12:21] dimitern, a single magic value meaning "stop and don't come back" is all I'm really thinking about here [12:21] dimitern, ErrDead exists, and is perfect, except it has the wrong name [12:21] fwereade_: i'm not comfortable changing something without knowing the repercussions [12:22] dimitern, fix the name and we're fine :) [12:22] fwereade_: what is ErrDead used for - in all cases? [12:22] (for workers) [12:22] dimitern, it is used, in all cases, to indicate that the running agent should exit 0 [12:23] fwereade_: well, if it's only that, the renaming makes sens [12:23] sense [12:24] dimitern, cool [12:24] sorry, I definitely didn't communicate correctly [12:24] fwereade_: I'll do it as a prereq though, since it's trivial [12:25] dimitern, sgtm -- but just to sync up, can we have a quick hangout? [12:28] rogpeppe, that seems to be working now thanks :) [12:29] mattyw: cool. sorry about that - it's been a problem for loads of people, because we're changing things in a backwardly incompatible way all the time without changing the major version number [12:29] mattyw: but that's dev versions for ya! [12:36] rogpeppe, one quick odd question [12:36] mattyw: ok [12:37] when a unit gets removed we get a unitinfo back with all the status about that unit? Is the plan to stay like that? The reason I ask is that in the pyjuju 'api' when a unit is removed we only get told the unit name of the removed unit [12:38] mattyw: yeah, it should stay like that. [12:39] mattyw: although it's kind of an unintended consequence of the way things are implemented [12:39] rogpeppe, ok thanks, some of my favourite consequences are the unintended ones [12:40] rogpeppe, once the api ships I guess any change we'll get told about? [12:40] mattyw: yup [12:40] rogpeppe, that's fine for me. When vds and I have a few more tests It would be good if you could take a look, will probably be next week though === jam1 is now known as jam [12:42] mattyw: sure === BradCrittenden is now known as bac [13:44] fwereade_: do you know the PPA/name of the SSL-capable mongod package, by any chance? [13:45] rogpeppe, sorry, I'm afraid not -- I always just grabbed it from the bucket and stuck in on $PATH [13:46] fwereade_: yeah, i did too. i'm trying to put some instructions together so that people can try to reproduce the go-runtime errors i've been seeing [13:46] fwereade_: and i seemed to remember someone had packaged mongod [13:46] fwereade_: i guess i'll just say "grab the bucket contents" [13:46] niemeyer: hiya [13:49] fwereade_, rogpeppe: ErrDead -> ErrTerminateAgent https://codereview.appspot.com/8416043 [14:00] dimitern, LGTM trivial [14:01] fwereade_: cheers [14:07] niemeyer: https://codereview.appspot.com/8367044 (not that it helped, in fact, but worth doing anyway) [14:40] rogpeppe, you're possibly looking for https://launchpad.net/~julian-edwards/+archive/mongodb/+packages [14:40] teknico: thanks [14:40] teknico: though i've just gone with directing people to the public bucket for now [14:40] teknico: (with an accompanying sha1 sum) [14:55] rogpeppe: Rob spent a good while last night looking at the issue [14:55] rogpeppe: I've helped them setting up a juju environment so they could reproduce the issue [14:56] rogpeppe: He wrote down notes by the end of his day, and I suppose Dmitry moved it on [15:03] niemeyer: thanks for that [15:05] rogpeppe: np, let's hope good stuff comes out of it [15:05] * rogpeppe crosses his fingers [15:07] cloudinit tests are a proper pain to change [15:07] dimitern: if you can think of a better way... [15:07] dimitern: they're better than they used to be [15:08] dimitern: a little script could probably help actually [15:09] :) i'm not sticking my finger in there any more than I need to for now [15:16] mgz: hey mr on-call :-) a quick trivial branch for you (code move only) https://codereview.appspot.com/8422043 [15:17] looking [15:18] mega over usage :) [15:18] are file-level comments a thing in go? [15:18] mgz: well, yeah [15:18] mgz: yes, we can do that though we tend not to [15:18] like, triple quote at top of module saying roughly what it's about? [15:19] or is the pre-function /// [15:19] mgz: usually the type declaration at the top gives a good idea [15:19] // funcName [15:19] the only special magic that the doc generation picks up on? [15:19] mgz: file-level comments are ignored by the doc tools [15:19] thanks [15:20] mgz: that and comments just before the package decl [15:20] mgz: the doc tool picks up comments attached directly to top level declarations, essentially [15:21] lgtm'd [15:22] mgz: ta! [15:22] dimitern: ^if you have a moment and if rog wants two reviews [15:23] mgz: i'm not sure that i can say much in a file-level comment that the type decl and doc comment at the top don't already say [15:24] mgz: i think i'll leave it as is if that's ok [15:24] that's a fair enough answer for one-struct files I think [15:25] which this isn't quite, but nearly :) [15:26] mgz: oh, the reviewer showed up ;) [15:27] I was expecting pokes [15:27] let me read through the log [15:27] browsing randomly for stuff to do is painful with rietveld... [15:27] dimitern: just click the codereview link, it's pretty clear [15:28] mgz: it is, but fortunately fwereade is keeping +activereviews neat now [15:29] looking, but it's submitted already [15:29] that answers clause #2 then :) [15:29] what's clause #2? [15:30] if rog wanted another review [15:31] mgz, dimitern: here's another fairly trivial one: https://codereview.appspot.com/8294044/ [15:32] rogpeppe: just opened it while you're writing :) [15:32] gaba? [15:32] leading underscore? [15:33] mgz: python-style :-) [15:33] so, as a hint internal to the package past the caps-vs-nocaps exposure? [15:33] mgz: yeah [15:34] I guess that seems reasonable [15:34] fwereade: what do you think? [15:34] rogpeppe: why do you need this? [15:34] rogpeppe, that sounds sensible to me [15:35] dimitern, because package-level-only encapsulation is scary as hell once you're past... say... 1 file per package ;) [15:35] dimitern: because i'm essentially exposing the allInfo type as an API to the backing type. some methods are not supposed to be called by the backing. [15:35] rogpeppe: sure, but what will this solve? [15:36] dimitern: it just makes it easy to verify that the backing is only calling expected methods [15:37] dimitern: and if you call a _method from outside the type itself, you'll probably think twice [15:37] rogpeppe: how about putting really internal stuff in a subpackage? [15:39] dimitern: i'm not sure. it seems very intertwined with the allWatcher implementation [15:39] dimitern: and that need to be internal to state, i *think*. [15:40] rogpeppe: this *could* indicate code smell, but otherwise I'm fine with it [15:44] dimitern: actually, i think you're perhaps right - the allWatcher can trivially be moved into its own package. [15:44] dimitern: while the backing implementation remains inside state [15:44] rogpeppe: sgtm [16:04] fwereade, rogpeppe: machine nonce introduced https://codereview.appspot.com/8247045/ [16:04] dimitern: cool; will look soon. [16:04] dimitern: actually, i'm not sure, peer reviews need doing by the end of day [16:05] dimitern: and i haven't done any yet [16:05] rogpeppe: what do you mean? [16:05] dimitern: the allhands stuff [16:06] rogpeppe: ah, yeah - I've done mine on the day [16:06] dimitern: v sensible! [16:06] rogpeppe: I know myself how forgetful I can get, so I try to cheat and do it right away, so I can forget it comfortably :) [16:07] dimitern: that's not cheating, that's the right thing to do! [16:08] rogpeppe: yeah, i meant cheating the lazyness :) === deryck is now known as deryck[lunch] [16:53] TheMue, reviewed [16:58] dimitern, why is MachineNonce required in Conf? [16:58] fwereade: that's what I asked earlier - should it be or not [16:59] fwereade: but I think it's better to be required and add tests for it [16:59] next one coming - state support for nonce provisioning - https://codereview.appspot.com/8429044/ [16:59] dimitern, sorry, I thought you meant in state [17:00] dimitern, but it shouldn't be required for agent.Conf -- that has to work for unit agents too [17:00] fwereade: oops, I mean *environs* [17:00] dimitern, cool [17:00] fwereade: so no tests then? only in cloudinit perhaps\ [17:01] dimitern, in cloudinit sounds good [17:01] fwereade: this will simplify things a lot [17:01] dimitern, and you should at least check that a Conf can be read/written both with and without a MachineNonce [17:01] fwereade: wasn't quite sure it'll be enough, but it's better to cut than to add ;) [17:02] fwereade: you mean add a no-error test with machinenonce in? [17:02] fwereade: and one without [17:03] dimitern, yeah, so really leave the rest and just add a single test that includes it I think [17:04] fwereade: deal [17:04] fwereade: i'm starting on state now [17:08] dimitern: allwatcher now factored into its own package. the tests weren't entirely trivial, but altogether fairly painless [17:09] rogpeppe: great to hear! [17:11] rogpeppe: You mentioned yesterday that you reproduced the bug on socket.Acquire even after the fix was made, right? [17:12] niemeyer: yes [17:12] rogpeppe: Okay, cool [17:12] rogpeppe: Just want to make sure I'm not on crac [17:12] niemeyer: all the panics in my last post on that issue were with that fix made [17:12] k [17:12] niemeyer: yeah, i've spent a fair amount of time trying to convince myself of the same thing [17:12] rogpeppe: Oh, might be worth mentioning it in that thread [17:15] rogpeppe: Were the tracebacks made with the rpc trick too? [17:16] niemeyer: yes [17:16] niemeyer: (i did mention that actually) [17:16] rogpeppe: Yeah, it was Rob that seemed unaware in the thread [17:21] dimitern, reviewed [17:26] fwereade: tyvm [17:37] fwereade: FYI, juju publish will be up for review in a few [17:37] niemeyer, awesome, sorry I've been lagging a bit on your reviews [17:37] fwereade: It's fine.. I have all reviewed right now [17:37] fwereade: Just have to submit [17:37] fwereade: And the main piece is coming up in a moment [17:37] niemeyer, ah, fantastic :) [17:38] fwereade: I still want to add a few features, but I want to get the core in before that [17:38] niemeyer, +1 :) [17:39] fwereade: why should CheckProvisioned(nonce) return an err ever? m.doc.Nonce == nonce should be enough, I think [17:39] dimitern, ha, yes, I'm on crack [17:39] fwereade: and it's because it can't be ever changed it'll either be that or emtpy [17:40] fwereade: so, perhaps m.doc.Nonce == nonce && nonce != "" [17:40] dimitern, but do check that InstanceId != "" as well, or make sure there's no way for it not to be (did we agree to panic on SetProvisioned with "" id?) [17:40] fwereade: yeah, already taken care of in SetProvisioned [17:41] dimitern, but you're absolutely right that we don't need to hit the network for that, cheers [17:41] fwereade: notTheSame := D{{"instanceid", D{{"$nin", []InstanceId{"", id}}}}} [17:41] fwereade: that's the extra assert in SetProvisioned [17:42] fwereade: and it has corresponding bunch of errAborted handling (i'm proud :) ) [17:42] dimitern, I look forward to seeing it :) [17:42] although, everybody, I'm off for a bit now [17:42] I'll probably be back at some stage, I'm up to my elbows in environs again, but I'm out of go-juice ATM [17:43] happy weekends to those I miss [17:43] fwereade: you too! === deryck[lunch] is now known as deryck [17:47] it's that time of day. happy weekend to all and sundry [17:48] fwereade, rogpeppe: have a nice weekend [17:48] fwereade: will propose the latest changes later [17:48] rogpeppe, TheMue: both of you too ;) [17:49] dimitern: enjoy the spare time to recreate [17:58] TheMue: oh, i do, especially lately [19:28] so, final propose is in, time for the weekend. bye