[00:35] thumper: re last nights discussion of pacakging [00:36] can you outline the concerns of the packager's ? [00:36] primarily it seemed to be that we weren't following company standards [00:36] of that I have no doubt [00:37] but that isn't very actionable [00:38] i'm guessing the lack of build reproducability is a red flag [00:38] I'm going to email people concerned to see if we can get help from people who do know the standards [00:38] awesome [00:38] one guy is even in sydney, so pairing might be good if you are interested [00:39] sure [00:39] I'll keep you in the email loop [02:14] davecheney: can I get you to merge that packaging branch of mine? then I can move the card to done [02:22] thumper: how was your foray into packaging? [02:22] bigjools: brief [02:25] thumper: i have some questions about that change [02:26] davecheney: shoot [02:26] see review [02:27] kk [02:39] davecheney: replied [02:39] * thumper is EODing early due to overly long meeting last night [02:41] * thumper will look back in later [02:41] kk [02:42] thumper: fair enough, that makes sense [03:25] Anyone else getting this build error? go/src/launchpad.net/juju-core/cmd/juju/constraints.go:66: undefined: results [03:25] Looks like a :=/= typo. [03:25] * thumper tries [03:25] jtv: which revno ? [03:26] The latest. [03:26] I'm also getting this one, which I guess is good news: [03:26] constraints_test.go:283: [03:26] // A failure here indicates that goyaml bug lp:1132537 is fixed; please [03:26] // delete this test and uncomment the flagged constraintsRoundtripTests. [03:26] +1 for being unhelpful :) [03:26] +1 for getting that [03:26] davecheney: I get it with tip [03:26] jtv: yes [03:26] confirmed, # launchpad.net/juju-core/cmd/juju [03:26] cmd/juju/constraints.go:66: undefined: results [03:26] cmd/juju/constraints.go:66: cannot assign to results [03:26] cmd/juju/constraints.go:67: undefined: results [03:26] will submit a fix [03:26] Ah thanks. [03:26] plz fix [03:26] what the fuck people [03:27] we just talked about this last night [03:27] it worked on my commit [03:27] yep, last commit added that [03:27] * davecheney sends a shittyram [03:28] gram? [03:28] jtv: perhaps he is sending it to mramm [03:28] or he is sending a pooey male sheep [03:28] thumper: that's why I asked :-P [03:29] Confucius say, man-sheep eat from curry tree, he be shitty ram for a day [03:29] sorry, type [03:29] typo [03:29] i meant shittyHAM [03:29] Oh! [03:29] Why didn't you SAY so [03:29] it's my accent [03:29] davecheney: I think the whole = / := thing is horrible [03:29] and needs a rethink [03:29] so many problems come about from it [03:30] FWIW I guess you _can_ create this kind of bug just from concurrent commits. [03:30] The shadowing is really horrible though. Makes you wonder what's so horrible about a mere unused variable. [03:30] foo, err := bar() [03:30] if err != nil { panic(err) } [03:31] if foo { [03:31] sklurg, err := szot() [03:31] Oops now I've created a second err. [03:31] jtv: I don't think you have [03:32] That's what I thought. [03:32] It'll shadow quietly. [03:32] it isn't shadowed [03:32] it is using the same err [03:32] a new sklurg is created though [03:33] hmm, inside the if though, new scope and all... [03:33] I've not tested... [03:33] I don't actually know what is happening there [03:33] perhaps davecheney does [03:33] davecheney: knows what ? [03:34] thumper: I do know _this_ happens: http://play.golang.org/p/uV__rKVATq [03:35] Woo! Test suite completed with just 1 compile error and 1 test failure. [03:35] davecheney: you have a local packaging branch...? [03:35] davecheney: https://code.launchpad.net/~thumper/juju-core/package/+merge/157020 has instructions at the top [03:35] davecheney: bzr merge ... [03:35] davecheney: bzr commit... bzr push [03:35] davecheney: done [03:35] thumper: oh, ok, it's that simple [03:35] * davecheney wonders what lbox does ... [03:35] davecheney: previously was referring to jtv's comment above about errors [03:36] davecheney: lbox does that too, but slowly [03:36] and with extra checking [03:36] and many round trips [03:38] thumper: are you talking about shadowing ? [03:38] davecheney: yeah, is the err a shadow or using the one from the previous scope? [03:38] it is what it is [03:38] it can't be changed now, it's in the spec [03:41] thumper: just tried the sklurg/err thing and yes, it creates a new sklurg *and* a new err. [03:47] looks like others feel the same, https://twitter.com/GolangSucks/status/319999909866651648 [03:48] jtv: is that you @GolangSucks ? [03:59] thumper: nope [03:59] Bit strong for me... You know what I'm like. I'm trying to see the good things. :) [04:01] But I do notice a surprising similarity. "for item := range x counts x. Unless it's a channel in which case it returns values." Pretty much a literal match with something I wrote. [04:01] The unnecessary choices are something I mentioned as well. Lots of make-work. [05:00] Getting a compile error from trunk: http://paste.ubuntu.com/5678748/ [05:01] bigjools: probably got versoin skew with openstack [05:01] davecheney: oh I assumed a go get -u would update that? [05:01] try go get -u launchpad.net/goose/... [05:01] it might [05:02] as in, I did a get -u on juju-core [05:02] ohhh, i wouldn't do that [05:02] it'll totally screw your bzr checkout [05:02] wha? [05:03] go get -u doesn't understand cobzr branches [05:03] I don't use cobzr [05:03] fair enough [05:03] that's the least of its problems anyway, I aliased my branch command so it doesn't use trees [05:04] which breaks go get [05:04] bigjools: probably easier to skip it and just to a bzr pull [05:04] yup! [05:04] anyway - still got that compile error [05:04] after goose updated [05:05] I am on raring if that makes any difference [05:07] bigjools: pls hold [05:07] davecheney: hlding :) [05:07] and holding [05:08] bigjools: fastest way to a fix [05:09] rm -rf $GOPATH/src/launchpad.net/goose [05:09] go get launcpad.net/goose/... [05:09] or whatever method you want to use to manage your source [05:09] ok [05:09] oh did it move? [05:10] it did not move [05:10] actually [05:10] that isn't strictly true [05:10] quickest way to fix that is: [05:10] bzr pull --remember lp:goose [05:11] aaaand done. Loads of updates, crikey. [05:11] someone fucked a lot of things up when the kicked us all out of the ~gophers gropu [05:11] ok new errors [05:11] canonical/GO/src/launchpad.net/juju-core/cmd/juju/constraints.go:66: undefined: results [05:12] fix proposed [05:12] ah is this what you were talking about jtv? [05:13] https://codereview.appspot.com/8370047/ [05:13] yup [05:15] davecheney: given that it's blocking everyone, are you going to land it right away or do you have to wait for 2 pairs of eyes? [05:16] bigjools: that is all the enchouragement I need :) [05:16] davecheney: that was the intention :) [05:16] *cough* [05:20] comitted [05:21] davecheney: compiles \o/ [05:52] and the raring mongo seems to have ssl support \o/ [05:54] bigjools: where is the lp project for mongo ? [05:54] ~mongodb looks stale [05:54] davecheney: not sure there is one - I am just using the latest package, I think jamespage packaged it [05:54] now, I am getting loads of test failures :/ [05:55] [LOG] 92.72463 ERROR api: error receiving request: read tcp 127.0.0.1:38922: use of closed network connection [05:55] all over [05:55] bigjools: id suggest that ssl doesn't work [05:56] are you on raring? [05:57] nope [06:21] davecheney: I think raring's mongo (2.2.4?) is incompatible with the current juju [06:22] it definitely has ssl support [06:24] thumper: you're on raring? [06:31] hmm I have it working on a fresh raring VM. I wonder what's up with my local env... [06:45] mornin' all! [06:46] evening [06:47] morning [07:17] error: Failed to load data for branch bzr+ssh://bazaar.launchpad.net/~dave-cheney/juju-core/111-add-go-build-check-to-lbox/: Server returned 502 and body: [07:17] [07:17] thanks for coming in today, launchpad [07:33] * fwereade_ -> shops, bbiab [09:07] rogpeppe: ping [09:07] dimitern: pong [09:07] * fwereade_ slept badly, and is more than usually stupid today, and is going to lie down for a bit [09:08] rogpeppe: hey, let me pass an idea by [09:08] dimitern: ok [09:08] rogpeppe: how about adding Id() to AgentState, so the jujud/agent can find out which entity id it's associated with and check if it's sane, and then set the password? [09:10] * rogpeppe goes to look [09:11] dimitern: i'm not sure i understand the motivation. [09:11] rogpeppe: it seems all implementations, except the upgrader are backed by a state object - machine or unit [09:11] dimitern: the code that sets the password first gets the entity [09:11] dimitern: there's already a method, Entity, to do that [09:11] rogpeppe: ah! [09:12] dimitern: if we make the Entity code return an error if something about it is bogus, then you've got what you want, no? [09:12] dimitern: that's what i was suggesting last night [09:12] rogpeppe: exactly [09:13] rogpeppe: for example how is a machine taken as an entity? [09:13] dimitern: i'm not sure i understand the question [09:14] rogpeppe: trying to find the Entity interface [09:14] dimitern: there's no Entity interface [09:14] dimitern: Entity returns an AgentState interface value [09:14] rogpeppe: ok, but where's the Id()? [09:14] rogpeppe: it's not in the AgentState [09:15] dimitern: Entity is implemented by the agent in question. [09:15] rogpeppe: and all I get back is an AgentState - should I cast it to something to get its Id? [09:15] dimitern: so we have MachineAgent.Entity and UnitAgent.Entity [09:15] dimitern: each of those knows the id [09:15] rogpeppe: ok, got you - you mean only return the entity if it's sane [09:15] dimitern: for instance, look at the implementation of MachineAgent.Entity [09:16] dimitern: yes [09:16] dimitern: there's already an error return [09:16] rogpeppe: that should work, cheers [09:16] dimitern: cool [09:26] dimitern: you might find this quite cool: i've been trying to isolate a go runtime bug that occurs when running the uniter tests against go tip. i wanted to eliminate the possibility that it was a problem with our use of unsafe and cgo (goyaml uses cgo), so i mocked up goyaml to make rpc calls instead. here's the "mock" goyaml implementation: lp:~rogpeppe/+junk/yamlserver [09:26] dimitern: it worked - i got uniter tests passing when using it, and i'm still seeing the runtime crashes. [09:27] rogpeppe: cool, I'll take a look [09:28] dimitern: i think it's nice that i can swap out an entire implementation like that and almost nothing needed to change in any of the other code (i just needed to change the uniter tests to register the expected types) [09:28] rogpeppe: so it uses goyaml on the rpc server [09:29] dimitern: yeah [09:29] rogpeppe: and the crash in uniter tests is no longer there? [09:29] dimitern: no, it still crashes [09:29] dimitern: which is what i was hoping [09:29] rogpeppe: so it's not our code then [09:29] dimitern: because it means it's (almost certainly) not a problem with our code [09:29] dimitern: yeah [09:30] rogpeppe: good to hear, but any idea is it in goyaml or is it go-tip related bug? [09:30] dimitern: it's a go-tip related bug. probably the new garbage collector. [09:31] dimitern: but perhaps the new scheduler. [09:31] dimitern: http://code.google.com/p/go/issues/detail?id=5193 [09:31] rogpeppe: hmm.. it seems the closer we get to 1.1 release more nasty bugs appear [09:31] dimitern: they've shoved a lot of stuff in quite recently. [09:31] dimitern: i'm very concerned too. [09:34] rogpeppe: it seems at least they're looking into it, it's not just collecting dust in the issues list [09:34] dimitern: yeah, they're concerned too [09:40] rogpeppe: well, there are a lot of smart people there, so there's hope of fixing it [09:40] dimitern: yeah. it would be nice if i could replicate the problem with less than x00,000 lines of source code :-) [09:41] rogpeppe: definitely :) [09:42] dimitern: unfortunately the problem is so intermittent, that's not easy [09:47] rogpeppe: how often can you reproduce it? [09:47] dimitern: about once in every 3 or 4 runs [09:48] dimitern: it dies in quite a few different ways [09:49] rogpeppe: bugger [09:50] dimitern: see the collection of panics in my last message on the above issue [09:50] rogpeppe: in random places of the code or at least these are often the same? [09:50] rogpeppe: ah, ok [09:54] rogpeppe: nasty.. they're in at least 3 different places [09:55] dimitern: yeah. i'm pretty sure they're all symptomatic of something else (for instance a race or other bug in the garbage collector) [09:55] dimitern: i see other problems too which aren't panics, but are symptomatic of memory corruption [09:56] rogpeppe: so you nailed it down to r16008 [09:56] (the link is broken there btw) [09:57] dimitern: i'm not sure. [09:57] dimitern: that might just be the change that made the garbage collector bug show up [09:57] dimitern: because 16008 is when the new scheduler was introduced [09:59] rogpeppe: in can be in either the new scheduler or the GC, both nasty internals [09:59] rogpeppe: oh well, we'll see how it goes [09:59] dimitern: yeah [10:08] rogpeppe: so if a machine doesn't have instid, is it correct to call EnsureDead() before returning it from the Entity()? That way it'll be detected in openState and the worker will die properly [10:09] dimitern, hell no :) [10:09] fwereade_: why? [10:09] dimitern, we don't want the machine to die -- there'll be another agent along soon to do the right thing [10:09] dimitern, we just want the bad agent to go away and not come back :) [10:10] fwereade_: exactly [10:10] fwereade_: so we want the instance dead, not the machine [10:10] dimitern, yeah, exactly [10:10] dimitern: and the agent will die immediately anyway [10:10] rogpeppe: not really [10:10] dimitern, sorry about the terminology, it is annoying ;) [10:10] rogpeppe: it'll die (now) iff the machine is missing or dead [10:10] dimitern: because the run loop will terminate [10:11] dimitern: ah, you're right - it'll keep on trying [10:11] dimitern, rogpeppe: how would you feel about just writing a flag somewhere local saying "stopped" or something [10:11] rogpeppe: I'll have to introduce a specific ErrUnprovisioned for example, and use that to amend the cases in which the worker dies [10:11] fwereade_: i really don't care about this case [10:12] fwereade_: it's a ghost machine [10:12] fwereade_: i don't care if it carries on trying [10:12] a ghost *instance*, rather [10:12] rogpeppe: but it'll be nicer to finish off the rouge MA as early as possible in this case, isn't it? [10:12] rogpeppe, so long as we return nil from Run it'll only retry once per restart anyway I guess [10:12] rogpeppe, but wait [10:13] dimitern: i don't want to introduce more logic for this case which is actually vanishingly unlikely [10:13] rogpeppe, if we *don't* mark it stopped, then a badly-timed restart of its instance can fuck up the other machine, I think [10:13] fwereade_: how so? [10:14] fwereade_, rogpeppe: how about that specific error (ErrUnprovisioned) or something? [10:14] (returned from MA.Entity) [10:15] or maybe (nastier) return NotFoundf(), although it's clearly a lie, but it'll only affect the agent [10:15] rogpeppe, agreed vanshingly unlikely, but: (1) bad machine logs in with old password (2) provisioner sets password for good machine and starts an instance (3) bad machine changes password and starts running [10:15] rogpeppe, (4) provisioner takes down bad machine (5) good machine can't run [10:16] fwereade_: why is this at all so unlikely? [10:16] dimitern, it requires pathologically bad timing [10:16] fwereade_: given infinite time and scenarios, everything is possible :) [10:17] dimitern, exactly so [10:17] dimitern, that is why I would prefer a little tweak to the agent running [10:18] dimitern, first thing an agent should do is check its conf for a stopped flag; last thing it should do if it's returning nil from Run is to set the stopped flag [10:18] fwereade_: why would the bad machine ever get as far as changing the password? [10:18] rogpeppe: this happens first thing after running the MA, right? [10:18] rogpeppe, because the provisioner set one at the worst possible time... say during a 5-second t1.micro sleep on the bad machine [10:19] fwereade_: the bad machine doesn't care - it looks at the instance id and sees that it doesn't match its own instance [10:19] fwereade_: i don't see that there's any way it can get past that barrier [10:19] rogpeppe, how's it meant to find that out? we're planning to drop InstanceId from EnvironProvider [10:19] rogpeppe: it cannot know it's own instance id, only the machine id [10:19] rogpeppe: it can check is it "" or not [10:20] fwereade_: orly? [10:20] rogpeppe, yesrly, it's caused problems for both maas and openstack [10:20] fwereade_: so how do we deal with the "provisioner unprovisions itself" problem? [10:20] rogpeppe, and there's no reason for it except to prevent the provisioner from screwing itself [10:21] rogpeppe, so, we special-case in the provisioner instead [10:21] fwereade_: I think MAAS guys figured it out, and in openstack it can be done through the storage [10:21] dimitern, I know it can be done [10:21] fwereade_: how does that work? [10:21] dimitern, but there is a different shitty wa of doing it in each case [10:22] rogpeppe, if there's only one machine in state, grab the instance id from provider state... done. right? [10:23] dimitern, note that the put-it-in-environ-state works for the bootstrap machine only -- it won't let us get an InstanceId on an arbitrary machine [10:23] * dimitern keeps waiting for an answer on errUnprovisioned (locally in jujud only) [10:23] fwereade_: assuming the machine doesn't come up really fast, i suppose :-) [10:24] dimitern: we're getting there :-) [10:24] :) [10:24] rogpeppe, even if it does, we fail out and get restarted in 5s [10:24] fwereade_: no, if it does, we unprovision ourselves [10:24] fwereade_: or... no, i see [10:25] rogpeppe, this would happen before we do anything else, so I think it's safe [10:25] fwereade_: i'm not sure you can guarantee it happens before we do anything else [10:26] fwereade_: the provisioner is just another API client, right? [10:26] rogpeppe, what's affected by this other than the provisioner? [10:26] fwereade_: someone external could jump in before the provisioner and add another machine [10:26] rogpeppe, ah! [10:26] rogpeppe, ok, so we special-case it to be machine 0 then ;p [10:27] fwereade_: how about HA state then? [10:27] dimitern: HA state is fine - we only have this problem at bootstrap [10:28] dimitern, HA state is orthogonal, I still plan to bootstrap one machine and add HA later [10:28] ah, ok then [10:28] dimitern, even if it's part of jujud bootstrap itself to add N more machines to state directly [10:28] fwereade_: ah, i've got an idea [10:29] fwereade_: the bootstrap init code marks machine 0 with an instance-id "pending" [10:29] fwereade_: when the provisioner comes up, if it sees machine 0 with "pending" instance id, it fetches it from the provider [10:30] fwereade_: otherwise i don't see how the machine-0 special casing works [10:31] rogpeppe, not sure about a magic value in that field, I'd kinda rather it be separate [10:32] * dimitern still doesn't quite get why we need to special-case machine-0 [10:32] dimitern: because machine-0 is the *only* place that haven't got the state to write the new instance id to [10:32] dimitern, we can't always know machine 0's instance id before we provision it [10:32] dimitern, and we can't write it to state just after we provisioned it, because there's no state yet [10:33] fwereade_: yeah, an extra field in Machine would work as well [10:33] oh, I see now [10:33] rogpeppe, I'm wondering whether it's actually a property of the environment [10:34] fwereade_: oh, doh! [10:34] rogpeppe, Environment.BootstrapComplete() (bool, error) ? [10:34] fwereade_: it's something that can be done by bootstrap-init [10:34] fwereade_: the provisioner doesn't need to know anything about it [10:35] rogpeppe, hmm [10:35] fwereade_: i mean bootstrap-state, of course [10:35] rogpeppe, yeah, that's right [10:35] rogpeppe, oh no wait [10:36] rogpeppe, we need the user to have connected once before we have the keys that let us find out the instance id [10:36] * dimitern goes for a run until the argument settles, bbi30m [10:36] rogpeppe, that doesn't matter now [10:36] :) [10:37] fwereade_: hmm, i suppose so. unless the provider can make it available to anyone [10:37] rogpeppe, but might it if the CLI were doing an API connection? [10:38] rogpeppe, I think it's ok to put it in the provisioner anyway -- the locality of reference would help, I think [10:38] fwereade_: the code would be much simpler if it could go in bootstrap-state, but i'd forgotten about the provisioner keys issue [10:39] rogpeppe, I know it would :( [10:39] rogpeppe, but I honestly think it's not too bad [10:39] rogpeppe, it exists to fix the provisioner [10:39] fwereade_: yeah, but the muck spreads into quite a few places [10:40] fwereade_: perhaps we change Machine.InstanceId to return, instead of a bool, one of InstanceIdNotSet, InstanceIdOk or InstanceIdPending, rather than add another bool. [10:41] fwereade_: then if the provisioner finds machine 0 with InstanceIdPending, it knows where to look [10:41] fwereade_: and it can then fill in the instance id and carry on as normal [10:42] rogpeppe, it feels quite localized to me anyway though [10:42] fwereade_: well, it affects the provisioners and the state as well as the provisioner, so not *very* localized :-) [10:43] s/provisioners/providers/ [10:43] rogpeppe, I don't think it even has to affect the providers [10:43] fwereade_: don't they have to be able to return the instance id for machine 0 ? [10:44] rogpeppe, they all have to list instances on demand, and those instances have an InstanceId method [10:44] fwereade_, rogpeppe sorry to interupt, I'm still having problems connecting to the api https://pastebin.canonical.com/88504 [10:44] fwereade_: that doesn't help [10:44] rogpeppe, meaning that we only have to worry about a single mechanism for determining instance id [10:45] fwereade_: we need to know which of those instances corresponds to machine 0's instance [10:45] mattyw: OpenID discovery error: HTTP Response status from identity URL host is not 200. Got status 503 [10:45] rogpeppe, how are those other instances going to get started without a provisioner? :) [10:45] fwereade_: bootstrap [10:45] fwereade_: and possibly by a previous environment [10:47] fwereade_: if we could assume that we'd only ever get one instance after bootstrap, the problem would be much easier [10:47] rogpeppe, do you mean a previous environment that has not been properly shut down, or just an eventual-consistency phantom? [10:47] fwereade_: the former. [10:47] rogpeppe, screw that, user error [10:47] rogpeppe, if destroy-environment failed, run it again until it succeeds [10:48] fwereade_: but eventual consistency could make a properly shut down environment show the same symptom [10:48] aaaaaargh (facepalm) [10:50] found the failure, only had to move three loc a bit. didn't seen which value differs otherwise in all that debugging output. (hmpf) [10:52] fwereade_: the scenario that concerns me is when, somehow, the token that signifies that an environment has been bootstrapped has gone (perhaps someone removed all their buckets), so we bootstrap and find all the previous instances. [10:52] rogpeppe, to be utterly explicit, this is when the env reports one terminated machine to be running and one running machine not to exist, right? [10:53] fwereade_: no, it's when the env reports more than one running instance, only one of which is the bootstrapped instance. [10:53] fwereade_, rogpeppe https://pastebin.canonical.com/88504/ should be working again now - it's trouble connecting to the api [10:53] rogpeppe, meh, fail out when len(instances) != 1 [10:54] fwereade_: that is a possibility, yeah. we could mark the bootstrap machine with an error status in that case. [10:55] rogpeppe, not even that [10:55] rogpeppe, just try again later [10:56] fwereade_: the user needs at least some feedback, otherwise they'd have no idea why "juju deploy" isn't working. [10:56] fwereade_: or... just don't unprovision any machines in that case, i guess [10:57] rogpeppe, but wait I'm being stupid [10:57] rogpeppe, we *always* set the instance id in the provider storage anyway [10:57] fwereade_: yes [10:57] rogpeppe, provisioner, as soon as it has keys, calls ensureBootstrapComplete() [10:57] fwereade_: well, in the existing providers [10:58] fwereade_: we don't make it available though [10:58] rogpeppe, agreed that without storage we will have a hard time [10:58] fwereade_: we had plans to move away from that storage in ec2 [10:58] fwereade_: and use instance tags instead [10:58] fwereade_: ah! [10:58] rogpeppe, well, that would be an alternative solution [10:59] fwereade_: this is a good reason why generating the env UUID on the client side would be good [10:59] rogpeppe, tagging instances with uuid? yeah [10:59] fwereade_: yeah [10:59] rogpeppe, it feels to me like storage is the tool we have available now though [11:00] fwereade_: that's true. but i'd like our solution to work with tags too. [11:00] mattyw: looking [11:01] rogpeppe, something funny about `invalid entity name ""` given the `info.Tag = "user-admin"` [11:01] fwereade_: that's what i'm thinking [11:01] mattyw: is this on go trunk? [11:02] mattyw: i mean juju-core tip! [11:03] rogpeppe, no, it's 2 or 3 days old [11:03]