=== thumper-gym is now known as thumper [00:19] * thumper is back to go through the review [01:09] davecheney: https://codereview.appspot.com/8348043 has been updated following fwereade__'s review [01:09] more looks would be good though [01:18] * davecheney looks [01:23] * thumper needs food, goes hunting in the kitchen [01:24] * bigjools imagines thumper spearing a bear roaming in his kitchen [01:25] http://creativeroots.org/2011/08/kiwi-anatomy-t-shirt/ [01:27] haha [01:54] davecheney: how many people do we have in this TZ that can approve my branch? [01:54] * thumper wonders when jam starts [01:56] bigjools: not a bear, but I did get some salami and cheese :) [01:56] lucky me, they didn't put up much resistence [01:57] thumper: I found a hot cross bun that was toasted [01:57] bigjools: did you have to light the fire to toast it? [01:57] or was it left by a passer-by [01:57] ? [01:57] I could use Ian's dog's breath [01:58] it's enough to curl hair [01:58] heh [01:58] * thumper needs to collect a young child from school [01:59] bbs [04:27] davecheney: ping [04:30] davecheney: https://code.launchpad.net/~thumper/juju-core/package/+merge/157020 [04:31] * thumper off until the meeting [06:01] jtv1: just trying to sort out what branch you want reviewed === ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: dimitern | Bugs: 2 Critical, 53 High - https://bugs.launchpad.net/juju-core/ [06:13] morning all [06:15] hi dimitern [06:16] Hi jam, hi dimitern [06:16] jam: I was about to bring it up... Had some trouble with lbox, but had to go with the normal way instead: [06:17] https://code.launchpad.net/~maas-maintainers/juju-core/maas-provider-skeleton/+merge/157025 [06:17] I'm just fixing the conflicts. [06:29] Oh dear. I'm getting build errors, but I get the same ones in trunk. [06:30] Nonexistent functions in "log," and more. [06:31] jtv1: which revision are you on? [06:31] 1104 [06:33] jtv1: me too, just pulled trunk and running the tests; can you paste the errors? [06:34] Coming up. [06:34] I get the exact same ones for trunk and our branch, so I guess that's a good sign w.r.t. ours. [06:35] dimitern: http://paste.ubuntu.com/5675740/ [06:35] all passes here [06:36] jtv1: these are build errors, not test failures - something is not updated I think [06:38] I'll run a "go get -u juju-core" then I guess [06:38] jtv1: well, it'll possibly screw up your working dir [06:39] jtv1: if you have branches, etc. - maybe try changing $GOPATH to a new dir and get it afresh? [06:39] Ouch. [06:39] OK, I'll try that, thanks [06:43] morning all! [06:44] Hi rogpeppe [06:44] jtv1: hiya [06:44] rogpeppe: hey [06:44] * davecheney waves [06:44] davecheney: hi :) [06:45] davecheney: good evening; dimitern: yo! [06:47] fwereade__: I have a branch for you https://codereview.appspot.com/8330043/ [06:50] dimitern, hey, just reviewed [06:52] fwereade__: cheers! btw the pending/starting is just the first step, the MA will set the correct status in a follow-up I think [06:52] dimitern, I think I would prefer to set nothing than to set something that's only mostly correct [06:53] dimitern, there's an argument to be made that "provisoned" or "starting" or whatever can be inferred perfectly well from InstanceId [06:53] fwereade__: ok, i'll think about it some more [06:53] dimitern, cheers [06:56] fwereade__: why do you think breakJuju should panic in the provisioner test? [07:01] is the meeting now or in 1h ? [07:02] mramm2: ping [07:03] * davecheney wonders if he has time to order a pizza before the meeting [07:03] so it's not now [07:04] nope, still 6pm in AU [07:04] meeting is at 7pm [07:05] so when does AU switch to DST? [07:05] soon [07:06] dimitern: 7th Apr [07:06] I see [07:06] but not Queensland [07:06] they never adopted daylight savings because they were worried it would confuse the Cows [07:06] so you're last to go - we already switched last sunday, and US - before 3 weeks [07:07] :D cool [07:07] dimitern: even inside australia the 7 states shift at different times [07:07] /s/times/dates [07:07] because when something is hard, it helps to make it even harder by not coordinating [07:07] * davecheney says fuckit, and orders a pizza' [07:09] interesting :) and opinionated [07:26] dimitern, sorry [07:26] dimitern, the provisioner test should not have a JujuHome [07:27] fwereade__: it's not using it really [07:27] dimitern, it runs in the cloud and doesn't have an environments.yaml or any of the associated crap [07:27] dimitern, yes it is [07:27] fwereade__: it has environ config though [07:27] dimitern, it's getting the environment from environments.yaml [07:28] dimitern, which invokes a whole bunch of crap that is *wrong* for this context [07:28] dimitern, I see why it works though [07:28] fwereade__: well, it's changing that, not reading it, but I see your point [07:28] dimitern, it's because JujuConnSuite is broken [07:28] dimitern, but fixing that is out of scope [07:28] dimitern, it's changing it and then reading it, right? [07:29] fwereade__: I'll split the "make broken env.yaml" from "make broken environ config" for a given dummy method [07:29] fwereade__: not [07:29] dimitern, then why's it writing it? :) [07:29] fwereade__: reading it - just changing it and returning config.Config [07:29] :) [07:30] fwereade__: because breakJuju seemed like a good candidate for that, already used in cmd tests [07:30] fwereade__: i changed it to reuse it, but it diverged too much from the original and now it's doing 2 unrelated things [07:30] dimitern, sure, but the dummy provider already has exactly the capability you need -- you just set one field in the existing config [07:31] dimitern, I think that just doing that in the provisioner test is neatest [07:31] fwereade__: wasn't sure, but now I see it better (after having done it the wrong way first) :) [07:31] dimitern, don't worry, this stuff is nasty to begin with and only made worse by the essentially-random contexts in which we run tests [07:32] fwereade__: how about adding an exported call to the dummy provider to make breaking methods easier? [07:32] dimitern, honestly its not that much work to change a config [07:33] dimitern, cfg, err := environ.Config().Apply(map[string]interface{}{"broken": "StartInstance"}); c.Assert(err, IsNil); err = environs.SetConfig(cfg); c.Assert(err, IsNil) [07:33] fwereade__: it's not obvious and it's definitely useful to have a shortcut I think, for future use as well [07:34] fwereade__: (and documented properly, rather than having to read through the source to figure out how) [07:34] dimitern, isn't that a method that'll be used in exactly one place? [07:35] fwereade__: in any place that needs to break the dummy provider's methods without needing environments.yaml [07:35] fwereade__: i'm sure there'll be others [07:36] dimitern, I suspect that `UpdateConfig(env Environ, attrs map[string]interface{}) error` would actually be useful [07:36] fwereade__: anyway, too early for me to argue properly :) I'll just do it in the provider tests as a helper, which can be extracted if needed [07:37] dimitern, provider or provisioner? :) [07:37] fwereade__: yeah, provisioner [07:37] dimitern, cool, thanks [07:38] * dimitern dives into the review queue [07:41] dimitern: about those test failures on trunk... I run "go test ./..." in a fresh Go environment, against trunk, and I get further now but it's still pretty bad: http://paste.ubuntu.com/5675840/ [07:42] Is that what you're getting? [07:42] * thumper is back [07:43] jtv1: you need mongod installed for the state tests [07:43] Ah damn, forgot about that. Will it work on i386 now? [07:43] jtv1: i think now, unless you build it + ssl [07:44] s/now/not/ [07:44] OK, I'll try it on the cloud. Thanks. [07:45] jtv1: but it seems mongo is not the only problem - some things are still missing from the checkout [07:45] rogpeppe: can you help jtv1 with these issues? ^^ [07:46] jtv1: looks like you might not have mongod installed [07:47] I don't — Dimiter already spotted that one. [07:47] I'm trying it on the cloud now. [07:47] rogpeppe: that's not the only problem though - see the undefined stuff [07:47] It was in a fresh Go environment. [07:48] jtv1: what's your current directory and what's your GOPATH set to? [07:49] My GOPATH is /tmp/go (where I built a fresh environment) [07:49] I softlinked $GOPATH/src/launchpad.net/juju-core to my own checkout of trunk, and was cd'ed into that. [07:50] jtv1: ah, that might be your problem [07:52] The softlink, the working directory, or the combination? [07:52] jtv1: the fact that the working directory isn't directly below $GOPATH [07:53] jtv1: what happens if you do "go test launchpad.net/juju-core/..." ? [07:53] jtv1: you can't softlink GOPATH [07:54] sorry [07:54] the go tool is too smart for symlinks [07:54] Too smart for its own good then :) [07:54] jtv1: it tries to work out the package path by looking at the current working directory [07:55] jtv1: unfortunately there's not enough information for it to do so [07:55] That other command line says "launchpad.net/juju-core/..." matched no packages [07:55] jtv1: i'm surprised it didn't give you an error actually [07:55] jtv1: you can softlink stuff inside $GOPATH though - I have ~/work/go/src/launchpad.net/juju-core -> ~/work/juju-core and it works [07:55] Well I'm getting plenty :) [07:55] That's what I did too. [07:56] dimitern: i'm surprised that works actually [07:57] (actually it's the other direction, ~/work/juju-core is a symlink to $GOPATH/src/launchpad.net/juju-core) [07:57] dimitern: ah, that makes more sense [07:57] t [07:57] davecheney: I didn't softlink GOPATH... just $GOPATH/src/launchpad.net/juju-core [07:58] sorry, symlinks inside GOPATH are a no no [07:58] jtv1: the problem is that when the go tool gets the current working directory, it sees ~/mycheckout, not $GOPATH/src/launchpad.net/juju-core [07:58] jtv1: so it can't work out that you're building launchpad.net/juju-core [08:03] rogpeppe: but it'll work as long as I use the launchpad.net/juju-core/... path? [08:03] jtv1: no; i thought it would, but it doesn't [08:03] :( [08:03] jtv1: (i'm not sure why) [08:03] But then how does that softlink work for you!? [08:03] jtv1: it doesn't [08:03] :( [08:03] jtv1: it works for dimiter because his softlink points *inside* GOPATH [08:04] I wonder if maybe it isn't better to have one Go environment per branch... === mthaddon` is now known as mthaddon [08:06] jtv1: i use cobzr, but you can also use bzr colocation [08:07] mgz: we miss you [08:08] takes too long to sign in to g+ these days... [08:13] mramm2: so you don't feel unloved. PING :) [08:41] davecheney: would you be happy if I made environs.config.DefaultSeries a const string? [08:41] I could make the change and merge [08:41] thumper: +1 [08:50] davecheney: just proposing those changes [08:52] thumper: where does version.CurrentNumber come from ? [08:52] was it added recently ? [08:52] yes, added in that branch [08:52] fwereade__: thanks for the notes on our feature branch! We were just looking at how big the divergence was, and I think this'll be helpful. [08:53] thumper: i can't find it ? [08:53] am i an idiot ? [08:53] davecheney: to of version/version.go [08:53] davecheney: first change in that file [08:53] not an idiot, just blind :) [08:53] ahh, yes, v sorts afrer e [08:53] so, i was right first time :) [09:03] :( [09:03] for some reason merging in trunk breaks it [09:03] * thumper pokes [09:06] jam: your sync-tools changes have broken my branch [09:06] * thumper tries to work out the test [09:06] yup, that was the plugin crashing... [09:07] I won't rejoin, signin takes too long [09:08] thumper: the mongo branch or ? [09:08] jam: yeah, my refactoring... [09:08] which hits some fake tools somehow [09:09] thumper: I set up a dummy env, and then copy tools out of it [09:09] I'm getting some extra tools there... [09:09] but you can't set up a dummy env [09:09] without fake tools [09:09] so I delete them [09:09] so the test has known state [09:09] ah, but not all of them... [09:09] now you need to delete 2 of them, I guess [09:09] right/ [09:09] I'll poke [09:09] isn't there a delete all? [09:09] thumper: no, but you can list, iterate, and delete. [09:10] Storage, has Put, Get, List, Remove [09:11] thumper: line 67 is where yoou can change the "lookup Current" into "list all" [09:11] sync_tools_test.go [09:11] * thumper nods [09:11] technically you should be matching against tool patterns, etc. But in practice, I think you can just nuke everything in the dummy env. [09:11] (may get us in trouble in the future :) [09:12] Say, does anyone know why the tests for launchpad.net/juju-core/cmd/juju might be timing out? [09:12] jtv1: if you are timing out, most likely you aren't running a mongo that is new enough to support ssl [09:12] that was the most common occurance I've encountered. [09:12] jtv1: this is a 600s timeout, right? [09:12] Yes. [09:13] During TestPackage. [09:13] So maybe I should try Raring? [09:13] jtv1: if you have the wrong mongod in your PATH, it doesn't start [09:13] jtv1: you have to run from the PPA or use the tarball [09:13] I just installed the quantal mongod. [09:13] jtv1: the one from Julian? [09:13] No, just the standard one. What PPA should I use? [09:14] jtv1: https://launchpad.net/~juju/+archive/experimental [09:14] that only has raring [09:14] just a sec [09:14] https://launchpad.net/~julian-edwards/+archive/mongodb [09:14] has quantal [09:14] Thanks. [09:15] it looks like raring might come with a default one that is new enough: https://launchpad.net/ubuntu/raring/+source/mongodb/1:2.2.4~rc0-0ubuntu1 [09:15] but I'm not positive [09:15] because the SSL support is contentious because of GPL licensing vs OpenSSL, etc. [09:16] OK, I'll try the PPA first then. I'm also keeping notes so hopefully people can re-use them. Thanks. [09:16] thumper: "constant wouldn't be a string": "const foo string = "stuff" [09:16] jam: yeah, that is what I have done [09:21] jam: the target bucket also needs fixing :( [09:24] thumper: I couldn't find exactly when the bucket gets filled, so I just allowed it to have one extra [09:24] nm, I'm fixing [09:24] It looked like putFakeTools was done on first *connect* [09:24] rather than on creation. [09:25] thumper: the other option is to have a dummy config items that lets you disable auto tools. [09:25] jam: sure... but not now :) [09:32] rogpeppe, it looks like Status has been removed from unitinfo recently? [09:32] mattyw: yes, unfortunately. i'm working on putting it back right now. [09:33] rogpeppe, ok no problem === andrewdeane is now known as awd [09:35] jam: managed to fix all the issues during the call :) [09:35] :) [09:35] just doing a final test run prior to submitting [09:35] jtv1, you're welcome :) [09:35] jtv1, there will certainly be more but I need to look at it in bits :) [09:35] I have a feeling tomorrow will be doing all the peer reviews [09:35] for me at least [09:35] also [09:35] yay for checking the tests prior to merging [09:36] with trunk merged in [09:37] fwereade__: I'm trying to get trunk to pass tests right now, and after that I can get a bit of an overview of the divergence. rvba and I are planning to do the fixing for this one. [09:37] jtv1, cool [09:41] * thumper has merged the branch, and is leaving the office [09:41] ciao [10:12] fwereade__: the provisioner currently aborts if any machine failed to start [10:13] fwereade__: I didn't change that, it was already like this [10:20] dimitern: just FYI, revision 1101 broke our tests against go tip for me. [10:20] rogpeppe: can I see a paste? [10:20] dimitern: it's probably the go runtime's fault, not yours [10:21] dimitern: http://paste.ubuntu.com/5676161/ [10:21] dimitern: that's running TestUniterRelations only [10:22] rogpeppe: I have no idea why [10:23] rogpeppe: could it be a bug in mgo? [10:25] dimitern: i really don't know. i'm trying to narrow down the problem first. [10:27] dimitern: it's not helped by the fact that from go revisions 15910 to 16052 our stuff does not build due to a bug in the go compiler. [10:28] rogpeppe: ha.. interesting [10:35] dimitern, yeah, but making that sane is part of the job, I think [10:35] dimitern, sorry, bbiab [10:44] fwereade__: my point is, when it fails it exits the loop, like the uniter does, and gets restarted [10:53] updated https://codereview.appspot.com/8330043/ [11:04] dimitern, sorry, still half-having lunch, but: the uniter sets a hook error state and keeps going, because it's meant to handle that failure [11:04] dimitern, the situation is analogous [11:05] dimitern, besides, letting the pro die will fuck the firewaller at the same time, and that has even more complex state to set up every time [11:05] fwereade__: so this means general refactoring of the provisioner, not just setting an error status, which I think is outside of this CL [11:06] fwereade__: it'll die exactly the same way now, no regression introduced [11:07] fwereade__: i'm happy to do the refactoring though, in a follow-up - will get me a chance to know it a whole lot better [11:24] 'AaaaaAaa,e. [11:26] jam: very expressive encoding :) [11:26] dimitern: 5-year old keyboard typing. And me hitting instead of [11:29] dimitern, mgz: standup? [11:35] dimitern, I think there has been a communication error [11:36] fwereade__: oh, what was it? [11:37] dimitern, meh, just what I thought I meant and what you thought I meant re the story [11:37] dimitern, it is ofc my fault :) [11:38] fwereade__: sorry, but it's still not clear for me [11:38] dimitern, in my mind, making the provisioner react sensibly to predictable errors -- as does the uniter -- is absolutely part of this story [11:39] dimitern, think through step by step what will happen when we try to provision a machine with bad constraints [11:41] dimitern, the provisioner starts up, sees an unprovisioned machine, tries to provision it... falls over [11:42] dimitern, gets restarted, sees an upprovisioned machine, tries to provision it... falls over [11:42] dimitern, and may well continue doing this forever even if there are other machines that *could* be provisioned [11:44] dimitern, now we have a channel by which errors can escape to something that'll handle them -- however badly -- we should (1) send the errors down that channel and (2) not send them elsewhere [11:44] dimitern, in my mind, the two pieces of work are two sides of the same coin [11:46] fwereade__: wanna have a quick g+? [11:46] dimitern, sure [12:21] TheMue, ping [12:40] fwereade__: pong [12:41] fwereade__: just proposing the changes ;) [12:41] TheMue, heyhey, just wanted to say I'm around if you want to chat about the uuid stuff [12:41] fwereade__: found your feedback clear (at least i hope i got it right ;) ) [12:41] TheMue, cool :) [12:42] fwereade__: so, it's in again [12:42] TheMue, what are the plans for env doc key vs globalKey then? ah ok cool I'll read it :) [12:43] fwereade__: i liked your idea, it's then a real global key. [12:44] TheMue, sorry, I think I failed to be clear here [12:45] fwereade__: i'm listening [12:45] TheMue, the globalKey for the environment was until recently "e" [12:45] TheMue, I'm not sure why that changed on the Environment entity in the first place [12:45] fwereade__: "e#<>" [12:47] TheMue, http://paste.ubuntu.com/5676494/ [12:47] TheMue, the global key for the environment is still "e" [12:47] TheMue, that globalKey() method is I think objectively wrong [12:48] fwereade__: ah, yes, i meant the the method globalKey() returned [12:48] fwereade__: the "e", like for the settings, has been the reason i've used "e" in the first run as document id too [12:48] frankban, ping [12:49] TheMue, yes, exactly [12:49] fwereade__: pong [12:49] frankban, Environment.globalKey() [12:49] frankban, it doesn't match the usual environment global key as used elsewhere; is this because the existing one ("e") is crazy, or because you didn't know about it? [12:50] TheMue, so, whatever we pick for globalKey is irrelevant [12:51] fwereade__: in order for the PA to pick up machines in error state _at all_ is to watch for something other than Life [12:51] fwereade__: OIC, you mean "e#[name]" vs just "e"? if so, the latter. [12:51] fwereade__: I mean machines that were in error status but was resolved [12:51] frankban, cool, thanks, just checking [12:52] dimitern, ah-ha! [12:52] dimitern, ok that makes things more interesting/annoying in the future [12:52] fwereade__: PA uses WatchMachines() which is a lifecycle watcher, not on status or anything else [12:52] dimitern, but for now remember we don't expose any way to resolve [12:52] fwereade__: so globalKey() should only return "e" and that's also ok as document id for the environment doc in its collection? [12:52] dimitern, so it doesn't matter yet [12:52] TheMue, (1) probably (2) maybe [12:53] fwereade__: hehe [12:53] fwereade__: so I cannot test the machine is skipped - it'll always be skipped once created and is in error state [12:53] TheMue, in both cases it's worth a discussion if you're not familiar with the context [12:53] fwereade__: even after restarting the PA the machine still won't be picked up [12:54] dimitern, and that is correct behaviour [12:54] dimitern, verify that it happens and you're good, I think [12:54] fwereade__: ok, but seems incomplete [12:54] fwereade__: as the doc is the only one in it's collection the id only has to be static, so i initially thought "e" like the one for the environment conf or its constraints would be fine. [12:54] fwereade__: but I guess when we have to deal with resolve it'll come back to this [12:55] dimitern, yeah -- I think it's a problem we can safely solve then [12:55] fwereade__: for the globalKey() i have to look more, indeed. interestingly the tests pass all. [12:55] Good mornings [12:55] dimitern: Fixed: https://codereview.appspot.com/8355044 [12:56] TheMue, I think I'm fine with "e" all round, actually, but please make sure it's a const somewhere and not used in magic string form [12:56] niemeyer: cool, thanks I'll take a look [12:57] fwereade__: so shall i change all other places where today simply "e" is used too? [12:57] TheMue, yes please [12:57] fwereade__: will do so [12:57] TheMue, although, wait a mo [12:58] fwereade__: yes, until your review is in [12:58] TheMue, I'm suspicious of having the two keys be the same [12:58] TheMue, any particular reason not to use the uuid? [12:59] TheMue, it *is* a unique identifier for the environment, after all :) [12:59] TheMue, (we should not use the uuid for the global key, just for the entity key) [12:59] fwereade__: when doing the FindId() i have to know the id. otherwise i could also read the first document of the collection. would work too. [13:00] TheMue, just-pick-one is OK for now [13:01] TheMue, if we end up sharing that collection when multitenanting it'll break, but we'd always have to have the env uuid available in the *State, so it'd be a trivial fix [13:03] fwereade__: ok, then will do it this way. so i'll use the uuid field as _id. [13:07] niemeyer: LGTM [13:13] fwereade__: i think this should be about all of it: https://codereview.appspot.com/8330043/ [13:15] dimitern, ack [13:15] * dimitern lunch === sidnei` is now known as sidnei [13:54] I have a small branch up for review, it's pretty straight-forward: https://codereview.appspot.com/8366043 [13:56] * rogpeppe is in a sea of *almost* reproducible bugs and uncertain dependencies [14:03] benji: I'm on it shortly [14:04] thanks dimitern [14:05] mgz: do you know of a way of getting bzr to print the revision id it's currently on (not just the latest revision id it knows about)? [14:05] --tree [14:06] mgz: ah, i missed that, thanks [14:06] mgz: i'm quite surprised it's not the default [14:06] fwereade__: Hi… I just wanted to let you know that I was able to implement the missing method InstanceId() in the MAAS provider. I've followed your advice and used cloudinit to populate a file containing the instanceId and then the method InstanceId() reads from that file. I've just tested my code in the lab with real machines and a real MAAS deployment and it seems to work ok. [14:07] rvba`, thanks, that's great news [14:07] it is a little surprising sometimes === rvba` is now known as rvba [14:08] not an easy default to flip though, revno and revision-info commands mostly used from scripts that would potentially break if the semantics changed [14:09] rvba, I hope it wasn't too much hassle... (there is a path towards completely dropping provider.InstanceId, but it's really nice not to have to do it right now ;)) [14:09] mramm2, rogpeppe, TheMue, hey, should we be meeting? [14:09] fwereade__: already done :D [14:09] fwereade__: ah yes [14:10] TheMue, sorry [14:10] fwereade__, TheMue, mramm2: i'm there [14:10] rogpeppe, fwereade__: we've seen that after this morning and with the current state on the board we don't need the meeting today [14:10] rogpeppe, funny, I think I'm there too [14:10] rogpeppe, moot anyway :) [14:10] fwereade__: it was actually pretty straightforward apart from one trick: I added a method in cloudinit.go to be able to add a runcmd at the *beginning* of the list of commands. [14:11] Because jujud is started by cloudinit and we need the file containing the instanceId to be present before jujud is started. [14:11] rvba, I've been kinda expecting us to need something like that for about a year, so np there :) [14:12] All right :) [14:13] fwereade__, rvba: another possibility would be to change environs/cloudinit so you pass in the cloudinit.Config instead of returning it [14:14] rogpeppe, hmm, that's rather nice I think, and seems to fit the style of cloudinit-composition already used in maas -- rvba, comment? [14:16] fwereade__: which way do you prefer the passing of the UUID to NewHookContext to be tested? it's done implicit by compared what is passed to what later is retrieved as env variable. [14:16] fwereade__: we might rename cloudinit.New to cloudinit.Config.Configure if we did that, perhaps [14:16] fwereade__: rogpeppe sounds like a good idea… [14:17] fwereade__: the field is only used to set the env variable, there's no other accessor [14:17] rvba, then maybe go with that if that's ok? [14:17] TheMue, so, you need to check it in an actual hook [14:17] fwereade__: ok, I'll give it a try. [14:17] rvba: thanks [14:17] TheMue, ATM you don't test that the env var actually has the correct value [14:17] rvba, rogpeppe: tyvm [14:18] fwereade__: i do, with the latest CL [14:18] TheMue, you do not [14:18] fwereade__: it's generated once, passed to NewHookContext and compared with the env variable in AssertEnv() [14:18] TheMue, you test HookContext [14:19] TheMue, you do not ever test the correct value of the environment uuid, for which you need to actually run a uniter [14:20] TheMue, you do test, perfectly well, that a hook context created with a given UUID will put that UUID into the env [14:20] fwereade__: then i got your comment wrong. thought you meant the passing to the HookContext [14:20] TheMue, that's exactly what I mean [14:20] fwereade__: hmm [14:20] TheMue, if I changed runHook() to pass in trivial.NewUUID(), what test would fail? [14:21] fwereade__: yeah, i know what you mean, but i would have expected a different comment. ;) [14:21] benji: reviewed [14:21] thanks [14:21] TheMue, sorry :) [14:22] fwereade__: will change (or add) the test [14:25] nice, uniter_test has "only" 1632 loc [14:27] TheMue: but getting to know all of them is incredibly rewarding in the long term ;) [14:29] dimitern: thankfully i know all chars, numbers and signs. will sort it and check it in again, then it's more simple. 1859 x "a", 198 x "b", 1329 x "c" ... :D [14:33] dimitern: thanks for the review, I made the changes you suggest. Now for a second review. Any takers? [14:35] benji: cheers [14:37] TheMue, you can do it all in the context type [14:39] niemeyer: ping [14:39] rogpeppe: pongus [14:39] niemeyer: heya [14:39] niemeyer: i'm seeing strange failures when running the uniter tests against go tip, and i'm trying to narrow down the issue [14:40] rogpeppe: Ok [14:40] niemeyer: one symptom in particular i'm seeing is that mgo is returning an error like: Invalid ns [@\x10!] [14:40] niemeyer: the "Invalid ns" comes from within mongod [14:40] niemeyer: the ns itself varies - it looks like garbage [14:41] rogpeppe: Ok [14:41] rogpeppe: ns is the collection name [14:41] rogpeppe: plus the database name usually [14:41] niemeyer: i'm also sometimes seeing a panic in mongoSocket.Acquire [14:42] niemeyer: do you have an idea of places that it might become corrupt? [14:42] TheMue, just (1) drop context id, add env uuid instead (2) s/UniterSuite-%d/%s/ in goodHook, badHook and hookPattern (and use ctx.uuid instead of ctx.id) [14:42] rogpeppe: No.. mgo doesn't use anything unsafe [14:42] niemeyer: yeah, i didn't think so. [14:43] niemeyer: the tests pass ok under the race detector [14:43] rogpeppe: I've seen one of your tracebacks.. it looks like corruption indeed [14:43] rogpeppe: Like, bad corruption, not just a race [14:43] niemeyer: yeah [14:43] rogpeppe: The thing that it said was nil in the traceback was actually a field that was not a pointer [14:44] niemeyer: yeah, and i added a nil check before the place that panicked, and i still got a panic there at some point [14:44] niemeyer: something is quite wrong [14:44] rogpeppe: Might be worth raising that with the core debs [14:44] devs [14:45] niemeyer: i bisected the problem to the scheduler change, but that might be just exposing some underlying GC race [14:45] niemeyer: they're following the issue already, i think [14:45] rogpeppe: It sounds like 1.1 is pretty close to release.. it'd be sad if something like that went out [14:46] niemeyer: i totally agree [14:46] rogpeppe: Oh, you filed something upstream about it already? [14:47] niemeyer: http://code.google.com/p/go/issues/detail?id=5193 [14:47] niemeyer: it's a pity we can't *entirely* rule out our own stuff, as we do use cgo, but tbh the yaml stuff has been stable for ages. [14:48] rogpeppe: Yeah [14:49] rogpeppe: I may actually end up porting goyaml pretty soon as a non-Canonical task [14:49] rogpeppe: This will help us ruling that stuff out as well [14:49] niemeyer: that would be marvellous, yes. [14:52] rogpeppe, sorry to interrupt, I'm getting a "juju home hasn't been intialized" when trying to call the api, But I've got an envionment running - there aren't any units but the state server is there [14:57] mattyw: you'll need to import environs/config and copy this function from cmd/juju: http://paste.ubuntu.com/5676897/ [14:57] mattyw: this is an unfortunate consequence of recent changes [14:57] rogpeppe: Here is a crazy idea: would the goyaml package work over rpc? [14:57] fwereade__: ^ [14:58] niemeyer: hmm, depends on the C API i think [14:58] rogpeppe: Not really [14:58] rogpeppe: What'd go over the wire has nothing to do with C [14:58] mattyw, is this a "juju home..." error coming back from the API server? [14:59] rogpeppe: The point is to quickly build a mock version of goyaml that is type safe [14:59] niemeyer: i'm thinking of the reflection stuff, but perhaps all that is done Go side [14:59] rogpeppe: It doesn't make a difference [14:59] mattyw, wait, that error is a panic, right? [14:59] rogpeppe: The real goyaml would sit on the other side of the wire [15:00] niemeyer: to unmarshal, you'd presumably need to send the other side a description of the kind of type you want to unmarshal into, right? [15:00] rogpeppe, API and yaml might not mix so wonderfully... AIUI yaml is a hassle in javascript [15:00] rogpeppe, but maybe I mistake you [15:00] rogpeppe: I don't know.. haven't used the rpc package at all, so I don't know what it is sending [15:00] fwereade__, yes it is [15:00] rogpeppe: I know it does send information about the real types, though [15:00] niemeyer: it sends gob by default [15:01] fwereade__: I'm not getting your point here: https://codereview.appspot.com/8330043/diff/11001/worker/provisioner/provisioner.go#newcode273 [15:01] rogpeppe: I imagined for the purposes of juju-core, we only marshal a very limited set of types [15:01] fwereade__: processMachines already behaves this way, due to the change in pendingOrDead [15:01] rogpeppe: Which means it'd be easy to custom-code it [15:01] fwereade__: the problem is we've got a real client here, and if you want to use environs.New and have the normal default thing happening, you need to call SetJujuHome [15:01] mattyw, ok, basically an incomplete environ config has somehow made its way into state, and something is throwing a fit because the user environment that usually fills that stuff in is not present [15:02] rogpeppe: and tell the other side what it should unmarshal onto, and send the full value back over the rpc [15:02] rogpeppe, mattyw: sorry, I'm not 100% clear on the exact context [15:02] * rogpeppe goes to look at goyaml [15:02] mattyw, but based on what rogpeppe said I said at lest one untrue thing above [15:03] dimitern, processMachines starts new machines before trashing unknown ones [15:03] rogpeppe: I don't think there's anything interesting to look there [15:03] rogpeppe: The point is to use the real goyaml [15:04] dimitern, if it continues to do this, we could end up with multiple instances of individual agents, and that would be bad [15:04] rogpeppe: and to request unmarshalling over a socket instead of locally [15:04] fwereade__, does it sound like my env hasn't bootstrapped properly? [15:04] fwereade__: so you suggest moving the trashing before starting? [15:04] dimitern, yes please [15:04] niemeyer: i don't think you can - you can't pass pointers over the wire [15:04] mattyw, so this is a real live environment that's doing this? [15:05] rogpeppe: You surely can pass structs containing pointers that are set, right? [15:05] rogpeppe: GiveMeAFoo(data []byte) *Foo [15:05] fwereade__: the point is that, i think, this (previously valid) code will now fail: http://paste.ubuntu.com/5676928/ [15:05] fwereade__, yes, on aws, I've been having trouble with it recently ( "state entity not found" on the juju mailing list) [15:06] fwereade__, but this bootstrap seemed to work [15:06] niemeyer: the problem is that Unmarshal needs to know what it's unmarshalling into [15:06] rogpeppe: Like, GiveMeAFoo? [15:06] fwereade__: sure thing [15:07] niemeyer: if you can enumerate all the possible types for Foo, that'll work. but i'm not sure you can do that easily. [15:07] rogpeppe: I imagined for the purposes of juju-core, we only marshal a very limited set of types [15:07] rogpeppe, I don't consider that code to be valid... the amount of magic we put in the environment config was a fuckup of the first order, and this is IMO just the latest toll it is extracting [15:08] fwereade__: so you don't want us to be able to use juju stuff from a normal Go program? [15:08] rogpeppe, I don't want us to be able to magically infer stuff, no [15:08] fwereade__: it's not about magically inferring stuff. it's about actually using the juju libraries to interact with juju. [15:09] niemeyer: i'm not sure. maybe we actually never unmarshal into a struct. [15:09] niemeyer: but we do [15:09] rogpeppe, the whole problem is the magic inference [15:10] rogpeppe: Hmm.. can't parse that.. :) [15:10] fwereade__: what magic inference? [15:10] fwereade__: thx, needed a bit to follow but i think i've got it [15:10] niemeyer: here's an example of a type that we unmarshal into: http://paste.ubuntu.com/5676945/ [15:10] rogpeppe, everything in environs/config that assumes the existence of $HOME or $JUJU_HOME, or any of the environs themselves that pull stuff out of the env [15:11] fwereade__: the UniterSuite-%d has the comment "prevents cross-pollution". so shouldn't it stay? [15:11] fwereade__: so user code should just duplicate the checkJujuHome function from cmd/juju and be done with it? [15:12] rogpeppe, as far as I'm concerned, if you want to interact programmatically you can supply a valid config [15:12] fwereade__: the config comes from the user's home directory [15:13] fwereade__: i think it's perfectly reasonable that if you write a juju-interacting program that it can understand $HOME/.juju/environsments.yaml just like the normal juju command [15:14] rogpeppe: yep? [15:15] niemeyer: so would we want to hard-code that type as "one of the limited set of types that we use with yaml"? or would we need to break it apart and make an unmarshal RPC for each component piece of it? [15:15] rogpeppe: $JUJU_HOME/environments.yaml [15:16] TheMue: either, right? [15:16] rogpeppe: The former [15:16] rogpeppe: yes [15:16] niemeyer: so every time we change that type, we'd need to update goyaml in sync with it? [15:16] rogpeppe, I think that asking that programmer to be explicit about where he wants his data to come from is pretty reasonable [15:16] rogpeppe: man.. [15:16] rogpeppe: I'm suggesting a *test case* [15:17] niemeyer: ah! [15:17] rogpeppe, if there were lots of people doing that, it might even make sense to export that function [15:17] rogpeppe: To see if things break when we don't have any cgo in the system [15:17] niemeyer: that's well doable, yeah [15:17] rogpeppe: I'm not suggesting you do any such awful hacks permanently, or even to commit something as bad as that [15:17] niemeyer: cool, sorry, i didn't realise it was a totally temporary thing you were suggesting [15:18] rogpeppe: np [15:18] fwereade__: that's what i would suggest [15:18] mattyw, (sorry, btw, I am still thinking... but popping out to get ciggies, brb) [15:18] fwereade__, ok thanks [15:19] mattyw, would you paste me the panic please? [15:19] niemeyer: i'd start by instrumenting goyaml to see what types are actually used (and in fact i think it might just be interface{}) [15:20] rogpeppe: Probably not.. but it's a reduced set either way [15:20] niemeyer: in which case i'm not entirely sure we can make gob work. json might be sufficient though. [15:20] rogpeppe: You can make it work as suggeste [15:20] d [15:20] rogpeppe: By asking the other side to unmarshal onto a specific type, and send back the result [15:20] niemeyer: gob doesn't allow you to unmarshal onto *interface{} by default. [15:21] rogpeppe: Why does that matter? [15:21] niemeyer: because (i think) you need to be able to tell the other side what specific type you want to unmarshal onto [15:21] rogpeppe: As I just mentioned, the wire is seeing concrete types [15:21] niemeyer: and you don't have that information [15:22] rogpeppe: I'm not following.. [15:22] rogpeppe: "you need to be able to tell the other side".. so just do! [15:23] niemeyer: i'm not entirely sure that you can marshal and unmarshal a map[interface{}]interface{} in gob.... but if you register all the possible concrete types, it may work. [15:24] """ [15:24] Interface types are not checked for compatibility; all interface types are treated, for transmission, as members of a single "interface" type, analogous to int or []byte - in effect they're all treated as interface{}. Interface values are transmitted as a string identifying the concrete type being sent (a name that must be pre-defined by calling Register), followed by a byte count of the length of the following data (so the value can be skipped if i [15:24] t cannot be stored), followed by the usual encoding of concrete (dynamic) value stored in the interface value. (A nil interface value is identified by the empty string and transmits no value.) Upon receipt, the decoder verifies that the unpacked concrete item satisfies the interface of the receiving variable. [15:24] """ [15:25] niemeyer: yeah, it may work, but i haven't tried it yet [15:25] fwereade__, http://pastebin.canonical.com/88419 [15:25] rogpeppe: Well, either it works, or there's a bug in the code, or in the documentation [15:26] niemeyer: it's, let's say, a less explored area :-) [15:26] rogpeppe: Ugh.. I see you got another crash, inside the gc this time [15:26] rogpeppe: and Russ is out.. sucs [15:26] ks [15:27] niemeyer: yeah. [15:27] niemeyer: and i have other stuff that i need to be doing too. but... this is important stuff to sort out, i think. [15:28] rogpeppe: If 1.1 is shipped and stuff breaks randomly, yeah, it'll be pretty bad [15:32] niemeyer: yeah, the gob thing works ok: http://play.golang.org/p/XGWLfQgPzr [15:33] rogpeppe: Cool [15:36] niemeyer: BTW here's little go hack i'd not thought of before to give you arbitrary variable-length keys to maps: http://play.golang.org/p/cs0HTON9nj [15:36] niemeyer: i don't think i'd ever actually want to *use* it, but it's kinda cool [15:39] rogpeppe: That's very cool.. hadn't thought about it either (but haven't needed it so far as well) [15:39] fwereade__: I disagree with some of your suggestions, not the important ones though - would you take a look again please? https://codereview.appspot.com/8330043 [15:39] mattyw, would you explain a little bit more about your use case here? you're trying to open an API connection given an env config? [15:39] rogpeppe: It reminds of erlang [15:39] dimitern, ack [15:39] niemeyer: binary trees as keys, yay! [15:40] rogpeppe: I'd call {a, b} as head, tali [15:40] tail [15:40] niemeyer: yeah [15:40] rogpeppe: There are tons of algorithms around this logic.. erlang lists are fully based on them, given the immutability of the language [15:40] rogpeppe: Good one, thanks [15:45] fwereade__, at the moment the use case is just, spin up some units, connect to the api and start adding and removing units to see what I get back from the api [15:46] fwereade__, (the adding and removing of units would be done using the juju command - not the api) [15:46] dimitern, did you propose your changes? [15:48] mattyw, ok, I do see the problem; and rogpeppe, I take your point, but the fact remains that if we'd done it properly in the first place we wouldn't have to deal with any of this absurdity :( [15:49] rogpeppe, config.SetCLIMode()? :( [15:49] fwereade__: i'm not sure. for the user-level logic, it makes sense to fall back to stuff in $HOME [15:49] fwereade__: i'd prefer the logic the other way around actually [15:49] fwereade__: config.SetDaemonMode [15:49] fwereade__: so that user-level client code can proceed with as little friction as possible [15:50] rogpeppe, if stuff is going to fail stupidly I want it to fail on the client where we can see it happening [15:50] fwereade__: haven't we just set stuff up so we'll get a panic on the server not the client? [15:51] fwereade__: yeah, I proposed them [15:51] rogpeppe, what I *thought* we were doing was setting stuff up so we'd get a panic in the *tests* [15:51] rogpeppe, but it turns out that JujuConnSuite shares certain characteristics with the honey badger [15:52] :D [15:52] rogpeppe, and renders all that useless [15:52] the honey badger is the best [15:52] * rogpeppe doesn't know anything about honey badgers [15:53] the honey badger just doesn't give a shit basically [15:53] rogpeppe: http://en.wikipedia.org/wiki/The_Crazy_Nastyass_Honey_Badger [15:54] dimitern: 58m views + 1 :-) [15:56] niemeyer: sum total of ways we use goyaml: http://paste.ubuntu.com/5677075/ [15:57] rogpeppe: Hah :) [15:57] niemeyer: well, from the uniter tests anyway [15:57] rogpeppe: Sure.. you've managed to break things wiht it alone, right? [15:58] niemeyer: yup [15:58] rogpeppe: Superb [15:58] rogpeppe: piece of cake then [15:58] * rogpeppe is on it === deryck is now known as deryck[lunch] [16:04] mattyw, ok, I have not yet come up with the Right answer [16:05] mattyw, the expedient one, though, is as rogpeppe anti-suggested -- copy checkJujuHome from cmd/juju/main.go and call it before you mess with environments locally [16:09] * fwereade__ remains convinced that the CLI is the special case, and that privileging that case to the point where it's inseparable from environ config was a seriously Bad Move [16:10] fwereade__: i'm not sure about the former, but i think you're probably right about the latter. [16:10] does the charm and tools cache also get moved with juju_home? [16:10] hazmat, yeah [16:10] hazmat: i think so, yes [16:10] hazmat, authorized-keys is the wrinkle [16:13] fwereade__, ic. i ended up with authorized-keys-path lookup relative to juju home in addition to absolute lookups. authorized-keys by themselves are content [16:14] hazmat, hell, that is a very good point, would you file a bug please? [16:16] fwereade__, sure [16:16] rogpeppe: I need one more review of the provisioner stuff https://codereview.appspot.com/8330043/ - it should be quick and simple [16:16] hazmat, that is strange, because the code seems to all be in terms of $HOME [16:17] fwereade__, trunk is yes, i needed it for some easier multi-env management so i just added it [16:17] i need to reinstall 64bit i think to get going with juju-core.. doing it in vms isn't quite as nice. [16:18] or amenable to hacking [16:18] on a branch re adding [16:19] fwereade__, bug is here.. i think that's what your referencing. bug 1164601 [16:19] <_mup_> Bug #1164601: JUJU_HOME should support relative lookup paths for authorized-keys-path < https://launchpad.net/bugs/1164601 > [16:21] dimitern: i'm just wondering what is the motivation for the change to processMachines [16:22] rogpeppe: so we can remove unknown/not matching machines before trying to start matching unprovisoned ones [16:22] dimitern: is it just to make sure that we stop unknown instances even when startMachines continually fails? [16:23] rogpeppe: it won't fail more than once now - the machine will be in error state and will be ignored [16:24] rogpeppe: on the other hand, if a machine was started but SetInstanceId failed we'll remove it next time before [16:24] rogpeppe: before trying to start new ones [16:25] dimitern: i'm not sure i see why the order matters [16:25] dimitern: won't it all work ok in the original order? [16:26] dimitern: my reason for asking is that findUnknownInstances and stopInstances may be quite slow, and it might be better if we did the start instance ASAP [16:28] rogpeppe: I should refer you to fwereade__ for better explanation then, sorry [16:29] rogpeppe, if we stop unknown ones before starting new ones, we should be able to avoid ever having 2 instances of the same agent running at the same time [16:29] fwereade__: ah! [16:30] * dimitern still doesn't get how this could happen though.. [16:30] dimitern, SetInstanceId fails [16:31] fwereade__: assuming ec2 eventual consistency does actually tell you the names of all recently started machines of course :-) [16:31] rogpeppe, yeah, there's that [16:31] rogpeppe, call it a best effort that should hit a number of cases in which we would otherwise be encouraging that sort of inconsistency [16:31] fwereade__: so should an agent check that its instance id is set before carrying on to do stuff? [16:32] fwereade__: because otherwise it will be killed ad-hoc [16:32] rogpeppe, probably, that would be sensible [16:32] rogpeppe, and that would eliminate the case I was trying not to think about, too [16:32] if SetInstanceId fails, how come we'll have 2 instances having the same agent? [16:33] dimitern: we've started the instance [16:33] dimitern: but we haven't managed to set the instance id of the machine [16:33] dimitern: so the new machine comes up [16:33] dimitern: and starts running an agent [16:33] rogpeppe: but the MA is local to the instance, right? each instance will have its own MA [16:33] dimitern: but in the meantime, the provisioner sees that there's a machine without an instance id [16:33] rogpeppe, the only one left over is the initial password-setting... I think we could possibly get an ugly race there, in which we have 2 machines running the agent but neither able to log in [16:34] dimitern: and starts another instance [16:34] rogpeppe, but it would be *very* hard to trigger, I think [16:34] fwereade__: it could be mitigated if the agent checked the instance id *before* setting the password [16:34] rogpeppe, oh, maybe not [16:34] rogpeppe, yep [16:34] rogpeppe: yeah, a third one, while the other two (one with instid, other without) are still running [16:35] rogpeppe, but wait... ok, I need to go chop vegetables and think about this [16:35] rogpeppe: and we'll have 2 valid running machines and one stray [16:35] rogpeppe: how does this translate to having the same agent on 2 machines? [16:36] dimitern, +100 to rog's suggestion of making the UA resilient to this [16:36] dimitern, two machines are both configured to run the agent for machine 3 [16:36] dimitern, both of them start running wordpress/0 [16:36] dimitern, everything goes to shit [16:36] fwereade__: ah! the machineid is the same, but one doesn't have instanceid! [16:36] dimitern, yeah [16:37] got it finally :) cheers [16:37] fwereade__, dimitern: ha, it's easy to check the InstanceId before the password's set [16:37] rogpeppe: where? [16:37] dimitern: in MachineAgent.Entity [16:37] rogpeppe: shouldn't this be a separate CL? [16:38] dimitern: definitely [16:38] dimitern, yes [16:38] dimitern: but i'd like a comment [16:38] dimitern, sorry, I should have been clear about that [16:38] dimitern: in processMachines [16:38] dimitern: because this is subtle stuff [16:38] dimitern, yeah, maybe best if you rewrite the explanation according to your understanding [16:38] rogpeppe: cool, write a review comment and I'll add a card to do it in a follow-up [16:39] dimitern: ok [16:39] fwereade__: sgtm, will just add the "same machine ID" stuff to the comment and it should be clear enough [16:39] fwereade__: of course, we are assuming that StopInstance is synchronous here [16:40] fwereade__: the machine agent should probably check that the instance id of the machine matches the instance id of the machine it's actually running on [16:42] fwereade__: in fact, with that check we *could* start new machines before stopping unknown ones, i think [16:46] rogpeppe: ok, noted down 2 changes for a follow-up: MA should ensure the instance id of the machine in state matches the one it's running on, and then set the password to connect to state [16:47] dimitern: sounds good [16:47] rogpeppe: sounds good? [16:47] dimitern: you've got a review BTW [16:47] :) great, I'll add a card [16:47] rogpeppe: tyvm [16:49] fwereade__: you're also fine with the changes I did? [17:02] rogpeppe: where's that MachineAgent.Entity you mentioned? [17:02] dimitern: in jujud/machine.go [17:04] rogpeppe: and when's the password being set - I can't see [17:04] dimitern: in RunAgentLoop (by openState) [17:05] rogpeppe: ok, thanks [17:06] fwereade__: ping [17:08] ok i'm off, good night all === ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: - | Bugs: 2 Critical, 53 High - https://bugs.launchpad.net/juju-core/ === deryck[lunch] is now known as deryck === BradCrittenden is now known as bac === BradCrittenden is now known as bac [18:50] niemeyer: i did the goyaml-as-rpc thing and it still crashes (earlier actually, but in a similar way) [18:50] rogpeppe: Oh, "sweet" [18:50] lol [18:51] niemeyer: here's the rpc server BTW: http://paste.ubuntu.com/5677572/ [18:51] niemeyer: i got a fair few of the standard goyaml tests passing but not all [18:51] rogpeppe: Thta's neat! [18:52] rogpeppe: Well, that's not necessary [18:52] rogpeppe: We just need enough of it to get the juju tests passing [18:52] niemeyer: indeed. but it was useful to get a reasonable assurance that it was doing something right [18:52] rogpeppe: +1 [18:52] rogpeppe: So there we go [18:52] niemeyer: yup [18:52] rogpeppe: We should really raise a harder flag on that one [18:53] rogpeppe: Did you increment that data on the issue yet? [18:53] niemeyer: it's consistently dying here now: http://paste.ubuntu.com/5677577/ [18:53] niemeyer: just about to [18:55] rogpeppe: Some weird addresses there [18:59] niemeyer: if put an "if nil" check and the problem moved elsewhere. darn heisenbugs! [18:59] s/if/i/ [19:01] rogpeppe: Okay, is that nil value still that in that field that is actually a by-value field? [19:01] rogpeppe: If so, it's of course impossible for it to be nil without a GC error, so this might be a good case to raise [19:02] niemeyer: i'm checking for socket==nil in mongoSocket.Acquire [19:02] niemeyer: it looks like that can be nil [19:02] rogpeppe: Okay, I'm thinking of that traceback on server.go [19:03] niemeyer: except i'd expect the first arg to Acquire to be 0x0 if that was the case [19:03] rogpeppe: If it can be reproduced, that'd be an easier one [19:03] niemeyer: i just want to catch it in the act [19:03] rogpeppe: Because it was claiming there was a nil value that was impossible to be nil without the GC being on crack [19:04] niemeyer: really? i'm not sure i see that [19:04] (or an unsafe error, which we can't have anymor) [19:04] rogpeppe: Do you have the old traceback around? [19:04] niemeyer: how old? :-) [19:05] rogpeppe: The first one I saw you mentioning on that regard [19:05] rogpeppe: So pretty old I suspect [19:06] niemeyer: this one is the first one i saw, i think: http://paste.ubuntu.com/5677624/ [19:06] niemeyer: (as reported in the issue) [19:06] niemeyer: it looks fairly similar [19:06] rogpeppe: Okay, it wasn't that one [19:09] rogpeppe: Hmm [19:09] rogpeppe: There's a race there [19:10] rogpeppe: It may be an actual issue in mgo [19:10] niemeyer: FWIW i ran the tests with the race detector enabled ok [19:10] rogpeppe: I don't know about that, but there's a logic error in the acquireSocket function [19:11] rogpeppe: Open session.go in 2898 [19:11] rogpeppe: Invert the two Acquire calls so that RUnlock is done after they're the Acquire [19:11] niemeyer: s.masterSocket.Acquire() ? [19:11] rogpeppe: Both slave and master [19:12] rogpeppe: They both have the same mistake [19:12] niemeyer: ok, will do [19:12] rogpeppe: It's entirely possible that in a concurrent scenario the sockets are niled out before the Acquire is done, because they're outside of the lock [19:12] niemeyer: yes [19:16] rogpeppe: Hah.. on a unrelated note, check out this weird bug that was fixed in tip: http://play.golang.org/p/FceemIemVw [19:17] niemeyer: ah yes, i caught that one going through [19:17] niemeyer: apparently some people were relying on it [19:17] rogpeppe: Indeed.. I got an error report about that :) [19:18] rogpeppe: How's the test going? [19:18] niemeyer: just crashed that second [19:18] rogpeppe: Where? [19:18] niemeyer: same place [19:19] rogpeppe: Can I see the new traceback? [19:19] niemeyer: http://paste.ubuntu.com/5677661/ [19:20] niemeyer: the line number's slightly different because the start of Acquire does this check: [19:20] niemeyer: http://paste.ubuntu.com/5677666/ [19:20] niemeyer: so it really does look like corruption [19:21] niemeyer: otherwise the first panic would have been triggered [19:21] rogpeppe: Wow.. yeah [19:22] rogpeppe: What's the Go issue again? It scrolled out of view [19:22] niemeyer: 5193 [19:22] rogpeppe: Thanks [19:22] niemeyer: (what, no search?) [19:23] rogpeppe: I'm lazy.. but if I have to debate that with you it doesn't work anymore.. :-) [19:23] lol [19:25] rogpeppe: The fact mgo puts every socket on their own goroutine is probably helping to trigger the bug [19:25] niemeyer: yeah. [19:25] rogpeppe: Or.. the reading side of the socket, anyway [19:36] rogpeppe: I sent a private ping to Rob about the issue, just to make sure it won't be left behind [19:36] niemeyer: thanks [19:36] rogpeppe: np.. usually I wouldn't be worried since rsc is CCd, but given he's away, I'm actually a bit concerned [19:36] niemeyer: it is marked as Priority-ASAP [19:38] rogpeppe: It can easily be unmarked as well :) [20:10] right, that's me utterly done for the day [20:10] niemeyer: thanks for the assistance [20:10] hi rogpeppe [20:10] night rogpeppe [20:10] thumper: hiya [20:10] thumper: how's tricks? [20:11] rogpeppe: good, was just thinking about code, then realised that today is the last day for peer reviews [20:11] thumper: anything you wanna chat about while we're online together? :-) [20:11] so I have those to do instead [20:11] not really... [20:11] thumper: oh frick [20:11] reviews suck the life out of me [20:11] I know they shouldn't [20:11] but they do [20:11] at least I'm not doing manager ones [20:11] thumper: code reviews or peer reviews... or both? [20:11] I would have had eight extra to do otherwise [20:12] just peer / manager / report reviews [20:12] code reviews are fine [20:14] * rogpeppe grabs a beer and settles down to do some reviews [20:15] thumper: where should i have found out about the peer review deadline, other than by word of mouth? [20:15] rogpeppe: in the all hands email [20:15] about reviews [20:15] quite a few weeks back [20:15] gave the timeline in it [20:17] thumper: ah, found it. i wonder if "by 5th April" is by end of day or end of previous day. [20:18] it normally means by the end of the 5th in the west-most time zone [20:18] :) [20:20] thumper: cool. tomorrow, then... [20:21] thumper: have a good day [20:21] rogpeppe: keep the beer for it though [20:21] rogpeppe: it helps [20:21] thumper: i certainly will [20:21] rogpeppe: have a good night [20:57] * thumper has put off reviews long enough, and goes to log into the system [23:52] hi davecheney [23:53] davecheney: https://code.launchpad.net/~thumper/juju-core/package/+merge/157020 [23:53] davecheney: and https://code.launchpad.net/~thumper/+recipe/juju-core-daily [23:53] davecheney: you can see in the built packages that the files are there and the man page too