[02:53] davecheney: hi there, i'd love a review if you have a change sometime https://codereview.appspot.com/6940069/ [02:57] also maybe https://codereview.appspot.com/6923056/ [03:00] wallyworld_: no probs [03:00] awesome thanks === davecheney is now known as geddan [07:26] wallyworld___: heading off, are there any more reviews to do tonight ? [08:44] breakfast break, bbiab [08:57] mornin' all [09:15] morning [09:15] I'm running into a test suite hang, where I end up with a "mongod " process, and the test suite ends up being killed after waiting 5 minutes. [09:15] Anyone seen that and can help me fix it? [09:16] (very reproducible, it happens with about 10 of the juju-core packages) [09:16] rogpeppe, fwereade: ^^ ? [09:16] jam: do you know which test is hanging? [09:16] rogpeppe: START: /home/jameinel/dev/go/src/launchpad.net/juju-core/cmd/juju/addrelation.go:0: AddRelationSuite.SetUpTest [09:17] TestAddRelation [09:17] but similar behavior is in a bunch of packages, so there may be a smaller one we could track down. [09:17] jam: have you got a stack dump out of it, so that we can see what line it's hanging on? [09:19] rogpeppe: well, if I try to connect with gdb, I get a permission denied, if I try as root, it dumps a memory map, perhaps there is a back trace. [09:20] jam: i usually do it by killing the process with SIGQUITE [09:20] QUIT [09:20] rogpeppe: http://paste.ubuntu.com/1444879/ [09:20] rogpeppe, jam, heyhey [09:20] fwereade: yo! [09:20] jam, sorry, I'm not familiar with that problem [09:20] jam: hmm, that's not very helpful :-) [09:20] rogpeppe: SIGQUIT output: http://paste.ubuntu.com/1444882/ [09:21] formatting seems hard to read [09:21] jam: i agree, but i lost that argument a while ago (in go core) [09:21] It looks like the test is hanging in a cleanup section, possibly as a bad side effect of something failing. [09:22] jam: it looks to me as if it's hanging trying to connect to mongo [09:22] jam: ah, i see where your bad formatting is coming from [09:22] jam: you probably just typed ^\ [09:22] rogpeppe: correct [09:23] jam: which is actually ok now on go tip, i believe, but in earlier versions, go test doesn't catch the signal [09:23] jam: so you get two stack dumps interleaved [09:23] rogpeppe: ah, interestting. new dump: http://paste.ubuntu.com/1444889/ [09:23] jam: the easiest way around it is to compile the binary with go test -c, then run that [09:23] using "kill -SIGQUIT" [09:24] jam: is mongod actually running? [09:24] rogpeppe: I've tried it with it alive and dead [09:24] that one could have been stopped [09:24] one more with it "start/running" [09:25] http://paste.ubuntu.com/1444893/ [09:25] rogpeppe, offhand, does a `func (*State)Info() *Info` method sound like a sensible thing to you? [09:26] fwereade: i've been wondering about that, off and on [09:26] fwereade: i *think* so. it could just return the info it was opened with [09:27] rogpeppe, well, it would be very convenient for the deployer, which needs to tell its deployees how to connect [09:28] fwereade: the problem is that it's not always appropriate to use [09:28] fwereade: for instance, we might have opened the state with a localhost address, which might not be appropriate to use in a container [09:29] jam: could you try it against go tip on amd64, just to see if it's a go issue? [09:29] rogpeppe, well, we're always going to be deploying things which use a modified variant of the deployer's state info anyway, so having another field to swap out isn't the end of the world I think [09:30] rogpeppe: is there a simple way to do so? [09:30] davecheney, morning [09:30] rogpeppe: dimitern says he sees the same thing [09:30] jam: hg clone the go repo, cd src; all.bash [09:31] jam: then set $GOROOT=your-go-repo, and rebuild [09:31] jam: (i actually have a separate juju tree that i compile against 1.0.2 rather than tip) [09:33] jam: if you just test cmd/juju on its own, do you see the same hangup? [09:33] rogpeppe, but there is a minor wrinkle... so, if I expose an Info method and use that to create the deployer manager, I no longer need to pass in a separate (deploying) entity name, because that is already in the state info [09:33] rogpeppe, and this mostly feels like a win, *so long as* we always want to run an agent against a state opened on that agent's behalf [09:34] rogpeppe: yes [09:34] this is doing "cd cmd/juju; go test -gocheck.v -gocheck.vv" [09:35] rogpeppe, that seems reasonable to me, but it will complicate the tests a bit, because the test State is not connected with an EntityName matching the arbitrary unit/machine I want to use for testing [09:35] fwereade: the entity name issue makes me feel like this might not be such a great idea [09:36] fwereade: because the info isn't just about the state, it's also about who is connecting to the state [09:36] rogpeppe, well, yes, this is the property of the Info that I want to use [09:37] rogpeppe, what I'm trying to figure out is whether it's ok for me to do so or whether it's icky and wrong [09:37] fwereade: it means we can't hand a state off to some code without giving them our password, which seems... i dunno, maybe it's not an issue, it just seems potentially so [09:38] rogpeppe, ok, alternative: how would you feel about Addrs() and CACert() methods? [09:39] fwereade: that feels less controversial [09:39] rogpeppe, ok, thanks :) [09:39] fwereade: although there's still the localhost vs non-localhost address issue [09:39] rogpeppe, well, we always need CACert and we sometimes need Addrs [09:40] rogpeppe, and always have to supply our own EntityName and Password [09:41] fwereade: shouldn't you be somewhere else, drinking tall drinks with umbrellas in them ? [09:42] davecheney, nah, I took a holiday on fri to synergise with the public holiday on thu, working again until EOD weds [09:42] davecheney, that said, when did a breakfast cocktail ever fail to improve the day? [09:42] * fwereade falls over [09:43] * rogpeppe wouldn't mind a bloody mary [09:44] mmmm, rum [09:44] lads, i am very happy [09:44] juju now works properly in ap-southeast-2 [09:44] davecheney: good work! [09:45] also, if anyone see's frank [09:45] please send my apologies [09:45] i had to roll back his change because it broke the initial auth phase [09:45] i don't know how [09:45] and I wasn't going to get into it trying to cut a release [09:48] speaking of that does cmd/jujud import environs/openstack [09:48] i noticed that I didn't have to update the deps when doing the deb build [09:53] davecheney: probably not yet [09:56] jam: that is fine [09:56] it was only something i noted doing the build [10:04] davecheney, was it maybe just that the jujud tests were expecting the FW to act differently somehow (ie just hadn't been run)? or something weirder? [10:04] nah, the auth code is quite touchy [10:04] rogpeppe: would you say that is a fair statement ? [10:21] jam: the standup is at 10:30 or at 11 utc? [10:27] davecheney: any auth code is touchy :-) [10:27] davecheney: i'm interested to know how it failed BTW [10:28] davecheney: here's an interesting little problem i found when trying to build juju for 386 (to try to replicate jam's problem) [10:28] rogpeppe: revert your working copy to r780 and see [10:29] davecheney: if you do, GOARCH=386 GOBIN=/tmp/foo go install launchpad.net/juju-core/cmd/juju [10:29] davecheney: the binary doesn't end up in $GOBIN [10:29] davecheney: but in $GOBIN/linux_386 [10:30] davecheney: which means our install script in environs/tools.go fails to work [10:30] rogpeppe: yup [10:30] cross compile will do that [10:30] GOHOSTARCH=386 ./all.bash [10:30] might help [10:30] is jam using 386 ? [10:30] ewww [10:30] davecheney: yes, as far as i could make out [10:31] davecheney: see http://paste.ubuntu.com/1444893/ [10:31] davecheney: note, for instance: /build/buildd/golang-1.0.2/src/pkg/runtime/zsema_386.c:146 +0x29 [10:31] davecheney: yes to 386 I believe [10:31] davecheney: i'll try the GOHOSTARCH thing [10:31] those pointers are rather small :) [10:32] dimitern: 11 [10:32] jam: ok, wasn't sure [10:32] hello. [10:32] * davecheney waves at Aram [10:33] dimitern: I run juju and launchpad in a VM to avoid having Mongo exposed to the world [10:33] and I tend to run 386 vms [10:33] Aram: hiya [10:33] jam: and it worked that way? [10:34] jam: honeslty, juju probalby wont' work compiled on 386 [10:34] we've never considered that case [10:35] it'll probably produce a stillborn toolset using juju bootstrap --upload-tools [10:35] davecheney: i dunno, why should it be a problem? [10:35] rogpeppe: i'm worried about a default-series kind of 'oh, we never tested that' situation [10:36] davecheney, jam: cmd/juju tests pass for me, at any rate [10:44] I'm pretty sure juju tests worked for me at one point. Though perhaps not on this VM. More oddly, why would it be hanging trying to talk to mgo based on CPU word size? [10:44] jam: i doubt that is the problem [10:45] but i am concerned there are assumptions baked into the tests that assume amd64 [10:45] i'm pretty sure environs/ec2 image tests assume amd64 [10:45] jam: just tried with latest juju trunk against i386 and all tests pass for me [10:45] jam: haven't tried live tests yet [10:45] rogpeppe: my concern is tests != juju bootstrap --upload-tools [10:46] davecheney: exactly, i'm just gonna try live tests [10:46] unrelated, juju tests are sometimes flaky in a VM, especially with -gocheck.vv [10:46] there's some timing issues exposed only in VMs. [10:47] davecheney: but jam's test failures were local ones, and that's what i was trying (and failing) to reproduce [10:48] Aram: well, it fails with and without -gocheck.vv [10:49] I just used that to see what was actually hanging, though SIGQUIT worked out better for that [10:52] ah, here's a problem: [10:52] http://juju-dist.s3.amazonaws.com/tools/mongo-2.2.0-precise-i386.tgz: [10:52] 2012-12-17 10:50:22 ERROR 404: Not Found. [10:53] other than that, it all seems to have worked ok [11:02] rogpeppe: the 386 thing is probably a red herring. dimitern is hanging as well, but he's on a amd64 [11:02] jam: ah. [11:03] jam: i just wanted to make sure that i could run tests ok in a similar env to yours [11:03] sure. [11:03] jam: (and it's an interesting experiment too...) [11:03] btw, do you know the difference between "apt-get install mongodb" and "apt-get install mongodb-server" ? [11:03] (I used to have nothing installed, installed mongodb-server, and it stopped the 'could not find binary' failures. I'll try installing the former and see if that changes anything) [11:05] same failure (in Semacquire during (*mongoCluster).AcquireSocket [11:05] dimitern is also running on bare metal, vs a VM [11:09] rogpeppe: rev 780 seems to fail in the same place [11:10] jam: hmm. [11:25] jam: what version of mgo are you using? [11:25] jam: (i.e. what revision number are you on?) [11:29] rogpeppe, jam: lots of nice deletions in https://codereview.appspot.com/6938072 [11:32] dimitern, and you, if you feel like reviewing ^^ [11:32] mongodb-2.0.6-1ubuntu4, mgo: v2/mgo/ revno 183 [11:32] fwereade: 10x, I'll look at it [11:32] Aram, or you, sorry, I didn't see you come on -- morning :) [11:33] yo [11:34] fwereade: looking [11:34] rogpeppe: ^^ [11:34] (rev 183 of mgo) [11:34] jam: darn, same as me [11:41] jam: thanks for the update on dependencies, reading through it now. [12:21] fwereade: FWIW you have LGTM from me [12:22] dimitern, cool, thanks [13:51] rogpeppe, dimitern: a followup: https://codereview.appspot.com/6944058 [13:54] fwereade: why conflate provisioner and firewaller? is it never reasonable to have one without the other? [13:56] fwereade: also, perhaps HostPrincipalUnits would be better as HostUnits - you can't have subordinates without principals. [13:56] * dimitern looking [13:56] fwereade: (and principals imply subordinates too) [14:03] rogpeppe, at the moment, we always want to run the FW and the P together, and the jobs are explicitly not meant to be 1:1 with workers [14:03] rogpeppe, things may migrate in future [14:04] rogpeppe, re HPA, the MA's job is to host the principal units -- those units' agents may then host subordinates, but the MA doesn't know or care about that [14:07] fwereade: from the point of view of AddMachine, i think HostUnits makes sense - we're asking that machine to host any units, and we don't care if it's the machine agent interpreting those flags, or something else. [14:11] rogpeppe, I'm fine with that too :) [14:11] rogpeppe, HostUnits then has an uncomfortable similarity to WatchUnits though [14:12] rogpeppe, HostPrincipalUnits/WatchPrincipalUnits is clearer [14:12] fwereade: well, if you think about it, WatchPrincipalUnits is only used for the top level, but that's just an implementation detail - we want the machine to host any kind of unit in the end [14:13] fwereade: done [14:14] fwereade: another thing i'm not entirely sure about: there was a nice correlation between worker constant names and the Workers function, but Host* doesn't relate strongly to "job" [14:14] rogpeppe, both the jobs are currently about hosting other things [14:15] rogpeppe, JobHostBlah felt over-wordy [14:15] fwereade: so if we had more constants, they might not all have a common element in the names? [14:16] rogpeppe, that might be the case -- I presume you are advocating for Job* [14:16] fwereade: i'm not sure. i'm possibly advocating changing AgentJobs to HostJobs or something along that kind of line [14:16] rogpeppe, am definitely fine with that [14:16] rogpeppe, ah not so much with that, sorry [14:16] rogpeppe, well, letme think ;) [14:18] rogpeppe, I think I would maybe like HostPrincipalUnits/RunEnvironController, probably with a Job prefix [14:19] fwereade: those names seem, erm, a bit verbose [14:20] fwereade: i'm thinking that from a top-down pov, we're declaring what kinds of things a given machine will run [14:21] fwereade: so maybe RunUnits, RunEnvironController? [14:21] fwereade: i definitely think we should lose the "Principal" - we want a machine to run all units, not just principals [14:24] rogpeppe, I still feel the other way re Principal, but not enough to fight it :) -- Run SGTM [14:25] fwereade: actually Task might work [14:25] fwereade: UnitsTask, EnvironControllerTask [14:25] rogpeppe, uncomfortable collision with jujud.task [14:26] fwereade: that should probably be named worker [14:26] fwereade: because it corresponds to the interface implemented by workers [14:26] rogpeppe, well, niemeyer is adamant that there is a distinction between a task and a worker, and an upgrader is not a worker [14:27] fwereade: hrmph [14:27] rogpeppe, or that is my understanding, which is probably flawed [14:27] rogpeppe, I don't think there's a meaningful distinction [14:27] fwereade: i *think* that was due to the old state Worker constants [14:27] rogpeppe, ha, ok [14:27] fwereade: which were the high level description of what to do [14:28] fwereade: and i'm proposing to rename those to tasks, to make the distinction clear. [14:29] fwereade: that starts to make quite a lot of sense to me, actually. [14:29] fwereade: the machine agent uses some number of workers in order to fulfil its allocated tasks. [14:30] rogpeppe, task feels a bit too singular for the intent here, I think -- in my mind I'm not thinking of workers at all, I'm thinking of tasks (ie runTasks(...task)), and I'm thinking of a Job as something that is made up of some number of tasks [14:30] fwereade: for me, "task" is a description of something to do, and "worker" is a something that accomplishes that [14:31] fwereade: "job" works ok as well as "task", but seems more like something with a defined beginning and end. [14:31] fwereade: whereas the tasks we're talking about are indefinite in extent [14:32] rogpeppe, IMO "job" is a pretty good name for a group of ongoing tasks [14:33] fwereade: "job" feels like something that can be finished. then again, maybe "task" does too... [14:33] fwereade: "we task this machine with the job of running the environment controller" :-) [14:33] rogpeppe, haha [14:34] rogpeppe, (IMO task feels >= job re finishitude) [14:35] fwereade: ok. UnitsJob, EnvironControllerJob ? [14:35] rogpeppe, I still want the common prefix for some reason [14:36] rogpeppe, what's the argument the other way? [14:36] fwereade: we had Worker as a suffix before; it seems to read more nicely to me, at any rate [14:36] rogpeppe, heh, it always seemed icky to me [14:37] fwereade: JobUnits seems... non-obvious. JobRunUnits perhaps [14:37] afk, one mo [14:38] rogpeppe, hmm: JobHostUnits/JobManageEnviron? [14:39] fwereade: sgtm [14:39] rogpeppe, cool [14:44] fwereade: and i think the task interface *should* probably be renamed to worker, but maybe in another cl [14:45] rogpeppe, I'd prefer the opposite change, I will let it stew in my mind for a bit [14:45] fwereade: renaming worker to task is a much bigger change [14:45] fwereade: we'd have to rename the whole worker package hierarchy [14:46] fwereade: (and i quite like the "worker" name currently - it seems to represent what's going on quite well) [14:46] rogpeppe, (I have worse thoughts -- "task/uniter", uniter.NewTask(), etc, which would then let me get a Deployer naming I'm happy with) [14:47] rogpeppe, but, anyway :) [14:48] fwereade: i'm not quite sure what you're thinking of there [14:48] rogpeppe, it's really not important -- as I say I will ponder worker/task some more, and perhaps attain enlightenment [16:35] rogpeppe, fwereade: PTAL https://codereview.appspot.com/6940073 [16:35] * fwereade looks [16:43] brb [16:53] back [17:26] fwereade: you've got a review [17:27] rogpeppe, awesome, ty [17:29] rogpeppe, I dunno -- at the moment it seems that an agent with no jobs is just not a useful thing to have [17:29] rogpeppe: would you ? ^^ [17:29] fwereade: it is for tests :-) [17:29] dimitern: will do [17:30] rogpeppe, ha, well, yeah -- it always was, though, and that didn't stop us then ;p [17:30] fwereade: although a test for no jobs in AddMachine seems reasonable too [17:30] rogpeppe, there is one, isn't there? [17:31] fwereade: quite likely, i didn't check [17:31] rogpeppe, I think I wrote one :0 [17:32] rogpeppe, +1 on MachineJob, I think [17:32] fwereade: cool [17:32] rogpeppe, then I guess Jobs() rather than AgentJobs() [17:32] fwereade: didn't i suggest that? [17:32] rogpeppe, ha, hadn;t read far enough [17:34] rogpeppe, not sure about the bitmask though -- feels like going to extra effort to restrict the number of jobs we have and make it fiddlier to add or remove them (when we eventually want to do that) [17:34] fwereade: i don't think it makes it any fiddlier to add or remove jobs [17:34] fwereade: and i think most logic gets simpler [17:34] rogpeppe, more complicated txn retries anyway, surely [17:35] fwereade: (no need to check for dupes for example) [17:35] fwereade: why is the txn retry more complicated? [17:35] fwereade: we're just setting an integer, no? [17:35] rogpeppe, $addToSet? [17:36] rogpeppe, we can $addToSet and $pull jobs out of an []int without having to mess around retrying because we're trying to store bitmask X which was dervived from Y which has become Z in the meantime [17:36] fwereade: we don't want to use that for jobs, do we? [17:37] rogpeppe, won't we? [17:37] rogpeppe, so the bootstrap machine will be the single bootstrap machine forever? [17:37] rogpeppe, (forgive the terminology) [17:40] fwereade: there's nothing to say we can't change the jobs for a machine; it just means we can't have an AddJob/RemoveJob API rather than a SetJobs API. [17:40] fwereade: ... which is quite possibly a sufficient reason [17:41] rogpeppe, I *think* it is... or at least it looks close enough from here [17:41] fwereade: but even if we store stuff in the database as []int, we could still have a bitmask as the externally visible API [17:42] rogpeppe, is there some advantage of a bitmask I'm not seeing? it seems strictly worse to me -- by the time we have enough flags we care about the extra space of an [], we have enough flags that we're worried about space in a bitmask [17:42] fwereade: it's not about space [17:42] rogpeppe, not looping through it? [17:42] fwereade: it's about the ease of checking for membership and the automatic duplicate deletion [17:44] rogpeppe, how common are these operations, and how much code do we waste each time we do them? [17:44] fwereade: not that common - most code never manipulates a set of jobs. maybe 8 or so lines in the places we do. [17:45] fwereade: to me, a bitmask represents like a natural representation of a set [17:45] s/represents/seems/ [17:46] rogpeppe, to me, it's something I use when there's a compelling reason to pack stuff into a small space and never besides :) [17:46] fwereade: ha [17:46] fwereade: C vs Python background i guess :-) [17:46] bitmaks are awesome. [17:47] rogpeppe, it's not like I *didn't* spend time up to my elbows in bits, it's just that that time was not generally otherwise representative of sensible development practices ;) [17:48] fwereade: using bitmasks as sets actually works pretty well, and isn't unsafe [17:50] rogpeppe, still feels like an [] is the more natural mongo representation, and I'm still not sure the other benefits really outweigh that [17:51] fwereade: i think i agree with that. seems like a pity that mongodb doesn't have bitwise ops, but there we go [17:53] rogpeppe, cool, cheers [17:57] dimitern, I'm afraid cath has just gone out without laura, so I won't finish your review any time soon: probably tomorrow :( [17:58] fwereade: sure, no probs [17:58] rogpeppe, dimitern, Aram: but I do have another CL for a horrifying moronic uniter bug: https://codereview.appspot.com/6946071 [17:58] I would very much appreciate reviews of that [18:00] fwereade: a surprisingly large change for such a small bug :-) [18:13] dimitern: sorry, run out of time on your review. FWIW, i got stuck on getId. i couldn't work out whether it's the right function with a dubious doc comment, or if it should be refactored somehow [18:17] * rogpeppe is off for the evening. g'night all. === flaviamissi is now known as flaviamissi_ === flaviamissi_ is now known as flaviamissi === carif_ is now known as carif [23:35] fwereade: ping