[01:24] davecheney: Yo! [01:24] davecheney: Are you staying around for a bit? [01:25] niemeyer: yy [01:29] davecheney: I guess the answer is no.. :) [01:29] yeah, i'm arround [01:29] davecheney: Ah, hey :) [01:30] davecheney: I'm about to get live tests working, and have a set of changes in the pipeline.. if you'll be around for a bit, we can quickly interact on those [01:30] niemeyer: excellent [01:30] davecheney: It's all pretty agreeable stuff [01:30] let me know what you need me too do [01:30] davecheney: Btw, I have a connected RPi turning a led on and off here :-) [01:31] nice [01:31] niemeyer: http://dave.cheney.net/2012/09/25/installing-go-on-the-raspberry-pi [01:31] Boards with GPIO <3 [01:31] finally wrote it up last night [01:31] davecheney: Oh, sweet [01:33] davecheney: Very cool [01:33] davecheney: I got Go working yesterday too [01:33] odessa(~/devel/src/launchpad.net/juju-core/testing) % go test ../cmd/juju [01:33] --- FAIL: TestPackage (0.00 seconds) [01:33] mgo.go:65: exec: "mongod": executable file not found in $PATH FAIL [01:33] FAIL launchpad.net/juju-core/cmd/juju 0.116s [01:33] ^ with your review comments applied [01:33] ? [01:34] davecheney: I don't understand.. I didn't suggest any changes to the path or whatever? [01:34] niemeyer: nah, this is just responding to your comments in ,https://codereview.appspot.com/6560045 [01:34] davecheney: Ah, ok, just showing how it looks like.. super, thanks [01:34] davecheney: Phew.. I thought it was all broken :) [01:35] nah, just testing that it does work as expected if you don't have mgo installed [01:36] davecheney: Looks good [01:38] ty [01:55] Amazon is taking ages to allocate a machine apparently.. and S3 is failing.. not a good night to test it live [01:55] Anyway, I'll break the CLs down [02:00] https://codereview.appspot.com/6572050 [02:02] niemeyer: Proposal: https://code.launchpad.net/~dave-cheney/goetveld/002-add-darwin-termios-support/+merge/126362 [02:02] for some reason goetveld doesn't create CL's, only LP reviews [02:03] niemeyer: try another zone ? [02:03] i ofter use southeast-1 [02:04] https://codereview.appspot.com/6573050 [02:09] https://codereview.appspot.com/6579043 [02:25] error: Failed to update merge proposal log: EOF [02:25] Maybe it's my connection that is in a bad state [02:25] https://codereview.appspot.com/6567048 [02:25] davecheney: Last for the day ^ [02:26] niemeyer: i haen't seen that one for a while, the EOF [02:26] reviewing last CL now [02:37] Shower.. will pass by in a bit.. [02:37] davecheney: Thanks for the reviews! [02:39] kk [02:39] niemeyer: no worries [02:50] davecheney: Okay, I guess I'll merge this stuff [02:50] davecheney: So people can make progress tomorrow on top of it [02:50] davecheney: How's config-get going there? [02:51] niemeyer: dunno, i should pull that branch and see if it compiles [02:51] davecheney: I mean the stuff that you were pushing forward since the sprint [02:51] davecheney: Or what have you been up to? [02:53] i've been on swap days since the sprint [02:53] Today as well? [02:54] davecheney: ? [02:56] niemeyer: getting yelled at from downstairs, back in a while === davecheney is now known as davecheney-a6faf [05:56] morning [05:57] morning === davecheney-a6faf is now known as davecheney-bb1ad [06:57] mornings [07:01] fwereade: http://paste.ubuntu.com/1228005/ [07:01] did I just break transactions ? [07:02] davecheney-bb1ad, huh, not seen that before [07:02] davecheney-bb1ad, er, maybe? :) [07:02] fwereade: shitter [07:02] davecheney-bb1ad, but I actually have no idea [07:02] davecheney-bb1ad, what did you do? :) [07:03] just what I typed [07:03] that is trunk [07:03] fwereade: morning [07:03] TheMue, heyhey === davecheney-bb1ad is now known as davecheney [07:03] davecheney, bah, I'll try myself [07:05] https://bugs.launchpad.net/juju-core/+bug/1056642 [07:05] for posterity [07:21] mroning rogpeppe [07:22] fwereade: hiya [07:22] rogpeppe: morning [07:24] fwereade: running those commands in a loop [07:52] fwereade: another failure [07:52] davecheney, oh dear :( same one? [07:52] different [07:52] http://paste.ubuntu.com/1228059/ [07:53] davecheney, hmm, is that just ec2 taking too long? [07:54] possibly, but we should be able to cope with that [07:55] lucky(~/src/launchpad.net/juju-core/cmd/juju) % juju destroy-environment [07:55] lucky(~/src/launchpad.net/juju-core/cmd/juju) % bash -x stress.bash [07:55] + set -e [07:55] + true [07:55] + juju bootstrap --upload-tools [07:55] + juju deploy mongodb [07:55] error: no instances found [07:55] that happened in 2 minutes [07:55] less [07:55] our timeout is 10 minutes [08:01] lucky(~/src/launchpad.net/juju-core/cmd/juju) % bash -x stress.bash [08:01] + set -e [08:01] + true [08:01] + juju bootstrap --upload-tools [08:01] + juju deploy mongodb [08:01] + juju status [08:01] error: instance i-bbd62fc6 for machine 1 not found [08:02] https://bugs.launchpad.net/juju-core/+bug/1056679 [08:08] bizarre :( [08:12] ha! [08:12] catched the service bug in firewaller, phew [09:07] hmm, still sometimes have a race *sigh* [09:49] rogpeppe: http://paste.ubuntu.com/1228172/ [09:49] ^ evil, or very evil ? [09:50] moin [09:50] davecheney: why? [09:51] the py tests fro juju ssh were using this weird mocking thing [09:51] I needed a way to catch the ssh arguments, without running the command [09:52] davecheney: the way i was doing that before was by changing $PATH [09:52] davecheney: is that not reasonable? (it doesn't involve changing the actual code at all) [09:53] i guess I want to test that the code is generating the correct invovation [09:53] davecheney: i realise it's more work though [09:53] so I need the args [09:53] Aram: hi [09:53] davecheney: you can get the args. just make the ssh into a shell script that saves the args. [09:54] rogpeppe: sure, that would work, but then I have to write that stuff to disk, blah blah [09:54] davecheney: i thought we were doing something like this anyway [09:55] davecheney: we do, in TestSSHErrors [09:55] fair point [09:56] davecheney: i don't think it's much more hassle to get the script to save its args. [09:56] davecheney: or just use a different script template [09:56] davecheney: i'm not that keen on mocking Command. [09:57] fair enough' [09:59] davecheney: just change the script to do something like this: for i in "$@"; do echo $i >> {{.Dir}}/args; done [10:34] fwereade: do you find the "JUJU:DEBUG watcher: loading new events from changelog collection..." messages useful? [10:34] fwereade: i'm inclined to delete them - they're so noisy. [10:34] rogpeppe, sometimes; I would be quite keen on a package-specific debug flag that defaults to false though [10:35] fwereade: how would you turn it on? [10:35] rogpeppe: good catch about nil != empty on golang-dev. I was like "wtf is this" but didn't say anything because I assumed I missed some subtlety and I was wrong. [10:35] rogpeppe, watcher.SetDebug(true) in the setups of tests that I thought it would help in debugging [10:35] :) [10:36] yes, delete the damn verbosity. [10:37] fwereade: or even just watcher.Debug = true :-) [10:37] rogpeppe, indeed :) [10:41] Aram, btw, what's the priority on the confignode read-missing change? [10:42] fwereade: ? [10:42] I'm missing something :) [10:42] what's wrong [10:42] Aram, sorry, I thought I overheard you discussing it with niemeyer with a vague view to doing it soon [10:43] Aram, I mean changing CondfigNode such that an attempt to read an empty one returns an error [10:43] Aram, sorry, a *missing* one [10:43] Aram, empty is fine [10:43] hmm. [10:44] I'll take a look. [10:44] the behavior used to be like you described. [10:44] Aram, honestly I'm easy either way, I think it would be a good change but not an overwhelmingly awesome one [10:44] Aram, I just wanted to know the plan because I'm about to fiddle around with one of the clients [10:45] the behavior should be like you described, if it is not, I'll do it after I do the machine units watcher. [10:46] which I'll start after lunch. [10:46] which is now [10:46] :L) [10:59] fwereade: what's the current status of the uniter w.r.t. lifecycle? does it watch for Dying? [10:59] rogpeppe, nope, no handling at all [10:59] rogpeppe, that would be a good thing to add, actually :) [11:00] lunchtime (did i already say i hate testing for races *grmpf*) [11:00] fwereade: where would you fit the watcher in? Uniter.loop is sooo neat currently... [11:01] fwereade: hmm, i suppose it could just be a separate thing that kills the watcher, perhaps with a known error. [11:01] rogpeppe, I was expecting all the steady state modes to watch for it [11:01] rogpeppe, maybe I can have occasional checks at transition time too [11:02] * fwereade shrugs [11:03] fwereade: is there anything mode-specific that needs to be done at shutdown time? [11:03] rogpeppe, yes, all sorts of watcher stops, generally [11:03] fwereade: those will happen anyway when the uniter tomb is killed, no? [11:05] rogpeppe, s/anyway/assuming the existence of a new entity whose responsibility it is to do that on unit death/ [11:05] fwereade: ah, of course, the watchers are independent of the modes [11:05] rogpeppe, sorry I think I am miscommunicating [11:06] rogpeppe, yes, when we return from a mode, its watchers will be cleaned up; there are no watchers outside modes at the moment [11:06] fwereade: that's what i thought originally [11:06] fwereade: so... [11:06] rogpeppe, but 2/3 of the steady state modes already watch the unit [11:07] fwereade: why do we need a new entity to stop watchers on unit death? [11:07] rogpeppe, we don't [11:07] rogpeppe, we just need to use unit watchers in the modes [11:07] rogpeppe, no call at all to mess around with uniter.loop IMO [11:07] fwereade: i'm wondering if there's any reason to clutter up every single mode with dying logic [11:08] rogpeppe, well, yes [11:08] rogpeppe, usually Dying is of no concern at all [11:08] fwereade: when we can have an entirely separate entity that just kills the uniter when life changes to state.Dying [11:08] rogpeppe, whoa how do yo propose to do that? [11:08] rogpeppe, we don;t want to *kill* the uniter [11:08] fwereade: no? [11:08] rogpeppe, we enter an extended and detailed shutdown sequence in which lots of things happen [11:09] rogpeppe, when we hit Dead is a different matter [11:09] fwereade: ah, ok. that's what i meant by "is there anything mode-specific that needs to be done at shutdown time?" [11:09] rogpeppe, but if you're in a hook error state it matters not one whit whether you're alive or dying [11:09] rogpeppe, you wait for that user resolution, like it or not [11:09] fwereade: does that mean we can't remove units that are in an error state? [11:10] rogpeppe, there will be mode-specific subtleties though [11:10] rogpeppe, yes [11:10] rogpeppe, unless you -force [11:10] rogpeppe, eg, a hook error state should probably just ignore charm upgrades while dying [11:10] fwereade: ok, this all makes sense. not a trivial change then :-) [11:11] rogpeppe, afraid not :) [11:12] rogpeppe, I am slightly worried that all the lifecycle logic will rather dirty up the nice clean uniter [11:13] fwereade: i *think* that it'll just result in another couple of modes [11:13] fwereade: and some logic in the existing modes to make that transition [11:13] rogpeppe, I'm talking more about responses to life checks in the various watchers within the modes [11:14] rogpeppe, I worry that they will obscure the rest ;) [11:14] fwereade: i guess we'll find out... [11:15] rogpeppe, yeah :) [11:31] fwereade: do you think it's reasonable that Refresh return a *NotFoundError when the entity has been removed? [11:32] rogpeppe, hmm, yeah, I think so [11:32] rogpeppe, we'll see how it is in practice [12:47] fwereade: rogpeppe: since now we are only returning Ids, what do you say about the idea of only returning two kinds of changes, one for ints and one for strings, instead of a custom change type for each watcher. [12:47] * fwereade kinda has a sad about this [12:48] generally speaking, when I get a change I really *like* being able to do something useful with it [12:48] so you don't like that we are only returning ids? [12:49] Aram, I guess I need more context; which watchers? [12:49] all of them. [12:49] :) [12:49] look at how machines watcher is now. [12:50] it returns [12:50] type MachinesChange struct { [12:50] Alive []int [12:50] Dead []int [12:50] } [12:50] all the others will return struct { Alive, Dead []string } [12:50] Aram, yeah, I do appreciate the new structure... but, hmm, nothing needs Dying? [12:51] other watchers sure need Dying. [12:51] unsure about this. [12:51] niemeyer wrote it [12:51] Aram, I thought we agreed a while back that we'd be supplying a single list of those entities whose life status had changed? [12:52] Aram, I'm not 100% sure that's right, but I'm sketchy on just having Alive/Dead fields [12:52] yeah, I remember we agreed on that long ago, but I also remember we agreed on this more recently. [12:53] "agreed" [12:53] Aram, blast, I missed that conversation [12:53] Aram, (sure, I am aware that agreement is a fluid that changes its nature as it flows within a group) [12:53] if it were my choice, I'd just return map[Life]Id, because we might add more life states in the future [12:54] map[Life][]Id [12:54] so they are grouped by life [12:54] and you can have as many life states as you want [12:54] Aram, that sounds good to me [12:54] but it's not my choice :). [12:54] Aram, ...although... [12:55] Aram, I kinda feel that reporting by status is somewhat against the spirit of the ids-only change [12:55] Aram, really just `Changed []Id` seems best [12:56] I agree in principle. [12:56] Aram, ok back to concrete :) [12:56] Aram, I want to watch a service's relations, and know when they are dying [12:57] Aram, am I now expected to set up an individual Dying watch, per relation that I detect alive? [12:57] no [12:57] Aram, how can I avoid this? [12:58] you'll get a ServiceRelationWatcher which will return struct { Alive, Dying []string } [12:59] Aram, ok, so each change type is tailored to its clients, and the Alive/Dead thing is not a brook-no-exceptions Agreement [12:59] Aram, that SGTM then :) [13:00] but struct { Alive, Dying []string } is hardly tailored to its client, almost all watchers will use it. [13:01] Aram, I still don't like the amount of reloading I'll be doing on doc watches... [13:02] yeah, I agree in principle. simpler watcher are good, but when they put a burden on the client the fact that they are simple means nothing. [13:02] Aram, on the basis that the ids change will be really ugly and inconvenient for the doc watchers' only clients... as it will be I think for ConfigNode watches... would you hold off on doing them until you've changed the other ones? [13:02] sure. [13:02] Aram, I will try to remember to bring it up with niemeyer this pm [13:03] Aram, cheers [13:03] Aram, and for the relations, I should be thinking in terms of lists of relation IDs only? [13:03] Aram, (grouped as Alive/Dying ofc) [13:04] whatever you want, but the primary key is the string formed from the endpoints and that has the advantage that you can extract information from it without having to load the document. [13:05] Aram, hmm; do we have a Relation accessor on state that accepts that key? [13:06] Aram, I don't think we do... [13:07] surprisingly no, though State.Relation is close in spirit [13:07] since the key is the stringified endpoints [13:07] Aram, I think we have some thinking do do [13:08] Aram, *lossily* stringified endpoints, I think [13:08] Aram, we can't reconstruct the originals without hitting state [13:08] yeah, we can't ATM [13:08] Aram, and that will involve figuring stuff out by getting information from the charms [13:09] Aram, which I'm pretty sure we don't want to do because it will break all our tests [13:10] Aram, (because most tests just slap a dummy charm into state, which doesn't declare any relations) [13:11] Hello all! [13:11] hi [13:11] niemeyer, heyhey [13:13] niemeyer: hiya [13:15] How's the day looking? [13:15] GOod stuff? [13:16] > db.mycoll.insert({_id: 1}) [13:16] duplicate key insert for unique index of capped collection [13:16] > db.foo.insert({_id: 1}) [13:16] E11000 duplicate key error index: test.foo.$_id_ dup key: { : 1.0 } [13:17] It's unfortunate that these are different errors [13:18] * niemeyer reports upstream [13:30] * TheMue still fights with a race condition [13:40] niemeyer: do we return dying machines in the initial event of the machine units watcher? [13:40] or do we now. [13:40] s/now/not/ [13:40] returning dying machines makes it consistent with Machine.Units [13:40] Aram: Dying, yes [13:41] Aram: The only thing that cares about an Alive => Dying transition is the entity agent itself [13:41] Aram: For all other purposes, the thing is still alive [13:42] niemeyer: so you're fine with UnitsChange being [13:42] type UnitsChange struct { [13:42] Alive []string [13:42] Dying []string [13:42] } [13:42] ? [13:42] Aram: s/Dying/Dead/? [13:42] Aram, surely that's an Alive/Dead one [13:42] Yeah [13:43] Aram, ServiceRelations is Alive/Dying, because while the watching entity is not istelf a relation, it *is* responsible for responding to the relation's lifecycle changes [13:44] Aram, sorry that was not a helpful sentence [13:44] how does one get Alive -> Dying if we don't deliver this event here? one watcher per each unit? [13:44] Aram, I think the question here is entirely situational, and dependent on what the client needs to use [13:45] Aram, the MPW only needs to about Alive (to deploy a container) and Dead (to destroy it) [13:45] ok, that seems sensible. [13:46] Aram, the Uniter is responsible for watching the unit for Dying, and then shutting itself down in an orderly fashion before making itself Dead [13:46] Weird.. that was an abrupt "disconnection by peer" [13:46] Aram, the Uniter is also responsible for handling everything about relations, and itself sets Dead, so it should only need Alive/Dying [13:47] Aram, from SRW [13:47] fwereade: That said, hmm [13:47] niemeyer_, I personally would prefer just []ChangedLife [13:48] niemeyer_, and handle it appropriately at the various call sites [13:48] fwereade: Wouldn't it be weird.. let's say.. for a machine watcher to report a unit as alive when it's dying.. [13:48] niemeyer_, which will I suspect contain a number of subtle differences [13:48] fwereade: just so it starts up, and dies [13:48] it seems kind of wrong to return Dying units inside the Alive field of UnitsChange though. [13:49] niemeyer_, so a unit that is initiall seen to be Dying should be ignored, and should not generate a Dead [13:49] fwereade: Kind of.. [13:49] fwereade: That's even more incorrect [13:49] fwereade: imagine the same situation, but the unit is actually still running [13:50] niemeyer_, ha [13:50] niemeyer_, fwereade: fairly trivial: https://codereview.appspot.com/6564054 [13:50] rogpeppe: Thanks [13:50] niemeyer_, wait, surely the machine agent *knows* what is running anyway? [13:50] fwereade: I think something along your ChangedLife idea might be plausible [13:50] fwereade: Forces acknowledgement of the possibilities [13:50] niemeyer_, but regardless, yeah, I feel that's the way to go for now [13:51] fwereade: Or even Alive/Dying/Dead fields [13:51] fwereade: Which makes the need for handling even more explicit [13:51] niemeyer_, I'd prefer to keep it just as "changed!", rather than risk a lie (when the id is loaded and revealed to be in a different state) [13:52] fwereade: Good point [13:52] niemeyer_, on a related note: ids from watchers [13:52] fwereade: ok [13:53] so Changed is the consesnsus? [13:53] niemeyer_, I like it in principle, and (I think) in practice for the collection watchers; I feel that it's going to be unhelpful when applied to the document watchers [13:53] Aram: It may be just a slice of ids, I guess, but let's wait until the end of the conversation [13:53] niemeyer_, as the only client so far, every time I get a changed service or unit, I want it refreshed [13:54] fwereade: There are a few different details that have been going unperceived [13:54] niemeyer_, can I handwave single document watchers to be "different enough" as to be sent as objects? [13:54] niemeyer_, go on [13:55] fwereade: 1) We're faking the data for the entity; since the entity might not be there by the time we try to fetch it, we change its field to Dead and send a cached version [13:56] fwereade: Which is quite wrong.. the unit may have died in an entirely different state [13:56] fwereade: and such a Dead + arbitrary state may never have happened.. the justification for the death is in the unit that we didn't see (unit being just an example here) [13:57] fwereade: 2) If we get 100 changes between the last time we've observed the unit, and the next time we're able to handle the change because e.g. the hook returned, [13:57] fwereade: we're reloading the unit 100 times within the watcher, for absolutely no reason [13:58] niemeyer_, re (1), I dunno -- for what clients is the distinction important? [13:58] fwereade: I don't know.. we'll likely find out when something explodes in our face [13:58] niemeyer_, re (2), hmm, true [13:58] fwereade: 3) The cost of loading the unit is minimal in all cases I've ported [13:59] niemeyer_, I mean, I've been thinking about it, and I can't see any -- doesn't Dead mean "the only safe way to interact with this document is to destroy it"? [13:59] niemeyer_, yeah, it is only a few lines of code [13:59] fwereade: and, interesting, in some cases it cleaned up.. [13:59] fwereade: The firewaller, for example, didn't really care much about that machine object [13:59] fwereade: It ended up not using it at all in most cases [13:59] fwereade: It was using it just because it was being handed off anyway [13:59] niemeyer_, I'm 100% behind it on the collections [14:00] niemeyer_, and I think I'm now convinced on the documents side [14:00] fwereade: 4) It makes the watchers simple (!) :-) [14:00] niemeyer_, ;) [14:02] fwereade, Aram: So, slice of ids? [14:02] niemeyer_, +1 [14:02] Changed []string? [14:02] Aram: []string [14:02] * rogpeppe wishes that machine ids were strings too [14:02] hmm [14:03] Aram: <-chan []string [14:03] rogpeppe++ [14:03] niemeyer_: ok, that seems fine [14:05] rogpeppe: We should talk about that someday when we're not in the middle of a big change [14:05] niemeyer_: yeah [14:10] rogpeppe: Reviewed === niemeyer_ is now known as niemeyer [14:10] niemeyer: thanks [14:11] niemeyer: i chose FitsTypeOf because then if the test fails it'll say what the failing error actually is [14:11] rogpeppe: If the test fails it's trivial to find out what's wrong [14:11] niemeyer: ok [14:11] rogpeppe: That's how it's being done in the rest of the code already [14:12] rogpeppe: We have a test for IsNotFound that verifies it actually works as intended, and the rest is relying on it workng [14:12] niemeyer: that's fine. i generally to try to make test failures as informative as i can, but i'm happy to use IsNotFound too [14:14] rogpeppe: That's a great approach, but there's value in testing the semantics we want.. we'll generally be using IsNotFound in client code, and it should work [14:14] niemeyer: ok, but i thought that's what the IsNotFound test is for. [14:14] niemeyer: anyway, i've changed it already [14:15] rogpeppe: Nevermind. Thank you! [14:15] and one other occurrence too [14:23] have to leave due to an emergency, bbl [14:27] TheMue: Ouch.. hope it's all good there [15:06] niemeyer, hey! available for a chat about versions of mongodb in quantal? I understand 2.2.0 is desired.... [15:06] niemeyer, whoops, I need to leave early today... I might have a CL for you later, but nothing yet I'm afraid [15:07] jamespage: Yo, here [15:07] fwereade: np, have a pleasant time there [15:08] niemeyer, Aram: ooo, one important thought: to make the ids change work with RelationUnitsChange et al, we will need a map[unit-name]txn-revno [15:08] fwereade: Hmm [15:08] Aram, niemeyer, actually, I don;t think we do [15:08] niemeyer, I had a request to up the version of mongodb in quantal to 2.2.0 to support go-juju [15:09] fwereade: That's good I guess, because I have no idea about what you have in mind yet :D [15:09] niemeyer, we just clear the settings out when we get a change [15:09] niemeyer, currently we will be shipping 2.0.6 (maybe 2.0.7 if I get time) [15:09] jamespage: Col [15:09] jamespage: Cool [15:09] jamespage: 2.2.0 would be good indeed [15:09] niemeyer, I was thinking we'd need to keep track of revnos at the watcher level because they're used by the relation context level [15:09] jamespage: Very good, in fact [15:09] niemeyer, at the moment I'm pushing back - its very late in the cycle [15:09] niemeyer, in persistent relation state storage [15:10] niemeyer, I'm assuming that this will be required to support usage on precise as well [15:10] fwereade: Right, I think that's sensible [15:10] jamespage: In principle, no [15:10] niemeyer, ?? [15:10] jamespage: Well, it depends a bit on what you mean by that [15:11] jamespage: Support usage in which sense? [15:11] fwereade: Let's have a chat once you have some more time [15:11] niemeyer, say if someone want to use go-juju on 12.04 [15:11] niemeyer, but we don't: we can treat change notifications as clear-your-settings; and if we return a txnRevno from ReadSettings, we can persist *that* [15:11] fwereade: I'd appreciate the insight there [15:11] rather than 12.10 ++ [15:11] niemeyer, although it may require unhelpful extra plumbing [15:11] niemeyer, I need to thing it through a bit more [15:12] anyway gtg, take care & have fun [15:12] fwereade: Sounds sensible I think [15:12] fwereade: Cool, ttyl [15:12] jamespage: Hm [15:12] jamespage: SO, [15:12] jamespage: So, [15:13] jamespage: There are tons of very attractive changes on 2.2.0 [15:13] jamespage: Things like improved concurrency support, an aggregation framework that enables queries that are impossible otherwise, speed ups, etc etc [15:13] jamespage: We want to benefit from those [15:14] jamespage: That said, [15:14] jamespage: We can benefit from those either way, in the server side [15:14] jamespage: and we have other reasons to be working around the stock packaging in those cases [15:15] niemeyer, sounds like you have issues with the distro packaging then? [15:15] other than the version... [15:15] jamespage: No, it's really the version [15:16] jamespage: The problem is that we have to support N releases of Ubuntu with the same code [15:16] jamespage: Back and forth [15:16] niemeyer, my suggestion is that we take the same approach as you guys did with zookeeper before [15:16] i.e. ship it in PPA alongside go-juju [15:16] jamespage: not only that, but we need to make sure that upgrades don't happen without our explicit approval [15:17] jamespage: Since an upgrade of the core MongoDB behind juju's back might kill the whole deployment [15:18] jamespage: Well, sure.. if we have to workaround it, we can do that by several means [15:18] jamespage: The point, though, is that we'd benefit from having the new version on the distro [15:18] jamespage: So that people can more easily deploy juju *locally* [15:19] jamespage: For development, testing, etc [15:19] jamespage: Without having to go around the distro packages [15:19] niemeyer, I would agree with that IF go-juju was going to land in the distro for quantal [15:19] jamespage: If it's not doable, I'll understand, and will start thinking of plan-b [15:20] niemeyer, I'm happy to push time at upgrading mongodb to 2.2.0 next cycle [15:20] jamespage: If it's not in Quantal, it'll be there a bit afterwards [15:20] yes [15:20] jamespage: It'll still come to life during Quantal's lifetime [15:20] jamespage: That said, we don't have local support yet, so perhaps you're right and it doesn't matter for this cycle [15:21] niemeyer, I just think the overall risk this late in cycle is to great for a number of reasons [15:21] 1) none of the server team have much experience with mongodb or its packaging (it being in universe) [15:21] jamespage: Understood, sounds good.. I'd probably be causing you pain for no good reason [15:22] 2) limited time with charm sprints and ODS in the next two weeks... [15:22] niemeyer, OK - I'll keep you notified of any work that happens [15:22] and will make sure its back-portable to precise/quantal [15:22] jamespage: Thanks, and sorry for the half-false alarm [15:23] niemeyer, no problemo [15:23] jamespage: Can we put this on the agenda for the next release? [15:23] niemeyer, absolutely - if mongodb is going to support juju going forward that it need to be MIR'ed and more owned in ubuntu anyway [15:24] owned/loved.... [15:24] jamespage: +1 [15:24] jamespage: Meanwhile, FYI we've built a version of 2.2.0 and are shipping alongside juju in a bare-bones fashion (bins only) [15:25] jamespage: This means we're able to deploy in all releases of Ubuntu we want to support [15:25] coolio [15:25] jamespage: Precise, Quantal, Quantal+1, etc [15:25] niemeyer, all archs as well? [15:26] jamespage: Yep, if/when needed [15:26] niemeyer, good-oh [15:26] jamespage: tools/mongo-2.2.0-precise-amd64.tgz [15:26] I know quite a bit of testing has been happening using juju/maas on arm for example [15:27] jamespage: I don't have any experience with mongo on arm, though [15:38] "juju-core has no active code reviews." [15:38] It's been a while since I've seen that message.. [15:40] Btw, we've topped almost 6k lines in during the sprint [15:40] and 11k lines *out* [15:41] (thanks to zk's departure) [15:48] Didn't take long, though: https://codereview.appspot.com/6574049 [15:53] fwereade: " [15:53] niemeyer, Aram: ooo, one important thought: to make the ids change work with RelationUnitsChange et al, we will need a map[unit-name]txn-revno" [15:53] do we need this? [15:53] Aram: I don't know.. have to talk to William later about details [15:53] Aram: Does it make a difference for you now? [15:54] I think it would be easier to adapt later if we need to. [15:54] Aram: Are you working on RelationUnitsChange? [15:55] no. [15:55] I was thinking in general. [15:55] Aram: Okay, so that's not needed. RelationUnitsChange is very special in that regard [15:55] about this. [15:55] ok. [15:55] suits me. [15:55] Aram: :-) [15:59] rogpeppe: Have you evolved/tried live tests today? [16:00] niemeyer: yeah, i've got a branch that is stubbornly refusing to work live. each time it's something a little bit different. [16:00] rogpeppe: :( [16:01] rogpeppe: We need to polish those live tests.. the organization right now is pretty confusing, there are bits in ec2 that should be in jujutest, unrelated tests are being done together, etc [16:01] niemeyer: i agree [16:01] niemeyer: which bits are you thinking can move from ec2 to jujutest, BTW? [16:05] niemeyer: i'm starting to think that my policy of always returning a non-nil Tools is not a good one [16:06] niemeyer: i think that if the tools haven't been set, AgentTools should return nil and NotFoundError [16:06] niemeyer: this is an error i get when calling juju status on the current test environment: [16:07] 2012/09/26 16:59:33 JUJU:DEBUG juju status command failed: cannot get all machines: invalid binary version "" [16:07] error: cannot get all machines: invalid binary version "" [16:07] rogpeppe: That's what I'd expect too [16:07] rogpeppe: (as a client of that API) [16:07] rogpeppe: Hmm.. are you missing an update from trunk? [16:08] rogpeppe: I'm running an env here.. status works fie [16:08] fine [16:08] niemeyer: i don't *think* so, unless it's a very recent one [16:08] rogpeppe: Last evening [16:08] rogpeppe: I've fixed the marshalling of tools [16:09] niemeyer: no, i've merged trunk [16:09] rogpeppe: Ok, so worth investigating.. [16:10] rogpeppe: Hmm.. wouldn't that happen if the machine was retrieved before it got an agent set? [16:10] niemeyer: that's what i'm thinking [16:10] rogpeppe: the machine sets its own agent on startup [16:10] niemeyer: yup [16:10] rogpeppe: So that must be it [16:10] niemeyer: that's why i thought moving to using a nil tools might help [16:11] niemeyer: but maybe it'd call SetBSON on the toolsDoc.Version anyway? [16:11] rogpeppe: What does "using a nil tools" mean? [16:13] niemeyer: hmm, now i look, i'm not sure :-) [16:13] niemeyer: it seems to me that the tools *will* be nil in a newly created machine doc. [16:13] rogpeppe: I think Tools is just missing the SetZero return if the value is bson's null [16:14] niemeyer: ah, you're probably right. [16:14] rogpeppe: I'm always probably right.. I'm often probably wrong too.. :-) [16:14] rogpeppe: I thought you were referring to this logic earlier: [16:15] if m.doc.Tools == nil { [16:15] return &Tools{}, nil [16:15] } [16:15] This is bogus [16:15] It should return nil, notFound("tools for machine %v", m.doc.Id) [16:15] niemeyer: yes, that's what i was saying above [16:15] rogpeppe: Cool, so two bugs [16:16] Btw, I found out why it's taking a bit to init mongo.. it's preallocing the journal [16:16] I'll find a way to work aroudn it [16:17] niemeyer: that would be great [16:18] niemeyer: how do you find if a bson.Raw is a null? [16:18] rogpeppe: Compare raw.Kind against a constant.. I don [16:18] rogpeppe: 't recall which, but I think we're already doing this elsewhere.. [16:18] rogpeppe: Otherwise, [16:18] rogpeppe: bsonspec.org [16:18] Lunch! [16:18] niemeyer: enjoy! [16:33] niemeyer: ah! i see why the state tests passed now... [16:33] c.Skip("Marshalling of agent tools is currently broken") [16:34] hmm, no, it passes [16:44] rogpeppe: Try to Marshal, Unmarhsal, re-Marshal, and re-Unmarshal [16:44] rogpeppe: I suspect the third will fail [16:45] rogpeppe: Sorry, the fourth oe [16:45] rogpeppe: The third will save the bogus empty tools [16:45] rogpeppe: Fourth will blow up [16:47] niemeyer: how does that sequence happen in our current code? i'm not sure we ever Marshal the whole machine doc [16:47] niemeyer: i'm trying to write a test that currently fails [16:48] rogpeppe: The suggested sequence will fail, I suspect [16:48] niemeyer: Marshal isn't available externally to state [16:48] rogpeppe: It marshals when it writes to the database [16:49] rogpeppe: Ah, it's probably GetBSON that is broken too [16:49] niemeyer: i don't *think* it marshals when it writes to the db [16:49] Hmm, or maybe not.. [16:49] rogpeppe: Huh? [16:50] rogpeppe: There's no way for mgo to send anything to the database without marshaling [16:50] niemeyer: well, it marshals the agent tools when they're set, but given that they're set, it should all work ok [16:51] rogpeppe: That's where the sequence above comes from.. due to the lack of SetZero on SetBSON, it will marshal empty tools [16:51] niemeyer: even if the tools aren't empty? [16:51] rogpeppe: It's the same thing we just talked above [16:51] rogpeppe: I think Tools is just missing the SetZero return if the value is bson's null [16:52] niemeyer: i can see how we get an invalid tools in the machineDoc, but i'm not sure how we can get that marshalled again. [16:53] niemeyer: because AFAICS we never marshal from the machine doc's Tools [16:54] niemeyer: obviously you're right in one sense, but i don't quite see yet exactly how it happens. i'm probably being stupid, sorry. [16:55] rogpeppe: You're not.. I don't know either [16:55] rogpeppe: let's see the code that failed [16:56] rogpeppe: Well, actually, can you push the WIP branch? [16:56] rogpeppe: Otherwise I have no idea about what *actually* failed [16:56] niemeyer: ok, i'll do that [16:58] niemeyer: https://codereview.appspot.com/6566058 [16:58] Cheers [16:58] s [16:59] I'm preparing some roast pork, but I'll be back after all the stuff is in the oven [17:03] niemeyer: i'm running the live tests again to check if i get the same symptom [17:04] Aram: my mouth waters [17:04] rogpeppe: If you get the error, see if you can look at the logs [17:04] niemeyer: i looked at the logs before and saw nothing unusual [17:05] rogpeppe: That test environment, are you sure you bootstrapped it with up-to-date tools? [17:05] niemeyer: i'm doing go test -amazon in environs/ec2 [17:05] niemeyer: that should be sufficient [17:06] rogpeppe: Because that's very similar to the bug I fixed yesterday [17:07] niemeyer: i'm pretty sure i'm using the up to date tools [17:07] niemeyer: well, the tools from the source in the CL above [17:07] Ok [17:13] rogpeppe: Ahh, I think the mystery is solved [17:13] * rogpeppe is all ears, having just reproduced the problem [17:13] rogpeppe: txn has to re-marshal the doc on insertion to inject the key [17:13] rogpeppe: Well, actually.. hmm [17:13] rogpeppe: Nevermind.. makes no sense [17:14] darn [17:14] rogpeppe: The remarshalling will not use the type [17:14] rogpeppe: Do you have a live env with it? [17:14] niemeyer: yes [17:14] rogpeppe: Can I get access? [17:14] niemeyer: yeah, one mo [17:15] rogpeppe: There's a handy tool to import ssh keys from lp [17:15] niemeyer: don't i need to give you an ssh key? [17:15] rogpeppe: ssh-import-id [17:15] rogpeppe: I have one :) [17:16] niemeyer: yeah, but the environment won't be authorized for your key [17:16] rogpeppe: ssh-import-id [17:17] niemeyer: or... can i add keys dynamically to a machine in the environment? [17:17] niemeyer: is that what you're suggesting? [17:17] rogpeppe: Yes, you can log into the machine, and run ssh-import-id [17:17] rogpeppe: ssh-import-id niemeyer, more precisely [17:17] niemeyer: ok, i'll try that [17:19] niemeyer: try now: ec2-23-20-216-248.compute-1.amazonaws.com [17:19] I'm in [17:20] rogpeppe: Tools versions in the env look fine [17:20] rogpeppe: Is a "juju status" against it breaking? [17:20] niemeyer: yeah [17:20] rogpeppe: Hah, nice.. that's going to be easy [17:21] niemeyer: but all juju status does is call AllMachines [17:21] rogpeppe: How do you call juju status? [17:22] niemeyer: juju status --debug -e sample-b765066ea6aab3bb [17:22] rogpeppe: COol [17:23] niemeyer: (it helps that i log the environment public attrs in the test now) [17:25] niemeyer: out of interest, what mongo command did you use to inspect the db? [17:25] rogpeppe: "mongo" [17:26] rogpeppe: "mongo localhost:37017/juju" [17:26] rogpeppe: More precisely [17:26] niemeyer: i got that far. [17:28] niemeyer: ah, got it. it's a literal "db". [17:28] rogpeppe: db.machines.find() [17:28] niemeyer: i thought i had to type juju.machines.find(0 [17:28] ) [17:29] rogpeppe: I can't reproduce the error without some further hacking [17:29] rogpeppe: Can you please enable the mgo logs, and run juju status with them on [17:29] rogpeppe: mgo.SetDebug(true); mgo.SetLogger(log.Target) [17:31] niemeyer: ok [17:33] rogpeppe: Sent a few comments on https://codereview.appspot.com/6566058/ [17:34] niemeyer: ah! you caught the bug i was actually trying to fix. thanks! [17:34] niemeyer: i'm so f*!#ing blind [17:35] niemeyer: i'm glad we found this in the process though [17:35] rogpeppe: +1, and don't blame yourself too much.. it's a single char.. very easy to miss [17:36] niemeyer: ok, so i'm an idiot as suspected [17:37] niemeyer: and you were exactly right. i was conflating two problems and assuming they were linked [17:37] niemeyer: the "juju" i was running to run status was not the current juju. doh! [17:37] niemeyer: and v sorry to waste your time [17:37] niemeyer: it all works [17:38] rogpeppe: Woohay! [17:38] pwd [17:38] /home/rog [17:38] rm -rf * [17:39] rm: command not found [17:39] give up in disgust [17:39] rogpeppe: :-) [17:42] rogpeppe: Those debug sessions are nice because we get some intimacy with the whole process [17:42] niemeyer: that's a nice slant to put on it :-) [17:42] niemeyer: right, i'm running the live tests again, and i hope and trust they will work this time [17:43] niemeyer: i've got to go and cook keema muttar curry :-) [17:43] rogpeppe: Have good fun there [17:44] niemeyer: the "does nothing" channel range loop BTW is so that in a subsequent branch i can just slot in a Unit and watch it in the same way [17:44] niemeyer: that was the reason for my "i wish machine ids were strings" remark earlier [17:44] rogpeppe: Ok, thanks [17:45] niemeyer: i'll pop back in 10 minutes or so and see if the test succeeded [17:45] rogpeppe: Super, fingers crossed [17:46] rogpeppe: Btw, I've put your review fixes here: https://codereview.appspot.com/6574049 [18:07] so, back, will join full in a few moments [18:20] niemeyer: [18:20] 08.43.171 PASS [18:20] 08.43.176 ok launchpad.net/juju-core/environs/ec2 521.154s [18:20] !!! [18:20] :-) [18:20] * niemeyer dances the funky chicken [18:21] rogpeppe: https://codereview.appspot.com/6569060 [18:21] rogpeppe: This will speed the next one up [19:15] * niemeyer loves "watcher was stopped cleanly" panics [19:15] So many background activity in tests caught up on it [19:38] ok launchpad.net/juju-core/cmd/jujud 13.860s [19:49] * fwereade cheers at niemeyer [19:49] fwereade: Hey! Good timing.. [19:49] niemeyer, heyhey [19:49] fwereade: With this change, the only tests that seem broken here are the uniter ones [19:49] fwereade: But I thought you had fixed those.. is the change pending? [19:49] niemeyer, oh? which ones? [19:50] niemeyer, would you paste them to me, it should all work... [19:50] fwereade: Cool [19:50] fwereade: http://paste.ubuntu.com/1229166/ [19:51] fwereade: I haven't investigated, so could be something trivial [19:51] niemeyer, that is odd, it looks kinda familiar, but I forget what was involved [19:51] niemeyer, I'll grab a fresh trunk and try to repro [19:52] fwereade: Not sure if it matters, but this was part of a full round run [20:04] 2012/09/26 16:56:23 JUJU Unit status is "started" [20:04] 2012/09/26 16:56:23 JUJU Built files published at http://ec2-177-71-232-16.sa-east-1.compute.amazonaws.com [20:04] 2012/09/26 16:56:23 JUJU Remember to destroy the environment when you're done... [20:12] error: Failed to update merge proposal log: EOF [20:12] Why do you hate me LP [20:13] One more change up: https://codereview.appspot.com/6565052 [20:19] Another one: https://codereview.appspot.com/6567054 [20:20] I guess I'll just bundle them up in a mail later.. the interactive reviewing isn't working today :) [20:28] niemeyer, https://codereview.appspot.com/6565052/ LGTM [20:32] niemeyer, and the other too [20:34] fwereade: Danke! [20:34] niemeyer, bitte! [20:37] niemeyer, btw, what's your current thinking on the necessity of a corrective agent? [20:38] niemeyer, I'm feeling like I want one :) [20:38] fwereade: Seems unnecessary [20:38] fwereade: What are you missing? [20:38] niemeyer, ok, the situation that makes me want it is as follows [20:38] niemeyer, the Uniter has detected a Dying unit and discharged all its responsibilities [20:38] niemeyer, it calls EnsureDead() on the unit [20:39] niemeyer, at this point, all bets are off [20:39] fwereade: In which sense? [20:39] niemeyer, in an ideal world, the uniter would smoothly delete the unit, and also the service (if it's dying and this was the last unit) [20:40] niemeyer, in the pessimistic case, it will be half way through doing that when the machine agent nukes the container and the unit agent with it [20:40] niemeyer, I think *something* needs to be capable of cleaning up after that [20:41] fwereade: The uniter should never delete a unit [20:41] fwereade: In my view so far, at least [20:41] niemeyer, ah, ok then, that's for the machiner to do once the unit is dead? [20:41] fwereade: It's work ends at EnsureDead [20:41] niemeyer, that makes sense to me [20:41] fwereade: Yes, once the machiner gets rid of the unit container, it deletes the unit [20:42] niemeyer, and once the machiner has cleaned up the container, it completely trashes the unit and possibly the srvice [20:42] niemeyer, cool [20:42] fwereade: As we've talked a few times, Dead is always handled by the "lifecycle manager" so to speak [20:42] niemeyer, what happens when the machiner is AWOL? [20:42] fwereade: The machiner<>provisioner relationship is the same [20:43] fwereade: The same as usual [20:43] niemeyer, that sounds sensible [20:43] fwereade: When the machiner handles the dead unit, it may also handle the service transparently [20:44] fwereade: We should probably put that logic within RemoveUnit itself [20:44] niemeyer, +1, but worrying about races [20:45] niemeyer, I expect we can make it work, we just need to be careful [20:45] fwereade: I think the idea we discussed at the sprint, with a refcount, works well [20:45] niemeyer, (and, to clarify: when a machiner is down it never removes the unit, and we sit and wait until someone fixes it?) [20:45] fwereade: yep [20:46] niemeyer, ok, cool [20:46] fwereade: Things should work in expected ways during shutdown.. juju will not kill resources up without user consent [20:47] niemeyer, permanently blocking a terminate-machine because of a dead unit (that's not being picked up due to a screwed machiner) seems a touch user-hostile if they don;t have the tools to resolve it [20:47] niemeyer, a Dying unit is one thing, they can resolve that [20:49] fwereade: We're not blocking because of a dead unit.. we're blocking because of a non-responding machiner [20:50] fwereade: I want to know why the machiner is not responding, rather than swiping that kind of error under the covers [20:50] niemeyer, yeah, good justification [20:51] * fwereade is happy he doesn't have to worry directly about cleaning up services too :) [20:51] not right now, anyway ;) [20:51] fwereade: yeah :) [20:53] Review queue is owned today: https://code.launchpad.net/juju-core/+activereviews [20:54] fwereade: I'll have to step out for an errand.. back later, but hopefully you'll be resting by then.. have a good time there [20:54] niemeyer, I'll almost certainly have a WIP branch I'd appreciate a look at... I'll mail you if I manage it [20:54] niemeyer, cheers [21:39] niemeyer, https://codereview.appspot.com/6575053 is (I think) reasonably complete, and does still pass all existing tests [21:41] niemeyer, still WIP, but I intend to write actual tests tomorrow morning and submit something pretty close, so preliminary comments would be much appreciated [23:01] fwereade: Awesome! [23:01] fwereade: Will have a look