[07:36] davecheney: hiya [07:36] rogpeppe: howdy [07:38] rogpeppe: what's shaking ? [07:38] davecheney: am sitting on platform waiting for train... [07:38] davecheney: to go down to london and meet with Aram and niemeyer [07:38] nioce [07:39] i saw niemeyer online form a french IP this morning [07:39] whats up with that [07:39] ? [07:39] * rogpeppe is always a little bit amazed when gatewaying through a mobile phone actually works [07:39] davecheney: he's in the uk, so that's a little bit odd [07:39] could just be the owner of the IP space [07:39] davecheney: probably [07:40] where abouts in london are you going ? [07:41] davecheney: arrive kings cross. then to millbank tower where canonical lives (for the next week - they're moving out, so i'm glad i'll see it before they do; it's supposed to be a spectacular location) [07:41] are they moving somewhere more salubrious? [07:42] davecheney: somewhere a little larger i think [07:42] davecheney, rogpeppe: the new place is just next to the tate modern I think, which sounds pretty cool :) [07:42] fwereade: cool. next to the river again then. [07:43] rogpeppe, nearby, at least :) [07:43] fwereade: morning BTW! [07:43] very nice [07:44] davecheney, fwereade: was wondering about how we're going to handle upgrades [07:44] rogpeppe: that is a small question for a big topic [07:45] i wondered if we have a "version" field in zk. clients can watch that and if they see it's changed, they'll look for a new version and replace themselves with it [07:45] does it have to be any more complex than that? [07:46] the zk tree will have to be backwards compatible anyway, i think. [07:46] rogpeppe, at some stage we'll want to sync up all upgrades so we can, eg, change how something's stored in ZK [07:46] rogpeppe: will the agents run under some kind of process manager, like upstart ? [07:46] davecheney: yes [07:46] rogpeppe, but, yes, that sounds like a good start [07:47] fwereade: i think that would be relatively easy too [07:47] fwereade: set the version to "pending" and wait for the various agents to acknowledge [07:47] rogpeppe: fwereade so each agent is responsible for a symlink (or something) that points to the current version of their binary [07:48] rogpeppe, yeah, my concern with this story is entirely in making the upgrade work even on machines that have a hyperactive toddler playing with their reset button [07:48] davecheney: i think i'd just create a new upstart script when changing versions [07:48] and if they notice the value of version in zk, they look for a binary that matches it, change the symlink, then commit sepuku and upstart restarts them ? [07:49] davecheney: i'm not sure the symlink is necessary [07:49] rogpeppe, davecheney's approach sounds like it may be more reliable in an unhelpful environment [07:49] rogpeppe, davecheney: but I don't think it covers the case where the args/env need to change [07:49] fwereade: i'd like to see a mixed approach, ie, the agents quit if they don't match the version [07:50] and dpkg handles installing the right version [07:50] but I don't think I understand how binaries get onto the machines [07:50] davecheney: the cloudinit script downloads them initially [07:50] ok, so no package manager [07:50] davecheney: yeah [07:51] davecheney: because they can come from a private s3 bucket [07:51] davecheney: we've already got the logic for choosing versions [07:52] fwereade: things are a little harder for the unit agent, because there may be commands running [07:52] fwereade: i suppose it can wait until all commands have completed before upgrading itself [07:53] davecheney: but i think you're right, i think there should be one thing responsible for actually downloading and restarting the s/w [07:53] rogpeppe, yeah, I think so; telling the jujuc server to close and waiting for it should handle that case [07:53] davecheney: and i think it should probably be the machine agent [07:53] rogpeppe, +1 [07:54] fwereade: so i think from the machine agent's point of view, it might go like this: [07:56] fwereade: see version change; download new s/w; wait for all local agents to shut down; replace upstart scripts; restart everything (including self); exit [07:56] fwereade, davecheney: train arriving; signal might get dodgy [07:56] rogpeppe, yeah, SGTM [07:57] rogpeppe: sounds good, and possibly could be reused across all agents [07:57] rogpeppe, davecheney: if the MA knows what agents should be running on that machine (including the PA) I'd really prefer it if it were all handled by the MA [07:59] fwereade: sure, what i meant to say was, the 'watch version, download binary' might be reusable across all agents [08:02] davecheney: i don't think anything other than the MA will need to do any downloading [08:02] davecheney: all the binaries are bundled together [08:03] davecheney: i *think* that all the other agents will need to do is wait for version change and exit (and possibly indicate that they're exiting) [08:05] i'm not entirely sure of the best way to stage the shutdown though [08:06] is anyone still seeing this? [08:06] rogpeppe: that would be even better, much better separation [08:07] i haven't looked into upstart much. is it possible to wait for something to exit without automatically restarting it? [08:08] hmm, perhaps it could rewrite the upstart script [08:09] rogpeppe: I think, but I'm no expert [08:09] davecheney: i'm pretty sure that upstart monitors the scripts in /etc [08:09] rogpeppe: yup, and it will reload _its_ representation of them [08:10] rogpeppe, you can add a job-name.override containing "manual" to /etc/init [08:10] rogpeppe, I think you need to explicitly stop it though [08:11] * davecheney reads http://upstart.ubuntu.com/cookbook/ [08:11] * rogpeppe starts to download it [08:11] * fwereade feels for davecheney (not that it's bad, I'm really happy it exists, but I've always found figuring upstart stuff out harder than it should be) [08:13] morning niemeyer [08:29] fwereade: Heya! [08:29] Morning all! [08:31] niemeyer: yo! [08:31] rog: Heya [08:31] rog: Where are you? :) [08:32] niemeyer: we were just having a chat about upgrading [08:32] niemeyer: good. on the train, so intermittent connectivity [08:32] rog: That's awesome :) [08:33] niemeyer: i think we may have something approaching a plan for upgrades [08:33] rog: Sounds great, let's talk this afternoon [08:34] niemeyer: yeah [08:44] TheMue, I just had a thought which would have been a lot more helpful 2 weeks ago [08:45] night all [08:45] TheMue, niemeyer: I'm wondering whether we currently have any reason at all to explicitly allow addition of peer relations from outside state [08:46] TheMue, niemeyer: because services with peer relations should *always* have their peer relations set up, and that should perhaps be rolled into AddService [08:48] TheMue, niemeyer: (rather than being tacked onto the deploy command, which always felt a little off) [08:50] fwereade: How would that look like? [08:51] TheMue, AddRelation would always take 2 args; AddService would contain more code for setting up the relation (and I guess defer the topology change to include both the service and the relation as the last step) [08:52] TheMue, AddService has access to the charm already, I think [08:53] fwereade: Have to take a deeper look. [08:53] fwereade: AddService() then has to check if it's a peer or not. So more complexity there. [08:55] TheMue, just a thought [08:55] fwereade: Maye a good one, don't want to break it. ;) [08:56] fwereade: Just have to understand it more. [08:56] fwereade: That's a fantastic question, actually, and I want to ponder about it this afternoon too [08:56] fwereade: Because we're breaking an assumption that was made in the original code, that I'm not sure makes sense [08:57] fwereade: The original code did allow for multiple peer relations [08:57] fwereade: the new one does not [08:57] fwereade: I'm not sure we want to do that [08:57] fwereade: I'm a bit concerned right now, actually, because the model change in the topology will make it painful to bring that back [08:58] Which means we'll have to rethink the original thinking again [08:58] As breaking compatibility to introduce that would be bad [08:58] TheMue: ^^^ [08:58] niemeyer: Yep, seen. [08:58] TheMue: We may have to redo the topoRelation stuff once mroe [08:58] more [08:58] TheMue: But let's get this branch to the end of the line anyway [08:58] niemeyer: No problem. [08:58] TheMue: (with the current logic) [08:59] TheMue: It'll be easier to refactor back to the original model, if we have to, than to keep that huge change flying for much longer [08:59] fwereade: Thanks for bringing that up [08:59] niemeyer, a pleasure, hope it proves fruitful :) [08:59] fwereade: Already has! [09:00] fwereade: At least we'll know what we're doing, rather than blindly finding out down the road that we did a mistake on the transition [09:00] niemeyer, yeah :) [09:04] gents: I spent much of last night getting friendly with mosquitoes, so I'm taking a walk in the sun to remind my body it's daytime; bbs [09:04] just proposed https://codereview.appspot.com/6245075 if anyone's of a mind [09:57] fwereade: Enjoy the walk [09:57] niemeyer, I did :) [09:57] fwereade: Oh, hey, it's been a while :-) [09:57] niemeyer, did I miss anything? looked empty... [09:58] fwereade: Hm? [09:58] niemeyer, don't worry, I think I misunderstood what you said [09:59] fwereade: I was alluding to my complete lack of sensibility related to the timing of your previous comment [09:59] niemeyer, yeah, I get that now :) [10:19] fwereade: Review delivered [10:19] niemeyer, cheers [10:21] niemeyer: Thx too [10:23] niemeyer: And ok, will split. ;) [10:23] TheMue: Sorry, see last comment [10:23] TheMue: Splitting is misleading [10:24] TheMue: RemoveServiceRelation would really remove the *relation*, not the service relation [10:24] TheMue: Can we have a ServiceRelation.Relation method that returns the *Relation, which can be removed? [10:25] niemeyer: Yes, is possible. [10:26] niemeyer: So one would RemoveRelation(relation) or RemoveRelation(serviceRelation.Relation())? [10:26] TheMue: Yeah, that looks very clear [10:26] niemeyer: OK, H5. [10:26] TheMue: Thanks [11:32] niemeyer: hey [11:32] I'm here [11:32] (aolmst). [11:32] almost [11:32] Aram: Heya [11:32] Aram: Where? :) [11:32] Aram: Can [11:32] 't see you [11:32] :) [11:32] I'm there in half an hour or so. [11:34] Aram: Woohay [11:48] hey again wrtp [11:48] yo! [12:21] niemeyer: The comments and the logic in my addRelation() (yes, will get a better name) are from/inspired by relation.py line 105 ff. It looks like an explanation why container scoped relations are handled elsewhere. [12:22] niemeyer: Sadly I don't know if there's a more elegant way to handle container scoped relations in the same context. [12:39] TheMue, is there any way to find out what units are assigned to what machines without topology access? [12:43] fwereade: As far as I see not. The assignment only modifies the topology. [12:55] TheMue, ok; this makes me fret slightly about the test, but I'll see how I go [12:56] fwereade: In a different case (not yet in trunk) I put a helper in export_test.go [12:57] TheMue, and I just found AssignedMachineId anyway, which I think gives me everything I need.. not sure how I missed that [12:57] TheMue, thanks :) [12:59] fwereade: np [13:58] niemeyer, do we want to retain the placement config setting for ec2? [14:29] fwereade: How do you mean? [14:30] niemeyer, we kept it in for 12.04 somewhat reluctantly as I recall [14:30] niemeyer, (allowing setting placement in ec2 config) [14:30] fwereade: Hmm.. oh, you mean in the yaml? [14:30] niemeyer, yeah [14:31] fwereade: I'm happy to delay it at this point [14:31] fwereade: But we may have to add it depending what people have been doing with it [14:31] niemeyer, sure, that shouldn't be too hard [14:32] niemeyer, once we have one environment setting others should be relatively simple to add [14:33] niemeyer, hmm, I'm feeling a strange reluctance to test a method that just returns a constant [14:34] niemeyer, in python I'd probably do it ithout thinking [14:34] niemeyer, am I being lazy or pragmatic [14:34] ? [14:47] fwereade: Hmm.. good question.. I'm tempted to suggest the test in this case [14:47] niemeyer, yeah, I decided to play it safe :) [14:47] fwereade: Mainly because it avoids the silly typo scenario [14:48] niemeyer, indeed [15:00] fwereade: 1:1 time? [15:25] robbiew, heyhey, sorry... cath's in bed and I had to pop out and get some stuff for her [15:25] robbiew, have I missed my slot? :( [15:26] fwereade: yes...but I can reslot you [15:26] ;) [15:27] robbiew, cool, when works for you? [15:53] fwereade: does tomorrow at the same time work for you? I can also do in little over an hour, but realize that could be a bit late for you [15:56] robbiew, tomorrow same time would probably be better if that's ok [15:56] sounds good [15:56] robbiew, 23h from now, right? [15:56] fwereade: yep [15:57] robbiew, great, thanks [22:46] wrtp: bit late for you mate [22:46] davecheney: yo! [22:46] davecheney: am in london [22:46] davecheney: i can go to bed when i want to [22:46] wrtp: that sounds like me in SF [22:46] davecheney: how's tricks? [22:47] wrtp: good, just polishing up my branches then I was going to go to the cafe for some breakfast [22:47] wrtp: had a good night with the lads ? [22:47] davecheney: yeah, and had some good discussions about the unit agent, upgrading &c [22:48] solid [22:50] wow...wrtp burning the midnight oil [22:51] assume you met with niemeyer and Aram? [22:51] robbiew: it's not midnight yet [22:51] almost though ;) [22:51] robbiew: yeah. aram is here in the room. [22:51] lol [22:51] tell him I said "hi" [22:51] robbiew: just did [22:51] ;) [22:51] hi [22:51] Aram: yo! [22:51] lol [22:52] Aram: don't take wrtp's lack of a life as the "norm"....it's okay not to work at 11pm ;) [22:52] heh, [22:52] indeed, only I am permitted to be up this late [22:53] well...and I usually lurk...and often get back on after "Dad Duties" ;) [22:53] btw...Mark Ramm officially starts tomorrow ;) [22:53] robbiew: cool. [22:53] * davecheney applauds [22:54] davecheney: here's a brief sketch of how i think upgrading should work: http://paste.ubuntu.com/1017166/ [22:55] wrtp: lgtm, extra points for not making the restarting the agents job [22:55] davecheney: oh yeah, forgot about that the machine agent needs to restart itself too: http://paste.ubuntu.com/1017167/ [22:56] seeing as we're approaching a quorum here [22:56] what does everyone think about adding a method like state.IsValid() ? [22:57] davecheney: current thought is that perhaps the machine agent should be responsible for starting the provisioning agent. [22:57] wrtp: SGTM, it wasn't clear if machine/0 had a MA [23:02] davecheney: just lost my connection. bizarrely Aram still saw everything, so i've now seen what you said... [23:02] davecheney: last i saw as "davecheney applauds" [23:02] s/as/was/ [23:02] wrtp: http://paste.ubuntu.com/1017173/ [23:02] davecheney: not sure what you mean by the isValid thing. [23:04] my version of that: http://paste.ubuntu.com/1017177/. irc (or is it just tcp??) is bizarre. [23:05] davecheney: machine 0 does have an MA, i think, but it doesn't do much currently. [23:06] davecheney: by making the MA responsible for starting the PA, we can move towards a place where the PA is eventually just a unit [23:12] wrtp: i like where this discussion is going [23:12] would that imply it has a UA as well ? [23:12] davecheney: probably [23:12] mmm, deliciously self referential [23:13] davecheney: mmm, i thought so too [23:13] davecheney: bootstrap it up and it eats its own tail [23:13] that is when you know it's working right, when there are no special cases [23:13] davecheney: i like the idea of the PA as a subordinate charm to the MA [23:14] davecheney: why not replicate the PA on every machine, as niemeyer suggested earlier? [23:14] no reason not too [23:14] hmm [23:15] might need a little bit of work in the state [23:15] currently the PA marks a machine as started by looking at the value of the 'provider-machine-id' key [23:15] well, s/started/claimed/g [23:16] but it's not unsolvable to support multiple PAs [23:16] davecheney: i think it's worth thinking about when implementing the PA [23:16] davecheney: because we need high availability even if we don't implement it as a service [23:17] making it a service woudl make it trivial to do, service PA add unit ; add unit ; add unit [23:17] then you have three PA's running [23:17] davecheney: xactly [23:17] which is a decent number [23:18] ok, i'll ponder the implications of that [23:18] davecheney: if it was subordinate to the machine service, then you'd have one PA on every machine. i think that may be overkill, but maybe it's just fine. [23:18] wrtp: that is one for the customers or product leads, to decide how to play that [23:19] the machine service doesn't run units and isn't a service, this the concept of subordinate isn't needed. A normal service unit running in the machine managed outside the ones invoked by the admin would work [23:20] the main problem with n PA's is storing the instance Id would have to become atomic [23:20] bcsaller: the machine provider doesn't run units? [23:20] which means the topology [23:20] bcsaller: i thouht that's more-or-less all it did [23:20] it runs unit agents which run units [23:20] but it has no services in the sense you're talking about [23:21] bcsaller: ok, i was thinking of a unit agent as a "unit", but i see [23:21] also I think you need to scale back the number of units running zookeeper, at 1000 machines, each with a zookeeper PA the inter-cluster traffic would be too high I suspect [23:21] bcsaller: i definitely wouldn't want each PA machine to be running zk too [23:21] unit agents have a principal service, that can spawn another unit agent running subordinate to it [23:23] bcsaller: the kind of thought we were kicking around today is that maybe that actually maps quite well to the relationship between the machine agent and the provider agent. probably totally crackful :-) [23:24] its not crack, running the PA stuff as services and units makes sense and is something we've wanted to do for a long time, both for HA and scale out [23:25] but it would be a unit of zookeeper and maybe a unit of some local storage service and a unit of the admin backend and so on running on some cross section of machines [23:25] where I think there is more than one juju internal service and they might scale differently [23:26] in that world there isn't a single PA I suspect [23:28] bcsaller: so for today my takeaway "good idea" was that the MA is primary and that we can allocate units to a machine and some might be containerised and some might be in the same container (e.g. subsidiary units) [23:28] bcsaller: and that that categorisation actually includes more-or-less everything other than the machine agent itself. [23:29] bcsaller: ... maybe. definitely a bit of late night hand waving going on. need to think more. [23:29] I think I see how you're thinking about it though [23:31] bcsaller: the missing link currently is that we have no way of specifying that several units should run containerised on the same machine (of course we need to solve the network issue first) [23:32] also missing is that we need something that does what the MA does or we can't build out units and hense services, making it hard to treat the MA as a service itself with things running subordinate to it [23:32] I'd rather promote the idea of container to a 1st class object in the system [23:33] "build out units and hense services" ? [23:33] bcsaller: ^ [23:33] sounded like you want to call the MA a service [23:33] which is very cyclic as its the thing that puts service units onto machines [23:34] so modeling it a service is not a clean fit today [23:34] bcsaller: that's true. but it doesn't put MAs onto machines [23:34] the PA? [23:34] bcsaller: yeah [23:36] bcsaller: (crack approaching fast!) so PA finds a new "machine unit", spawns a new machine to run the MA for that unit, which also looks for units allocated inside that unit. [23:36] bcsaller: containers now being first class, at least within the admin structure, of course. :-) [23:37] bcsaller: so in that sense, the MA becomes a glorified unit agent. [23:37] so it would look for containers assigned to that machine an set those up, which in some future world could be unpacking them from frozen lxc states (hows that for crack) [23:38] bcsaller: sounds like my kind of crack [23:38] heh [23:40] bcsaller: so... maybe there's actually no need for a machine agent at all. we can actually write the machine agent as a regular charm... [23:41] an interesting thought experiment anyway [23:42] if the PA can bring units up running enough code to deploy services, but that last part is what the MA does and thats the loop [23:42] so possible, but it means the images coming up might not be "clean images" [23:43] bcsaller: yup [23:43] bcsaller: they aren't clean right now [23:43] bcsaller: they've already got our shit running on 'em [23:44] we start from a clean image though, we install things as part of their cloud init, but it could be similarlly done. Still, that running bit is what we call the MA. [23:45] I sound like a broken record [23:45] and I just realized how dated that expression is [23:45] and now I feel old, thanks ;) [23:46] bcsaller: :- [23:46] ) [23:47] bcsaller: "their" cloud init? isn't it *our* cloud init? [23:47] that too [23:50] bcsaller: one interesting thought is the idea that we could have PAs that run on different providers. so we could have a genuinely cross-provider juju worm... [23:53] wrtp: you'd have to select container handling code specific to the arch as well, if for example some types of virtualization or isolation were not available. It might be that we allow additional charm metadata (similar to constraints) that say what features from a provider we depend on [23:54] that could apply to juju internal service charms initially but things like EC2 services in the charm metadata would be useful to users as well [23:54] bcsaller: makes sense. in fact if we have containers as first class, then what we can embed in a container comes from the kind of that container (you can't put LXC inside LXC for example) [23:55] they've worked hard to make that specific case mostly work, but yeah, I hear ya [23:55] bcsaller: oh, really, cool. well anyway, it's probably not so useful to allow it [23:56] but things like network isolation would apply at the container level as well (but possibly requires cross container machine modifications as well) [23:56] bcsaller: but i'm thinking we can model container placement as part of constraints perhaps [23:57] bcsaller: network connectivity is an interesting issue altogether [23:59] bcsaller, davecheney: i should probably stop now. i hear snores coming from the other side of the room... :-) [23:59] ha, ok, nite