/srv/irclogs.ubuntu.com/2012/05/31/#juju-dev.txt

rogpeppedavecheney: hiya07:36
davecheneyrogpeppe: howdy07:36
davecheneyrogpeppe: what's shaking ?07:38
rogpeppedavecheney: am sitting on platform waiting for train...07:38
rogpeppedavecheney: to go down to london and meet with Aram and niemeyer07:38
davecheneynioce07:38
davecheneyi saw niemeyer online form a french IP this morning07:39
davecheneywhats up with that07:39
davecheney?07:39
* rogpeppe is always a little bit amazed when gatewaying through a mobile phone actually works07:39
rogpeppedavecheney: he's in the uk, so that's a little bit odd07:39
davecheneycould just be the owner of the IP space07:39
rogpeppedavecheney: probably07:39
davecheneywhere abouts in london are you going ?07:40
rogpeppedavecheney: arrive kings cross. then to millbank tower where canonical lives (for the next week - they're moving out, so i'm glad i'll see it before they do; it's supposed to be a spectacular location)07:41
davecheneyare they moving somewhere more salubrious?07:41
rogpeppedavecheney: somewhere a little larger i think07:42
fwereadedavecheney, rogpeppe: the new place is just next to the tate modern I think, which sounds pretty cool :)07:42
rogpeppefwereade: cool. next to the river again then.07:42
fwereaderogpeppe, nearby, at least :)07:43
rogpeppefwereade: morning BTW!07:43
davecheneyvery nice07:43
rogpeppedavecheney, fwereade: was wondering about how we're going to handle upgrades07:44
davecheneyrogpeppe: that is a small question for a big topic07:44
rogpeppei wondered if we have a "version" field in zk. clients can watch that and if they see it's changed, they'll look for a new version and replace themselves with it07:45
rogpeppedoes it have to be any more complex than that?07:45
rogpeppethe zk tree will have to be backwards compatible anyway, i think.07:46
fwereaderogpeppe, at some stage we'll want to sync up all upgrades so we can, eg, change how something's stored in ZK07:46
davecheneyrogpeppe: will the agents run under some kind of process manager, like upstart ?07:46
rogpeppedavecheney: yes07:46
fwereaderogpeppe, but, yes, that sounds like a good start07:46
rogpeppefwereade: i think that would be relatively easy too07:47
rogpeppefwereade: set the version to "pending" and wait for the various agents to acknowledge07:47
davecheneyrogpeppe: fwereade so each agent is responsible for a symlink (or something) that points to the current version of their binary07:47
fwereaderogpeppe, yeah, my concern with this story is entirely in making the upgrade work even on machines that have a hyperactive toddler playing with their reset button07:48
rogpeppedavecheney: i think i'd just create a new upstart script when changing versions07:48
davecheneyand if they notice the value of version in zk, they look for a binary that matches it, change the symlink, then commit sepuku and upstart restarts them ?07:48
rogpeppedavecheney: i'm not sure the symlink is necessary07:49
fwereaderogpeppe, davecheney's approach sounds like it may be more reliable in an unhelpful environment07:49
fwereaderogpeppe, davecheney: but I don't think it covers the case where the args/env need to change07:49
davecheneyfwereade: i'd like to see a mixed approach, ie, the agents quit if they don't match the version07:49
davecheneyand dpkg handles installing the right version07:50
davecheneybut I don't think I understand how binaries get onto the machines07:50
rogpeppedavecheney: the cloudinit script downloads them initially07:50
davecheneyok, so no package manager07:50
rogpeppedavecheney: yeah07:50
rogpeppedavecheney: because they can come from a private s3 bucket07:51
rogpeppedavecheney: we've already got the logic for choosing versions07:51
rogpeppefwereade: things are a little harder for the unit agent, because there may be commands running07:52
rogpeppefwereade: i suppose it can wait until all commands have completed before upgrading itself07:52
rogpeppedavecheney: but i think you're right, i think there should be one thing responsible for actually downloading and restarting the s/w07:53
fwereaderogpeppe, yeah, I think so; telling the jujuc server to close and waiting for it should handle that case07:53
rogpeppedavecheney: and i think it should probably be the machine agent07:53
fwereaderogpeppe, +107:53
rogpeppefwereade: so i think from the machine agent's point of view, it might go like this:07:54
rogpeppefwereade: see version change; download new s/w; wait for all local agents to shut down; replace upstart scripts; restart everything (including self); exit07:56
rogpeppefwereade, davecheney: train arriving; signal might get dodgy07:56
fwereaderogpeppe, yeah, SGTM07:56
davecheneyrogpeppe: sounds good, and possibly could be reused across all agents07:57
fwereaderogpeppe, davecheney: if the MA knows what agents should be running on that machine (including the PA) I'd really prefer it if it were all handled by the MA07:57
davecheneyfwereade: sure, what i meant to say was, the 'watch version, download binary' might be reusable across all agents07:59
rogpeppedavecheney: i don't think anything other than the MA will need to do any downloading08:02
rogpeppedavecheney: all the binaries are bundled together08:02
rogpeppedavecheney: i *think* that all the other agents will need to do is wait for version change and exit (and possibly indicate that they're exiting)08:03
rogpeppei'm not entirely sure of the best way to stage the shutdown though08:05
rogpeppeis anyone still seeing this?08:06
davecheneyrogpeppe: that would be even better, much better separation08:06
rogpeppei haven't looked into upstart much. is it possible to wait for something to exit without automatically restarting it?08:07
rogpeppehmm, perhaps it could rewrite the upstart script08:08
davecheneyrogpeppe: I think, but I'm no expert08:09
rogpeppedavecheney: i'm pretty sure that upstart monitors the scripts in /etc08:09
davecheneyrogpeppe: yup, and it will reload _its_ representation of them08:09
fwereaderogpeppe, you can add a job-name.override containing "manual" to /etc/init08:10
fwereaderogpeppe, I think you need to explicitly stop it though08:10
* davecheney reads http://upstart.ubuntu.com/cookbook/08:11
* rogpeppe starts to download it08:11
* fwereade feels for davecheney (not that it's bad, I'm really happy it exists, but I've always found figuring upstart stuff out harder than it should be)08:11
fwereademorning niemeyer08:13
niemeyerfwereade: Heya!08:29
niemeyerMorning all!08:29
rogniemeyer: yo!08:31
niemeyerrog: Heya08:31
niemeyerrog: Where are you? :)08:31
rogniemeyer: we were just having a chat about upgrading08:32
rogniemeyer: good. on the train, so intermittent connectivity08:32
niemeyerrog: That's awesome :)08:32
rogniemeyer: i think we may have something approaching a plan for upgrades08:33
niemeyerrog: Sounds great, let's talk this afternoon08:33
rogniemeyer: yeah08:34
fwereadeTheMue, I just had a thought which would have been a lot more helpful 2 weeks ago08:44
davecheneynight all08:45
fwereadeTheMue, niemeyer: I'm wondering whether we currently have any reason at all to explicitly allow addition of peer relations from outside state08:45
fwereadeTheMue, niemeyer: because services with peer relations should *always* have their peer relations set up, and that should perhaps be rolled into AddService08:46
fwereadeTheMue, niemeyer: (rather than being tacked onto the deploy command, which always felt a little off)08:48
TheMuefwereade: How would that look like?08:50
fwereadeTheMue, AddRelation would always take 2 args; AddService would contain more code for setting up the relation (and I guess defer the topology change to include both the service and the relation as the last step)08:51
fwereadeTheMue, AddService has access to the charm already, I think08:52
TheMuefwereade: Have to take a deeper look.08:53
TheMuefwereade: AddService() then has to check if it's a peer or not. So more complexity there.08:53
fwereadeTheMue, just a thought08:55
TheMuefwereade: Maye a good one, don't want to break it. ;)08:55
TheMuefwereade: Just have to understand it more.08:56
niemeyerfwereade: That's a fantastic question, actually, and I want to ponder about it this afternoon too08:56
niemeyerfwereade: Because we're breaking an assumption that was made in the original code, that I'm not sure makes sense08:56
niemeyerfwereade: The original code did allow for multiple peer relations08:57
niemeyerfwereade: the new one does not08:57
niemeyerfwereade: I'm not sure we want to do that08:57
niemeyerfwereade: I'm a bit concerned right now, actually, because the model change in the topology will make it painful to bring that back08:57
niemeyerWhich means we'll have to rethink the original thinking again08:58
niemeyerAs breaking compatibility to introduce that would be bad08:58
niemeyerTheMue: ^^^08:58
TheMueniemeyer: Yep, seen.08:58
niemeyerTheMue: We may have to redo the topoRelation stuff once mroe08:58
niemeyermore08:58
niemeyerTheMue: But let's get this branch to the end of the line anyway08:58
TheMueniemeyer: No problem.08:58
niemeyerTheMue: (with the current logic)08:58
niemeyerTheMue: It'll be easier to refactor back to the original model, if we have to, than to keep that huge change flying for much longer08:59
niemeyerfwereade: Thanks for bringing that up08:59
fwereadeniemeyer, a pleasure, hope it proves fruitful :)08:59
niemeyerfwereade: Already has!08:59
niemeyerfwereade: At least we'll know what we're doing, rather than blindly finding out down the road that we did a mistake on the transition09:00
fwereadeniemeyer, yeah :)09:00
fwereadegents: I spent much of last night getting friendly with mosquitoes, so I'm taking a walk in the sun to remind my body it's daytime; bbs09:04
fwereadejust proposed https://codereview.appspot.com/6245075 if anyone's of a mind09:04
niemeyerfwereade: Enjoy the walk09:57
fwereadeniemeyer, I did :)09:57
niemeyerfwereade: Oh, hey, it's been a while :-)09:57
fwereadeniemeyer, did I miss anything? looked empty...09:57
niemeyerfwereade: Hm?09:58
fwereadeniemeyer, don't worry, I think I misunderstood what you said09:58
niemeyerfwereade: I was alluding to my complete lack of sensibility related to the timing of your previous comment09:59
fwereadeniemeyer, yeah, I get that now :)09:59
niemeyerfwereade: Review delivered10:19
fwereadeniemeyer, cheers10:19
TheMueniemeyer: Thx too10:21
TheMueniemeyer: And ok, will split. ;)10:23
niemeyerTheMue: Sorry, see last comment10:23
niemeyerTheMue: Splitting is misleading10:23
niemeyerTheMue: RemoveServiceRelation would really remove the *relation*, not the service relation10:24
niemeyerTheMue: Can we have a ServiceRelation.Relation method that returns the *Relation, which can be removed?10:24
TheMueniemeyer: Yes, is possible.10:25
TheMueniemeyer: So one would RemoveRelation(relation) or RemoveRelation(serviceRelation.Relation())?10:26
niemeyerTheMue: Yeah, that looks very clear10:26
TheMueniemeyer: OK, H5.10:26
niemeyerTheMue: Thanks10:26
Aramniemeyer: hey11:32
AramI'm here11:32
Aram(aolmst).11:32
Aramalmost11:32
niemeyerAram: Heya11:32
niemeyerAram: Where? :)11:32
niemeyerAram: Can11:32
niemeyer't see you11:32
niemeyer:)11:32
AramI'm there in half an hour or so.11:32
niemeyerAram: Woohay11:34
fwereadehey again wrtp11:48
wrtpyo!11:48
TheMueniemeyer: The comments and the logic in my addRelation() (yes, will get a better name) are from/inspired by relation.py line 105 ff. It looks like an explanation why container scoped relations are handled elsewhere.12:21
TheMueniemeyer: Sadly I don't know if there's a more elegant way to handle container scoped relations in the same context.12:22
fwereadeTheMue, is there any way to find out what units are assigned to what machines without topology access?12:39
TheMuefwereade: As far as I see not. The assignment only modifies the topology.12:43
fwereadeTheMue, ok; this makes me fret slightly about the test, but I'll see how I go12:55
TheMuefwereade: In a different case (not yet in trunk) I put a helper in export_test.go12:56
fwereadeTheMue, and I just found AssignedMachineId anyway, which I think gives me everything I need.. not sure how I missed that12:57
fwereadeTheMue, thanks :)12:57
TheMuefwereade: np12:59
fwereadeniemeyer, do we want to retain the placement config setting for ec2?13:58
niemeyerfwereade: How do you mean?14:29
fwereadeniemeyer, we kept it in for 12.04 somewhat reluctantly as I recall14:30
fwereadeniemeyer, (allowing setting placement in ec2 config)14:30
niemeyerfwereade: Hmm.. oh, you mean in the yaml?14:30
fwereadeniemeyer, yeah14:30
niemeyerfwereade: I'm happy to delay it at this point14:31
niemeyerfwereade: But we may have to add it depending what people have been doing with it14:31
fwereadeniemeyer, sure, that shouldn't be too hard14:31
fwereadeniemeyer, once we have one environment setting others should be relatively simple to add14:32
fwereadeniemeyer, hmm, I'm feeling a strange reluctance to test a method that just returns a constant14:33
fwereadeniemeyer, in python I'd probably do it ithout thinking14:34
fwereadeniemeyer, am I being lazy or pragmatic14:34
fwereade?14:34
niemeyerfwereade: Hmm.. good question.. I'm tempted to suggest the test in this case14:47
fwereadeniemeyer, yeah, I decided to play it safe :)14:47
niemeyerfwereade: Mainly because it avoids the silly typo scenario14:47
fwereadeniemeyer, indeed14:48
robbiewfwereade: 1:1 time?15:00
fwereaderobbiew, heyhey, sorry... cath's in bed and I had to pop out and get some stuff for her15:25
fwereaderobbiew, have I missed my slot? :(15:25
robbiewfwereade: yes...but I can reslot you15:26
robbiew;)15:26
fwereaderobbiew, cool, when works for you?15:27
robbiewfwereade: does tomorrow at the same time work for you?  I can also do in little over an hour, but realize that could be a bit late for you15:53
fwereaderobbiew, tomorrow same time would probably be better if that's ok15:56
robbiewsounds good15:56
fwereaderobbiew, 23h from now, right?15:56
robbiewfwereade: yep15:56
fwereaderobbiew, great, thanks15:57
davecheneywrtp: bit late for you mate22:46
wrtp  davecheney: yo!22:46
wrtpdavecheney: am in london22:46
wrtpdavecheney: i can go to bed when i want to22:46
davecheneywrtp: that sounds like me in SF22:46
wrtpdavecheney: how's tricks?22:46
davecheneywrtp: good, just polishing up my branches then I was going to go to the cafe for some breakfast22:47
davecheneywrtp: had a good night with the lads ?22:47
wrtpdavecheney: yeah, and had some good discussions about the unit agent, upgrading &c22:47
davecheneysolid22:48
robbiewwow...wrtp burning the midnight oil22:50
robbiewassume you met with niemeyer and Aram?22:51
wrtprobbiew: it's not midnight yet22:51
robbiewalmost though ;)22:51
wrtprobbiew: yeah. aram is here in the room.22:51
robbiewlol22:51
robbiewtell him I said "hi"22:51
wrtprobbiew: just did22:51
robbiew;)22:51
Aramhi22:51
wrtpAram: yo!22:51
wrtplol22:51
robbiewAram: don't take wrtp's lack of a life as the "norm"....it's okay not to work at 11pm ;)22:52
Aramheh,22:52
davecheneyindeed, only I am permitted to be up this late22:52
robbiewwell...and I usually lurk...and often get back on after "Dad Duties" ;)22:53
robbiewbtw...Mark Ramm officially starts tomorrow ;)22:53
wrtprobbiew: cool.22:53
* davecheney applauds22:53
wrtpdavecheney: here's a brief sketch of how i think upgrading should work: http://paste.ubuntu.com/1017166/22:54
davecheneywrtp: lgtm, extra points for not making the restarting the agents job22:55
wrtpdavecheney: oh yeah, forgot about that the machine agent needs to restart itself too: http://paste.ubuntu.com/1017167/22:55
davecheneyseeing as we're approaching a quorum here22:56
davecheneywhat does everyone think about adding a method like state.IsValid() ?22:56
wrtpdavecheney: current thought is that perhaps the machine agent should be responsible for starting the provisioning agent.22:57
davecheneywrtp: SGTM, it wasn't clear if machine/0 had a MA22:57
wrtpdavecheney: just lost my connection. bizarrely Aram still saw everything, so i've now seen what you said...23:02
wrtpdavecheney: last i saw as "davecheney applauds"23:02
wrtps/as/was/23:02
Aramwrtp: http://paste.ubuntu.com/1017173/23:02
wrtpdavecheney: not sure what you mean by the isValid thing.23:02
wrtpmy version of that: http://paste.ubuntu.com/1017177/. irc (or is it just tcp??) is bizarre.23:04
wrtpdavecheney: machine 0 does have an MA, i think, but it doesn't do much currently.23:05
wrtpdavecheney: by making the MA responsible for starting the PA, we can move towards a place where the PA is eventually just a unit23:06
davecheneywrtp: i like where this discussion is going23:12
davecheneywould that imply it has a UA as well ?23:12
wrtpdavecheney: probably23:12
davecheneymmm, deliciously self referential23:12
wrtpdavecheney: mmm, i thought so too23:13
wrtpdavecheney: bootstrap it up and it eats its own tail23:13
davecheneythat is when you know it's working right, when there are no special cases23:13
wrtpdavecheney: i like the idea of the PA as a subordinate charm to the MA23:13
wrtpdavecheney: why not replicate the PA on every machine, as niemeyer suggested earlier?23:14
davecheneyno reason not too23:14
davecheneyhmm23:14
davecheneymight need a little bit of work in the state23:15
davecheneycurrently the PA marks a machine as started by looking at the value of the 'provider-machine-id' key23:15
davecheneywell, s/started/claimed/g23:15
davecheneybut it's not unsolvable to support multiple PAs23:16
wrtpdavecheney: i think it's worth thinking about when implementing the PA23:16
wrtpdavecheney: because we need high availability even if we don't implement it as a service23:16
davecheneymaking it a service woudl make it trivial to do, service PA add unit ; add unit ; add unit23:17
davecheneythen you have three PA's running23:17
wrtpdavecheney: xactly23:17
davecheneywhich is a decent number23:17
davecheneyok, i'll ponder the implications of that23:18
wrtpdavecheney: if it was subordinate to the machine service, then you'd have one PA on every machine. i think that may be overkill, but maybe it's just fine.23:18
davecheneywrtp: that is one for the customers or product leads, to decide how to play that23:18
bcsallerthe machine service doesn't run units and isn't a service, this the concept of subordinate isn't needed. A normal service unit running in the machine managed outside the ones invoked by the admin would work23:19
davecheneythe main problem with n PA's is storing the instance Id would have to become atomic23:20
wrtpbcsaller: the machine provider doesn't run units?23:20
davecheneywhich means the topology23:20
wrtpbcsaller: i thouht that's more-or-less all it did23:20
bcsallerit runs unit agents which run units23:20
bcsallerbut it has no services in the sense you're talking about23:20
wrtpbcsaller: ok, i was thinking of a unit agent as a "unit", but i see23:21
bcsalleralso I think you need to scale back the number of units running zookeeper, at 1000 machines, each with a zookeeper PA the inter-cluster traffic would be too high I suspect23:21
wrtpbcsaller: i definitely wouldn't want each PA machine to be running zk too23:21
bcsallerunit agents have a principal service, that can spawn another unit agent running subordinate to it23:21
wrtpbcsaller: the kind of thought we were kicking around today is that maybe that actually maps quite well to the relationship between the machine agent and the provider agent. probably totally crackful :-)23:23
bcsallerits not crack, running the PA stuff as services and units makes sense and is something we've wanted to do for a long time, both for HA and scale out23:24
bcsallerbut it would be a unit of zookeeper and maybe a unit of some local storage service and a unit of the admin backend and so on running on some cross section of machines23:25
bcsallerwhere I think there is more than one juju internal service and they might scale differently23:25
bcsallerin that world there isn't a single PA I suspect23:26
wrtpbcsaller: so for today my takeaway "good idea" was that the MA is primary and that we can allocate units to a machine and some might be containerised and some might be in the same container (e.g. subsidiary units)23:28
wrtpbcsaller: and that that categorisation actually includes more-or-less everything other than the machine agent itself.23:28
wrtpbcsaller: ... maybe. definitely a bit of late night hand waving going on. need to think more.23:29
bcsallerI think I see how you're thinking about it though23:29
wrtpbcsaller: the missing link currently is that we have no way of specifying that several units should run containerised on the same machine (of course we need to solve the network issue first)23:31
bcsalleralso missing is that we need something that does what the MA does or we can't build out units and hense services, making it hard to treat the MA as a service itself with things running subordinate to it23:32
bcsallerI'd rather promote the idea of container to a 1st class object in the system23:32
wrtp"build out units and hense services" ?23:33
wrtpbcsaller: ^23:33
bcsallersounded like you want to call the MA a service23:33
bcsallerwhich is very cyclic as its the thing that puts service units onto machines23:33
bcsallerso modeling it a service is not a clean fit today23:34
wrtpbcsaller: that's true. but it doesn't put MAs onto machines23:34
bcsallerthe PA?23:34
wrtpbcsaller: yeah23:34
wrtpbcsaller: (crack approaching fast!) so PA finds a new "machine unit", spawns a new machine to run the MA for that unit, which also looks for units allocated inside that unit.23:36
wrtpbcsaller: containers now being first class, at least within the admin structure, of course. :-)23:36
wrtpbcsaller: so in that sense, the MA becomes a glorified unit agent.23:37
bcsallerso it would look for containers assigned to that machine an set those up, which in some future world could be unpacking them from frozen lxc states (hows that for crack)23:37
wrtpbcsaller: sounds like my kind of crack23:38
bcsallerheh23:38
wrtpbcsaller: so... maybe there's actually no need for a machine agent at all. we can actually write the machine agent as a regular charm...23:40
wrtpan interesting thought experiment anyway23:41
bcsallerif the PA can bring units up running enough code to deploy services, but that last part is what the MA does and thats the loop23:42
bcsallerso possible, but it means the images coming up might not be "clean images"23:42
wrtpbcsaller: yup23:43
wrtpbcsaller: they aren't clean right now23:43
wrtpbcsaller: they've already got our shit running on 'em23:43
bcsallerwe start from a clean image though, we install things as part of their cloud init, but it could be similarlly done. Still, that running bit is what we call the MA.23:44
bcsallerI sound like a broken record23:45
bcsallerand I just realized how dated that expression is23:45
bcsallerand now I feel old, thanks ;)23:45
wrtpbcsaller: :-23:46
wrtp)23:46
wrtpbcsaller: "their" cloud init? isn't it *our* cloud init?23:47
bcsallerthat too23:47
wrtpbcsaller: one interesting thought is the idea that we could have PAs that run on different providers. so we could have a genuinely cross-provider juju worm...23:50
bcsallerwrtp: you'd have to select container handling code specific to the arch as well, if for example some types of virtualization or isolation were not available. It might be that we allow additional charm metadata (similar to constraints) that say what features from a provider we depend on23:53
bcsallerthat could apply to juju internal service charms initially but things like EC2 services in the charm metadata would be useful to users as well23:54
wrtpbcsaller: makes sense. in fact if we have containers as first class, then what we can embed in a container comes from the kind of that container (you can't put LXC inside LXC for example)23:54
bcsallerthey've worked hard to make that specific case mostly work, but yeah, I hear ya23:55
wrtpbcsaller: oh, really, cool. well anyway, it's probably not so useful to allow it23:55
bcsallerbut things like network isolation would apply at the container level as well (but possibly requires cross container machine modifications as well)23:56
wrtpbcsaller: but i'm thinking we can model container placement as part of constraints perhaps23:56
wrtpbcsaller: network connectivity is an interesting issue altogether23:57
wrtpbcsaller, davecheney: i should probably stop now. i hear snores coming from the other side of the room... :-)23:59
bcsallerha, ok, nite23:59

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!