/srv/irclogs.ubuntu.com/2012/03/19/#juju-dev.txt

bigjoolsfwereade__: around?06:19
fwereade__bigjools, heyhey07:25
bigjoolsfwereade__: hello!07:25
fwereade__bigjools, how goes it?07:25
bigjoolsfwereade__: desperately need your help, I am trying to fix that system_id thing07:26
bigjoolstotally failing so far07:26
bigjoolsdo you have some time?07:26
fwereade__bigjools, yeah, I'm not sure I explained myself very clearly07:26
fwereade__bigjools, ofc07:26
bigjoolsthank you, ok let me explain where I got to07:26
fwereade__bigjools, cool07:26
bigjoolsfirstly - I am unclear on how to test this in a QA environment since juju checks code out of Launchpad (!)07:27
bigjoolsthat's utterly bizarre07:27
fwereade__bigjools, yeah, that underlying bug has lead be to break every-juju-in-the-world twice now :/07:28
bigjoolswhen LP is down, by any chance?07:28
bigjoolsor broken trunk07:28
fwereade__bigjools, nah, not even *broken*... just a significant-enough change in trunk can be deadly07:29
bigjoolsanyway, I can't work out how to get my branch's code on there to test07:29
fwereade__bigjools, you should be able to use the origin field in environments.yaml07:29
fwereade__    juju-origin: lp:~fwereade/juju/set-service-constraints07:30
fwereade__bigjools, which it, well, god enough for testing07:30
bigjoolsoh boy, ok :)07:30
bigjoolsnext question, what do you know about cloud-init? :)07:30
fwereade__bigjools, um, embarrassingly little :(07:31
bigjoolsfair enough, I am trying to work out why it's crashing when I boot my node :(07:31
fwereade__bigjools, my go-to technique is "vdiff with a known-good one and see if anything jumps out at me"07:31
bigjoolsoh at which level does that config go BTW?07:31
fwereade__bigjools, that's inside a given environment07:32
bigjoolsok07:32
rogpeppemornin' campers07:32
fwereade__heya rogpeppe07:32
bigjoolsso at the same level as admin-secret et al?07:32
fwereade__bigjools, do you know exactly what is crashing on boot?07:32
bigjoolsI don't, the logs are useless unfortunately07:33
fwereade__bigjools, is that the traceback you sent in the mail or something else?07:33
bigjoolsit just says it exited with status 107:33
bigjoolsno traceback07:33
fwereade__bigjools, sorry... what exited with status 1?07:33
bigjoolsno this is cloud-init crashing now, not juju07:33
fwereade__bigjools, ah-ha07:33
bigjoolswhich is a pre-requisite to getting as far as juju :)_07:33
fwereade__bigjools, quite so07:33
bigjoolsI suspect I need Daviey07:34
fwereade__bigjools, would you pastebin me the cloud-init file, just in case?07:34
bigjoolsfwereade__: you want the user-data it's using?07:35
fwereade__bigjools, just in case anything leaps out at me07:35
fwereade__bigjools, btw, are you using system_id as instance id throughout now?07:35
bigjoolswell I changed it in launch.py but as I said, not even getting close to testing that ATM07:36
bigjoolstoo many other cloud-init changes have broken things I think07:36
fwereade__bigjools, if that's all you changed and it's now killing cloud-init, it sounds interesting07:36
fwereade__bigjools, ah, sorry, what else has changed?07:37
bigjoolsnot entirely sure tbh, the server guys have been busy!07:37
fwereade__bigjools, ha, ok:)07:37
bigjoolsbtw why on earth is it branching code on the master node anyway? can a tarball not be pushed through?  even bzr serve on the client end would be better!07:38
fwereade__bigjools, we absolutely need a sane use-the-same-code-everywhere-in-an-env story07:39
bigjoolsno kidding :)07:39
fwereade__bigjools, thinking out loud, I presume you don't know where cloud-init crashes?07:43
bigjoolsfwereade__: I don't07:44
bigjoolsstuff flashes up on the guest's console ... AHA07:44
fwereade__bigjools, can you look on the instance and make inferences based on what's installed so far though?07:44
rogpeppefwereade__: hiya07:44
bigjoolsvt7 has a traceback07:44
fwereade__bigjools, cool07:45
bigjoolsImportError: No module named DataSourceMAAS.  It's the freaking s/MaaS/MAAS/ that happened recently.07:45
* bigjools takes it to the right channel ... :)07:45
rogpeppefwereade__: i just got that transient testing error too. i think i'll just choose a port at random.07:45
fwereade__rogpeppe, cool, sounds good07:46
fwereade__bigjools, ouch :(07:46
fwereade__bigjools, still, progress :)07:46
bigjoolsfwereade__: yeah, I can attack this now, I'll be back with you later!  Although having said that when I did a check-seed on the node, the user-data still had env JUJU_MACHINE_ID="0"07:47
fwereade__bigjools, that should be there -- that's what it uses to poke the "machine 0 is already provisioned" data in07:47
fwereade__bigjools, but it needs an instance id as well07:47
bigjoolsoh it's not the system_id then?07:47
bigjoolswhere is instance_id conveyed?07:48
fwereade__bigjools, nah, sorry: we have machine ids which are basically just ints, and instance ids which are provider-dependent07:48
fwereade__bigjools, instance_id is sent in through set_instance_id_accessor and I *think* it's only used in the `juju-admin initialize` script07:49
bigjoolsoh from zk07:49
bigjoolsI see it now, it's set07:49
fwereade__bigjools, cool, and it's a system_id?07:49
bigjoolsok let me fix cloud-init and then I can test this07:49
bigjoolsit is :)07:49
fwereade__bigjools, sweet07:50
fwereade__bigjools, it's a public holiday for me today but I'm working the first half so I'll be around for a few hours more07:50
fwereade__bigjools, just ping me if you need anything07:50
bigjoolsfwereade__: ah ok thanks, very much appreciated07:50
fwereade__bigjools, a pleasure :)07:50
fwereade__rogpeppe, btw, I had a thought over the weekend: one of the big problems with the hook package is its name08:04
fwereade__rogpeppe, because hooks themselves are really only very tangentially related to what it's doing08:04
* rogpeppe always likes a good name change08:04
fwereade__rogpeppe, I'm starting to think that the best place for this code is cmd/server08:05
fwereade__rogpeppe, but there's probably an even better place I haven't though of yet08:05
rogpeppefwereade__: does this code actually need its own package in fact?08:05
rogpeppefwereade__: couldn't it just go into the unit agent package08:06
fwereade__rogpeppe, we don't have a unit agent package: you didn't want one :p08:06
rogpeppefwereade__: lol08:06
rogpeppefwereade__: well, then in the place that has that08:06
fwereade__rogpeppe, that's in cmd/jujud and I don't think that's the right place08:07
rogpeppefwereade__: no?08:07
fwereade__rogpeppe, I have a forthcoming cmd/server which is only connected to jujud in that a process invoked by jujud will happen to run the server08:07
fwereade__rogpeppe, and it's starting to seem that the server, the tool execution context, and the tool implementations themselves should probably all go in there08:08
rogpeppefwereade__: sorry, i think i lost the implication: cmd/server is a command?08:08
fwereade__rogpeppe, it may be that we want a main package/fun in cmd/jujuc, and then to stick it in cmd/jujuc/server08:09
fwereade__rogpeppe, it's not really, no, but it is a "command server" and it "serves" cmd/Commands08:09
fwereade__rogpeppe, ...but they're purely for use by jujuc, so jujuc/server may be clearer08:10
rogpeppefwereade__: i think that for our own sanity the subdirectories under cmd should all be main packages08:10
rogpeppebut cmd/jujuc/server might work08:10
fwereade__rogpeppe, it seemed that if I tried to add cmd/jujuc/server when there wasn't any code in cmd/jujuc, go just ignored it08:10
fwereade__rogpeppe, is that expected or did I do something wrong?08:10
rogpeppefwereade__: ignored it when you did what?08:11
fwereade__rogpeppe, go didn't run the tests in cmd/jujuc/server when I put the server code in there with jujuc otherwise empty08:11
rogpeppefwereade__: if it does that it, it's a bug08:11
fwereade__rogpeppe, hm, that was when running go test .../cmd/...08:11
rogpeppefwereade__: it should still work. let me check.08:12
fwereade__rogpeppe, or more likely that I did something wrong :p08:12
fwereade__rogpeppe, but I was expecting at least a "you're stupid, I'm not doing that" message08:12
rogpeppefwereade__: it works for me.08:13
fwereade__rogpeppe, then I guess I did something stupid, cmd/jujuc/server it shall be (if that makes sense to you?)08:13
rogpeppefwereade__: yeah, that makes sense. it's the server side of the jujuc commands.08:14
fwereade__rogpeppe, and so that'll have Server, Context, and a whole bunch of things like LogCommand and RelationSetCommand08:15
rogpeppefwereade__: the only hesitation i have is you might want it to depend on stuff internal to jujud.08:15
fwereade__rogpeppe, go on... such as?08:16
rogpeppefwereade__: just a hunch08:16
rogpeppefwereade__: depends how closely the callback commands interact with the stuff in the unit agent.08:17
rogpeppefwereade__: i guess anything we need can go in an interface, and that should be fine.08:18
fwereade__rogpeppe, I'm not seeing it yet; I think that when I figure out precisely what is responsible for the socket things may be rearranged slightly08:18
fwereade__rogpeppe, but the only things I expect them to hit are log and state08:18
fwereade__rogpeppe, I'm not sure quite how the state will get there yet but I think that's a future consideration08:19
rogpeppefwereade__: if that's the case, it's a nice clean separation and +108:19
fwereade__rogpeppe, it's my intent anyway :)08:19
rogpeppefwereade__: well the server stuff is invoked by jujud, right?08:19
rogpeppefwereade__: (well, the unit agent within jujud)08:19
fwereade__rogpeppe, it will be but I don't yet know exactly how08:19
rogpeppefwereade__: ok08:19
fwereade__rogpeppe, and my own hunch says that we will start to want a unit agent package around the time it all starts to get hooked up08:20
fwereade__rogpeppe, incidentally, one nice effect of go I'm coming to appreciate:08:20
rogpeppefwereade__: maybe. i'm still thinking the agents are small enough they can live inside jujud. but we'll see.08:21
fwereade__rogpeppe, no-unused-imports means that just by opening a file and seeing a bunch of unrelated imports you detect a smell08:21
rogpeppefwereade__: yeah08:21
fwereade__rogpeppe, the unit agent is I think big enough that it'll feel wrong08:21
fwereade__rogpeppe, all the lifecycle and workflow and scheduler  stuff basically08:22
rogpeppefwereade__: yeah, maybe you're right.08:22
fwereade__rogpeppe, the MA and the PA are probably compact enough they wouldn't feel bad really08:22
* rogpeppe goes to see how many lines of code the python version is08:22
fwereade__rogpeppe, I could very well be wrong -- ATM the code run directly by the UA is smeared across juju.hook and juju.unit (in addition to all the state stuff etc)08:23
fwereade__rogpeppe, but perhaps it isn't actually *big* enough to warrant its own package and I'm just responding to the unclear factoring08:23
rogpeppe fwereade__: yeah, that's quite a lot of code actually.08:24
rogpeppefwereade__: i'm wondering that with server and jujuc factored out the actual core unit agent code might be reasonably compact.08:24
rogpeppefwereade__: i.e. the core lifecycle, workflow and scheduler stuff.08:25
* fwereade__ is cautiously optimistic08:25
rogpeppefwereade__: it *feels* compact in my head, but that's probably because i'm not familiar with it :-)08:25
fwereade__rogpeppe, it's fiddlier than it looks08:25
fwereade__rogpeppe, as I discovered when I thought "yeah, I'll pick up agent upstartification, how hard can it be?"08:26
rogpeppefwereade__: yeah. it's probably the fiddliest bit of the whole system, right?08:26
fwereade__rogpeppe, yeah, I think so08:26
rogpeppefwereade__: but i guess it's that bit which is really what makes juju juju.08:27
fwereade__rogpeppe, but *even then* I think it's that the unit agent itself is intrinsically fiddly, and so a jujud/unit subpackage might be just the ticket08:27
fwereade__rogpeppe, yeah, it's all about the agents :)08:28
rogpeppefwereade__: i was thinking its all about mapping juju state transitions to shell scripts...08:28
fwereade__rogpeppe, there are indeed many valid perspectives :)08:29
rogpeppefwereade__: a review for you, if you choose to accept it: https://codereview.appspot.com/5853048/08:35
fwereade__:)08:35
fwereade__rogpeppe, I have a few from the other day08:35
rogpeppefwereade__: unfortunately it breaks the environs/ec2 amazon tests. but i think fixing that is for another review.08:36
fwereade__rogpeppe, I hope you like how hook/context turned out after discussing with niemeyer for a while08:36
rogpeppefwereade__: oh yeah, from friday. i'll have a look - i've been pointedly avoiding looking at my email this morning...08:36
rogpeppefwereade__: oh, i did see that you'd made some changes that i wasn't expecting08:37
fwereade__rogpeppe, as long as we don't end up *merging* broken stuff I'm fine with that :)08:37
rogpeppefwereade__: ExecInfo went away - i'm happy to see it, but i didn't see any discussion about it.08:37
fwereade__rogpeppe, the crucial insight is that this really is only very slightly related to hooks in the first place08:37
rogpeppefwereade__: was that your G+ conversation with gustavo?08:37
fwereade__rogpeppe, but it took me a while longer to think "maybe this shouldn't be in the "hook" package at all08:37
fwereade__rogpeppe, that was what crystallised it, yeah08:38
rogpeppefwereade__: cool. i was like "i thought i didn't manage to convince you, but you've gone and done it anyway... how did *that* happen?!"08:38
fwereade__rogpeppe, and it now makes me think that Context.ExecHook is what we'll need in the end but until it has a client I'm comfortable as it is08:38
rogpeppefwereade__: yeah, i'm happy how it looks now.08:39
fwereade__rogpeppe, the leap was too great for me to see while I was still thinking it was about hooks08:39
fwereade__rogpeppe, once you forget about hooks the rightness of your approach is clear08:40
rogpeppefwereade__: i still quite liked Exec and vars being methods on Context.08:40
fwereade__rogpeppe, if you're OK with that I'll gladly put them back on08:40
rogpeppefwereade__: yeah, i'm very happy with that.08:40
rogpeppefwereade__: they're tied closely enough to Context that i think they work well as methods on it.08:41
rogpeppefwereade__: and it's trivial to factor them out later if we want.08:42
rogpeppes/want/need/08:42
fwereade__rogpeppe, I'm thinking that if I do that I will move them into cmd/jujuc/server as well, may as well start as I mean to go on08:44
fwereade__rogpeppe, at which point I think the methods actually become ExecHook and hookVars08:45
rogpeppefwereade__: doesn't Context move into jujuc/server too?08:46
fwereade__rogpeppe, yes, exactly08:46
rogpeppefwereade__: so they can still be Context.Exec and Context.vars if you like08:47
fwereade__rogpeppe, I'm not sure, I think they become an "alien" concept once it's under jujuc08:47
rogpeppefwereade__: hmm, i dunno. if they were appropriate as methods on Context before, i don't really see why that's changed when Context has moved.08:48
fwereade__rogpeppe, sorry: they're still context methods, but they should change their names to make it clear that they're about hooks (not the jujuc tools themselves, which will only be called as side effects if you like)08:49
rogpeppefwereade__: ok, that makes sense.08:50
fwereade__rogpeppe, cool08:50
rogpeppefwereade__: one thought: maybe "RunHook" rather than "ExecHook"08:50
fwereade__rogpeppe, perfect08:50
bigjoolsfwereade__: http://pastebin.ubuntu.com/890372/08:57
bigjoolsfun!08:57
fwereade__bigjools, we're making progress though09:04
bigjoolsslow!09:04
fwereade__bigjools, I think that just means that the resource-uri/system-id confusion ran deeper09:04
fwereade__bigjools, would a resource-uri be unique and immutable in the same way as system-id is?09:05
fwereade__bigjools, if so it is probably a more convenient representation and would allow you to forget about system-id entirely?09:06
bigjoolsyes, resource_uri is just a URL with the system_id in there somewhere09:07
fwereade__bigjools, ok: that makes it sound like you can drop the notion of system-id entirely and just use resource-uri as instance_id throughout09:07
fwereade__bigjools, sorry poor advice before09:08
bigjoolsI am seriously confused09:08
fwereade__bigjools, sorry, let me step back a mo09:08
fwereade__bigjools, a juju machine id is really entirely abstract -- it's a predictable way for us to refer to specific machines internally, regardless of whether or not they're actually provisioned09:09
fwereade__bigjools, so it's basically just an int09:09
fwereade__bigjools, we maintain a mapping between machine ids and provider-specific instance ids (I forget exactly how it's stored)09:10
fwereade__bigjools, and the provisioning agent keeps an eye on that mapping09:10
bigjoolsok so far09:11
fwereade__bigjools, and provisions new instances in response to seeing machine states which *aren't* yet associated to an instance09:11
fwereade__bigjools, once it's provisioned an instance for a juju machine, it sticks it in the mapping09:11
bigjoolsok09:11
fwereade__bigjools, I am not aware of any restrictions on the format of instance-id -- I don't think we ever try to parse them09:12
fwereade__bigjools, so the only relevant property of instance-id is that it affords a convenient way to talk to the provider about a specific instance09:12
fwereade__bigjools, system-id was that (or near enough) in the orchestra provider, which is why I suggested that it should be the case here09:13
bigjoolsoh hmmm09:13
bigjoolsnot sure the checkout worked ok from cloud-init09:14
fwereade__bigjools, if you have enough information to construct a resource-uri given (1) a system-id and (2) the maas provider details09:14
bigjoolsbzr: ERROR: A control directory already exists: "file:///usr/lib/juju/juju/".09:14
fwereade__bigjools, huh, not seen that, maybe it's just reacting to droppings from a previous attempt?09:15
bigjoolsyes09:15
bigjoolsI neglected to wipe properly09:15
fwereade__bigjools, anyway , if you *can* construct the uri given system-id then it might make sense to keep system-id, but I don't have a firm handle on whether or not that's actually a good idea09:16
bigjoolsfwereade__: well this is how it was originally, right?09:16
bigjoolsI was setting the machine_id as the resource_uri09:16
fwereade__bigjools, yeah but if it's not the best fit for the problem it should change09:17
bigjoolsstill confused tbh since I don't know what's going on in the depths09:17
fwereade__bigjools, the problem is that the MaaSMachine thinks system_id is the instance id, while other parts of the code think that resource_uri is09:17
bigjoolswhat is it doing with the machine_id later?09:17
bigjoolse_toomanyids09:18
fwereade__bigjools, machine id is I think a red herring here09:18
fwereade__haha09:18
bigjoolsso what is: cloud_init.set_instance_id_accessor() doing? I thought it set machine_id?09:18
fwereade__bigjools, nope: instance_id09:19
bigjoolsso its name has a clue :)09:19
fwereade__bigjools, the clue's in the name :p09:19
bigjoolswhen instance_id is looked up later, how is it used?09:19
fwereade__bigjools, give me a mo, double-checking09:20
fwereade__bigjools, it's only actually used by the provisioning agent AFAICT09:21
fwereade__bigjools, the only reason it intrudes on your consciousness at all is because we need to fake up initial state on bootstrap, to say "machine id 0 is already provisioned on instance id WHATEVER", and prevent the PA from trying to provision itself09:22
fwereade__bigjools, that is done by `juju-admin initialize` -- grep for that and you should see how set_instance_id_accessor is relevant09:24
bigjoolsfwereade__: sorry, total PC lockup :/09:26
fwereade__bigjools, np09:26
bigjoolsI have a call in 4 minutes09:26
fwereade__<fwereade__> bigjools, give me a mo, double-checking09:26
fwereade__ bigjools, it's only actually used by the provisioning agent AFAICT09:26
fwereade__ bigjools, the only reason it intrudes on your consciousness at all is because we need to fake up initial state on bootstrap, to say "machine id 0 is already provisioned on instance id WHATEVER", and prevent the PA from trying to provision itself09:26
fwereade__<-- bigjools has quit (Read error: Connection reset by peer)09:26
fwereade__<fwereade__> bigjools, that is done by `juju-admin initialize` -- grep for that and you should see how set_instance_id_accessor is relevant09:26
fwereade__bigjools, I should still be around afterwards unless it's *really* long09:27
bigjools20 mins09:27
fwereade__bigjools, just grab me when you're free then :)09:27
bigjoolsok thanks09:27
fwereade__rog, thinking about your review09:28
rogfwereade__: cool, thanks09:28
fwereade__rog, there are quite a lot of tests that start by Initializing a State09:29
fwereade__rog, and the required data for initialization will become more complicated09:29
fwereade__rog, so we will at some stage want a testing.InitializeState(addrs string) function, but maybe it's not justified yet09:30
fwereade__rog, OTOH when we do need it, if it already exists, it'll be just one place to change09:30
rogfwereade__: i think i'd leave that until we need it09:30
rogfwereade__: it's trivial to find occurrences and to add09:30
rogfwereade__: in fact, won't Initialize need to take an addrs method?09:30
rogs/method/argument/09:31
fwereade__rog, it already does (implicitly, in the Info), I think09:31
rogfwereade__: ah, so what would testing.InitializeState give us?09:31
fwereade__rog, but the eventual reuiqred args to Initialize will be more complicated than to Open09:31
rogfwereade__: ok. what other stuff will it have?09:32
fwereade__rog, at the very least we need the instance id, to set up the state I've been talking to bigjools about09:32
fwereade__rog, and I'm 99% sure that we'll end up passing in the environment settings too, imminently09:32
fwereade__rog, like must-be-done-for-12.04-imminently09:32
fwereade__rog, maybe that's not too much to duplicate09:33
fwereade__rog, after all, dummy provider env settings are going to be basically empty09:33
rogInitialize is only called in three places AFAICS. when the duplication becomes a burden we can factor it out.09:34
rogfwereade__: for now, let's not add stuff that we don't need.09:34
TheMuefwereade__: just one question after reading niemeyers comment to my last proposal: when i've got two pingers pinging the same node and i say one to kill its work, the second one will recreate the node, doesn't it?09:34
fwereade__rog, so it will probably be `(info *Info, instanceId, providerType string)`09:34
fwereade__TheMue, it should do, but 2 pingers on the same node is Doing It Wrong09:35
rogTheMue: there should never be two pingers pinging the same node :-)09:35
fwereade__TheMue, what are you trying to accomplish?09:35
fwereade__rog, (yes indeed, it's not called for, ty for discussing :))09:37
fwereade__rog, why a 3 minute timeout?09:40
TheMuefwereade__: niemeyer found a problem with retrieving an instance of Agent() twice09:40
fwereade__TheMue, go on09:41
TheMuefwereade__: you get two diffent instances then today09:41
TheMuefwereade__: which is, when keeping a pinger inside, indeed isn't good09:41
fwereade__TheMue, yeah, makes sense; I thought you were taking the pinger out anyway?09:42
TheMuefwereade__: on the other hand he suggested an api change to return a pinger with agent.StartPinger()09:42
rogfwereade__: because it takes about 2 minutes to boot, and 3 minutes seemed long enough for the zk node to be inited after boot (maybe it's not and that's why my test is failing). the test harness fails after 6 minutes.09:42
TheMuefwereade__: but here the problem stays the same09:42
rogTheMue: there's no problem if the agent doesn't cache the pinger09:43
rogTheMue: i think09:43
TheMuerog: it's exact the same problem09:43
rogTheMue: what's the problem?09:43
TheMuerog: in both cases it's an illegal usage of the api09:43
fwereade__rog, I *think* that we have 2 interesting cases: on the instance, if any code is running before initialize is complete we Have A Problem09:44
TheMuerog: if i create two agent instances or two pinger instance, both is wrong09:44
rogTheMue: i don't think you can stop that. it's a distributed system.09:44
fwereade__rog, and if we're connecting from outside I think we want to wait forever and let the user interrupt us09:44
TheMuerog: Pinger has the method Kill()09:44
fwereade__TheMue, why would you ever create 2 pingers for the same node anyway?09:44
rogTheMue: that's fine. that's to kill that particular pinger09:45
TheMuefwereade__: ask niemeyer why one would create two agent for the same unit anyway09:45
rogfwereade__: ok. i added the timeout as an afterthought because my test was timing out after 6 minutes. but maybe that was correct, and i should just up the test harness timeout time.09:46
fwereade__TheMue, why is agent different to any other state class? you can have N state.Units referring to the same ZK state and that shouldn't be a problem09:46
TheMuefwereade__: i only say that, if the one way is an error, that error won't move away by returning the pinger09:46
TheMuefwereade__: so it should be with pinger too09:46
fwereade__rog, I'm not *sure* that my analysis is correct, give it a bit of a mental kicking09:46
fwereade__TheMue, it's always possible to write code that does the wrong thing09:46
rogTheMue: returning the pinger seems good to me. it means that the Agent doesn't need to keep track of that state - it's less code and no less correct IMHO09:47
fwereade__TheMue, in practice the unit agent process will call StartPinger once and only once, and that's it09:47
fwereade__TheMue, and the agent process itself will decide when it needs a Stop/Kill09:47
rogfwereade__: how long does the bootstrap node take to come up and be usable, usually?09:47
TheMuefwereade__: so why return the pinger?09:47
fwereade__rog, I've never actually measured it09:48
rog[09:47] <rog> TheMue: returning the pinger seems good to me. it means that the Agent doesn't need to keep track of that state - it's less code and no less correct IMHO09:48
rogfwereade__: maybe i'll take the timeout out again.09:48
fwereade__rog, it may be there's some case I missed09:49
fwereade__TheMue, what rog said :)09:49
rogfwereade__: no, i think you're right. i guess i thought that three minutes waiting after zk connect *should* be fine. surely we don't take that long to start up the juju init command after starting zookeeperd?09:50
fwereade__TheMue, it may be we have some disconnect on how we expect state.Agent to be used?09:50
TheMuerog: i only have the poor maintainer, new to the code, in 2 years in my eyes. asking state to give a unit, asking unit to give an agent, asking agent to start a pinger (why a pinger, i'm only interested to signal that the agent is alive, so what does a pinger has to do with it?) and then keep the pinger09:50
rogTheMue: that's what a pinger *does*.09:51
fwereade__rog, that sounds right09:51
rogTheMue: (i wasn't happy with the name "Pinger" (i preferred "Occupy" and "Occupied" but gustavo's choice)09:51
rog)09:52
TheMuerog: it's a technological description how it works. but when i drive a car i'm not interested on how the motor works, i wonna drive a car from a to b09:52
rogTheMue: i know that09:52
rogTheMue: but that's a debate to have about Pinger, i think.09:52
TheMuerog: my intention is to hide HOW we do something but to tell WHY we do it09:52
fwereade__TheMue, yeah, I liked Occupy too09:52
rogTheMue: if StartPinger was called "Occupy", would you be happier?09:53
TheMuerog: the pinger is a fine tool, i only have the opinion that i have to keep the too inside to provide a clean api regarding agent (and later anything else) for the user of this api09:54
rogTheMue: the pinger is *the* tool for detecting and signalling agent occupation09:54
fwereade__rog, TheMue: `RegisterPresence() (*presence.Pinger, error)`?09:54
TheMuerog: yes, this way it makes more sense09:54
rogfwereade__: yeah, that would be fine for me.09:55
TheMuefwereade__: i still wouldn't return the pinger. i would hide it.09:55
rogTheMue: why hide it?09:55
rog[09:54] <rog> TheMue: the pinger is *the* tool for detecting and signalling agent occupation09:55
bigjoolsfwereade__: ok so I'm free now09:56
rogTheMue: we've built this abstraction, why not use it as is?09:56
TheMuerog: in this case the name isn't optimal09:56
rogTheMue: otherwise perhaps we should build it slightly differently, so we *can* use it as is.09:56
fwereade__TheMue, the trouble is that it ends up making state.Agent unique among state.FOOs in that it's not something you can reconstruct safely from a fresh state with nothing but keys09:56
fwereade__bigjools, where were we? was I making sense? ;)09:57
bigjoolsfwereade__: unfortunately not :)09:57
rogTheMue: i don't think we should get hung up on the name.09:57
TheMuefwereade__: that's why i wanted to embed it. btw, now the pinger (or better the AgentOccupier) is special too.09:57
bigjoolsbut I need to re-establish my test env, so I'll be a few mins09:57
fwereade__bigjools, heh, ok: did I ever start making sense, or was there a specific point where I started babbling crackfully?09:58
fwereade__TheMue, pinger is not just for agents09:58
rogTheMue: i don't see that hiding it gains anything.09:58
bigjoolsfwereade__: it's not you, more that I don't really understand what's going on inside juju when it deploys stuff09:58
fwereade__rog, I think that hiding it keeps the name out of the way, and the name exposes the implementation too much for comfort09:59
TheMuefwereade__: that's ok, so i understood it first. that's why i wanted to encapsulate it for agent, so that the agent api is clear09:59
fwereade__bigjools, ultra-high-level sketch:10:00
rogfwereade__: because it's called "Pinger" rather than "Occupier"?10:00
fwereade__bigjools, the user makes changes to an "ideal" state stored in ZK and the PA starts/kills machines in response to changes in the ideal state10:01
fwereade__rog, exactly (or some other name, whatever ;))10:01
rogi do think that "Pinger" is an unfortunate name because it implies polling, and we might use some other technique in the future. but...10:01
fwereade__bigjools, that's the steady state and it's pretty simple really (devil in details ofc)10:01
rogi think that that package is exactly the right place for the thing returned from an Agent.10:01
fwereade__bigjools, the ugliness comes at bootstrap time10:02
fwereade__rog, agreed10:02
rogTheMue, fwereade__: if we're writing more code just to hide a name that we've only just invented, let's just change the name!10:02
rogTheMue: but you can be the one to persuade gustavo :-)10:03
fwereade__bigjools, the PA is responsible for making sure that the machines which should exist do exist; and machine 0 is just another part of the environment, we don't want to have to treat it specially10:03
TheMuerog: simply changing the name if it's still a multipurpose tool isn't it10:04
TheMuerog: i'm talking about encapsulation and api design10:04
fwereade__bigjools, so before we let the PA look at state, we prime the state such that it sees "machine 0 is meant to be provisioned... and, hey, it already is"10:04
rogTheMue: it's a multipurpose tool that is designed for signalling presence on whatever underlying storage system we're using. that's *exactly* what the agent presence stuff is about.10:05
fwereade__bigjools, doing so involves storing the instance id and the machine id together10:05
fwereade__bigjools, hence the requirement for instance_id at bootstrap time10:05
rogTheMue: so it seems perfect that it's that that's returned from Agent.10:05
fwereade__bigjools, instance id should in all other circumstances purely be an internal detail10:05
rogTheMue: we're adding more layers of abstraction "just in case", but YAGNI!10:05
fwereade__rog, TheMue: strongly agree that it's not up to state.Agent to stop the pinger10:06
fwereade__bigjools, but sadly you need to deal with it at bootstrap time10:06
TheMuefwereade__: indeed not, it's up to the user of agent (he has got his pinger from agent) to also use it to signal "hey, it's me, the agent, i'm stopping".10:07
bigjoolsfwereade__: the traceback is from a "deploy" though10:07
fwereade__bigjools: huh, sorry, let me reread10:08
rogTheMue: that sounds right to me10:08
fwereade__bigjools, right, sorry: the trouble is that you're still using 2 different notions of instance_id10:08
rogTheMue: pinger := unit.Agent().StartPinger(); .... pinger.Kill()10:08
fwereade__bigjools, either always use system_id, or always use resource_uri10:08
bigjoolsfwereade__: where am I using those?10:09
TheMuerog: so to me an api like agentAPI.SignalWork() and agentAPI.SignalEndOfWork() sounds more natural10:09
bigjoolsjust launch.py10:09
bigjools?10:09
rogTheMue: that makes the agentAPI stateful, which it doesn't need to be10:09
fwereade__bigjools, (1) MaaSMachine turns system_id into MaaSMachine.instance_id10:09
rogTheMue: the state can live in the pinger.10:09
fwereade__bigjools, (2) the provider takes instance_ids in some methods10:09
TheMuerog: the pinger IS stateful, and the pinger IS part of the state api today10:10
fwereade__bigjools, (3) you also need to set one at bootstrap time10:10
fwereade__bigjools, I think that's it10:10
rogTheMue: there are two places that are stateful: the underlying zk tree, and the local pinger state. you'd be adding a third.10:10
fwereade__bigjools, (4) it'll be *stored* using a juju.state.machine.MachineState but you should be 100% insulated from that detail10:10
TheMuerog: i don't add one, i only hide the already existing one to the user f the agent api10:11
bigjoolsfwereade__: is this provider api documented anywhere?  are there instructions on adding providers?10:11
rogTheMue: you do add one. you store whether the pinger has already started so agentAPI.SignalWork can return an error if so.10:11
TheMuerog: instead of keeping an instance of the pinger he kann also keen an instance of the agent, it's the same10:11
bigjoolsbecause so far I scratched around10:12
bigjoolsand clearly my understanding of things isn't right10:12
TheMuerog: so where's the difference?10:12
fwereade__bigjools, only the stuff in juju.providers.common.base; I'm sorry if it's unhelpful :(10:12
fwereade__TheMue, but why make state.Agent special in this way?10:12
bigjoolsfwereade__: not entirely unhelpful, but a bit lacking :)10:13
rogTheMue: it's not needed. and there's nothing stopping you (possibly on another machine) creating another State and another Agent and (erroneously) starting another pinger on that.10:13
fwereade__TheMue, every other state.Foo is safe to grab a fresh instance of at any time10:13
rogTheMue: just checking local usage isn't sufficient.10:13
fwereade__TheMue, making state.Agent special even in this one circumstance feels like a bad move10:13
bigjoolsfwereade__: I'm really sorry, I am still struggling to get my head around all this ID stuff.  It's massively confusing :(10:14
fwereade__bigjools, I definitely remember it was a pain for a while :(10:14
bigjoolsnot sure where to go from here10:14
TheMuerog: so nothing stops you to create a new unit, get a new agent, create a second pinger. it's the same mistake, the same wrong usage. as you told yourself above.10:14
rogTheMue: sure.10:15
TheMuefwereade__: pinger is alsready special10:15
fwereade__TheMue, then why allow the specialness to affect Agent?10:15
fwereade__bigjools, ok, mechanical solution10:15
fwereade__bigjools, grep the maas provider for uses of instance_id10:16
bigjoolsok10:16
fwereade__bigjools, ensure that they all come from consistent sources -- they should all either be coming from a system_id, or from a resource_uri10:16
fwereade__bigjools, it seems most of the provider was written to expect resource_uri10:17
TheMuefwereade__: as told above, only to keep together what belongs together. and not to tell a "start this" but later b "stop this". instead i wonna have it bundled a clean way a "start this" and later a "stop this"10:17
fwereade__TheMue, that's exactly what Pinger offers10:18
TheMuefwereade__: no, the design after the last review shall be Agent.Start and Pinger.Stop10:19
TheMuefwereade__: that's pain to a maintainer later who is not today invoked in the design process10:19
fwereade__TheMue, it's pinger := agent.StartPinger() and pinger.Stop(), which doesn't apear to me to be unclear10:19
fwereade__TheMue, were it agent.Start and pinger.Stop I'd agree10:20
fwereade__bigjools, does that help at all?10:20
fwereade__bigjools, it *should* just be s/system_id/resource_uri/ in maas.machine, and maas.launch10:21
TheMuefwereade__: it's StartPinger(), indeed, i only shortened it10:21
bigjoolsfwereade__: ok let me digest that10:21
fwereade__TheMue, that may be the problem, it's not actually starting the agent at all10:22
TheMuefwereade__: but where is the problem with agent.Occupy() and agent.Release()?10:22
fwereade__TheMue, I have *no* problem with it, that was proposed10:22
fwereade__TheMue, niemeyer didn't like it10:22
fwereade__TheMue, I think he may have suspected me of political activism10:22
fwereade__TheMue, OCCUPY JUJU10:22
TheMuefwereade__: hehe10:23
fwereade__TheMue, ah wait sorry10:23
TheMuefwereade__: but i also proposed different names.10:23
fwereade__TheMue, I *do* have a problem with agent.occupy/release10:23
fwereade__TheMue, but I've tried to explain it -- the mismatch with the other state types -- and I think I'm not communicating it well10:24
bigjoolsfwereade__: I don't understand what you mean by " ensure that they all come from consistent sources"10:24
fwereade__TheMue, or possibly you disagree?10:24
TheMuefwereade__: who will start the pinger and what is his intention behind it?10:24
fwereade__bigjools, 1 sec10:24
rogfwereade__, TheMue: for me, it's just less code and easier to get correct if you return the pinger.10:24
rogwhich is usually swings the argument for me.10:25
fwereade__TheMue, the agent process will start it, and the agent process will know how it means to stop and if/how it should kill the pinger10:25
fwereade__TheMue, I don;t think stop/kill is a sensible or meaningful distinction on the *agent* itself10:25
TheMuerog: and once we chage to a different method every code which isn't interested in a pinger but in a working agent functionality has to be changed?10:26
fwereade__TheMue, I'd even say that the only reaosn to put StartPinger on there is so that we can keep the knowledge of the actual agent presence path entirely internal10:26
TheMuefwereade__: but WHY does the agent start the pinger? what's his intention?10:26
rogTheMue: what do you mean by "a different method"?10:26
fwereade__TheMue, to signal presence... is that a trick question? ;)10:27
TheMuefwereade__: ah, so a better name than Occupy() may be SignalPresence()?10:27
fwereade__TheMue, I suggested something like that above10:27
TheMuefwereade__: exactly ;) just a rhetoric way10:27
fwereade__TheMue, RegisterPresence maybe?10:27
rog[09:54] <fwereade__> rog, TheMue: `RegisterPresence() (*presence.Pinger, error)`?10:27
rog:-)10:27
fwereade__TheMue, I still feel that returning a PresenceRegisterer (eww) is the right thing10:27
fwereade__TheMue, for now we have just one and we call it a Pinger10:28
rogfwereade__: yeah.10:28
rogfwereade__: seems ok to me.10:28
fwereade__TheMue, it exposes an implementation detail but not gratuitously10:28
TheMuerog: yep, only wanted to come back after Occupy()10:28
fwereade__TheMue, we could handle it as a KillerStopper interface if we had any reason to worry about implementation changes10:29
rogfwereade__: it's a pity it's called Pinger, but if you half-close your eyes and imagine it's called something less gratuitously implementation-specific, i think it works.10:29
TheMuefwereade__: i'm no friend of exposing implementation details10:29
fwereade__TheMue, but for now, an extra interface is gratuitous10:29
TheMuefwereade__: i prefer a symetric handling where i can also tell agent to DeregisterPresence() regardless of the used underlying technology10:30
rogwe've designed an interface (presence) that is specifically about agent occupation. if we feel we have to hide it because it's "too implementation-specific" then we've got it wrong.10:30
fwereade__TheMue, I may have to bow out and suggest you propose a name change to niemeyer; you're in a better position to advocate for it than I am, you're an actual client10:30
fwereade__bigjools, sorry :)10:30
TheMuerog: above you told that it's not only for agent10:31
fwereade__bigjools, I *think* there are only 2 places where instance_ids enter the system10:31
rogTheMue: that was fwereade__ i think10:31
bigjoolsfwereade__: I think the basic problem is that it's unclear as to what exactly each provider's API is in terms of what data goes in and out10:32
fwereade__bigjools, one of them is in MaaSMachine, where it passes d["system_id"] up to ProviderMachine.__init__10:32
rogTheMue: for now, why don't we go with the KISS approach? we're using a statically typed language - this stuff can easily be changed in the future.10:32
bigjoolsso your s/system_id/resource_uri/ still doesn't make sense to me. :(10:33
fwereade__bigjools, the other one is CloudInit.set_instance_id_accessor10:33
rogTheMue: (if we feel it's necessary)10:33
rogTheMue: it's not like we'll have lots of clients using the Agent type.10:33
fwereade__bigjools, when the PA is looking at machines it looks them up by machine id and gets back data that includes the instance id10:33
fwereade__bigjools, to interact with the provider, it uses the instance id10:34
fwereade__bigjools, the only places that information enters the system are (1) set_instance_id_accessor and (2) MaaSMachine.__init__10:34
fwereade__bigjools, and that information is passed back into provider methods that take instance ids (or possibly MaaSMachines, which themselves have instance_ids set)10:35
fwereade__bigjools, so whatever you set in .machine and .launch will get passed back into the provider as instance_id10:35
fwereade__bigjools, oh, hold on, I need to look more closely at the maas provider10:36
TheMuerog: maybe you're right. but i only have my experiences in maintaining legacy code, also in static languages. and also juju once will be legacy.10:36
bigjoolsfwereade__: holding10:37
fwereade__bigjools, ha: juju/providers/maas/tests/test_maas.py|64| # TODO: Add test for get_nodes with a system_id parameter.10:37
bigjools:/10:37
TheMuerog: but indeed, maybe it isn't woth it, maybe Unit and Machine should use pinger directly instead of an extra Agent. the old mixin approach leaded into a wrong direction.10:38
rogTheMue: that was my thought originally, which is why i suggested you put this branch on hold for a while. but still, i think it does add some useful encapsulation, so it's still useful.10:39
fwereade__bigjools, ok, I was confused (er, and may still be)10:40
fwereade__bigjools, correct me if I'm wrong10:40
fwereade__bigjools, scratch that10:41
fwereade__bigjools, look at MaaSClient10:41
fwereade__bigjools, get_nodes expects system ids10:41
rog[08:37] <fwereade__> rogpeppe, as long as we don't end up *merging* broken stuff I'm fine with that :)10:41
fwereade__bigjools, start, stop, release expect resource uris10:41
rogfwereade__: in this case, the broken stuff is already merged...10:41
fwereade__bigjools, juju is passing its idea of "instance_id" in in both those cases10:42
rogfwereade__: (i just had a test time out after 20 minutes - i'm sure the zk should have been initialised within that time scale)10:42
fwereade__rog, yeah, my instinct says that the Right Thing to do if not inited is context dependent10:43
fwereade__rog, agents should just throw a hissy fit10:43
rogfwereade__: yeah, i think you're right.10:43
bigjoolsfwereade__: hummm10:43
rogfwereade__: hmm, actually maybe that means that agents shouldn't wait at all10:43
fwereade__rog, command line tools may want to time out sensibly, or just wait for ^C10:44
rogfwereade__: maybe that means we should have Open and WaitOpen10:44
fwereade__rog, I think that may be the case10:44
fwereade__rog, +0.5, thinking10:44
rogfwereade__: and perhaps WaitOpen could have a timeout duration argument10:44
fwereade__rog, for now I'd leave it; not hard to add once  we need it10:45
rogfwereade__: k10:45
fwereade__rog, re broken stuff, I think I missed the import10:45
rogfwereade__: ?10:46
fwereade__rog, if it makes things *worse* we should merge it at the same time as some other branch that also fixes it10:46
fwereade__rog, if it is a step towards making a broken thing less broken it's fine10:46
rogfwereade__: it *exposes* some existing brokenness.10:47
fwereade__rog, (er, "import" in the sense of "importance")10:47
roglol10:47
rogfwereade__: i.e. it seems that the existing bootstrap stuff isn't actually succeeding in getting juju init to be called.10:47
fwereade__rog, in that case I would prefer that it not be merged without another branch that either hides or fixes the brokenness10:48
rogfwereade__: which i wasn't testing for before10:48
fwereade__rog, Initialize doesn't work itself, so this is all somewhat moot10:48
rogfwereade__: given that amazon tests weren't working until one merge ago, i don't think it's too bad to regress for a small while.10:48
rogfwereade__: (i doubt anyone besides myself has actually run that test :-])10:49
rogfwereade__: why moot?10:49
rogfwereade__: we're interacting with the python backend for the time being.10:49
fwereade__rog, ha true, forget I said anything10:50
fwereade__rog, I'll leave it to your judgment then :)10:50
fwereade__bigjools, how painful will it be to fix that?10:52
bigjoolsfwereade__: I can't deal with this. I am tired and frustrated and about to pull all my hair out.  I am going to punt, but I don't know where10:52
bigjoolsfwereade__: I can see there might be a mismatch of ID types but none of them make any sense to me still10:53
fwereade__bigjools, you *should* only have to think about one at a time10:53
bigjoolsfwereade__: this is one situation where I need to be in a room with someone10:53
fwereade__bigjools, if it's not instance_id, you can ignore it10:53
fwereade__bigjools, if it *is* instance_id, it needs either to *always* mean system_id, or *always* mean resource_uri10:54
rogfwereade__, bigjools: maybe this is a time that G+ hangouts with extras (i.e. screen sharing) could be useful?10:54
fwereade__bigjools, I misled you by asking you to fiddle with the input data (which did need to be done) while neglecting to notice the deeper internal mismatch in MaaSClient10:55
bigjoolsquite possibly, but it's almost 9pm here and I am out of brain juice :(10:55
fwereade__bigjools, let me think a mo10:56
fwereade__bigjools, is the maas api documented at all?10:56
bigjoolsyes, I need to find a URL for you10:57
bigjoolsfwereade__: http://people.canonical.com/~gavin/docs/lp:maas/api.html10:58
bigjoolsit needs regenerating, slightly out of date10:58
fwereade__bigjools, if you get me that and point me to the tree you're working from I will take a quick look at it and see if I can either tell you what you need to do (or do a sketch of it myself if it's trivial enough)10:59
bigjoolsfwereade__: I've not made any significant changes yet, still trying to work things out10:59
fwereade__bigjools, without an actual maas provider to test against it will be just a sketch though I think10:59
bigjoolsfwereade__: the api you can infer from the calls it's making in maas.py.  It's restful.11:00
fwereade__bigjools, cool, I will see what I can do :)11:01
bigjoolswhich is why we need to use a combination of resource_uri and system_id :/11:01
bigjools(e.g. get_nodes takes system_ids because there's no resource_uri as such)11:01
bigjoolsso I can see where that needs fixing up11:01
fwereade__bigjools, hr'm, I see the problem11:01
fwereade__bigjools, it sounds like there's not much that can be done on the juju side until that's in place11:02
fwereade__bigjools, bah11:02
bigjoolsindeed11:02
=== TheMue_ is now known as TheMue
fwereade__blast, I put some toast on 10 mins ago, just a mo11:02
bigjoolsfwereade__: I need to rest, migraine coming.  I'll hand this over to Gavin as he's working with me here11:03
fwereade__bigjools, sorry I wasn't more help :(11:06
fwereade__bigjools, tell him to ping me if he needs11:06
rogfwereade__: i decided to add WaitOpen, since you mentioned it in your review :-)11:18
fwereade__rog, cool11:48
fwereade__rog, https://codereview.appspot.com/583204511:48
fwereade__rog, TheMue, that's me for the day (public holiday )11:59
rogfwereade__: LGTM11:59
rogfwereade__: cool, have fun!11:59
TheMuefwereade__: enjoy12:09
niemeyerGood mornings!12:49
rogniemeyer: yo!12:59
TheMueniemeyer: moin13:03
rogTheMue: there's a review for you, BTW: https://codereview.appspot.com/5853048/13:10
TheMue*click*13:10
roglc13:12
TheMuerog: lgtm, only one minor note13:17
rogTheMue: hmm, i don't know about the readability - i read '2e9' as 2 seconds. but i can't do 0.2 * time.Second, unfortunately.13:18
rogTheMue: but i'll change it anyway13:19
TheMuerog: yeah, it sadly only wants ints. maybe time.Second/5 ;)13:20
rog:-)13:21
niemeyerrog: We went over that before, IIRC.. I don't think we need Open and WaitOpen(..., timeout)13:31
rogniemeyer: i had a discussion with william this morning13:32
rogniemeyer: he thought we did...13:32
rogniemeyer: so i changed it to be like that13:32
niemeyerrog: Why?13:32
* rog goes to copy the code review comment he's making13:32
rogthat's what i had until this morning's discussion with william. we decided that the length of time to wait was quite consumer-dependent. in particular agents shouldn't wait at all, because they should never be started before zk is initialised. on the other hand there's not necessarily any right answer to how long to wait after bootup (i had it at 3 minutes, but william queried it, rightly i think).13:32
rogniemeyer: our discussion is here (http://irclogs.ubuntu.com/2012/03/19/%23juju-dev.html) from 094013:33
niemeyerrog: Ok, I don't think that's the case..13:34
rogniemeyer: ok, i can easily revert.13:34
niemeyerrog: We can start agents in the same machine, or even in distributed machines, and those should wait13:34
niemeyerrog: The initialization mechanism is there precisely to avoid silly races13:34
rogniemeyer: when would we ever start an agent before calling juju-init ?13:35
niemeyerrog: Between starting ZooKeeper and it having its data initialized13:35
rogniemeyer: why would we do that?13:35
niemeyerrog: ZooKeeper is started by init13:35
niemeyerrog: At package installation time13:35
rogniemeyer: but juju-init isn't called then, is it?13:35
niemeyerrog: Maybe not, but still, there's no point13:35
rogniemeyer: i think it's trivial to assure that juju-init is *always* called before any agent is started13:36
rogniemeyer: and then the timeout serves no purpose13:36
niemeyerrog: So why are you adding it?13:36
rogniemeyer: for the client code.13:36
niemeyerrog: Yeah, so it is useful.. what you're saying is that there are many cases where we won't have a race13:37
rogniemeyer: yes.13:37
niemeyerrog: what I'm saying is that it's pointless to consciously remove a race protection because you know that in some cases you don' t have it13:37
niemeyerrog: It's like dropping a mutex because you know that in some cases you're calling the function serially13:37
rogniemeyer: we're still protected - the function will return with an error.13:38
niemeyerrog: Ok.. what's the benefit of removing the race protection?13:38
niemeyerrog: That's not protection.. that's a crash13:38
rogniemeyer: it means that if we *do* fail to juju-init first, we'll fail early13:38
niemeyerrog: Sure, so let's not fail it!13:39
rogniemeyer: it'll fail anyway, just 2 or 3 or 10 minutes later.13:39
niemeyerrog: I'll repeat: there's no point in preventing a mechanism that avoids a race.13:40
niemeyerrog: If juju-init doesn't run, it's a serious bug, and we'll know about it 3 minutes later.13:40
rogniemeyer: ok. what's a good timeout, BTW.13:40
rog?13:40
niemeyerrog: I'd rather have a single function, that is protected by default.13:41
niemeyerrog: 3 minutes is fine for me.13:41
rogniemeyer: ok. as i said, that's what i had before - i was just trying to represent fwereade__'s side of the argument...13:41
niemeyerrog: There's no point in us thinking "Oh, do I have a race here? Do I use Open or WaitOpen?  How long do I use on WaitOpen?" on *every single call* of Open.13:41
rogniemeyer: i had the same thought about DNSName...13:42
niemeyerrog: It's the exact opposite case.13:42
niemeyerrog: Open should generally complete in seconds under normally working circumstances.13:43
niemeyerrog: Because whoever is starting zookeeper should initialize it shortly afterwards.13:43
niemeyerrog: We're just closing that window.13:43
rogniemeyer: ok.13:43
rogniemeyer: PTAL13:48
niemeyerrog: Checking13:52
niemeyerrog: Review sent14:04
rogniemeyer: what do you think about time.Sleep(0.2e9) vs time.Sleep(200 * time.Millisecond) ?14:11
niemeyerrog: Hmm.. good question14:12
niemeyerrog: The latter might be friendlier to newcomers. I grew used to the 0.2e9 or 2e8 notation, so I can quickly see what it means, but maybe it'd be wise to make it more readable.. do you have an opinion?14:13
rogniemeyer: i also find it easy to read, but TheMue suggested less terse notation was more readable.14:14
rogniemeyer: when it's used with time.Sleep i think the raw number is fine14:14
rogniemeyer: with other calls, it's probably best to use explicit duration constants14:15
niemeyerrog: That'd be a strange line to draw..14:15
rogniemeyer: it's just a pity you can't use 0.2 * time.Second :-)14:15
rogniemeyer: you're probably right14:15
niemeyerrog: Yeah, it's a bit unfortunate indeed.. it should work as long as the division results in an integer14:16
rogniemeyer: i don't think it does, because a floating point constant can't multiply a non-ideal constant14:17
rogniemeyer: (perhaps that should be allowed in fact)14:18
niemeyerrog: It really should14:18
TheMuerog: tried time.Second/5?14:20
rogTheMue: i'd prefer not to do that.14:20
rogTheMue: 200 * time.Milliseconds is better14:21
TheMuerog: especially it reads even worth.14:21
TheMuerog: yep, definitely14:21
rogniemeyer: done14:27
niemeyerrog: LGTM14:28
rogniemeyer: thanks14:28
niemeyerrog: I hope you like the result as well14:29
rogniemeyer: yeah, i was a little bit unhappy about adding WaitOpen earlier, but i was persuaded it was a good idea. you persuaded me back :-)14:29
niemeyerrog: Hehe :)14:29
rogniemeyer: and i was thinking about putting the initialize code in open anyway; it was a coin toss.14:30
rogniemeyer: BTW this has exposed a bug in the environs/ec2 code - the juju init code never gets run, it seems. so the amazon test is currently broken.14:31
niemeyerrog: Hmm14:31
niemeyerrog: I suppose that's due to the lack of running the initialization code via cloud-init?14:31
rogniemeyer: i'm guessing so14:31
niemeyerrog: Cool, that should be easy to fix14:32
rogniemeyer: my next branch hooks juju logging up to gocheck.C14:32
rogniemeyer: then i'll get into the init code.14:32
niemeyerAh, cool14:32
TheMueniemeyer: after some discussion about the Agent, its usage ony in Unit and Machine and the role of presence and the Pinger i today would say there's no need for the Agent type anymore. instead Unit and Machine should use presence directly.14:36
niemeyerTheMue: Hmm, that might be a nice improvement indeed.. have you tried it out to see how it looks yet?14:40
TheMueniemeyer: i wrote some comments after your review and discussed them with rog and fwereade__14:41
TheMueniemeyer: i'm still not really happy with "StartPinger()". to me it doesn't really says something about the semantics behind it.14:43
TheMueniemeyer: it's more a description on "how" something is done (by activily pinging a node).14:43
niemeyerTheMue: The semantics is that it "starts a pinger" for the respective agent.14:43
niemeyerTheMue: This feels a lot more clear in that regard than "Connect"14:43
TheMueniemeyer: i would prefer something like "SignalPresence()", because it signals the presence of the agent14:44
TheMueniemeyer: yeah, Connect() has been a bad choice, indeed14:44
* rog is a bit wistful for "Occupy" :-)14:45
niemeyerTheMue: we already have presence.StartPinger.. let's please stick to the same terminology.14:45
rogniemeyer: a review for you. https://codereview.appspot.com/584106714:46
rogniemeyer: as i say, i'm not entirely convinced it's right.14:46
rogniemeyer: i'm off for lunch.14:46
niemeyerrog: Enjoy14:46
TheMueniemeyer: IMHO it's states task to build a higher abstraction on the state model. so StartPinger() in the context of presence ("Is node X present?") ok, but from states view I woild expect a different naming.14:49
niemeyerTheMue: If I see pinger, err := unit.StartPinger(), I know what that means.14:50
niemeyerTheMue: Why introducing another term for the same concept?14:51
TheMueniemeyer: you, yes, because you're currently invoked in the implementation. how about the poor maintainer, new to the team in two years from now? asking himself whiy he has to call "StartPinger()" to signal, that the agent is now alive?14:52
niemeyerTheMue: Yeah, maybe..14:56
robbiewTheMue: rog: was there any progress made on you being able to access allhands.canonical.com?  Or should I nag HR ;)14:57
niemeyerTheMue: StartAlivePinger?14:58
TheMuerobbiew: Esther already talked to me. She passed it to IS.14:58
robbiewTheMue: ack, thx14:58
TheMueniemeyer: Yeah, this way it gets more clear.14:59
niemeyerTheMue: SetAgentAlive might be even better14:59
TheMueniemeyer: So we wout have "AgentAlive()", "WaitAgentAlive()" and "SetAgentAlive()" directly on "Unit" and "Machine", "Set…" is returning a "Pinger".15:03
TheMueniemeyer: Sounds like a plan. ;)15:03
niemeyerTheMue: Cool, +115:04
niemeyerLunch, biab16:05
rogrobbiew: just tried it and it worked this time!16:13
robbiewrog: awesome16:13
robbiewlet me know when you've submitted, so I can sign..then you can countersign and start the review ;)16:14
rogrobbiew: done16:24
robbiewrog: ..and right back at 'cha ;)16:25
robbiewthnx16:25
niemeyerhazmat: Let's continue the charm URL discussion after that call17:13
hazmatniemeyer, sounds good17:13
rogniemeyer: any hints for debugging a machine that i can't ssh to? (i suspect i'm that's a symptom of the problem)18:14
rogniemeyer: my userdata looks like this: http://paste.ubuntu.com/891013/18:14
niemeyerrog: EC2?18:14
rogniemeyer: yeah18:14
niemeyerrog: ec2-get-console-output may help18:15
rogniemeyer: ah, didn't know about that18:15
niemeyerrog: Note it's not synchronous.. it make take a moment for the output to be visible18:15
niemeyerrog: Are those broken lines supported by cloud-inti?18:16
rogniemeyer: i suspected those - they're valid YAML, and i'm not sure how i can tell yaml.Marshal not to produce them.18:17
rogniemeyer: i didn't know you could do that actually18:17
TheMueniemeyer: so, next agent try plus some cleanup with https://codereview.appspot.com/578205318:18
niemeyerrog: it may well be fine.. I recall broken lines on Python too18:18
rogniemeyer: hmm, the console output is useful, but i can't see any mention of cloudinit in there unfortunately: http://paste.ubuntu.com/891026/18:24
rogniemeyer: oh, cloud-init18:24
rogniemeyer: none of those key fingerprints matches the one sent, BTW18:25
rogTheMue: i preferred it when the code wasn't duplicated. unit.Agent().Alive() reads well to me.18:29
TheMuerog: see discussion above with niemeyer18:30
rogTheMue: yeah, i saw that, but i didn't quite realise the implications. i think that the code duplications in unnecessary.18:30
rogs/duplications in/duplication is/18:30
niemeyerTheMue: Review delivered18:32
niemeyerrog: The duplication is certainly unnecessary, but it's trivial to avoid it without introducing a new type18:32
TheMueniemeyer: yup, notification just received18:32
* TheMue feels reminded for the first proposal, with three helper functions ...18:33
rogniemeyer: what method would you use?18:33
rogniemeyer: just define a helper function?18:34
niemeyerrog: Yeah.. it's just a simple function used twice18:34
TheMueniemeyer: LOL18:34
rogniemeyer: AFAICS a new Agent type (with three methods, Alive, SetAlive and WaitAlive) would fit well here.18:34
TheMueniemeyer: that has been the first proposal18:34
rogniemeyer: then Unit and Machine both get Agent() *Agent methods18:35
niemeyerrog: It would, but TheMue seems to prefer this approach, and I have no reason to disagree with him18:35
rogniemeyer: ok. less code, but fair enough.18:35
niemeyerrog: Disagree.. the amount of code in his implementation is fairly minimal.. adding a type wouldn't save much, if anything at all18:36
niemeyerTheMue: Sorry about the back and forth.. I don't know about the whole history, and either way would be fine for me. Having two 30-line functions with the exact same implementation sounds unnecessary, though18:37
rogniemeyer: 16 lines eventually and three less methods. well, i know it's not much, but every little helps :-)18:39
niemeyerrog: There are at least three methods for the implementation of Agent, the declaration of Agent, and the two methods for returning the agent18:40
niemeyerBesides another file, and imports18:40
niemeyerrog: It's not less code, and even if it was, the current implementation is on the trivial side.18:41
rogniemeyer: util.go :-)18:41
niemeyerrog: No. Thanks.18:41
niemeyerTheMue: It's cool to move on with your approach, just remove the dups please18:42
TheMueniemeyer: so back to the first approach with three funcs? used then in the methods of unit and machine?18:42
niemeyerTheMue: No..18:42
niemeyerTheMue: WaitAgentAlive is the only function that has logic duplication we just need a single function with that logic for reuse18:43
niemeyers/duplication we/duplication. We/18:43
TheMueniemeyer: in an own file named agent.go or in util.go?18:43
TheMueniemeyer: and just to remove the dup the reader later has to look where he finds this logic while the other func are implemented directly using presence.18:44
niemeyerTheMue: A file for a single function would be too much.. it's fine to put it below one of the implementations, and hook it on the next one18:44
TheMuefor those few locs18:45
rogniemeyer: i'm off for the evening. will be probably be mostly out of contact tomorrow afternoon as i travel down to the london Go meetup (i'll be working on the train, but the reception is patchy)18:47
TheMuei'll do so, just to stop the discussions.18:47
TheMuerog: i would have liked to participate at that meeting.18:48
TheMuerobut the trip is too expensive.18:48
TheMueeh, rog, but ...18:48
rogTheMue: it's quite expensive for me to go down but i decided that i couldn't miss it18:48
TheMuerog: now i'm giving a talk about go at the gtug in bremen in april18:48
niemeyerrog: Cool, we'll certainly talk before that, but have there no matter what18:48
niemeyerhave fun there18:49
rogniemeyer: hopefully it'll be interesting.18:49
niemeyerrog: I bet it will!18:49
TheMuerog: i think so, there are some interesting people18:49
rogright, i smell food!18:49
niemeyerrog: Mmmm.. that' dbe nice :)18:50
TheMueniemeyer: one wish: presence gets a function "WaitAlive(zkConn, path, timeout) error" and that can be used in both agent methods (and anywhere else)18:56
TheMueniemeyer: so it's a one-liner there too18:57
niemeyerTheMue: Sounds good.. let's mutate the current function in that fashion then, since that's precisely the use case we need it for18:57
niemeyerTheMue: Just push that in a different branch, please18:58
TheMueniemeyer: so a new branch with the func and when it's submitted merge it into my agent branch and use it there?18:59
niemeyerTheMue: Right.. you can just merge from trunk once it's in19:00
niemeyerTheMue: Hopefully we can do the whole thing with a very quick turnaround19:00
TheMueniemeyer: ok, will do so tomorrow, its late here. should be easy.19:00
niemeyerTheMue: Sounds good, thanks19:00
niemeyerTheMue: The idea of tweaking presence is a good one19:00
TheMueniemeyer: thx19:01
TheMueniemeyer: i'm off, bye19:01
niemeyerTheMue: Have a good evening19:01
niemeyerfwereade__: So, you've decided to drop it all for now?19:07
niemeyerfwereade_: So, you've decided to drop it all for now?19:16
niemeyerhazmat: ping20:27
niemeyerhazmat: I'm up for that conversation about charm URLs, when you're ready20:38
hazmatniemeyer, cool20:49
hazmatniemeyer, i'm ready just picked up ethan from day care20:49
niemeyerhazmat: Ok.. we can even go here I guess20:50
hazmatniemeyer, sounds good20:50
hazmatso wrt to it urls.. their of the form scheme:~user/series/name20:50
niemeyerhazmat: The charm URL needs to work for deployments.. it's already supported in the command line, but not taken into account for the actual deployment with the backend20:50
niemeyerhazmat: What's the issue you were mentioning about that?20:50
hazmatniemeyer, my understanding is that with the merge of ~fwereade/juju/pa-start-machine-constraints that we can enforce series as a constraint20:53
hazmatlooking over deploy, it doesn't do it by itself20:53
niemeyerhazmat: Series must not be a constraint20:53
hazmatniemeyer, its not a user constraint20:53
niemeyerhazmat: Not a user visible one, at least20:53
hazmatits a system constraint20:53
niemeyerhazmat: Ok20:53
hazmatniemeyer, this is going to explode the question of consistent version of juju across distros per SpamapS's email earlier on the same20:54
niemeyerhazmat: I don't know what the question is20:54
niemeyerhazmat: Or how the version of juju is involved in that?20:54
hazmatniemeyer, different versions of juju released in different distro releases20:54
hazmatall in the same env20:55
niemeyerhazmat: Yeah, it's an interesting problem20:55
niemeyerhazmat: Versions have to be compatible, basically20:56
hazmatniemeyer, its  a simple problem, if we didn't have to use packages20:56
niemeyerhazmat: Right20:56
hazmatits the juju upgrade itself problem.. but with env version pinning added20:56
hazmatniemeyer, okay.. i've looked over the charm deploy code, this should be straight forward20:57
niemeyerhazmat: Hmmm.. kind of20:57
hazmatto add the series as system constraint from charm20:57
niemeyerhazmat: juju upgrade has its own set of problems20:57
hazmatniemeyer, agreed, it needs some configurable policy choice to enable cluster workflows20:57
hazmatbut focusing on what we can do atm.. i'm wondering if this something important enough to try and get it done for 12.04, ie. get a spec out on it today20:58
hazmatniemeyer, did you see SpamapS's email on the topic https://lists.ubuntu.com/archives/juju/2012-March/001337.html20:58
hazmatie. being able to upgrade an environment seems pretty important for prod like usage20:59
niemeyerhazmat: Definitely21:01
niemeyerhazmat: I have a personal embargo to comment on that issue ;-)21:01
hazmatniemeyer, you mean shipping the go binaries into provider storage won't fly ;-)21:01
hazmatseems perfectly reasonable21:02
niemeyerhazmat: I think it's reasonable too21:02
niemeyerhazmat: It's probably the real long term solution21:02
niemeyerhazmat: It's also boringly easy21:02
niemeyerhazmat: Since it consists into a single binary being uploaded/downloaded21:03
niemeyerWell, maybe two21:03
niemeyerhazmat: But that won't solve the juju-py issue21:03
niemeyerhazmat: I suppose the short term solution is to upgrade Oneiric21:03
hazmatniemeyer, well its possibly N your uploading a binary tagged to a version, and instructing the cluster to use it by ref/version21:04
hazmatniemeyer, well i'm debating if we should just do the same for juju py21:04
niemeyerhazmat: Where the same means..?21:04
hazmatallow for code repo + rev or bundle w/ version21:04
hazmatfor consistent cluster deploys21:04
hazmati'd rather not. cause it will tie me up for the cycle21:05
SpamapShazmat: since this is merged: https://code.launchpad.net/~fwereade/juju/apply-machine-constraints .. does that mean that https://code.launchpad.net/~fwereade/juju/pa-start-machine-constraints/+merge/86451 can land, and then constraints will be released?21:05
hazmatbut if needs tog et done..21:05
hazmatSpamapS, yes, but only for ec221:05
niemeyerhazmat: It's not so simple, since there are deps involved too21:05
niemeyerhazmat: It'd need to deal with downloading packages + deps, and uploading packages, if I see what you mean21:05
hazmatniemeyer, i'm less concerned with the deps outside of txzk their all stable21:05
hazmatniemeyer, yes.. for a bundle, exactly that21:06
SpamapShazmat: thats fine.. OMG Ubuntu needs it. :)21:06
niemeyerhazmat: Right..21:06
niemeyerhazmat: All the deps that need to :)21:06
hazmatniemeyer, not nesc. dpkg binaries though, i was just going to grab push py eggs21:06
niemeyerhazmat: It's also not readily available locally21:06
niemeyerhazmat: Ugh..21:06
hazmatniemeyer, :-)21:06
hazmatniemeyer, the dpkg is avail local21:06
hazmatunless its source21:06
niemeyerhazmat: It's not avaliable locally21:07
hazmatin which case use branch and revision21:07
hazmatniemeyer, its not in the dpkg cache?21:07
niemeyerhazmat: No, not necessarily.. it's a cache after all21:07
hazmatSpamapS, considering it was a client skew there.. not sure this would help21:08
SpamapShazmat: this is orthogonal to that problem21:08
hazmatagreed21:08
SpamapShazmat: they need xlarges for the webserver, and m1.small for everything else.21:09
SpamapSwebserverS I should say21:09
hazmatSpamapS, oh.. constraints would help them yes..21:09
niemeyerhazmat: As far as OMG-what-to-do-for-12.04 goes, I'd make sure that Oneiric and Precise match each other, and beg SpamapS for help21:09
hazmatniemeyer, the omg refs are for the site omgubuntu site.. although it does apply to the other ;-)21:09
niemeyerhazmat: Yeah, I was consciously not referring to that part of the problem :)21:10
hazmatjimbaker, just a heads up.. per our conversation friday, i'm implementing the status changes per the spec21:10
jimbakerhazmat, yes, i've just updated the spec per the discussion21:11
jimbakerover that spec21:11
hazmatjimbaker, awesome21:11
niemeyerhazmat, SpamapS: Long term, having raw binaries in storage feels like the best in terms of portability and stability of environments21:11
jimbakeri certainly welcome your impl help! thanks21:11
hazmatniemeyer, so the problem is how to get the binary for py-juju.. and then spec the rest of it21:11
niemeyerhazmat: So you mean you're also going to do that for 12.04? :-)21:12
hazmatniemeyer, if needs to get done, i'll have to find something else to drop21:13
hazmatniemeyer, but it also eases the take machine/upgrade for go-juju21:13
hazmats/take/takeover21:14
niemeyerhazmat: Kind of.. we can always do that later, and I'm not optimistic that we can do something that will be used as-is for upgrading Go21:14
niemeyerUpgrading to Go will likely be a breaking change for the environment itself21:14
niemeyerIt won't break charms, though21:14
niemeyerhazmat: So we have to evaluate what's the pluses and minuses..21:15
hazmathuh.. i thought we where playing approved any state changes for the purpose of transparent upgrade21:15
niemeyerhazmat: If you get some ideas for how to do it in Python, I'd be happy to talk about themLet me know how you'd do it in Python..21:15
hazmatniemeyer, speaking of which there is a state change notice for a trivial change to upgrade flag storage on the list21:15
niemeyerI need to get some quick food now, though, as Ale is waiting..21:15
hazmatniemeyer, okay.. i can spec it out21:15
hazmatniemeyer, i guess what i really need is guidance from you and SpamapS  that this is critical for 12.0421:16
niemeyerhazmat: WIll go over it.. there are a couple of entries in the TODO about answering emails21:16
hazmatif its not then... i'll push on other things21:16
niemeyerhazmat: My vague view on it is still the same since the last milestone21:16
hazmatif it is i can at least get a spec for discussion21:16
niemeyerhazmat: The focus is on making in *stable*21:16
niemeyerhazmat: A lot has been happening, and significant changes are still going in right now21:17
niemeyerAnyway.. biab21:17
niemeyerhazmat: Back21:31
niemeyerhazmat: The spec sounds good, though21:32
niemeyerhazmat: I expect it will be fairly uncontroversial, at least in the part that does the actual upgrade21:32
niemeyerhazmat: I don't know what you have in mind for the harvesting of dependencies, though21:33
hazmatniemeyer, sounds good, i'll put on my list for today, first some impl and reviews21:34
niemeyerhazmat: Thanks a lot21:38
hazmatniemeyer, np21:42
niemeyerjimbaker:21:53
niemeyer 38 »        relations-error:21:53
niemeyer 39            db: [blog3, blog4]21:53
niemeyer 40 »        relations-pending:21:53
niemeyer 41            db: [blog2]21:53
niemeyerjimbaker: I don't think this reflects the agreement. Please see hazmat's email.21:53
niemeyerIn fact.. do we even need that spec?21:54
niemeyerIt was good to bootstrap the discussion, but hazmat's email to the list looks great21:54
andrewsmedinarog: pin21:58
andrewsmedinarog: ping21:58
hazmatniemeyer, for spec review there's also http://codereview.appspot.com/5847053/ i'm making some changes per fwereade_'s review, but their minor21:59
hazmatjimbaker, there is no pending.. only relations, relations-error22:00
niemeyerandrewsmedina: Heya22:03
niemeyerandrewsmedina: It's pretty late for rog now22:03
niemeyerhazmat: Looking22:03
andrewsmedinaniemeyer: ty22:03
niemeyerhazmat: Hmmm.. will wait for the next round if that's ok22:04
hazmatniemeyer, sure22:05
niemeyerhazmat: Do we need {force: true} on the yaml state?22:09
niemeyerhazmat: for upgrade-charm?22:10
niemeyerhazmat: I thought we were just not setting the upgrade depending on the state22:10
hazmatniemeyer, its to support the --force upgrade to allow for upgrades from any state, i wanted to distinguish it from a normal upgrade22:10
niemeyerhazmat: Just trying to understand why.. I thought the non-forcing behavior was client-side only22:11
hazmatniemeyer, the additional behavior is only enabled by the --force flag, else normal upgrade checks apply22:11
niemeyerhazmat: Right, but aren't the checks client side?22:11
* hazmat checks something22:12
niemeyerhazmat: In fact.. upgrade-charms --force seems to go into the void22:12
hazmatniemeyer, there's also agent side checks22:12
niemeyerhazmat: Cool, +1 on the concept then.. but there seems to be a hole in the implementation22:14
hazmatniemeyer, i'm all ears22:15
niemeyerhttps://codereview.appspot.com/5752069/diff/6001/juju/control/upgrade_charm.py#newcode5622:16
niemeyerhazmat: ^22:16
hazmatniemeyer, indeed22:19
hazmatmissing a test clearly22:19
niemeyerhazmat: It might be good to have a run with it in practice too, to check that it's glued up end-to-end22:21
jimbakerhazmat, that makes sense re pending22:55
hazmatniemeyer, re reviews, i would appreciate if you could have a look at jimbaker's relation specs all ones you  reviewed have feedback incorporated, and are ready for another look.23:13
jimbakerhazmat, niemeyer, sounds good23:13
hazmatwe need to start them... now effectively, to get them implemented23:15
jimbakerhazmat, agreed23:25

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!