[06:19] fwereade__: around? [07:25] bigjools, heyhey [07:25] fwereade__: hello! [07:25] bigjools, how goes it? [07:26] fwereade__: desperately need your help, I am trying to fix that system_id thing [07:26] totally failing so far [07:26] do you have some time? [07:26] bigjools, yeah, I'm not sure I explained myself very clearly [07:26] bigjools, ofc [07:26] thank you, ok let me explain where I got to [07:26] bigjools, cool [07:27] firstly - I am unclear on how to test this in a QA environment since juju checks code out of Launchpad (!) [07:27] that's utterly bizarre [07:28] bigjools, yeah, that underlying bug has lead be to break every-juju-in-the-world twice now :/ [07:28] when LP is down, by any chance? [07:28] or broken trunk [07:29] bigjools, nah, not even *broken*... just a significant-enough change in trunk can be deadly [07:29] anyway, I can't work out how to get my branch's code on there to test [07:29] bigjools, you should be able to use the origin field in environments.yaml [07:30] juju-origin: lp:~fwereade/juju/set-service-constraints [07:30] bigjools, which it, well, god enough for testing [07:30] oh boy, ok :) [07:30] next question, what do you know about cloud-init? :) [07:31] bigjools, um, embarrassingly little :( [07:31] fair enough, I am trying to work out why it's crashing when I boot my node :( [07:31] bigjools, my go-to technique is "vdiff with a known-good one and see if anything jumps out at me" [07:31] oh at which level does that config go BTW? [07:32] bigjools, that's inside a given environment [07:32] ok [07:32] mornin' campers [07:32] heya rogpeppe [07:32] so at the same level as admin-secret et al? [07:32] bigjools, do you know exactly what is crashing on boot? [07:33] I don't, the logs are useless unfortunately [07:33] bigjools, is that the traceback you sent in the mail or something else? [07:33] it just says it exited with status 1 [07:33] no traceback [07:33] bigjools, sorry... what exited with status 1? [07:33] no this is cloud-init crashing now, not juju [07:33] bigjools, ah-ha [07:33] which is a pre-requisite to getting as far as juju :)_ [07:33] bigjools, quite so [07:34] I suspect I need Daviey [07:34] bigjools, would you pastebin me the cloud-init file, just in case? [07:35] fwereade__: you want the user-data it's using? [07:35] bigjools, just in case anything leaps out at me [07:35] bigjools, btw, are you using system_id as instance id throughout now? [07:36] well I changed it in launch.py but as I said, not even getting close to testing that ATM [07:36] too many other cloud-init changes have broken things I think [07:36] bigjools, if that's all you changed and it's now killing cloud-init, it sounds interesting [07:37] bigjools, ah, sorry, what else has changed? [07:37] not entirely sure tbh, the server guys have been busy! [07:37] bigjools, ha, ok:) [07:38] btw why on earth is it branching code on the master node anyway? can a tarball not be pushed through? even bzr serve on the client end would be better! [07:39] bigjools, we absolutely need a sane use-the-same-code-everywhere-in-an-env story [07:39] no kidding :) [07:43] bigjools, thinking out loud, I presume you don't know where cloud-init crashes? [07:44] fwereade__: I don't [07:44] stuff flashes up on the guest's console ... AHA [07:44] bigjools, can you look on the instance and make inferences based on what's installed so far though? [07:44] fwereade__: hiya [07:44] vt7 has a traceback [07:45] bigjools, cool [07:45] ImportError: No module named DataSourceMAAS. It's the freaking s/MaaS/MAAS/ that happened recently. [07:45] * bigjools takes it to the right channel ... :) [07:45] fwereade__: i just got that transient testing error too. i think i'll just choose a port at random. [07:46] rogpeppe, cool, sounds good [07:46] bigjools, ouch :( [07:46] bigjools, still, progress :) [07:47] fwereade__: yeah, I can attack this now, I'll be back with you later! Although having said that when I did a check-seed on the node, the user-data still had env JUJU_MACHINE_ID="0" [07:47] bigjools, that should be there -- that's what it uses to poke the "machine 0 is already provisioned" data in [07:47] bigjools, but it needs an instance id as well [07:47] oh it's not the system_id then? [07:48] where is instance_id conveyed? [07:48] bigjools, nah, sorry: we have machine ids which are basically just ints, and instance ids which are provider-dependent [07:49] bigjools, instance_id is sent in through set_instance_id_accessor and I *think* it's only used in the `juju-admin initialize` script [07:49] oh from zk [07:49] I see it now, it's set [07:49] bigjools, cool, and it's a system_id? [07:49] ok let me fix cloud-init and then I can test this [07:49] it is :) [07:50] bigjools, sweet [07:50] bigjools, it's a public holiday for me today but I'm working the first half so I'll be around for a few hours more [07:50] bigjools, just ping me if you need anything [07:50] fwereade__: ah ok thanks, very much appreciated [07:50] bigjools, a pleasure :) [08:04] rogpeppe, btw, I had a thought over the weekend: one of the big problems with the hook package is its name [08:04] rogpeppe, because hooks themselves are really only very tangentially related to what it's doing [08:04] * rogpeppe always likes a good name change [08:05] rogpeppe, I'm starting to think that the best place for this code is cmd/server [08:05] rogpeppe, but there's probably an even better place I haven't though of yet [08:05] fwereade__: does this code actually need its own package in fact? [08:06] fwereade__: couldn't it just go into the unit agent package [08:06] rogpeppe, we don't have a unit agent package: you didn't want one :p [08:06] fwereade__: lol [08:06] fwereade__: well, then in the place that has that [08:07] rogpeppe, that's in cmd/jujud and I don't think that's the right place [08:07] fwereade__: no? [08:07] rogpeppe, I have a forthcoming cmd/server which is only connected to jujud in that a process invoked by jujud will happen to run the server [08:08] rogpeppe, and it's starting to seem that the server, the tool execution context, and the tool implementations themselves should probably all go in there [08:08] fwereade__: sorry, i think i lost the implication: cmd/server is a command? [08:09] rogpeppe, it may be that we want a main package/fun in cmd/jujuc, and then to stick it in cmd/jujuc/server [08:09] rogpeppe, it's not really, no, but it is a "command server" and it "serves" cmd/Commands [08:10] rogpeppe, ...but they're purely for use by jujuc, so jujuc/server may be clearer [08:10] fwereade__: i think that for our own sanity the subdirectories under cmd should all be main packages [08:10] but cmd/jujuc/server might work [08:10] rogpeppe, it seemed that if I tried to add cmd/jujuc/server when there wasn't any code in cmd/jujuc, go just ignored it [08:10] rogpeppe, is that expected or did I do something wrong? [08:11] fwereade__: ignored it when you did what? [08:11] rogpeppe, go didn't run the tests in cmd/jujuc/server when I put the server code in there with jujuc otherwise empty [08:11] fwereade__: if it does that it, it's a bug [08:11] rogpeppe, hm, that was when running go test .../cmd/... [08:12] fwereade__: it should still work. let me check. [08:12] rogpeppe, or more likely that I did something wrong :p [08:12] rogpeppe, but I was expecting at least a "you're stupid, I'm not doing that" message [08:13] fwereade__: it works for me. [08:13] rogpeppe, then I guess I did something stupid, cmd/jujuc/server it shall be (if that makes sense to you?) [08:14] fwereade__: yeah, that makes sense. it's the server side of the jujuc commands. [08:15] rogpeppe, and so that'll have Server, Context, and a whole bunch of things like LogCommand and RelationSetCommand [08:15] fwereade__: the only hesitation i have is you might want it to depend on stuff internal to jujud. [08:16] rogpeppe, go on... such as? [08:16] fwereade__: just a hunch [08:17] fwereade__: depends how closely the callback commands interact with the stuff in the unit agent. [08:18] fwereade__: i guess anything we need can go in an interface, and that should be fine. [08:18] rogpeppe, I'm not seeing it yet; I think that when I figure out precisely what is responsible for the socket things may be rearranged slightly [08:18] rogpeppe, but the only things I expect them to hit are log and state [08:19] rogpeppe, I'm not sure quite how the state will get there yet but I think that's a future consideration [08:19] fwereade__: if that's the case, it's a nice clean separation and +1 [08:19] rogpeppe, it's my intent anyway :) [08:19] fwereade__: well the server stuff is invoked by jujud, right? [08:19] fwereade__: (well, the unit agent within jujud) [08:19] rogpeppe, it will be but I don't yet know exactly how [08:19] fwereade__: ok [08:20] rogpeppe, and my own hunch says that we will start to want a unit agent package around the time it all starts to get hooked up [08:20] rogpeppe, incidentally, one nice effect of go I'm coming to appreciate: [08:21] fwereade__: maybe. i'm still thinking the agents are small enough they can live inside jujud. but we'll see. [08:21] rogpeppe, no-unused-imports means that just by opening a file and seeing a bunch of unrelated imports you detect a smell [08:21] fwereade__: yeah [08:21] rogpeppe, the unit agent is I think big enough that it'll feel wrong [08:22] rogpeppe, all the lifecycle and workflow and scheduler stuff basically [08:22] fwereade__: yeah, maybe you're right. [08:22] rogpeppe, the MA and the PA are probably compact enough they wouldn't feel bad really [08:22] * rogpeppe goes to see how many lines of code the python version is [08:23] rogpeppe, I could very well be wrong -- ATM the code run directly by the UA is smeared across juju.hook and juju.unit (in addition to all the state stuff etc) [08:23] rogpeppe, but perhaps it isn't actually *big* enough to warrant its own package and I'm just responding to the unclear factoring [08:24] fwereade__: yeah, that's quite a lot of code actually. [08:24] fwereade__: i'm wondering that with server and jujuc factored out the actual core unit agent code might be reasonably compact. [08:25] fwereade__: i.e. the core lifecycle, workflow and scheduler stuff. [08:25] * fwereade__ is cautiously optimistic [08:25] fwereade__: it *feels* compact in my head, but that's probably because i'm not familiar with it :-) [08:25] rogpeppe, it's fiddlier than it looks [08:26] rogpeppe, as I discovered when I thought "yeah, I'll pick up agent upstartification, how hard can it be?" [08:26] fwereade__: yeah. it's probably the fiddliest bit of the whole system, right? [08:26] rogpeppe, yeah, I think so [08:27] fwereade__: but i guess it's that bit which is really what makes juju juju. [08:27] rogpeppe, but *even then* I think it's that the unit agent itself is intrinsically fiddly, and so a jujud/unit subpackage might be just the ticket [08:28] rogpeppe, yeah, it's all about the agents :) [08:28] fwereade__: i was thinking its all about mapping juju state transitions to shell scripts... [08:29] rogpeppe, there are indeed many valid perspectives :) [08:35] fwereade__: a review for you, if you choose to accept it: https://codereview.appspot.com/5853048/ [08:35] :) [08:35] rogpeppe, I have a few from the other day [08:36] fwereade__: unfortunately it breaks the environs/ec2 amazon tests. but i think fixing that is for another review. [08:36] rogpeppe, I hope you like how hook/context turned out after discussing with niemeyer for a while [08:36] fwereade__: oh yeah, from friday. i'll have a look - i've been pointedly avoiding looking at my email this morning... [08:37] fwereade__: oh, i did see that you'd made some changes that i wasn't expecting [08:37] rogpeppe, as long as we don't end up *merging* broken stuff I'm fine with that :) [08:37] fwereade__: ExecInfo went away - i'm happy to see it, but i didn't see any discussion about it. [08:37] rogpeppe, the crucial insight is that this really is only very slightly related to hooks in the first place [08:37] fwereade__: was that your G+ conversation with gustavo? [08:37] rogpeppe, but it took me a while longer to think "maybe this shouldn't be in the "hook" package at all [08:38] rogpeppe, that was what crystallised it, yeah [08:38] fwereade__: cool. i was like "i thought i didn't manage to convince you, but you've gone and done it anyway... how did *that* happen?!" [08:38] rogpeppe, and it now makes me think that Context.ExecHook is what we'll need in the end but until it has a client I'm comfortable as it is [08:39] fwereade__: yeah, i'm happy how it looks now. [08:39] rogpeppe, the leap was too great for me to see while I was still thinking it was about hooks [08:40] rogpeppe, once you forget about hooks the rightness of your approach is clear [08:40] fwereade__: i still quite liked Exec and vars being methods on Context. [08:40] rogpeppe, if you're OK with that I'll gladly put them back on [08:40] fwereade__: yeah, i'm very happy with that. [08:41] fwereade__: they're tied closely enough to Context that i think they work well as methods on it. [08:42] fwereade__: and it's trivial to factor them out later if we want. [08:42] s/want/need/ [08:44] rogpeppe, I'm thinking that if I do that I will move them into cmd/jujuc/server as well, may as well start as I mean to go on [08:45] rogpeppe, at which point I think the methods actually become ExecHook and hookVars [08:46] fwereade__: doesn't Context move into jujuc/server too? [08:46] rogpeppe, yes, exactly [08:47] fwereade__: so they can still be Context.Exec and Context.vars if you like [08:47] rogpeppe, I'm not sure, I think they become an "alien" concept once it's under jujuc [08:48] fwereade__: hmm, i dunno. if they were appropriate as methods on Context before, i don't really see why that's changed when Context has moved. [08:49] rogpeppe, sorry: they're still context methods, but they should change their names to make it clear that they're about hooks (not the jujuc tools themselves, which will only be called as side effects if you like) [08:50] fwereade__: ok, that makes sense. [08:50] rogpeppe, cool [08:50] fwereade__: one thought: maybe "RunHook" rather than "ExecHook" [08:50] rogpeppe, perfect [08:57] fwereade__: http://pastebin.ubuntu.com/890372/ [08:57] fun! [09:04] bigjools, we're making progress though [09:04] slow! [09:04] bigjools, I think that just means that the resource-uri/system-id confusion ran deeper [09:05] bigjools, would a resource-uri be unique and immutable in the same way as system-id is? [09:06] bigjools, if so it is probably a more convenient representation and would allow you to forget about system-id entirely? [09:07] yes, resource_uri is just a URL with the system_id in there somewhere [09:07] bigjools, ok: that makes it sound like you can drop the notion of system-id entirely and just use resource-uri as instance_id throughout [09:08] bigjools, sorry poor advice before [09:08] I am seriously confused [09:08] bigjools, sorry, let me step back a mo [09:09] bigjools, a juju machine id is really entirely abstract -- it's a predictable way for us to refer to specific machines internally, regardless of whether or not they're actually provisioned [09:09] bigjools, so it's basically just an int [09:10] bigjools, we maintain a mapping between machine ids and provider-specific instance ids (I forget exactly how it's stored) [09:10] bigjools, and the provisioning agent keeps an eye on that mapping [09:11] ok so far [09:11] bigjools, and provisions new instances in response to seeing machine states which *aren't* yet associated to an instance [09:11] bigjools, once it's provisioned an instance for a juju machine, it sticks it in the mapping [09:11] ok [09:12] bigjools, I am not aware of any restrictions on the format of instance-id -- I don't think we ever try to parse them [09:12] bigjools, so the only relevant property of instance-id is that it affords a convenient way to talk to the provider about a specific instance [09:13] bigjools, system-id was that (or near enough) in the orchestra provider, which is why I suggested that it should be the case here [09:13] oh hmmm [09:14] not sure the checkout worked ok from cloud-init [09:14] bigjools, if you have enough information to construct a resource-uri given (1) a system-id and (2) the maas provider details [09:14] bzr: ERROR: A control directory already exists: "file:///usr/lib/juju/juju/". [09:15] bigjools, huh, not seen that, maybe it's just reacting to droppings from a previous attempt? [09:15] yes [09:15] I neglected to wipe properly [09:16] bigjools, anyway , if you *can* construct the uri given system-id then it might make sense to keep system-id, but I don't have a firm handle on whether or not that's actually a good idea [09:16] fwereade__: well this is how it was originally, right? [09:16] I was setting the machine_id as the resource_uri [09:17] bigjools, yeah but if it's not the best fit for the problem it should change [09:17] still confused tbh since I don't know what's going on in the depths [09:17] bigjools, the problem is that the MaaSMachine thinks system_id is the instance id, while other parts of the code think that resource_uri is [09:17] what is it doing with the machine_id later? [09:18] e_toomanyids [09:18] bigjools, machine id is I think a red herring here [09:18] haha [09:18] so what is: cloud_init.set_instance_id_accessor() doing? I thought it set machine_id? [09:19] bigjools, nope: instance_id [09:19] so its name has a clue :) [09:19] bigjools, the clue's in the name :p [09:19] when instance_id is looked up later, how is it used? [09:20] bigjools, give me a mo, double-checking [09:21] bigjools, it's only actually used by the provisioning agent AFAICT [09:22] bigjools, the only reason it intrudes on your consciousness at all is because we need to fake up initial state on bootstrap, to say "machine id 0 is already provisioned on instance id WHATEVER", and prevent the PA from trying to provision itself [09:24] bigjools, that is done by `juju-admin initialize` -- grep for that and you should see how set_instance_id_accessor is relevant [09:26] fwereade__: sorry, total PC lockup :/ [09:26] bigjools, np [09:26] I have a call in 4 minutes [09:26] bigjools, give me a mo, double-checking [09:26] bigjools, it's only actually used by the provisioning agent AFAICT [09:26] bigjools, the only reason it intrudes on your consciousness at all is because we need to fake up initial state on bootstrap, to say "machine id 0 is already provisioned on instance id WHATEVER", and prevent the PA from trying to provision itself [09:26] <-- bigjools has quit (Read error: Connection reset by peer) [09:26] bigjools, that is done by `juju-admin initialize` -- grep for that and you should see how set_instance_id_accessor is relevant [09:27] bigjools, I should still be around afterwards unless it's *really* long [09:27] 20 mins [09:27] bigjools, just grab me when you're free then :) [09:27] ok thanks [09:28] rog, thinking about your review [09:28] fwereade__: cool, thanks [09:29] rog, there are quite a lot of tests that start by Initializing a State [09:29] rog, and the required data for initialization will become more complicated [09:30] rog, so we will at some stage want a testing.InitializeState(addrs string) function, but maybe it's not justified yet [09:30] rog, OTOH when we do need it, if it already exists, it'll be just one place to change [09:30] fwereade__: i think i'd leave that until we need it [09:30] fwereade__: it's trivial to find occurrences and to add [09:30] fwereade__: in fact, won't Initialize need to take an addrs method? [09:31] s/method/argument/ [09:31] rog, it already does (implicitly, in the Info), I think [09:31] fwereade__: ah, so what would testing.InitializeState give us? [09:31] rog, but the eventual reuiqred args to Initialize will be more complicated than to Open [09:32] fwereade__: ok. what other stuff will it have? [09:32] rog, at the very least we need the instance id, to set up the state I've been talking to bigjools about [09:32] rog, and I'm 99% sure that we'll end up passing in the environment settings too, imminently [09:32] rog, like must-be-done-for-12.04-imminently [09:33] rog, maybe that's not too much to duplicate [09:33] rog, after all, dummy provider env settings are going to be basically empty [09:34] Initialize is only called in three places AFAICS. when the duplication becomes a burden we can factor it out. [09:34] fwereade__: for now, let's not add stuff that we don't need. [09:34] fwereade__: just one question after reading niemeyers comment to my last proposal: when i've got two pingers pinging the same node and i say one to kill its work, the second one will recreate the node, doesn't it? [09:34] rog, so it will probably be `(info *Info, instanceId, providerType string)` [09:35] TheMue, it should do, but 2 pingers on the same node is Doing It Wrong [09:35] TheMue: there should never be two pingers pinging the same node :-) [09:35] TheMue, what are you trying to accomplish? [09:37] rog, (yes indeed, it's not called for, ty for discussing :)) [09:40] rog, why a 3 minute timeout? [09:40] fwereade__: niemeyer found a problem with retrieving an instance of Agent() twice [09:41] TheMue, go on [09:41] fwereade__: you get two diffent instances then today [09:41] fwereade__: which is, when keeping a pinger inside, indeed isn't good [09:42] TheMue, yeah, makes sense; I thought you were taking the pinger out anyway? [09:42] fwereade__: on the other hand he suggested an api change to return a pinger with agent.StartPinger() [09:42] fwereade__: because it takes about 2 minutes to boot, and 3 minutes seemed long enough for the zk node to be inited after boot (maybe it's not and that's why my test is failing). the test harness fails after 6 minutes. [09:42] fwereade__: but here the problem stays the same [09:43] TheMue: there's no problem if the agent doesn't cache the pinger [09:43] TheMue: i think [09:43] rog: it's exact the same problem [09:43] TheMue: what's the problem? [09:43] rog: in both cases it's an illegal usage of the api [09:44] rog, I *think* that we have 2 interesting cases: on the instance, if any code is running before initialize is complete we Have A Problem [09:44] rog: if i create two agent instances or two pinger instance, both is wrong [09:44] TheMue: i don't think you can stop that. it's a distributed system. [09:44] rog, and if we're connecting from outside I think we want to wait forever and let the user interrupt us [09:44] rog: Pinger has the method Kill() [09:44] TheMue, why would you ever create 2 pingers for the same node anyway? [09:45] TheMue: that's fine. that's to kill that particular pinger [09:45] fwereade__: ask niemeyer why one would create two agent for the same unit anyway [09:46] fwereade__: ok. i added the timeout as an afterthought because my test was timing out after 6 minutes. but maybe that was correct, and i should just up the test harness timeout time. [09:46] TheMue, why is agent different to any other state class? you can have N state.Units referring to the same ZK state and that shouldn't be a problem [09:46] fwereade__: i only say that, if the one way is an error, that error won't move away by returning the pinger [09:46] fwereade__: so it should be with pinger too [09:46] rog, I'm not *sure* that my analysis is correct, give it a bit of a mental kicking [09:46] TheMue, it's always possible to write code that does the wrong thing [09:47] TheMue: returning the pinger seems good to me. it means that the Agent doesn't need to keep track of that state - it's less code and no less correct IMHO [09:47] TheMue, in practice the unit agent process will call StartPinger once and only once, and that's it [09:47] TheMue, and the agent process itself will decide when it needs a Stop/Kill [09:47] fwereade__: how long does the bootstrap node take to come up and be usable, usually? [09:47] fwereade__: so why return the pinger? [09:48] rog, I've never actually measured it [09:48] [09:47] TheMue: returning the pinger seems good to me. it means that the Agent doesn't need to keep track of that state - it's less code and no less correct IMHO [09:48] fwereade__: maybe i'll take the timeout out again. [09:49] rog, it may be there's some case I missed [09:49] TheMue, what rog said :) [09:50] fwereade__: no, i think you're right. i guess i thought that three minutes waiting after zk connect *should* be fine. surely we don't take that long to start up the juju init command after starting zookeeperd? [09:50] TheMue, it may be we have some disconnect on how we expect state.Agent to be used? [09:50] rog: i only have the poor maintainer, new to the code, in 2 years in my eyes. asking state to give a unit, asking unit to give an agent, asking agent to start a pinger (why a pinger, i'm only interested to signal that the agent is alive, so what does a pinger has to do with it?) and then keep the pinger [09:51] TheMue: that's what a pinger *does*. [09:51] rog, that sounds right [09:51] TheMue: (i wasn't happy with the name "Pinger" (i preferred "Occupy" and "Occupied" but gustavo's choice) [09:52] ) [09:52] rog: it's a technological description how it works. but when i drive a car i'm not interested on how the motor works, i wonna drive a car from a to b [09:52] TheMue: i know that [09:52] TheMue: but that's a debate to have about Pinger, i think. [09:52] rog: my intention is to hide HOW we do something but to tell WHY we do it [09:52] TheMue, yeah, I liked Occupy too [09:53] TheMue: if StartPinger was called "Occupy", would you be happier? [09:54] rog: the pinger is a fine tool, i only have the opinion that i have to keep the too inside to provide a clean api regarding agent (and later anything else) for the user of this api [09:54] TheMue: the pinger is *the* tool for detecting and signalling agent occupation [09:54] rog, TheMue: `RegisterPresence() (*presence.Pinger, error)`? [09:54] rog: yes, this way it makes more sense [09:55] fwereade__: yeah, that would be fine for me. [09:55] fwereade__: i still wouldn't return the pinger. i would hide it. [09:55] TheMue: why hide it? [09:55] [09:54] TheMue: the pinger is *the* tool for detecting and signalling agent occupation [09:56] fwereade__: ok so I'm free now [09:56] TheMue: we've built this abstraction, why not use it as is? [09:56] rog: in this case the name isn't optimal [09:56] TheMue: otherwise perhaps we should build it slightly differently, so we *can* use it as is. [09:56] TheMue, the trouble is that it ends up making state.Agent unique among state.FOOs in that it's not something you can reconstruct safely from a fresh state with nothing but keys [09:57] bigjools, where were we? was I making sense? ;) [09:57] fwereade__: unfortunately not :) [09:57] TheMue: i don't think we should get hung up on the name. [09:57] fwereade__: that's why i wanted to embed it. btw, now the pinger (or better the AgentOccupier) is special too. [09:57] but I need to re-establish my test env, so I'll be a few mins [09:58] bigjools, heh, ok: did I ever start making sense, or was there a specific point where I started babbling crackfully? [09:58] TheMue, pinger is not just for agents [09:58] TheMue: i don't see that hiding it gains anything. [09:58] fwereade__: it's not you, more that I don't really understand what's going on inside juju when it deploys stuff [09:59] rog, I think that hiding it keeps the name out of the way, and the name exposes the implementation too much for comfort [09:59] fwereade__: that's ok, so i understood it first. that's why i wanted to encapsulate it for agent, so that the agent api is clear [10:00] bigjools, ultra-high-level sketch: [10:00] fwereade__: because it's called "Pinger" rather than "Occupier"? [10:01] bigjools, the user makes changes to an "ideal" state stored in ZK and the PA starts/kills machines in response to changes in the ideal state [10:01] rog, exactly (or some other name, whatever ;)) [10:01] i do think that "Pinger" is an unfortunate name because it implies polling, and we might use some other technique in the future. but... [10:01] bigjools, that's the steady state and it's pretty simple really (devil in details ofc) [10:01] i think that that package is exactly the right place for the thing returned from an Agent. [10:02] bigjools, the ugliness comes at bootstrap time [10:02] rog, agreed [10:02] TheMue, fwereade__: if we're writing more code just to hide a name that we've only just invented, let's just change the name! [10:03] TheMue: but you can be the one to persuade gustavo :-) [10:03] bigjools, the PA is responsible for making sure that the machines which should exist do exist; and machine 0 is just another part of the environment, we don't want to have to treat it specially [10:04] rog: simply changing the name if it's still a multipurpose tool isn't it [10:04] rog: i'm talking about encapsulation and api design [10:04] bigjools, so before we let the PA look at state, we prime the state such that it sees "machine 0 is meant to be provisioned... and, hey, it already is" [10:05] TheMue: it's a multipurpose tool that is designed for signalling presence on whatever underlying storage system we're using. that's *exactly* what the agent presence stuff is about. [10:05] bigjools, doing so involves storing the instance id and the machine id together [10:05] bigjools, hence the requirement for instance_id at bootstrap time [10:05] TheMue: so it seems perfect that it's that that's returned from Agent. [10:05] bigjools, instance id should in all other circumstances purely be an internal detail [10:05] TheMue: we're adding more layers of abstraction "just in case", but YAGNI! [10:06] rog, TheMue: strongly agree that it's not up to state.Agent to stop the pinger [10:06] bigjools, but sadly you need to deal with it at bootstrap time [10:07] fwereade__: indeed not, it's up to the user of agent (he has got his pinger from agent) to also use it to signal "hey, it's me, the agent, i'm stopping". [10:07] fwereade__: the traceback is from a "deploy" though [10:08] bigjools: huh, sorry, let me reread [10:08] TheMue: that sounds right to me [10:08] bigjools, right, sorry: the trouble is that you're still using 2 different notions of instance_id [10:08] TheMue: pinger := unit.Agent().StartPinger(); .... pinger.Kill() [10:08] bigjools, either always use system_id, or always use resource_uri [10:09] fwereade__: where am I using those? [10:09] rog: so to me an api like agentAPI.SignalWork() and agentAPI.SignalEndOfWork() sounds more natural [10:09] just launch.py [10:09] ? [10:09] TheMue: that makes the agentAPI stateful, which it doesn't need to be [10:09] bigjools, (1) MaaSMachine turns system_id into MaaSMachine.instance_id [10:09] TheMue: the state can live in the pinger. [10:09] bigjools, (2) the provider takes instance_ids in some methods [10:10] rog: the pinger IS stateful, and the pinger IS part of the state api today [10:10] bigjools, (3) you also need to set one at bootstrap time [10:10] bigjools, I think that's it [10:10] TheMue: there are two places that are stateful: the underlying zk tree, and the local pinger state. you'd be adding a third. [10:10] bigjools, (4) it'll be *stored* using a juju.state.machine.MachineState but you should be 100% insulated from that detail [10:11] rog: i don't add one, i only hide the already existing one to the user f the agent api [10:11] fwereade__: is this provider api documented anywhere? are there instructions on adding providers? [10:11] TheMue: you do add one. you store whether the pinger has already started so agentAPI.SignalWork can return an error if so. [10:11] rog: instead of keeping an instance of the pinger he kann also keen an instance of the agent, it's the same [10:12] because so far I scratched around [10:12] and clearly my understanding of things isn't right [10:12] rog: so where's the difference? [10:12] bigjools, only the stuff in juju.providers.common.base; I'm sorry if it's unhelpful :( [10:12] TheMue, but why make state.Agent special in this way? [10:13] fwereade__: not entirely unhelpful, but a bit lacking :) [10:13] TheMue: it's not needed. and there's nothing stopping you (possibly on another machine) creating another State and another Agent and (erroneously) starting another pinger on that. [10:13] TheMue, every other state.Foo is safe to grab a fresh instance of at any time [10:13] TheMue: just checking local usage isn't sufficient. [10:13] TheMue, making state.Agent special even in this one circumstance feels like a bad move [10:14] fwereade__: I'm really sorry, I am still struggling to get my head around all this ID stuff. It's massively confusing :( [10:14] bigjools, I definitely remember it was a pain for a while :( [10:14] not sure where to go from here [10:14] rog: so nothing stops you to create a new unit, get a new agent, create a second pinger. it's the same mistake, the same wrong usage. as you told yourself above. [10:15] TheMue: sure. [10:15] fwereade__: pinger is alsready special [10:15] TheMue, then why allow the specialness to affect Agent? [10:15] bigjools, ok, mechanical solution [10:16] bigjools, grep the maas provider for uses of instance_id [10:16] ok [10:16] bigjools, ensure that they all come from consistent sources -- they should all either be coming from a system_id, or from a resource_uri [10:17] bigjools, it seems most of the provider was written to expect resource_uri [10:17] fwereade__: as told above, only to keep together what belongs together. and not to tell a "start this" but later b "stop this". instead i wonna have it bundled a clean way a "start this" and later a "stop this" [10:18] TheMue, that's exactly what Pinger offers [10:19] fwereade__: no, the design after the last review shall be Agent.Start and Pinger.Stop [10:19] fwereade__: that's pain to a maintainer later who is not today invoked in the design process [10:19] TheMue, it's pinger := agent.StartPinger() and pinger.Stop(), which doesn't apear to me to be unclear [10:20] TheMue, were it agent.Start and pinger.Stop I'd agree [10:20] bigjools, does that help at all? [10:21] bigjools, it *should* just be s/system_id/resource_uri/ in maas.machine, and maas.launch [10:21] fwereade__: it's StartPinger(), indeed, i only shortened it [10:21] fwereade__: ok let me digest that [10:22] TheMue, that may be the problem, it's not actually starting the agent at all [10:22] fwereade__: but where is the problem with agent.Occupy() and agent.Release()? [10:22] TheMue, I have *no* problem with it, that was proposed [10:22] TheMue, niemeyer didn't like it [10:22] TheMue, I think he may have suspected me of political activism [10:22] TheMue, OCCUPY JUJU [10:23] fwereade__: hehe [10:23] TheMue, ah wait sorry [10:23] fwereade__: but i also proposed different names. [10:23] TheMue, I *do* have a problem with agent.occupy/release [10:24] TheMue, but I've tried to explain it -- the mismatch with the other state types -- and I think I'm not communicating it well [10:24] fwereade__: I don't understand what you mean by " ensure that they all come from consistent sources" [10:24] TheMue, or possibly you disagree? [10:24] fwereade__: who will start the pinger and what is his intention behind it? [10:24] bigjools, 1 sec [10:24] fwereade__, TheMue: for me, it's just less code and easier to get correct if you return the pinger. [10:25] which is usually swings the argument for me. [10:25] TheMue, the agent process will start it, and the agent process will know how it means to stop and if/how it should kill the pinger [10:25] TheMue, I don;t think stop/kill is a sensible or meaningful distinction on the *agent* itself [10:26] rog: and once we chage to a different method every code which isn't interested in a pinger but in a working agent functionality has to be changed? [10:26] TheMue, I'd even say that the only reaosn to put StartPinger on there is so that we can keep the knowledge of the actual agent presence path entirely internal [10:26] fwereade__: but WHY does the agent start the pinger? what's his intention? [10:26] TheMue: what do you mean by "a different method"? [10:27] TheMue, to signal presence... is that a trick question? ;) [10:27] fwereade__: ah, so a better name than Occupy() may be SignalPresence()? [10:27] TheMue, I suggested something like that above [10:27] fwereade__: exactly ;) just a rhetoric way [10:27] TheMue, RegisterPresence maybe? [10:27] [09:54] rog, TheMue: `RegisterPresence() (*presence.Pinger, error)`? [10:27] :-) [10:27] TheMue, I still feel that returning a PresenceRegisterer (eww) is the right thing [10:28] TheMue, for now we have just one and we call it a Pinger [10:28] fwereade__: yeah. [10:28] fwereade__: seems ok to me. [10:28] TheMue, it exposes an implementation detail but not gratuitously [10:28] rog: yep, only wanted to come back after Occupy() [10:29] TheMue, we could handle it as a KillerStopper interface if we had any reason to worry about implementation changes [10:29] fwereade__: it's a pity it's called Pinger, but if you half-close your eyes and imagine it's called something less gratuitously implementation-specific, i think it works. [10:29] fwereade__: i'm no friend of exposing implementation details [10:29] TheMue, but for now, an extra interface is gratuitous [10:30] fwereade__: i prefer a symetric handling where i can also tell agent to DeregisterPresence() regardless of the used underlying technology [10:30] we've designed an interface (presence) that is specifically about agent occupation. if we feel we have to hide it because it's "too implementation-specific" then we've got it wrong. [10:30] TheMue, I may have to bow out and suggest you propose a name change to niemeyer; you're in a better position to advocate for it than I am, you're an actual client [10:30] bigjools, sorry :) [10:31] rog: above you told that it's not only for agent [10:31] bigjools, I *think* there are only 2 places where instance_ids enter the system [10:31] TheMue: that was fwereade__ i think [10:32] fwereade__: I think the basic problem is that it's unclear as to what exactly each provider's API is in terms of what data goes in and out [10:32] bigjools, one of them is in MaaSMachine, where it passes d["system_id"] up to ProviderMachine.__init__ [10:32] TheMue: for now, why don't we go with the KISS approach? we're using a statically typed language - this stuff can easily be changed in the future. [10:33] so your s/system_id/resource_uri/ still doesn't make sense to me. :( [10:33] bigjools, the other one is CloudInit.set_instance_id_accessor [10:33] TheMue: (if we feel it's necessary) [10:33] TheMue: it's not like we'll have lots of clients using the Agent type. [10:33] bigjools, when the PA is looking at machines it looks them up by machine id and gets back data that includes the instance id [10:34] bigjools, to interact with the provider, it uses the instance id [10:34] bigjools, the only places that information enters the system are (1) set_instance_id_accessor and (2) MaaSMachine.__init__ [10:35] bigjools, and that information is passed back into provider methods that take instance ids (or possibly MaaSMachines, which themselves have instance_ids set) [10:35] bigjools, so whatever you set in .machine and .launch will get passed back into the provider as instance_id [10:36] bigjools, oh, hold on, I need to look more closely at the maas provider [10:36] rog: maybe you're right. but i only have my experiences in maintaining legacy code, also in static languages. and also juju once will be legacy. [10:37] fwereade__: holding [10:37] bigjools, ha: juju/providers/maas/tests/test_maas.py|64| # TODO: Add test for get_nodes with a system_id parameter. [10:37] :/ [10:38] rog: but indeed, maybe it isn't woth it, maybe Unit and Machine should use pinger directly instead of an extra Agent. the old mixin approach leaded into a wrong direction. [10:39] TheMue: that was my thought originally, which is why i suggested you put this branch on hold for a while. but still, i think it does add some useful encapsulation, so it's still useful. [10:40] bigjools, ok, I was confused (er, and may still be) [10:40] bigjools, correct me if I'm wrong [10:41] bigjools, scratch that [10:41] bigjools, look at MaaSClient [10:41] bigjools, get_nodes expects system ids [10:41] [08:37] rogpeppe, as long as we don't end up *merging* broken stuff I'm fine with that :) [10:41] bigjools, start, stop, release expect resource uris [10:41] fwereade__: in this case, the broken stuff is already merged... [10:42] bigjools, juju is passing its idea of "instance_id" in in both those cases [10:42] fwereade__: (i just had a test time out after 20 minutes - i'm sure the zk should have been initialised within that time scale) [10:43] rog, yeah, my instinct says that the Right Thing to do if not inited is context dependent [10:43] rog, agents should just throw a hissy fit [10:43] fwereade__: yeah, i think you're right. [10:43] fwereade__: hummm [10:43] fwereade__: hmm, actually maybe that means that agents shouldn't wait at all [10:44] rog, command line tools may want to time out sensibly, or just wait for ^C [10:44] fwereade__: maybe that means we should have Open and WaitOpen [10:44] rog, I think that may be the case [10:44] rog, +0.5, thinking [10:44] fwereade__: and perhaps WaitOpen could have a timeout duration argument [10:45] rog, for now I'd leave it; not hard to add once we need it [10:45] fwereade__: k [10:45] rog, re broken stuff, I think I missed the import [10:46] fwereade__: ? [10:46] rog, if it makes things *worse* we should merge it at the same time as some other branch that also fixes it [10:46] rog, if it is a step towards making a broken thing less broken it's fine [10:47] fwereade__: it *exposes* some existing brokenness. [10:47] rog, (er, "import" in the sense of "importance") [10:47] lol [10:47] fwereade__: i.e. it seems that the existing bootstrap stuff isn't actually succeeding in getting juju init to be called. [10:48] rog, in that case I would prefer that it not be merged without another branch that either hides or fixes the brokenness [10:48] fwereade__: which i wasn't testing for before [10:48] rog, Initialize doesn't work itself, so this is all somewhat moot [10:48] fwereade__: given that amazon tests weren't working until one merge ago, i don't think it's too bad to regress for a small while. [10:49] fwereade__: (i doubt anyone besides myself has actually run that test :-]) [10:49] fwereade__: why moot? [10:49] fwereade__: we're interacting with the python backend for the time being. [10:50] rog, ha true, forget I said anything [10:50] rog, I'll leave it to your judgment then :) [10:52] bigjools, how painful will it be to fix that? [10:52] fwereade__: I can't deal with this. I am tired and frustrated and about to pull all my hair out. I am going to punt, but I don't know where [10:53] fwereade__: I can see there might be a mismatch of ID types but none of them make any sense to me still [10:53] bigjools, you *should* only have to think about one at a time [10:53] fwereade__: this is one situation where I need to be in a room with someone [10:53] bigjools, if it's not instance_id, you can ignore it [10:54] bigjools, if it *is* instance_id, it needs either to *always* mean system_id, or *always* mean resource_uri [10:54] fwereade__, bigjools: maybe this is a time that G+ hangouts with extras (i.e. screen sharing) could be useful? [10:55] bigjools, I misled you by asking you to fiddle with the input data (which did need to be done) while neglecting to notice the deeper internal mismatch in MaaSClient [10:55] quite possibly, but it's almost 9pm here and I am out of brain juice :( [10:56] bigjools, let me think a mo [10:56] bigjools, is the maas api documented at all? [10:57] yes, I need to find a URL for you [10:58] fwereade__: http://people.canonical.com/~gavin/docs/lp:maas/api.html [10:58] it needs regenerating, slightly out of date [10:59] bigjools, if you get me that and point me to the tree you're working from I will take a quick look at it and see if I can either tell you what you need to do (or do a sketch of it myself if it's trivial enough) [10:59] fwereade__: I've not made any significant changes yet, still trying to work things out [10:59] bigjools, without an actual maas provider to test against it will be just a sketch though I think [11:00] fwereade__: the api you can infer from the calls it's making in maas.py. It's restful. [11:01] bigjools, cool, I will see what I can do :) [11:01] which is why we need to use a combination of resource_uri and system_id :/ [11:01] (e.g. get_nodes takes system_ids because there's no resource_uri as such) [11:01] so I can see where that needs fixing up [11:01] bigjools, hr'm, I see the problem [11:02] bigjools, it sounds like there's not much that can be done on the juju side until that's in place [11:02] bigjools, bah [11:02] indeed === TheMue_ is now known as TheMue [11:02] blast, I put some toast on 10 mins ago, just a mo [11:03] fwereade__: I need to rest, migraine coming. I'll hand this over to Gavin as he's working with me here [11:06] bigjools, sorry I wasn't more help :( [11:06] bigjools, tell him to ping me if he needs [11:18] fwereade__: i decided to add WaitOpen, since you mentioned it in your review :-) [11:48] rog, cool [11:48] rog, https://codereview.appspot.com/5832045 [11:59] rog, TheMue, that's me for the day (public holiday ) [11:59] fwereade__: LGTM [11:59] fwereade__: cool, have fun! [12:09] fwereade__: enjoy [12:49] Good mornings! [12:59] niemeyer: yo! [13:03] niemeyer: moin [13:10] TheMue: there's a review for you, BTW: https://codereview.appspot.com/5853048/ [13:10] *click* [13:12] lc [13:17] rog: lgtm, only one minor note [13:18] TheMue: hmm, i don't know about the readability - i read '2e9' as 2 seconds. but i can't do 0.2 * time.Second, unfortunately. [13:19] TheMue: but i'll change it anyway [13:20] rog: yeah, it sadly only wants ints. maybe time.Second/5 ;) [13:21] :-) [13:31] rog: We went over that before, IIRC.. I don't think we need Open and WaitOpen(..., timeout) [13:32] niemeyer: i had a discussion with william this morning [13:32] niemeyer: he thought we did... [13:32] niemeyer: so i changed it to be like that [13:32] rog: Why? [13:32] * rog goes to copy the code review comment he's making [13:32] that's what i had until this morning's discussion with william. we decided that the length of time to wait was quite consumer-dependent. in particular agents shouldn't wait at all, because they should never be started before zk is initialised. on the other hand there's not necessarily any right answer to how long to wait after bootup (i had it at 3 minutes, but william queried it, rightly i think). [13:33] niemeyer: our discussion is here (http://irclogs.ubuntu.com/2012/03/19/%23juju-dev.html) from 0940 [13:34] rog: Ok, I don't think that's the case.. [13:34] niemeyer: ok, i can easily revert. [13:34] rog: We can start agents in the same machine, or even in distributed machines, and those should wait [13:34] rog: The initialization mechanism is there precisely to avoid silly races [13:35] niemeyer: when would we ever start an agent before calling juju-init ? [13:35] rog: Between starting ZooKeeper and it having its data initialized [13:35] niemeyer: why would we do that? [13:35] rog: ZooKeeper is started by init [13:35] rog: At package installation time [13:35] niemeyer: but juju-init isn't called then, is it? [13:35] rog: Maybe not, but still, there's no point [13:36] niemeyer: i think it's trivial to assure that juju-init is *always* called before any agent is started [13:36] niemeyer: and then the timeout serves no purpose [13:36] rog: So why are you adding it? [13:36] niemeyer: for the client code. [13:37] rog: Yeah, so it is useful.. what you're saying is that there are many cases where we won't have a race [13:37] niemeyer: yes. [13:37] rog: what I'm saying is that it's pointless to consciously remove a race protection because you know that in some cases you don' t have it [13:37] rog: It's like dropping a mutex because you know that in some cases you're calling the function serially [13:38] niemeyer: we're still protected - the function will return with an error. [13:38] rog: Ok.. what's the benefit of removing the race protection? [13:38] rog: That's not protection.. that's a crash [13:38] niemeyer: it means that if we *do* fail to juju-init first, we'll fail early [13:39] rog: Sure, so let's not fail it! [13:39] niemeyer: it'll fail anyway, just 2 or 3 or 10 minutes later. [13:40] rog: I'll repeat: there's no point in preventing a mechanism that avoids a race. [13:40] rog: If juju-init doesn't run, it's a serious bug, and we'll know about it 3 minutes later. [13:40] niemeyer: ok. what's a good timeout, BTW. [13:40] ? [13:41] rog: I'd rather have a single function, that is protected by default. [13:41] rog: 3 minutes is fine for me. [13:41] niemeyer: ok. as i said, that's what i had before - i was just trying to represent fwereade__'s side of the argument... [13:41] rog: There's no point in us thinking "Oh, do I have a race here? Do I use Open or WaitOpen? How long do I use on WaitOpen?" on *every single call* of Open. [13:42] niemeyer: i had the same thought about DNSName... [13:42] rog: It's the exact opposite case. [13:43] rog: Open should generally complete in seconds under normally working circumstances. [13:43] rog: Because whoever is starting zookeeper should initialize it shortly afterwards. [13:43] rog: We're just closing that window. [13:43] niemeyer: ok. [13:48] niemeyer: PTAL [13:52] rog: Checking [14:04] rog: Review sent [14:11] niemeyer: what do you think about time.Sleep(0.2e9) vs time.Sleep(200 * time.Millisecond) ? [14:12] rog: Hmm.. good question [14:13] rog: The latter might be friendlier to newcomers. I grew used to the 0.2e9 or 2e8 notation, so I can quickly see what it means, but maybe it'd be wise to make it more readable.. do you have an opinion? [14:14] niemeyer: i also find it easy to read, but TheMue suggested less terse notation was more readable. [14:14] niemeyer: when it's used with time.Sleep i think the raw number is fine [14:15] niemeyer: with other calls, it's probably best to use explicit duration constants [14:15] rog: That'd be a strange line to draw.. [14:15] niemeyer: it's just a pity you can't use 0.2 * time.Second :-) [14:15] niemeyer: you're probably right [14:16] rog: Yeah, it's a bit unfortunate indeed.. it should work as long as the division results in an integer [14:17] niemeyer: i don't think it does, because a floating point constant can't multiply a non-ideal constant [14:18] niemeyer: (perhaps that should be allowed in fact) [14:18] rog: It really should [14:20] rog: tried time.Second/5? [14:20] TheMue: i'd prefer not to do that. [14:21] TheMue: 200 * time.Milliseconds is better [14:21] rog: especially it reads even worth. [14:21] rog: yep, definitely [14:27] niemeyer: done [14:28] rog: LGTM [14:28] niemeyer: thanks [14:29] rog: I hope you like the result as well [14:29] niemeyer: yeah, i was a little bit unhappy about adding WaitOpen earlier, but i was persuaded it was a good idea. you persuaded me back :-) [14:29] rog: Hehe :) [14:30] niemeyer: and i was thinking about putting the initialize code in open anyway; it was a coin toss. [14:31] niemeyer: BTW this has exposed a bug in the environs/ec2 code - the juju init code never gets run, it seems. so the amazon test is currently broken. [14:31] rog: Hmm [14:31] rog: I suppose that's due to the lack of running the initialization code via cloud-init? [14:31] niemeyer: i'm guessing so [14:32] rog: Cool, that should be easy to fix [14:32] niemeyer: my next branch hooks juju logging up to gocheck.C [14:32] niemeyer: then i'll get into the init code. [14:32] Ah, cool [14:36] niemeyer: after some discussion about the Agent, its usage ony in Unit and Machine and the role of presence and the Pinger i today would say there's no need for the Agent type anymore. instead Unit and Machine should use presence directly. [14:40] TheMue: Hmm, that might be a nice improvement indeed.. have you tried it out to see how it looks yet? [14:41] niemeyer: i wrote some comments after your review and discussed them with rog and fwereade__ [14:43] niemeyer: i'm still not really happy with "StartPinger()". to me it doesn't really says something about the semantics behind it. [14:43] niemeyer: it's more a description on "how" something is done (by activily pinging a node). [14:43] TheMue: The semantics is that it "starts a pinger" for the respective agent. [14:43] TheMue: This feels a lot more clear in that regard than "Connect" [14:44] niemeyer: i would prefer something like "SignalPresence()", because it signals the presence of the agent [14:44] niemeyer: yeah, Connect() has been a bad choice, indeed [14:45] * rog is a bit wistful for "Occupy" :-) [14:45] TheMue: we already have presence.StartPinger.. let's please stick to the same terminology. [14:46] niemeyer: a review for you. https://codereview.appspot.com/5841067 [14:46] niemeyer: as i say, i'm not entirely convinced it's right. [14:46] niemeyer: i'm off for lunch. [14:46] rog: Enjoy [14:49] niemeyer: IMHO it's states task to build a higher abstraction on the state model. so StartPinger() in the context of presence ("Is node X present?") ok, but from states view I woild expect a different naming. [14:50] TheMue: If I see pinger, err := unit.StartPinger(), I know what that means. [14:51] TheMue: Why introducing another term for the same concept? [14:52] niemeyer: you, yes, because you're currently invoked in the implementation. how about the poor maintainer, new to the team in two years from now? asking himself whiy he has to call "StartPinger()" to signal, that the agent is now alive? [14:56] TheMue: Yeah, maybe.. [14:57] TheMue: rog: was there any progress made on you being able to access allhands.canonical.com? Or should I nag HR ;) [14:58] TheMue: StartAlivePinger? [14:58] robbiew: Esther already talked to me. She passed it to IS. [14:58] TheMue: ack, thx [14:59] niemeyer: Yeah, this way it gets more clear. [14:59] TheMue: SetAgentAlive might be even better [15:03] niemeyer: So we wout have "AgentAlive()", "WaitAgentAlive()" and "SetAgentAlive()" directly on "Unit" and "Machine", "Set…" is returning a "Pinger". [15:03] niemeyer: Sounds like a plan. ;) [15:04] TheMue: Cool, +1 [16:05] Lunch, biab [16:13] robbiew: just tried it and it worked this time! [16:13] rog: awesome [16:14] let me know when you've submitted, so I can sign..then you can countersign and start the review ;) [16:24] robbiew: done [16:25] rog: ..and right back at 'cha ;) [16:25] thnx [17:13] hazmat: Let's continue the charm URL discussion after that call [17:13] niemeyer, sounds good [18:14] niemeyer: any hints for debugging a machine that i can't ssh to? (i suspect i'm that's a symptom of the problem) [18:14] niemeyer: my userdata looks like this: http://paste.ubuntu.com/891013/ [18:14] rog: EC2? [18:14] niemeyer: yeah [18:15] rog: ec2-get-console-output may help [18:15] niemeyer: ah, didn't know about that [18:15] rog: Note it's not synchronous.. it make take a moment for the output to be visible [18:16] rog: Are those broken lines supported by cloud-inti? [18:17] niemeyer: i suspected those - they're valid YAML, and i'm not sure how i can tell yaml.Marshal not to produce them. [18:17] niemeyer: i didn't know you could do that actually [18:18] niemeyer: so, next agent try plus some cleanup with https://codereview.appspot.com/5782053 [18:18] rog: it may well be fine.. I recall broken lines on Python too [18:24] niemeyer: hmm, the console output is useful, but i can't see any mention of cloudinit in there unfortunately: http://paste.ubuntu.com/891026/ [18:24] niemeyer: oh, cloud-init [18:25] niemeyer: none of those key fingerprints matches the one sent, BTW [18:29] TheMue: i preferred it when the code wasn't duplicated. unit.Agent().Alive() reads well to me. [18:30] rog: see discussion above with niemeyer [18:30] TheMue: yeah, i saw that, but i didn't quite realise the implications. i think that the code duplications in unnecessary. [18:30] s/duplications in/duplication is/ [18:32] TheMue: Review delivered [18:32] rog: The duplication is certainly unnecessary, but it's trivial to avoid it without introducing a new type [18:32] niemeyer: yup, notification just received [18:33] * TheMue feels reminded for the first proposal, with three helper functions ... [18:33] niemeyer: what method would you use? [18:34] niemeyer: just define a helper function? [18:34] rog: Yeah.. it's just a simple function used twice [18:34] niemeyer: LOL [18:34] niemeyer: AFAICS a new Agent type (with three methods, Alive, SetAlive and WaitAlive) would fit well here. [18:34] niemeyer: that has been the first proposal [18:35] niemeyer: then Unit and Machine both get Agent() *Agent methods [18:35] rog: It would, but TheMue seems to prefer this approach, and I have no reason to disagree with him [18:35] niemeyer: ok. less code, but fair enough. [18:36] rog: Disagree.. the amount of code in his implementation is fairly minimal.. adding a type wouldn't save much, if anything at all [18:37] TheMue: Sorry about the back and forth.. I don't know about the whole history, and either way would be fine for me. Having two 30-line functions with the exact same implementation sounds unnecessary, though [18:39] niemeyer: 16 lines eventually and three less methods. well, i know it's not much, but every little helps :-) [18:40] rog: There are at least three methods for the implementation of Agent, the declaration of Agent, and the two methods for returning the agent [18:40] Besides another file, and imports [18:41] rog: It's not less code, and even if it was, the current implementation is on the trivial side. [18:41] niemeyer: util.go :-) [18:41] rog: No. Thanks. [18:42] TheMue: It's cool to move on with your approach, just remove the dups please [18:42] niemeyer: so back to the first approach with three funcs? used then in the methods of unit and machine? [18:42] TheMue: No.. [18:43] TheMue: WaitAgentAlive is the only function that has logic duplication we just need a single function with that logic for reuse [18:43] s/duplication we/duplication. We/ [18:43] niemeyer: in an own file named agent.go or in util.go? [18:44] niemeyer: and just to remove the dup the reader later has to look where he finds this logic while the other func are implemented directly using presence. [18:44] TheMue: A file for a single function would be too much.. it's fine to put it below one of the implementations, and hook it on the next one [18:45] for those few locs [18:47] niemeyer: i'm off for the evening. will be probably be mostly out of contact tomorrow afternoon as i travel down to the london Go meetup (i'll be working on the train, but the reception is patchy) [18:47] i'll do so, just to stop the discussions. [18:48] rog: i would have liked to participate at that meeting. [18:48] robut the trip is too expensive. [18:48] eh, rog, but ... [18:48] TheMue: it's quite expensive for me to go down but i decided that i couldn't miss it [18:48] rog: now i'm giving a talk about go at the gtug in bremen in april [18:48] rog: Cool, we'll certainly talk before that, but have there no matter what [18:49] have fun there [18:49] niemeyer: hopefully it'll be interesting. [18:49] rog: I bet it will! [18:49] rog: i think so, there are some interesting people [18:49] right, i smell food! [18:50] rog: Mmmm.. that' dbe nice :) [18:56] niemeyer: one wish: presence gets a function "WaitAlive(zkConn, path, timeout) error" and that can be used in both agent methods (and anywhere else) [18:57] niemeyer: so it's a one-liner there too [18:57] TheMue: Sounds good.. let's mutate the current function in that fashion then, since that's precisely the use case we need it for [18:58] TheMue: Just push that in a different branch, please [18:59] niemeyer: so a new branch with the func and when it's submitted merge it into my agent branch and use it there? [19:00] TheMue: Right.. you can just merge from trunk once it's in [19:00] TheMue: Hopefully we can do the whole thing with a very quick turnaround [19:00] niemeyer: ok, will do so tomorrow, its late here. should be easy. [19:00] TheMue: Sounds good, thanks [19:00] TheMue: The idea of tweaking presence is a good one [19:01] niemeyer: thx [19:01] niemeyer: i'm off, bye [19:01] TheMue: Have a good evening [19:07] fwereade__: So, you've decided to drop it all for now? [19:16] fwereade_: So, you've decided to drop it all for now? [20:27] hazmat: ping [20:38] hazmat: I'm up for that conversation about charm URLs, when you're ready [20:49] niemeyer, cool [20:49] niemeyer, i'm ready just picked up ethan from day care [20:50] hazmat: Ok.. we can even go here I guess [20:50] niemeyer, sounds good [20:50] so wrt to it urls.. their of the form scheme:~user/series/name [20:50] hazmat: The charm URL needs to work for deployments.. it's already supported in the command line, but not taken into account for the actual deployment with the backend [20:50] hazmat: What's the issue you were mentioning about that? [20:53] niemeyer, my understanding is that with the merge of ~fwereade/juju/pa-start-machine-constraints that we can enforce series as a constraint [20:53] looking over deploy, it doesn't do it by itself [20:53] hazmat: Series must not be a constraint [20:53] niemeyer, its not a user constraint [20:53] hazmat: Not a user visible one, at least [20:53] its a system constraint [20:53] hazmat: Ok [20:54] niemeyer, this is going to explode the question of consistent version of juju across distros per SpamapS's email earlier on the same [20:54] hazmat: I don't know what the question is [20:54] hazmat: Or how the version of juju is involved in that? [20:54] niemeyer, different versions of juju released in different distro releases [20:55] all in the same env [20:55] hazmat: Yeah, it's an interesting problem [20:56] hazmat: Versions have to be compatible, basically [20:56] niemeyer, its a simple problem, if we didn't have to use packages [20:56] hazmat: Right [20:56] its the juju upgrade itself problem.. but with env version pinning added [20:57] niemeyer, okay.. i've looked over the charm deploy code, this should be straight forward [20:57] hazmat: Hmmm.. kind of [20:57] to add the series as system constraint from charm [20:57] hazmat: juju upgrade has its own set of problems [20:57] niemeyer, agreed, it needs some configurable policy choice to enable cluster workflows [20:58] but focusing on what we can do atm.. i'm wondering if this something important enough to try and get it done for 12.04, ie. get a spec out on it today [20:58] niemeyer, did you see SpamapS's email on the topic https://lists.ubuntu.com/archives/juju/2012-March/001337.html [20:59] ie. being able to upgrade an environment seems pretty important for prod like usage [21:01] hazmat: Definitely [21:01] hazmat: I have a personal embargo to comment on that issue ;-) [21:01] niemeyer, you mean shipping the go binaries into provider storage won't fly ;-) [21:02] seems perfectly reasonable [21:02] hazmat: I think it's reasonable too [21:02] hazmat: It's probably the real long term solution [21:02] hazmat: It's also boringly easy [21:03] hazmat: Since it consists into a single binary being uploaded/downloaded [21:03] Well, maybe two [21:03] hazmat: But that won't solve the juju-py issue [21:03] hazmat: I suppose the short term solution is to upgrade Oneiric [21:04] niemeyer, well its possibly N your uploading a binary tagged to a version, and instructing the cluster to use it by ref/version [21:04] niemeyer, well i'm debating if we should just do the same for juju py [21:04] hazmat: Where the same means..? [21:04] allow for code repo + rev or bundle w/ version [21:04] for consistent cluster deploys [21:05] i'd rather not. cause it will tie me up for the cycle [21:05] hazmat: since this is merged: https://code.launchpad.net/~fwereade/juju/apply-machine-constraints .. does that mean that https://code.launchpad.net/~fwereade/juju/pa-start-machine-constraints/+merge/86451 can land, and then constraints will be released? [21:05] but if needs tog et done.. [21:05] SpamapS, yes, but only for ec2 [21:05] hazmat: It's not so simple, since there are deps involved too [21:05] hazmat: It'd need to deal with downloading packages + deps, and uploading packages, if I see what you mean [21:05] niemeyer, i'm less concerned with the deps outside of txzk their all stable [21:06] niemeyer, yes.. for a bundle, exactly that [21:06] hazmat: thats fine.. OMG Ubuntu needs it. :) [21:06] hazmat: Right.. [21:06] hazmat: All the deps that need to :) [21:06] niemeyer, not nesc. dpkg binaries though, i was just going to grab push py eggs [21:06] hazmat: It's also not readily available locally [21:06] hazmat: Ugh.. [21:06] niemeyer, :-) [21:06] niemeyer, the dpkg is avail local [21:06] unless its source [21:07] hazmat: It's not avaliable locally [21:07] in which case use branch and revision [21:07] niemeyer, its not in the dpkg cache? [21:07] hazmat: No, not necessarily.. it's a cache after all [21:08] SpamapS, considering it was a client skew there.. not sure this would help [21:08] hazmat: this is orthogonal to that problem [21:08] agreed [21:09] hazmat: they need xlarges for the webserver, and m1.small for everything else. [21:09] webserverS I should say [21:09] SpamapS, oh.. constraints would help them yes.. [21:09] hazmat: As far as OMG-what-to-do-for-12.04 goes, I'd make sure that Oneiric and Precise match each other, and beg SpamapS for help [21:09] niemeyer, the omg refs are for the site omgubuntu site.. although it does apply to the other ;-) [21:10] hazmat: Yeah, I was consciously not referring to that part of the problem :) [21:10] jimbaker, just a heads up.. per our conversation friday, i'm implementing the status changes per the spec [21:11] hazmat, yes, i've just updated the spec per the discussion [21:11] over that spec [21:11] jimbaker, awesome [21:11] hazmat, SpamapS: Long term, having raw binaries in storage feels like the best in terms of portability and stability of environments [21:11] i certainly welcome your impl help! thanks [21:11] niemeyer, so the problem is how to get the binary for py-juju.. and then spec the rest of it [21:12] hazmat: So you mean you're also going to do that for 12.04? :-) [21:13] niemeyer, if needs to get done, i'll have to find something else to drop [21:13] niemeyer, but it also eases the take machine/upgrade for go-juju [21:14] s/take/takeover [21:14] hazmat: Kind of.. we can always do that later, and I'm not optimistic that we can do something that will be used as-is for upgrading Go [21:14] Upgrading to Go will likely be a breaking change for the environment itself [21:14] It won't break charms, though [21:15] hazmat: So we have to evaluate what's the pluses and minuses.. [21:15] huh.. i thought we where playing approved any state changes for the purpose of transparent upgrade [21:15] hazmat: If you get some ideas for how to do it in Python, I'd be happy to talk about themLet me know how you'd do it in Python.. [21:15] niemeyer, speaking of which there is a state change notice for a trivial change to upgrade flag storage on the list [21:15] I need to get some quick food now, though, as Ale is waiting.. [21:15] niemeyer, okay.. i can spec it out [21:16] niemeyer, i guess what i really need is guidance from you and SpamapS that this is critical for 12.04 [21:16] hazmat: WIll go over it.. there are a couple of entries in the TODO about answering emails [21:16] if its not then... i'll push on other things [21:16] hazmat: My vague view on it is still the same since the last milestone [21:16] if it is i can at least get a spec for discussion [21:16] hazmat: The focus is on making in *stable* [21:17] hazmat: A lot has been happening, and significant changes are still going in right now [21:17] Anyway.. biab [21:31] hazmat: Back [21:32] hazmat: The spec sounds good, though [21:32] hazmat: I expect it will be fairly uncontroversial, at least in the part that does the actual upgrade [21:33] hazmat: I don't know what you have in mind for the harvesting of dependencies, though [21:34] niemeyer, sounds good, i'll put on my list for today, first some impl and reviews [21:38] hazmat: Thanks a lot [21:42] niemeyer, np [21:53] jimbaker: [21:53] 38 » relations-error: [21:53] 39 db: [blog3, blog4] [21:53] 40 » relations-pending: [21:53] 41 db: [blog2] [21:53] jimbaker: I don't think this reflects the agreement. Please see hazmat's email. [21:54] In fact.. do we even need that spec? [21:54] It was good to bootstrap the discussion, but hazmat's email to the list looks great [21:58] rog: pin [21:58] rog: ping [21:59] niemeyer, for spec review there's also http://codereview.appspot.com/5847053/ i'm making some changes per fwereade_'s review, but their minor [22:00] jimbaker, there is no pending.. only relations, relations-error [22:03] andrewsmedina: Heya [22:03] andrewsmedina: It's pretty late for rog now [22:03] hazmat: Looking [22:03] niemeyer: ty [22:04] hazmat: Hmmm.. will wait for the next round if that's ok [22:05] niemeyer, sure [22:09] hazmat: Do we need {force: true} on the yaml state? [22:10] hazmat: for upgrade-charm? [22:10] hazmat: I thought we were just not setting the upgrade depending on the state [22:10] niemeyer, its to support the --force upgrade to allow for upgrades from any state, i wanted to distinguish it from a normal upgrade [22:11] hazmat: Just trying to understand why.. I thought the non-forcing behavior was client-side only [22:11] niemeyer, the additional behavior is only enabled by the --force flag, else normal upgrade checks apply [22:11] hazmat: Right, but aren't the checks client side? [22:12] * hazmat checks something [22:12] hazmat: In fact.. upgrade-charms --force seems to go into the void [22:12] niemeyer, there's also agent side checks [22:14] hazmat: Cool, +1 on the concept then.. but there seems to be a hole in the implementation [22:15] niemeyer, i'm all ears [22:16] https://codereview.appspot.com/5752069/diff/6001/juju/control/upgrade_charm.py#newcode56 [22:16] hazmat: ^ [22:19] niemeyer, indeed [22:19] missing a test clearly [22:21] hazmat: It might be good to have a run with it in practice too, to check that it's glued up end-to-end [22:55] hazmat, that makes sense re pending [23:13] niemeyer, re reviews, i would appreciate if you could have a look at jimbaker's relation specs all ones you reviewed have feedback incorporated, and are ready for another look. [23:13] hazmat, niemeyer, sounds good [23:15] we need to start them... now effectively, to get them implemented [23:25] hazmat, agreed