[06:19] fwereade__: around? [07:25] bigjools, heyhey [07:25] fwereade__: hello! [07:25] bigjools, how goes it? [07:26] fwereade__: desperately need your help, I am trying to fix that system_id thing [07:26] totally failing so far [07:26] do you have some time? [07:26] bigjools, yeah, I'm not sure I explained myself very clearly [07:26] bigjools, ofc [07:26] thank you, ok let me explain where I got to [07:26] bigjools, cool [07:27] firstly - I am unclear on how to test this in a QA environment since juju checks code out of Launchpad (!) [07:27] that's utterly bizarre [07:28] bigjools, yeah, that underlying bug has lead be to break every-juju-in-the-world twice now :/ [07:28] when LP is down, by any chance? [07:28] or broken trunk [07:29] bigjools, nah, not even *broken*... just a significant-enough change in trunk can be deadly [07:29] anyway, I can't work out how to get my branch's code on there to test [07:29] bigjools, you should be able to use the origin field in environments.yaml [07:30] juju-origin: lp:~fwereade/juju/set-service-constraints [07:30] bigjools, which it, well, god enough for testing [07:30] oh boy, ok :) [07:30] next question, what do you know about cloud-init? :) [07:31] bigjools, um, embarrassingly little :( [07:31] fair enough, I am trying to work out why it's crashing when I boot my node :( [07:31] bigjools, my go-to technique is "vdiff with a known-good one and see if anything jumps out at me" [07:31] oh at which level does that config go BTW? [07:32] bigjools, that's inside a given environment [07:32] ok [07:32] mornin' campers [07:32] heya rogpeppe [07:32] so at the same level as admin-secret et al? [07:32] bigjools, do you know exactly what is crashing on boot? [07:33] I don't, the logs are useless unfortunately [07:33] bigjools, is that the traceback you sent in the mail or something else? [07:33] it just says it exited with status 1 [07:33] no traceback [07:33] bigjools, sorry... what exited with status 1? [07:33] no this is cloud-init crashing now, not juju [07:33] bigjools, ah-ha [07:33] which is a pre-requisite to getting as far as juju :)_ [07:33] bigjools, quite so [07:34] I suspect I need Daviey [07:34] bigjools, would you pastebin me the cloud-init file, just in case? [07:35] fwereade__: you want the user-data it's using? [07:35] bigjools, just in case anything leaps out at me [07:35] bigjools, btw, are you using system_id as instance id throughout now? [07:36] well I changed it in launch.py but as I said, not even getting close to testing that ATM [07:36] too many other cloud-init changes have broken things I think [07:36] bigjools, if that's all you changed and it's now killing cloud-init, it sounds interesting [07:37] bigjools, ah, sorry, what else has changed? [07:37] not entirely sure tbh, the server guys have been busy! [07:37] bigjools, ha, ok:) [07:38] btw why on earth is it branching code on the master node anyway? can a tarball not be pushed through? even bzr serve on the client end would be better! [07:39] bigjools, we absolutely need a sane use-the-same-code-everywhere-in-an-env story [07:39] no kidding :) [07:43] bigjools, thinking out loud, I presume you don't know where cloud-init crashes? [07:44] fwereade__: I don't [07:44] stuff flashes up on the guest's console ... AHA [07:44] bigjools, can you look on the instance and make inferences based on what's installed so far though? [07:44] fwereade__: hiya [07:44] vt7 has a traceback [07:45] bigjools, cool [07:45] ImportError: No module named DataSourceMAAS. It's the freaking s/MaaS/MAAS/ that happened recently. [07:45] * bigjools takes it to the right channel ... :) [07:45] fwereade__: i just got that transient testing error too. i think i'll just choose a port at random. [07:46] rogpeppe, cool, sounds good [07:46] bigjools, ouch :( [07:46] bigjools, still, progress :) [07:47] fwereade__: yeah, I can attack this now, I'll be back with you later! Although having said that when I did a check-seed on the node, the user-data still had env JUJU_MACHINE_ID="0" [07:47] bigjools, that should be there -- that's what it uses to poke the "machine 0 is already provisioned" data in [07:47] bigjools, but it needs an instance id as well [07:47] oh it's not the system_id then? [07:48] where is instance_id conveyed? [07:48] bigjools, nah, sorry: we have machine ids which are basically just ints, and instance ids which are provider-dependent [07:49] bigjools, instance_id is sent in through set_instance_id_accessor and I *think* it's only used in the `juju-admin initialize` script [07:49] oh from zk [07:49] I see it now, it's set [07:49] bigjools, cool, and it's a system_id? [07:49] ok let me fix cloud-init and then I can test this [07:49] it is :) [07:50] bigjools, sweet [07:50] bigjools, it's a public holiday for me today but I'm working the first half so I'll be around for a few hours more [07:50] bigjools, just ping me if you need anything [07:50] fwereade__: ah ok thanks, very much appreciated [07:50] bigjools, a pleasure :) [08:04] rogpeppe, btw, I had a thought over the weekend: one of the big problems with the hook package is its name [08:04] rogpeppe, because hooks themselves are really only very tangentially related to what it's doing [08:04] * rogpeppe always likes a good name change [08:05] rogpeppe, I'm starting to think that the best place for this code is cmd/server [08:05] rogpeppe, but there's probably an even better place I haven't though of yet [08:05] fwereade__: does this code actually need its own package in fact? [08:06] fwereade__: couldn't it just go into the unit agent package [08:06] rogpeppe, we don't have a unit agent package: you didn't want one :p [08:06] fwereade__: lol [08:06] fwereade__: well, then in the place that has that [08:07] rogpeppe, that's in cmd/jujud and I don't think that's the right place [08:07] fwereade__: no? [08:07] rogpeppe, I have a forthcoming cmd/server which is only connected to jujud in that a process invoked by jujud will happen to run the server [08:08] rogpeppe, and it's starting to seem that the server, the tool execution context, and the tool implementations themselves should probably all go in there [08:08] fwereade__: sorry, i think i lost the implication: cmd/server is a command? [08:09] rogpeppe, it may be that we want a main package/fun in cmd/jujuc, and then to stick it in cmd/jujuc/server [08:09] rogpeppe, it's not really, no, but it is a "command server" and it "serves" cmd/Commands [08:10] rogpeppe, ...but they're purely for use by jujuc, so jujuc/server may be clearer [08:10] fwereade__: i think that for our own sanity the subdirectories under cmd should all be main packages [08:10] but cmd/jujuc/server might work [08:10] rogpeppe, it seemed that if I tried to add cmd/jujuc/server when there wasn't any code in cmd/jujuc, go just ignored it [08:10] rogpeppe, is that expected or did I do something wrong? [08:11] fwereade__: ignored it when you did what? [08:11] rogpeppe, go didn't run the tests in cmd/jujuc/server when I put the server code in there with jujuc otherwise empty [08:11] fwereade__: if it does that it, it's a bug [08:11] rogpeppe, hm, that was when running go test .../cmd/... [08:12] fwereade__: it should still work. let me check. [08:12] rogpeppe, or more likely that I did something wrong :p [08:12] rogpeppe, but I was expecting at least a "you're stupid, I'm not doing that" message [08:13] fwereade__: it works for me. [08:13] rogpeppe, then I guess I did something stupid, cmd/jujuc/server it shall be (if that makes sense to you?) [08:14] fwereade__: yeah, that makes sense. it's the server side of the jujuc commands. [08:15] rogpeppe, and so that'll have Server, Context, and a whole bunch of things like LogCommand and RelationSetCommand [08:15] fwereade__: the only hesitation i have is you might want it to depend on stuff internal to jujud. [08:16] rogpeppe, go on... such as? [08:16] fwereade__: just a hunch [08:17] fwereade__: depends how closely the callback commands interact with the stuff in the unit agent. [08:18] fwereade__: i guess anything we need can go in an interface, and that should be fine. [08:18] rogpeppe, I'm not seeing it yet; I think that when I figure out precisely what is responsible for the socket things may be rearranged slightly [08:18] rogpeppe, but the only things I expect them to hit are log and state [08:19] rogpeppe, I'm not sure quite how the state will get there yet but I think that's a future consideration [08:19] fwereade__: if that's the case, it's a nice clean separation and +1 [08:19] rogpeppe, it's my intent anyway :) [08:19] fwereade__: well the server stuff is invoked by jujud, right? [08:19] fwereade__: (well, the unit agent within jujud) [08:19] rogpeppe, it will be but I don't yet know exactly how [08:19] fwereade__: ok [08:20] rogpeppe, and my own hunch says that we will start to want a unit agent package around the time it all starts to get hooked up [08:20] rogpeppe, incidentally, one nice effect of go I'm coming to appreciate: [08:21] fwereade__: maybe. i'm still thinking the agents are small enough they can live inside jujud. but we'll see. [08:21] rogpeppe, no-unused-imports means that just by opening a file and seeing a bunch of unrelated imports you detect a smell [08:21] fwereade__: yeah [08:21] rogpeppe, the unit agent is I think big enough that it'll feel wrong [08:22] rogpeppe, all the lifecycle and workflow and scheduler stuff basically [08:22] fwereade__: yeah, maybe you're right. [08:22] rogpeppe, the MA and the PA are probably compact enough they wouldn't feel bad really [08:22] * rogpeppe goes to see how many lines of code the python version is [08:23] rogpeppe, I could very well be wrong -- ATM the code run directly by the UA is smeared across juju.hook and juju.unit (in addition to all the state stuff etc) [08:23] rogpeppe, but perhaps it isn't actually *big* enough to warrant its own package and I'm just responding to the unclear factoring [08:24] fwereade__: yeah, that's quite a lot of code actually. [08:24] fwereade__: i'm wondering that with server and jujuc factored out the actual core unit agent code might be reasonably compact. [08:25] fwereade__: i.e. the core lifecycle, workflow and scheduler stuff. [08:25] * fwereade__ is cautiously optimistic [08:25] fwereade__: it *feels* compact in my head, but that's probably because i'm not familiar with it :-) [08:25] rogpeppe, it's fiddlier than it looks [08:26] rogpeppe, as I discovered when I thought "yeah, I'll pick up agent upstartification, how hard can it be?" [08:26] fwereade__: yeah. it's probably the fiddliest bit of the whole system, right? [08:26] rogpeppe, yeah, I think so [08:27] fwereade__: but i guess it's that bit which is really what makes juju juju. [08:27] rogpeppe, but *even then* I think it's that the unit agent itself is intrinsically fiddly, and so a jujud/unit subpackage might be just the ticket [08:28] rogpeppe, yeah, it's all about the agents :) [08:28] fwereade__: i was thinking its all about mapping juju state transitions to shell scripts... [08:29] rogpeppe, there are indeed many valid perspectives :) [08:35] fwereade__: a review for you, if you choose to accept it: https://codereview.appspot.com/5853048/ [08:35] :) [08:35] rogpeppe, I have a few from the other day [08:36] fwereade__: unfortunately it breaks the environs/ec2 amazon tests. but i think fixing that is for another review. [08:36] rogpeppe, I hope you like how hook/context turned out after discussing with niemeyer for a while [08:36] fwereade__: oh yeah, from friday. i'll have a look - i've been pointedly avoiding looking at my email this morning... [08:37] fwereade__: oh, i did see that you'd made some changes that i wasn't expecting [08:37] rogpeppe, as long as we don't end up *merging* broken stuff I'm fine with that :) [08:37] fwereade__: ExecInfo went away - i'm happy to see it, but i didn't see any discussion about it. [08:37] rogpeppe, the crucial insight is that this really is only very slightly related to hooks in the first place [08:37] fwereade__: was that your G+ conversation with gustavo? [08:37] rogpeppe, but it took me a while longer to think "maybe this shouldn't be in the "hook" package at all [08:38] rogpeppe, that was what crystallised it, yeah [08:38] fwereade__: cool. i was like "i thought i didn't manage to convince you, but you've gone and done it anyway... how did *that* happen?!" [08:38] rogpeppe, and it now makes me think that Context.ExecHook is what we'll need in the end but until it has a client I'm comfortable as it is [08:39] fwereade__: yeah, i'm happy how it looks now. [08:39] rogpeppe, the leap was too great for me to see while I was still thinking it was about hooks [08:40] rogpeppe, once you forget about hooks the rightness of your approach is clear [08:40] fwereade__: i still quite liked Exec and vars being methods on Context. [08:40] rogpeppe, if you're OK with that I'll gladly put them back on [08:40] fwereade__: yeah, i'm very happy with that. [08:41] fwereade__: they're tied closely enough to Context that i think they work well as methods on it. [08:42] fwereade__: and it's trivial to factor them out later if we want. [08:42] s/want/need/ [08:44] rogpeppe, I'm thinking that if I do that I will move them into cmd/jujuc/server as well, may as well start as I mean to go on [08:45] rogpeppe, at which point I think the methods actually become ExecHook and hookVars [08:46] fwereade__: doesn't Context move into jujuc/server too? [08:46] rogpeppe, yes, exactly [08:47] fwereade__: so they can still be Context.Exec and Context.vars if you like [08:47] rogpeppe, I'm not sure, I think they become an "alien" concept once it's under jujuc [08:48] fwereade__: hmm, i dunno. if they were appropriate as methods on Context before, i don't really see why that's changed when Context has moved. [08:49] rogpeppe, sorry: they're still context methods, but they should change their names to make it clear that they're about hooks (not the jujuc tools themselves, which will only be called as side effects if you like) [08:50] fwereade__: ok, that makes sense. [08:50] rogpeppe, cool [08:50] fwereade__: one thought: maybe "RunHook" rather than "ExecHook" [08:50] rogpeppe, perfect [08:57] fwereade__: http://pastebin.ubuntu.com/890372/ [08:57] fun! [09:04] bigjools, we're making progress though [09:04] slow! [09:04] bigjools, I think that just means that the resource-uri/system-id confusion ran deeper [09:05] bigjools, would a resource-uri be unique and immutable in the same way as system-id is? [09:06] bigjools, if so it is probably a more convenient representation and would allow you to forget about system-id entirely? [09:07] yes, resource_uri is just a URL with the system_id in there somewhere [09:07] bigjools, ok: that makes it sound like you can drop the notion of system-id entirely and just use resource-uri as instance_id throughout [09:08] bigjools, sorry poor advice before [09:08] I am seriously confused [09:08] bigjools, sorry, let me step back a mo [09:09] bigjools, a juju machine id is really entirely abstract -- it's a predictable way for us to refer to specific machines internally, regardless of whether or not they're actually provisioned [09:09] bigjools, so it's basically just an int [09:10] bigjools, we maintain a mapping between machine ids and provider-specific instance ids (I forget exactly how it's stored) [09:10] bigjools, and the provisioning agent keeps an eye on that mapping [09:11] ok so far [09:11] bigjools, and provisions new instances in response to seeing machine states which *aren't* yet associated to an instance [09:11] bigjools, once it's provisioned an instance for a juju machine, it sticks it in the mapping [09:11] ok [09:12] bigjools, I am not aware of any restrictions on the format of instance-id -- I don't think we ever try to parse them [09:12] bigjools, so the only relevant property of instance-id is that it affords a convenient way to talk to the provider about a specific instance [09:13] bigjools, system-id was that (or near enough) in the orchestra provider, which is why I suggested that it should be the case here [09:13] oh hmmm [09:14] not sure the checkout worked ok from cloud-init [09:14] bigjools, if you have enough information to construct a resource-uri given (1) a system-id and (2) the maas provider details [09:14] bzr: ERROR: A control directory already exists: "file:///usr/lib/juju/juju/". [09:15] bigjools, huh, not seen that, maybe it's just reacting to droppings from a previous attempt? [09:15] yes [09:15] I neglected to wipe properly [09:16] bigjools, anyway , if you *can* construct the uri given system-id then it might make sense to keep system-id, but I don't have a firm handle on whether or not that's actually a good idea [09:16] fwereade__: well this is how it was originally, right? [09:16] I was setting the machine_id as the resource_uri [09:17] bigjools, yeah but if it's not the best fit for the problem it should change [09:17] still confused tbh since I don't know what's going on in the depths [09:17] bigjools, the problem is that the MaaSMachine thinks system_id is the instance id, while other parts of the code think that resource_uri is [09:17] what is it doing with the machine_id later? [09:18] e_toomanyids [09:18] bigjools, machine id is I think a red herring here [09:18] haha [09:18] so what is: cloud_init.set_instance_id_accessor() doing? I thought it set machine_id? [09:19] bigjools, nope: instance_id [09:19] so its name has a clue :) [09:19] bigjools, the clue's in the name :p [09:19] when instance_id is looked up later, how is it used? [09:20] bigjools, give me a mo, double-checking [09:21] bigjools, it's only actually used by the provisioning agent AFAICT [09:22] bigjools, the only reason it intrudes on your consciousness at all is because we need to fake up initial state on bootstrap, to say "machine id 0 is already provisioned on instance id WHATEVER", and prevent the PA from trying to provision itself [09:24] bigjools, that is done by `juju-admin initialize` -- grep for that and you should see how set_instance_id_accessor is relevant [09:26] fwereade__: sorry, total PC lockup :/ [09:26] bigjools, np [09:26] I have a call in 4 minutes [09:26] bigjools, give me a mo, double-checking [09:26] bigjools, it's only actually used by the provisioning agent AFAICT [09:26] bigjools, the only reason it intrudes on your consciousness at all is because we need to fake up initial state on bootstrap, to say "machine id 0 is already provisioned on instance id WHATEVER", and prevent the PA from trying to provision itself [09:26] <-- bigjools has quit (Read error: Connection reset by peer) [09:26] bigjools, that is done by `juju-admin initialize` -- grep for that and you should see how set_instance_id_accessor is relevant [09:27] bigjools, I should still be around afterwards unless it's *really* long [09:27] 20 mins [09:27] bigjools, just grab me when you're free then :) [09:27] ok thanks [09:28] rog, thinking about your review [09:28] fwereade__: cool, thanks [09:29] rog, there are quite a lot of tests that start by Initializing a State [09:29] rog, and the required data for initialization will become more complicated [09:30] rog, so we will at some stage want a testing.InitializeState(addrs string) function, but maybe it's not justified yet [09:30] rog, OTOH when we do need it, if it already exists, it'll be just one place to change [09:30] fwereade__: i think i'd leave that until we need it [09:30] fwereade__: it's trivial to find occurrences and to add [09:30] fwereade__: in fact, won't Initialize need to take an addrs method? [09:31] s/method/argument/ [09:31] rog, it already does (implicitly, in the Info), I think [09:31] fwereade__: ah, so what would testing.InitializeState give us? [09:31] rog, but the eventual reuiqred args to Initialize will be more complicated than to Open [09:32] fwereade__: ok. what other stuff will it have? [09:32] rog, at the very least we need the instance id, to set up the state I've been talking to bigjools about [09:32] rog, and I'm 99% sure that we'll end up passing in the environment settings too, imminently [09:32] rog, like must-be-done-for-12.04-imminently [09:33] rog, maybe that's not too much to duplicate [09:33] rog, after all, dummy provider env settings are going to be basically empty [09:34] Initialize is only called in three places AFAICS. when the duplication becomes a burden we can factor it out. [09:34] fwereade__: for now, let's not add stuff that we don't need. [09:34] fwereade__: just one question after reading niemeyers comment to my last proposal: when i've got two pingers pinging the same node and i say one to kill its work, the second one will recreate the node, doesn't it? [09:34] rog, so it will probably be `(info *Info, instanceId, providerType string)` [09:35] TheMue, it should do, but 2 pingers on the same node is Doing It Wrong [09:35] TheMue: there should never be two pingers pinging the same node :-) [09:35] TheMue, what are you trying to accomplish? [09:37] rog, (yes indeed, it's not called for, ty for discussing :)) [09:40] rog, why a 3 minute timeout? [09:40] fwereade__: niemeyer found a problem with retrieving an instance of Agent() twice [09:41] TheMue, go on [09:41] fwereade__: you get two diffent instances then today [09:41] fwereade__: which is, when keeping a pinger inside, indeed isn't good [09:42] TheMue, yeah, makes sense; I thought you were taking the pinger out anyway? [09:42] fwereade__: on the other hand he suggested an api change to return a pinger with agent.StartPinger() [09:42] fwereade__: because it takes about 2 minutes to boot, and 3 minutes seemed long enough for the zk node to be inited after boot (maybe it's not and that's why my test is failing). the test harness fails after 6 minutes. [09:42] fwereade__: but here the problem stays the same [09:43] TheMue: there's no problem if the agent doesn't cache the pinger [09:43] TheMue: i think [09:43] rog: it's exact the same problem [09:43] TheMue: what's the problem? [09:43] rog: in both cases it's an illegal usage of the api [09:44] rog, I *think* that we have 2 interesting cases: on the instance, if any code is running before initialize is complete we Have A Problem [09:44] rog: if i create two agent instances or two pinger instance, both is wrong [09:44] TheMue: i don't think you can stop that. it's a distributed system. [09:44] rog, and if we're connecting from outside I think we want to wait forever and let the user interrupt us [09:44] rog: Pinger has the method Kill() [09:44] TheMue, why would you ever create 2 pingers for the same node anyway? [09:45] TheMue: that's fine. that's to kill that particular pinger [09:45] fwereade__: ask niemeyer why one would create two agent for the same unit anyway [09:46] fwereade__: ok. i added the timeout as an afterthought because my test was timing out after 6 minutes. but maybe that was correct, and i should just up the test harness timeout time. [09:46] TheMue, why is agent different to any other state class? you can have N state.Units referring to the same ZK state and that shouldn't be a problem [09:46] fwereade__: i only say that, if the one way is an error, that error won't move away by returning the pinger [09:46] fwereade__: so it should be with pinger too [09:46] rog, I'm not *sure* that my analysis is correct, give it a bit of a mental kicking [09:46] TheMue, it's always possible to write code that does the wrong thing [09:47] TheMue: returning the pinger seems good to me. it means that the Agent doesn't need to keep track of that state - it's less code and no less correct IMHO [09:47] TheMue, in practice the unit agent process will call StartPinger once and only once, and that's it [09:47] TheMue, and the agent process itself will decide when it needs a Stop/Kill [09:47] fwereade__: how long does the bootstrap node take to come up and be usable, usually? [09:47] fwereade__: so why return the pinger? [09:48] rog, I've never actually measured it [09:48] [09:47] TheMue: returning the pinger seems good to me. it means that the Agent doesn't need to keep track of that state - it's less code and no less correct IMHO [09:48] fwereade__: maybe i'll take the timeout out again. [09:49] rog, it may be there's some case I missed [09:49] TheMue, what rog said :) [09:50] fwereade__: no, i think you're right. i guess i thought that three minutes waiting after zk connect *should* be fine. surely we don't take that long to start up the juju init command after starting zookeeperd? [09:50] TheMue, it may be we have some disconnect on how we expect state.Agent to be used? [09:50] rog: i only have the poor maintainer, new to the code, in 2 years in my eyes. asking state to give a unit, asking unit to give an agent, asking agent to start a pinger (why a pinger, i'm only interested to signal that the agent is alive, so what does a pinger has to do with it?) and then keep the pinger [09:51] TheMue: that's what a pinger *does*. [09:51] rog, that sounds right [09:51] TheMue: (i wasn't happy with the name "Pinger" (i preferred "Occupy" and "Occupied" but gustavo's choice) [09:52] ) [09:52] rog: it's a technological description how it works. but when i drive a car i'm not interested on how the motor works, i wonna drive a car from a to b [09:52] TheMue: i know that [09:52] TheMue: but that's a debate to have about Pinger, i think. [09:52] rog: my intention is to hide HOW we do something but to tell WHY we do it [09:52] TheMue, yeah, I liked Occupy too [09:53] TheMue: if StartPinger was called "Occupy", would you be happier? [09:54] rog: the pinger is a fine tool, i only have the opinion that i have to keep the too inside to provide a clean api regarding agent (and later anything else) for the user of this api [09:54] TheMue: the pinger is *the* tool for detecting and signalling agent occupation [09:54] rog, TheMue: `RegisterPresence() (*presence.Pinger, error)`? [09:54] rog: yes, this way it makes more sense [09:55] fwereade__: yeah, that would be fine for me. [09:55] fwereade__: i still wouldn't return the pinger. i would hide it. [09:55] TheMue: why hide it? [09:55] [09:54] TheMue: the pinger is *the* tool for detecting and signalling agent occupation [09:56] fwereade__: ok so I'm free now [09:56] TheMue: we've built this abstraction, why not use it as is? [09:56] rog: in this case the name isn't optimal [09:56] TheMue: otherwise perhaps we should build it slightly differently, so we *can* use it as is. [09:56] TheMue, the trouble is that it ends up making state.Agent unique among state.FOOs in that it's not something you can reconstruct safely from a fresh state with nothing but keys [09:57] bigjools, where were we? was I making sense? ;) [09:57] fwereade__: unfortunately not :) [09:57] TheMue: i don't think we should get hung up on the name. [09:57] fwereade__: that's why i wanted to embed it. btw, now the pinger (or better the AgentOccupier) is special too. [09:57] but I need to re-establish my test env, so I'll be a few mins [09:58] bigjools, heh, ok: did I ever start making sense, or was there a specific point where I started babbling crackfully? [09:58] TheMue, pinger is not just for agents [09:58] TheMue: i don't see that hiding it gains anything. [09:58] fwereade__: it's not you, more that I don't really understand what's going on inside juju when it deploys stuff [09:59] rog, I think that hiding it keeps the name out of the way, and the name exposes the implementation too much for comfort [09:59] fwereade__: that's ok, so i understood it first. that's why i wanted to encapsulate it for agent, so that the agent api is clear [10:00] bigjools, ultra-high-level sketch: [10:00] fwereade__: because it's called "Pinger" rather than "Occupier"? [10:01] bigjools, the user makes changes to an "ideal" state stored in ZK and the PA starts/kills machines in response to changes in the ideal state [10:01] rog, exactly (or some other name, whatever ;)) [10:01] i do think that "Pinger" is an unfortunate name because it implies polling, and we might use some other technique in the future. but... [10:01] bigjools, that's the steady state and it's pretty simple really (devil in details ofc) [10:01] i think that that package is exactly the right place for the thing returned from an Agent. [10:02] bigjools, the ugliness comes at bootstrap time [10:02] rog, agreed [10:02] TheMue, fwereade__: if we're writing more code just to hide a name that we've only just invented, let's just change the name! [10:03] TheMue: but you can be the one to persuade gustavo :-) [10:03] bigjools, the PA is responsible for making sure that the machines which should exist do exist; and machine 0 is just another part of the environment, we don't want to have to treat it specially [10:04] rog: simply changing the name if it's still a multipurpose tool isn't it [10:04] rog: i'm talking about encapsulation and api design [10:04] bigjools, so before we let the PA look at state, we prime the state such that it sees "machine 0 is meant to be provisioned... and, hey, it already is" [10:05] TheMue: it's a multipurpose tool that is designed for signalling presence on whatever underlying storage system we're using. that's *exactly* what the agent presence stuff is about. [10:05] bigjools, doing so involves storing the instance id and the machine id together [10:05] bigjools, hence the requirement for instance_id at bootstrap time [10:05] TheMue: so it seems perfect that it's that that's returned from Agent. [10:05] bigjools, instance id should in all other circumstances purely be an internal detail [10:05] TheMue: we're adding more layers of abstraction "just in case", but YAGNI! [10:06] rog, TheMue: strongly agree that it's not up to state.Agent to stop the pinger [10:06] bigjools, but sadly you need to deal with it at bootstrap time [10:07] fwereade__: indeed not, it's up to the user of agent (he has got his pinger from agent) to also use it to signal "hey, it's me, the agent, i'm stopping". [10:07] fwereade__: the traceback is from a "deploy" though [10:08] bigjools: huh, sorry, let me reread [10:08] TheMue: that sounds right to me [10:08] bigjools, right, sorry: the trouble is that you're still using 2 different notions of instance_id [10:08] TheMue: pinger := unit.Agent().StartPinger(); .... pinger.Kill() [10:08] bigjools, either always use system_id, or always use resource_uri [10:09] fwereade__: where am I using those? [10:09] rog: so to me an api like agentAPI.SignalWork() and agentAPI.SignalEndOfWork() sounds more natural [10:09] just launch.py [10:09] ? [10:09] TheMue: that makes the agentAPI stateful, which it doesn't need to be [10:09] bigjools, (1) MaaSMachine turns system_id into MaaSMachine.instance_id [10:09] TheMue: the state can live in the pinger. [10:09] bigjools, (2) the provider takes instance_ids in some methods [10:10] rog: the pinger IS stateful, and the pinger IS part of the state api today [10:10] bigjools, (3) you also need to set one at bootstrap time [10:10] bigjools, I think that's it [10:10] TheMue: there are two places that are stateful: the underlying zk tree, and the local pinger state. you'd be adding a third. [10:10]