[02:41] davecheney: Proposal in the list [06:33] davecheney: mornin' [06:35] rog: howdy === philipballew__ is now known as philballew [07:20] rog, heyhey [07:22] fwereade_: mornin' guv [07:22] rog, how's it going? [07:22] fwereade_: am finally looking at relation-units-watcher BTW... [07:22] rog, cool, if you see any obvious simplifications I will be most pleased [07:23] rog, it's an awful lot better now it has watch-presence-children to depend on [07:23] fwereade_: the main simplifications would be gained, i believe (although i haven't properly verified), by applying the CL i proposed yesterday [07:25] fwereade_: i may be wrong, but all that tomb checking seems unnecessary. [07:25] rog, so how do I avoid sending changes after departs? [07:27] fwereade_: instead of keeping two maps, one with tombs and one with names, i'd keep a single map with a locally defined relationUnit type. [07:27] rog, maintaining state on the loop goroutine feels kinda hacky to me, like saying "we can't make this do the right thing so patch it up later" [07:27] fwereade_: then it would be trivial to mark that so that you don't issue any updates after a depart [07:27] fwereade_: you're already maintaining state with the tombs [07:27] rog, (state to filter the changes, I mean) [07:28] fwereade_: the unitLoop code would be much simpler, for one [07:28] fwereade_: you'd have a couple of extra lines to maintain the state, but about 30 less (total guess!) lines overall [07:30] rog, and, hmm, I don;t think I need names at all anyway, do I? [07:31] rog, oh bother, yes I do, for removes [07:31] fwereade_: and the same applies to TheMue's firewaller code. i saw your suggestion to add tombs and i thought, yes you're right, the way things are, we do. but... noooo, it's already much bigger than it needs to be this is so unnecessary if we just changed our invariants a little [07:31] oops [07:31] * rog hates pressing return when he meant to press backspace [07:32] wereade_: and the same applies to TheMue's firewaller code. i saw your suggestion to add tombs and i thought, yes you're right, the way things are, we do. but... noooo, it's already much bigger than it needs to be and unnecessary if we just changed our invariants a little [07:32] is what i meant to say [07:34] fwereade_: all the code looks very reasonable given our current rules BTW. i'm just thinking that we could make things simpler if we wanted to. [07:34] rog, I think making those guarantees makes our lives simpler overall [07:34] rog, your mileage clearly varies on this ;) [07:34] fwereade_: clearly :-) [07:35] fwereade_: how does it make our lives simpler? [07:35] fwereade_: given that it's adding lots of code (IMHO obviously) [07:35] rog, it's easier to reason about the components when we use them [07:35] rog, a Stop that doesn't really stop feels kinda weak to me [07:36] fwereade_: if you want a Stop that really stops, it's trivial to do. [07:36] fwereade_: you just stop and wait for eof on the channel. [07:36] fwereade_: think of it as flushing the pipeline [07:37] fwereade_: where the pipeline consists of components like unitLoop [07:38] rog, indeed, we can't do it when we have components like unitLoop; we have to add that deadness state [07:39] fwereade_: actually, i think we could do it without the deadness state [07:39] fwereade_: you *already* do t.Kill then t.Wait [07:39] fwereade_: which is the moral equivalent [07:40] rog, I thought you were saying the tombs were unnecesssary? [07:40] fwereade_: they are [07:40] rog, and to make that work I need the selects that you don't like [07:40] fwereade_: i'm saying that you could close the unit's stop channel, then wait for eof on its update channel. [07:41] rog, we have one updates channel for all unit loops [07:41] fwereade_: oh yeah, good point :-) [07:41] fwereade_: i knew there was a reason [07:42] rog, as I said I'm really +-0 on the change -- it feels to me like the things it makes easier are balanced by the things it makes harder [07:43] rog, either would be a reasonable way to start out but I don;t feel like a switch in midstream really gains us much [07:43] fwereade_: from my point of view, we are just starting out on the real uses of watchers. [07:43] rog, I think I'm just less bothered by all the selects than you are ;) [07:43] fwereade_: which is why i'm suggesting the change now [07:45] rog, in the end it's niemeyer's call and I don't think he's convinced... but maybe he will be by looking at relationUnitsWatcher :) [07:45] rog, I don't feel it's a win in this context but I am probably too close to the current implementation to be enitirely objective [07:46] fwereade_: i'll try and rephrase relation units watcher to try and show the advantage. i'll probably fail :-) [07:47] rog, either way one of us will learn something :) [07:47] fwereade_: i suppose it's coming from my experience with programming with channels and that working with one-way pipelines is much more elegant than two-way comms. [07:49] rog, makes sense; I personally feel that a one-shot 1-bit pathway in the second direction gives us some explicit clarity that I find helpful [07:50] rog, I may well come to change this view in time :) [07:54] fwereade_: here's how i might expect unitLoop to look, BTW: http://paste.ubuntu.com/1096214/ [07:56] * fwereade_ remains unconvinced :) [07:57] * fwereade_ is also going to pop out for some breakfast, he was up late reviewing last night [07:58] fwereade_: enjoy. i haven't broken my fast yet... [07:58] rog, cheers [08:18] moin. [09:00] rog, do you recall there was a dicsussion somewhere about journalling for workflow state transitions in the UA? [09:00] fwereade_: yeah [09:00] rog, I was thinking of doing something like that and thought I should probably catch up with the consensus [09:01] fwereade_: i made some notes, which ended up in one of the google docs [09:01] fwereade_: one mo, i'll find 'em [09:01] rog, awesome, tyvm [09:04] fwereade_: they'd been removed from that doc, but i found them anyway... http://paste.ubuntu.com/1096304/ [09:04] rog, cheers [09:05] fwereade_: i'd had a few more thoughts further along those lines; i'll see if i can find some more notes [09:06] rog, hmm, ok, this is for hook executions specifically? [09:07] fwereade_: yes [09:07] fwereade_: well, anything that takes any time really [09:07] fwereade_: so that we don't miss updates [09:07] * fwereade_ ponders [09:08] is the meeting in one hour or two hours? [09:08] * Aram is confused about TZ again. [09:08] rog, it kinda feels like it should be for state transitions rather than hook executions [09:08] Aram, I *think* it's 2h [09:08] Aram: 2 hours, i think [09:09] thanks. [09:09] * Aram heads for a snack. [09:09] fwereade_: perhaps [09:10] fwereade_: i wanted to avoid writing all state data to the log file [09:10] rog, if you mean the settings of every unit, I agree [09:11] rog, but I think in general we do need to keep track of known membership per-relation and latest settings version per-related-unit [09:13] fwereade_: my idea was that each thing that can trigger a hook corresponds to a single event from some watched thing, and each watched thing has a version. [09:13] rog, not true [09:13] rog, install/start don't [09:13] rog, config-changed does [09:14] fwereade_: install/start i think can be treated as special cases. [09:14] rog, more critically, relation unit changes are not 1:1 with hook execution [09:14] fwereade_: agreed. [09:14] fwereade_: it's 1:N, right? [09:15] rog, I think the only quibble is that we always need to do a -changed after a -joined [09:15] rog, so in practice IMO that actually means we should always precede a -changed of a hitherto unknown unit with a -joined [09:15] fwereade_: that should work ok [09:15] rog, yeah, the code seems to want it :) [09:16] fwereade_: that was the idea behind "for all outstanding intentions" [09:16] rog, ahhh, got you [09:16] fwereade_: because you might excecute a joined but not its associated changed [09:17] rog, except, hmm, consider -broken [09:17] * rog considers broken. [09:17] rog, that overrides *everything* [09:17] rog, clear the queue, just break the relation [09:17] fwereade_: remind me of the semantics of broken [09:17] rog, called when the unit itself leaves the relation [09:18] rog, doesn't have access to any useful state really [09:18] rog, similarly, actually, not sure how queue reduction fits in [09:18] fwereade_: sorry, i don't remember at all how -broken works. [09:19] rog, it's just the hook for "this specific is no longer part of the relation" [09:19] fwereade_: queue reduction is easy i think [09:19] s/specific/specific unit/ [09:19] fwereade_: assuming i understand what you mean by that [09:20] rog, stuff like "a -changed followed by a -departed" should just be a -departed [09:20] fwereade_: the idea is that we only add to intentions when we are just about to execute the hook [09:20] fwereade_: so the queue reduction can happen before the intentions make it to disk [09:21] rog, ok, cool [09:22] rog, so it's just for what in python is the HookScheduler, rather than the workflow stuff [09:22] fwereade_: the workflow stuff? [09:22] fwereade_: the stuff that works out what hooks to execute given what state changes? [09:23] rog, keeping track of the states of the unit and its relations [09:23] rog, more or less yeah [09:25] fwereade_: i'm imagining that the workflow could be a goroutine pipeline component [09:25] fwereade_: which takes in any changes and spits out intentions [09:26] rog, yeah, I'm having leanings in that direction but not quite sure how it will all fit together [09:26] fwereade_: but i'm not entirely sure how that would work, it's just an initial gut feeling [09:26] rog, it needs to store its own state in ZK for sure [09:26] fwereade_: what state does it need to store? [09:27] rog, er, its state -- charm_upgrade_error, or installed, or running, or whatever [09:27] rog, so status can see it [09:28] rog, I think state maintenance and actual hook execution are distinct problems [09:28] fwereade_: i'm wondering if it would be better that the central loop did that [09:28] fwereade_: i see hook execution and state maintenance as two sides of the same coin [09:29] rog, no question that they're intimately connected [09:29] TheMue, morning [09:29] TheMue: hiya [09:29] rog, fwereade_ : Morning [09:30] fwereade_: thing is, some of the state maintenance transitions come from the results of hook executions, i think [09:31] rog, some but not all [09:31] fwereade_: which seems to argue in favour of having the central hook-execution loop responsible for all of them, that way there's only one thing in charge of the state. [09:32] rog, I don't think that's right... hook execution is a single and relatively simple responsibility [09:33] rog, having the single hook executor goroutine responsible for keeping the separate workflows of the unit and all its relation up to date, especially when not all workflow transitions necessarily correspond to hook executions, sounds like a nightmare [09:34] rog, *some* external state changes imply workflow state transitions, *some* of which imply hook executions [09:34] fwereade_: hmm, it seems like there's a two way flow, which i hadn't appreciated before. i.e. a different sequence of hooks will be executed depending on the results of previous hooks [09:35] rog, yeah, for example if we're in an error state we won't be worrying about any other transitions that we otherwise might be expected to pay attention to [09:35] i'd been imagining: {external state changes -> workflow state transitions -> hook executions} as a one-way pipeline flow [09:36] rog, I *think* that still applies, what did I miss? [09:36] fwereade_: but perhaps an error state just implies throwing all workflow state transitions away, which would be easy [09:37] rog, I *think* it means treating that workflow as though the UA was not executing at all [09:37] fwereade_: right [09:37] rog, and then when it's active again doing a big-bang diff against the last known state when the workflow was sane [09:37] rog, and executing whatever changes come from that [09:38] fwereade_: "having the single hook executor goroutine responsible for keeping ...". that wasn't my intention [09:38] rog, ah, sorry, I guess I'm just not clear what the intentions correspond to if not hook executions [09:38] fwereade_: i'd thought {external state changes -> workflow state transitions -> (hook executions + state maintenance)} [09:41] rog, I just can't yet figure out how many places we actively need journalling, and of what sort [09:41] rog, I'm not sure the hook *executor* needs it at all but I haven't yet even convinced myself [09:42] fwereade_: the question is: if we crash half way through executing a hook, how do we know that we need to execute it again when restarting? [09:44] fwereade_: actually perhaps the pipe line could be extended: {external state changes -> workflow state transitions -> hook executions -> state maintenance} [09:48] rog, I need to refresh my memory with the python it seems :) [09:49] fwereade_: because AFAICS all state maintenance is done as a result of hook executing (and charm upgrade, presumably, which i think could be considered similar) [09:53] rog, ok, a given workflow transition is associated with N lifecycle operations which may or may not execute hooks, any of which may in general cause an error and thereby a different workflow transition and set of lifecycle operations [09:54] fwereade_: does an error ever in fact cause anything more than the cessation of lifecycle operations? [09:54] rog, it usually leads to a new state transition IIRC [09:54] workflow transition [09:55] fwereade_: i'm not sure that answers my question [09:56] rog, well, it leads to a different workflow transition than the one we expected [09:56] rog, but in general all the workflow is responsible for is telling the lifecycle to do stuff and writing new state when it has done so [09:57] rog, so the answer kinda depends on your perspective [09:57] fwereade_: that's the way the python code is done, yeah, but i'm wondering if we need that generality [09:58] fwereade_: because that generality is the thing that means that it's not a one-way pipeline [09:59] rog, I think it's reasonable to suggest that the way the workflow and lifecycle can affect one another is confusing and couldprobably be done better [09:59] fwereade_: and i'm wondering if the error state is sufficiently special that we don't mind hard-coding that [09:59] rog, we have a bunch of error states specific to what transition failed, and how we recover from them differs [10:00] rog, I'm not certain this is an essential property of the system though [10:00] fwereade_: interesting [10:00] fwereade_: what sort of recovery are we talking about here? [10:01] fwereade_: i've got an idea [10:01] rog, what hook to (maybe, depending on resolved mode) run again [10:01] * fwereade_ listens [10:02] fwereade_: when we get an error, we treat that as a break in the pipeline, so we tear down the workflow goroutines, do whatever we need to get the show back on the road, then start it up again [10:03] rog, yeah, I think so [10:03] fwereade_: that means that we've still got a nice clean one way flow, apart from termination, which is an abnormal condition, and therefore requires exceptional handling [10:04] rog, from a relation unit's POV, recovering from an error state should be exactly the same as dealing with process restart [10:04] fwereade_: interesting [10:05] rog, so we never watch anything excet resolved when we're in an error state [10:05] fwereade_: and then restart everything [10:05] rog, (these are tentative statements of opinion rather than fact) [10:05] rog, but it feels like it might be a good way to go [10:05] fwereade_: i'd taken that for granted :-) [10:07] rog, I suspect if anything the critical thing is to break up the lifecycle so that bits-that-affect-workflow are separate from bits-affected-by-workflow [10:12] evening lads [10:14] fwereade_: definitely [10:14] davecheney: hiya [10:14] davecheney, heyhey [10:15] miss anything good ? [10:15] davecheney: a wee discussion about the unit agent stuff [10:15] rog: this channel is logged somewhere right ? [10:15] anyone know the url ? [10:15] davecheney: one mo [10:16] davecheney: i just google for "ubuntu irc logs" [10:16] davecheney: http://irclogs.ubuntu.com/2012/07/17/%23juju-dev.txt [10:16] lmgtfy [10:16] noice [10:16] davecheney: sadly it's updated very slowly [10:17] probably monitoring a tonne of channels [10:17] davecheney: Heya [10:17] davecheney: in this case it's only about 12 minutes out of date [10:18] davecheney: I added http://irclogs.ubuntu.com/ to my bookmarks, so the access is easy. [10:18] fwereade_: i'm sure this won't convince you, but for the record: https://codereview.appspot.com/6399052/diff/2001/state/watcher.go [10:19] TheMue: yeah, but you still have to manually navigate to the right date and the right channel, right? :-) [10:20] rog, err and stop and stopped kinda feel like an ad-hoc tomb to me ;) [10:20] rog: To the one I want to have, yes. Multiple access paths (e.g. by channel) and search would be nice. [10:21] rog, fwereade_: Btw, thx for your reviews. [10:21] TheMue, yw, hope they're useful [10:21] fwereade_: no more than: var err error; ... if err != nil { return } is an ad hoc tomb, IMHO... [10:22] rog, fwereade_ : I'll move all watchers into the loop and maybe used a shared environ from the provisioner. [10:22] fwereade_: what do you mean by "move all watchers into the loop"? [10:22] rog, aren't they pretty much giving us all that tomb does, which just returning doesn't? [10:23] TheMue, and I'm not sure the provisioner should actually be responsible for the environ anyway [10:23] TheMue, feels to me like the agent process should get a State and an Environ, and be looking after the changes to the Environ, and provisioner/firewaller should both just be dumbly using the environ [10:24] fwereade_: +1, i think [10:24] davecheney, thoughts? ^^ [10:25] fwereade_: because we can just call SetConfig (or whatever it's called) on the environ [10:25] fwereade_: Sounds fine too [10:25] rog, that's the idea [10:25] rog, I *think* it should be safe [10:25] rog, AIUI it's meant to be ;) [10:25] fwereade_: i think it was designed tobe [10:25] fwereade_: yes, the provisioniner needs to be responsible for the environ [10:25] that was an explitic request from gustabo [10:25] gustavo [10:25] davecheney, hmm, I'm saying it shouldn't [10:26] fwereade_: i think you have a good case [10:26] davecheney, or rather, I'm saying the agent should be but the worker should not [10:26] but that was an extended and painful review process [10:26] I would be loathe to change it without guidence [10:26] davecheney, heh :) [10:26] davecheney, I know the feeling :) [10:28] fwereade_: the reason the firewaller, in my proposals, maintains an independent connection to the environ and the state [10:28] was based on a conversation at UDS-q [10:28] where it was mentioned that the firewaller service may not live with the provisioner [10:28] it only cohabits currently for convenient access to the secrets [10:28] davecheney, hmm, interesting [10:29] davecheney, not sure that's a reason for them to cohabit, is it? [10:29] davecheney, it's not like *anything* is restricted from looking at the secrets ;) [10:30] fwereade_: indeed [10:31] so with that in mind the provisioner, machiner et al, all operate independantly [10:33] davecheney, STM like the agent processes should be responsible for setting up the bits their workers need, and then firing off the workers, but I guess we could make it work either way without duplication [10:33] davecheney, it's the duplication in firewaller/provisioner that currently bugs me [10:34] davecheney, and the fact that watching the environs complicates each of their main loops [10:34] fwereade_: Yep, see your point, sounds reasonable to me. [10:35] fwereade_: i think i agree. i don't see why the environment setting needs to be in the provisioner's main loop. [10:35] *: So in general, whoever uses a worker is responsible to pass an environment. [10:35] rog, TheMue: cool, cheers [10:36] TheMue, yeah, I think so, and is also responsible for keeping the Environ up to date, because any worker ought to be able to just deal with it [10:36] fwereade_: i'm surprised your more worried about the environ than the state [10:36] fwereade_: it means, i think, that when the PA starts, it would do three things: start the environment settings agent and wait for an environ; start the provisioner and the firewaller with the environment thus obtained [10:36] as in, duplicates in process [10:36] davecheney, I'm worried about both :) [10:36] davecheney, I have comments on separate reviews whining about each of those things ;) [10:36] fwereade_: good, just checking [10:37] rog, yeah, think so === Aram2 is now known as Aram [10:56] so, who has the hangout invite ? [11:03] mramm: I sent an invite a few minutes ago already [11:03] https://plus.google.com/hangouts/_/79b690ed61262ab9e0bbaa673cd718b151275733 [11:03] ahh [11:03] sorry [11:03] you did? [11:03] I'll join that one [11:04] Aram: Yep [11:04] damn flash [11:07] meh [11:07] wtf [11:07] A connection error occurred while loading this page. Please try refreshing the page. [11:14] https://bugs.launchpad.net/juju-core/+bug/1022954 [11:18] "un-in-progressing" [11:18] fwereade's new verb [11:45] milestones++ [11:59] davecheney, were my reviews clearish? [12:00] * fwereade_ tries to remember what they were, all I can currently remember is that there were things I wasn't sure I'd said effectively [12:01] davecheney, niemeyer: ah, yeah, that was it [12:01] davecheney, niemeyer: doing authorized-keys "properly", ie responding to env-sets, feels to me like it's the MA's job [12:01] davecheney, niemeyer: sane? [12:02] fwereade_: Does sound sane to me [12:02] davecheney, niemeyer: this is something we don't handle at all in the python but feels like it's necessary for env-set authorized-keys=BLAH to be useful [12:03] fwereade_: sorry, i'm unclear what you are talking about ? [12:03] davecheney, https://codereview.appspot.com/6405046/diff/2001/environs/ec2/cloudinit.go#oldcode116 [12:03] fwereade_: Agreed [12:03] fwereade_: But it sounds like a bug we can file and postpone [12:03] fwereade_: right, i'm with you now [12:04] fwereade_: After all, it's not handled at all in Python :-) [12:04] as per my reply, it's important, but those comments should go somewhere else [12:04] niemeyer, yeah, I'm not suggesting it's *high* priority, but I think it's something without which we can't reasonably call env-set "done" [12:04] fwereade_: +1 [12:04] davecheney, sorry, I saw no replies [12:04] fwereade_: with your permission i'll copy that text into a bug for the backlog [12:04] is that ok ? [12:04] * fwereade_ goes and pokes at his email client [12:05] davecheney, that's great, thank you [12:05] davecheney, it's a point I was raising as I thought of it rather than a request for immediate fixing [12:05] fwereade_: i have a very broad view of how bug trackers work [12:05] to me, they are simply places where you put things you don't want to loose [12:05] the rest is just semantics [12:06] davecheney, yeah, makes sense [12:07] OMG, i just found a bug in LP; if you alter the field on the show bugs screen [12:07] the report bug link stops working [12:08] fwereade_: done, bug raised [12:09] fwereade_: if I can also draw your attention to http://codereview.appspot.com/6408047/#msg5, which may have been lost in the either [12:09] and http://codereview.appspot.com/6408047/#msg4 [12:09] err, ether [12:09] davecheney, was just coming to that [12:10] davecheney, I'm still -1 [12:10] davecheney, I really think that a half-initialized machine is a Bad Thing [12:10] fwereade_: I've replied to all your comments on the config branch.. can you please have a look and see what is sensible and what is not? [12:10] fwereade_: I'll step out for breakfast and will come back to act on it [12:11] davecheney, if initzk falls over half way through but everybody else keeps going we will get confused [12:11] fwereade_: Will ping you before starting in case you'd like to discuss [12:11] fwereade_: yup, cloud init is not checking the return status of any of the commands [12:11] davecheney, if we have a machine but the instance-id is not set, the PA will go ahead and prvision it [12:11] witness the hilarity of the ec2 apt mirror snafu a few weeks ago [12:11] davecheney, so I really think it should go inside Initialize [12:12] davecheney, I *also*think Initialize should not *require* it [12:12] davecheney, otherwise all our tests will break, because there will be 1 machine where there were once 0 [12:12] fwereade_: I have to call it a night, i'm alread in the doghouse [12:12] davecheney, sorry :) [12:12] could you please raise this as a bug [12:12] I still think it's a blocker on the CL [12:13] yup, assign it to me [12:13] but I'll raise the other as a bug [12:13] i'll do it tomorrow [12:13] niemeyer, looking now [12:13] davecheney, cheers, awesome [12:13] night all [12:52] fwereade_: if path != "" || keys == "" { [12:52] What happens when both path and keys are empty? [12:53] fwereade_: So, tell me.. what happens? :-) [12:55] lunch [12:56] niemeyer, doh :( [12:56] niemeyer, but shouldn't there be a "whoa, you specified both" error path? [12:57] fwereade_: I'm happy to define that when both are provided, the path takes over.. or to concatenate them [12:57] fwereade_: This is addressing the point you brought up yesterday related to set-env [12:57] niemeyer, I'm easy tbh [12:58] niemeyer, I think I have overthought that particular functionality :/ [12:58] niemeyer, my perspective is as likely to be warped as it is to be correct [12:59] fwereade_: I understand.. IMO it's fine to have a well defined behavior that makes the implementation simpler [13:00] niemeyer, SVGTM [13:00] fwereade_: I suspect concatenating them is the least-surprising behavior [13:01] niemeyer, agreed in the small, feels maybe tricky to do the Right Thing on env-set [13:01] fwereade_: Ah, yes [13:01] fwereade_: The current behavior handles that properly I think [13:48] fwereade_: Do you have a quick link into your relation-unit branches? I would like to see how you do it with tombs in the sub-goroutines. [13:49] TheMue, https://codereview.appspot.com/6405044/ [13:49] fwereade_: Thx [13:49] TheMue, given that you're using little types for the sub-goroutines, the tombs would probably be better placed on the types [13:50] TheMue, but I *think* the genral idea is sound [13:50] fwereade_: I'll place them in those types, yes. [13:52] fwereade_: Your deferred finish() is neat. [13:53] TheMue, cheers : [13:53] ) [14:07] rog: LGTM on https://codereview.appspot.com/6344113/ [14:09] niemeyer: thanks a lot [14:09] niemeyer: we could return ports sorted, but i don't really see the point, as only the testing code is concerned about that, i think. [14:09] rog: Thank you! [14:09] rog: I'd be glad to see ports sorted whenever looking at that information, personally [14:12] niemeyer: if we do that, we'll want to implement something like state.PortSlice, otherwise every provider will need to do its own sort.Interface implementation for ports. [14:12] niemeyer: or state.SortPorts [14:12] i suppose [14:12] niemeyer: i'd be ok doing that if you'd like [14:13] niemeyer: state.SortPorts, that is [14:13] rog: +1 [14:13] rog: We already have the implementation anyway [14:13] niemeyer: sure. will do. [14:13] rog: Thanks! [14:14] fwereade_: https://codereview.appspot.com/6354045/ is still deleting .lbox [14:16] niemeyer, gaah, I guess I forgot that :/ [14:19] fwereade_: Would you have a moment to talk about the logic in there? [14:19] niemeyer, ofc [14:19] fwereade_: Looking at line 509 on presence.go [14:19] niemeyer, here? [14:19] niemeyer, yep [14:19] fwereade_: It stikes me as odd that we're firing another watch before we even cared to observe what the previous watch said [14:20] fwereade_: I may just be missing the underlying logic, though [14:20] niemeyer, the underlying idea is that AliveW returns a bool and a chan that sends the other bool [14:21] niemeyer, if the bool we get out of the new AliveW is not the same as what was fired by the original watch, state has changes again since the original watch fired [14:21] fwereade_: Well, if you look a third time, it may have changed again.. we may do that ad infinitum [14:22] niemeyer, but I looked and got a watch which should be guaranteed to fire next time it changes [14:22] niemeyer, the point is that if it differs, the latest know state is no different to the last one we notified of [14:23] niemeyer, and therefore we should not send a spurious "hey dude, still alive (or dead)" event, and should just start again with the new watch [14:23] fwereade_: Sorry, I still don't get it [14:23] fwereade_: Why are we not doing that a third time, just in case? [14:23] fwereade_: (it's not a tricky question.. it would help me understand) [14:24] niemeyer, because the watch is guaranteed to fire when the value we get out changes [14:24] fwereade_: Right.. that's true for the first watch too, right/ [14:24] ? [14:24] niemeyer, the first watch has fired [14:24] niemeyer, we know something has changed [14:24] fwereade_: Exactly [14:24] niemeyer, but it's just like using ZK normally [14:25] fwereade_: We were just told something has changed.. why are we asking again? [14:25] niemeyer, because it might have changed any number of times again between our being notified that something changed and our starting a new watch [14:25] niemeyer, we need a new watch, right? [14:25] fwereade_: Yes, and it may have changed any number of times again after the second watch [14:26] niemeyer, the point of the watch is that if it has we'll immediatey get notified next time through the loop [14:26] fwereade_: Ah, I see [14:26] fwereade_: So your point is that we need this watch anyway, so we may as well do it ahead of time and verify the second result already [14:27] fwereade_: Right? [14:28] niemeyer, yes, we need the watch; and the important current value is the one that comes out when we get the watch; and *if* that value differs from the one we got out of the first watch, the state has changed an odd number of times in the interim [14:28] niemeyer, and therefore the latest state is the same as the last one we notified the client of [14:28] niemeyer, and therefore we should not notify them [14:28] fwereade_: Understood, makes sense, thanks for explaining [14:28] niemeyer, a pleasure [14:28] niemeyer, I imagine it could use an extra comment or two ;) [14:29] fwereade_: Yeah, I'll suggest that [14:43] niemeyer, I'm pretty sure I readded the lbox as it should have been but I'm getting "error: Failed to send patch set to codereview: can't upload base of .lbox: ERROR: Checksum mismatch." -- once I have your review I think I will just clone it onto a fresh branch and link it to the original for contnuity of discussion [14:44] fwereade_: I'd prefer to fix it instead, if we manage to [14:44] fwereade_: Every time we recreate a CL we lose all the context for the previous comments [14:44] fwereade_: and I can't review the delta anymore [14:45] fwereade_: There's probably a bug in lbox [14:45] fwereade_: If needed, let's just remove and readd it [14:47] niemeyer, that's what I thought I'd done [14:47] niemeyer, codereview now seems to think that there are two .lbox files, one of which was deleted and another of which was added [14:49] fwereade_: Can you uncommit the re-add? [14:49] fwereade_: Or is it not tip anymore? [14:50] niemeyer, done [14:50] fwereade_: Ok, you'll probably have to force-push now [14:50] fwereade_: Since the revision is already up [14:50] fwereade_: Try "bzr push" just to confirm [14:51] fwereade_: If that doesn't work, just make sure the branch URL is right (and not trunk!) and then do bzr push --overwrite [14:53] niemeyer, ok, yep, needed to --overwrite [15:12] fwereade_: You have a review [15:12] fwereade_: It's really just cosmetic stuff, except for the fact the pinger has changed since the last review, and it's not clear why [15:14] Lunch time.. biab [15:40] niemeyer: quick once-over before i submit? https://codereview.appspot.com/6344113 [15:41] rog: Awesome, thank you! [15:41] niemeyer: i take it that LGTY [15:42] rog: Definitely, thanks [15:43] rog, fwereade_ : the firewallers first part is at https://codereview.appspot.com/6374069 in again, will continue this way with the second part. [15:43] rog, fwereade_ : looks simpler now, and uses tombs for each machine. [15:48] TheMue: what happens to a machine units watcher when the machine it's watching is removed? [15:50] TheMue: i'm concerned that we might receive an error from that watcher before we see the machine removal, and thus cause the whole thing to fall over [15:54] rog: that's in the next branch. I think I'll add there a tomb and a finish() for all tombs too. but I'm not yet exactly shure. [15:55] TheMue: i'm concerned that something that i'd envisaged as about 6 lines of code has expanded to 50 lines. i don't think it should have to be this complex, but i don't quite see how to avoid it with our current conventions. [15:55] lc [15:58] rog: yeah, it's getting bigger again. *sigh* [15:58] rog: I would like to only have one range loop for machine and co too. [15:59] rog: but we also have to react on the firewall, on depending goroutines etc. [15:59] TheMue: i don't see a problem with doing that [16:00] rog: I'm listening [16:01] rog: btw, the code is now smaller than the one you lgtm'ed yesterday [16:02] rog: only the tomb is new, so we really now when the machine instance has ended working [16:03] TheMue: yesterday's code was 144 lines. today's is 165. how is that smaller? [16:05] TheMue: we already know when the machine instance has been removed - we can just flag it as dead and ignore it if it sends any events after we've killed it. [16:05] TheMue: that would only take about 4 lines of code. [16:05] TheMue: but i know fwereade_ has opinions about this too :-) [16:06] rog: the additional lines are two helpers, a constructor, a forgotten statement and comments for types and methods., [16:07] rog, I'm still looking :) [16:08] rog: could you show me your 4 lines? last time they contained much pseudo code and error testing had to be added [16:09] TheMue: http://paste.ubuntu.com/1096832/ [16:09] TheMue: no error checking is needed in that loop, i think. [16:10] TheMue: (it can be done by the central loop when it receives the nil change notification [16:10] ) [16:10] rog, I think I'm happier with more lines of code but clearer boundaries [16:11] fwereade_: personally, i find the logic with the myriad of tombs and watchers quite hard to follow [16:12] fwereade_: i like to keep concurrent code as minimal as possible [16:12] rog: and when the firewaller dies or stops, or a machineUnitsWatcher? [16:12] fwereade_: as it's quite easy to get wrong and hard to test fully [16:13] TheMue: if the firewaller stops, it kills the machineUnitsWatcher which gives EOF on its changes channel, which propagates through the machine loop and back to the firewaller which exits when all its sub-watchers have terminated. [16:14] TheMue: Sent comments [16:14] TheMue: if a machineUnitsWatcher dies, it gives EOF as above. [16:14] niemeyer: thx [16:15] rog: This looks sensible.. the only disadvantage is that any errors on the watcher are lost [16:15] rog: It could be fixed by adding an err field to the Change though [16:15] rog: Or someone else has to Stop() the watcher [16:16] niemeyer: you're talking about the above branch, presumable? [16:16] presumably [16:16] rog: I'm talking about the above paste [16:17] niemeyer: ah, thanks [16:17] rog: I think the logic in there works correctly, though [16:17] rog: It'd probably be wise to move forward with it and let that kind of simplification for a follow up [16:17] niemeyer: the errors can later be retrieved because the central loop can interrogate the watcher itself. [16:17] niemeyer: or, as you say, add an err to the change type [16:17] rog: It can, it has to interrogate all the watchers, and Stop them [16:18] rog: Also must be careful not to close machinesChanges [16:18] niemeyer: definitely. [16:18] TheMue, I'm starting to wonder what we get from the machine type that couldn't be got from (say) a (*Firewaller)machineLoop(*Machine, *tomb.Tomb) [16:18] niemeyer: but i don't think that's hard to ensure. [16:18] It's not hard.. it's just another approach that TheMue will have to get right [16:19] He seems to have gotten that one approach almost correctly [16:19] So I'm tempted to suggest moving forward with it, and simplifying in a follow up [16:19] niemeyer: seems ok. [16:19] TheMue, but then Firewaller has machines and machines has ports and neither of those seem to be used a great deal, and I think my issue is that I haven't closely followed the original discussion so I'm not quite sure of the plan [16:20] fwereade_: and then go that loop per machine? would be possible, but I think more complex in the end (there will be more types later). take a look at the Py code, all in one type with callbacks. [16:20] fwereade_: did you see my original sketch? [16:21] rog, not closely enough that it stuck in the mind, I'm afraid [16:21] fwereade_: http://paste.ubuntu.com/1096860/ [16:22] niemeyer: the small types at the end of the file are left from rogs paste. they grow with each new branch. the follow-up (already in) has a working unit (will rename it to unitTracker). [16:24] rog, ok, I do remember that; not fully seeing the path from here to something-like-there at the moment [16:24] TheMue: Okay [16:24] rog, but, honestly, I has an EOD sleepy on and I think I should take a rest [16:24] fwereade_: TheMue's branch is the beginning of that [16:25] TheMue: I suggest keeping the machineTracker with the current mechanism for now [16:25] fwereade_: it implements the "start machine units watcher for m; add it to machines" piece [16:25] TheMue: To avoid delaying it much further with new logic [16:26] rog, there's something I can't quite put my finger on about having ports on both machine and unit [16:27] fwereade_: i played around with a few configurations, but that one seemed to work best [16:27] niemeyer: sounds reasonable [16:28] fwereade_: because they're distinct things [16:28] fwereade_: the machine ports are a union of the unit ports on that machine [16:30] rog, fair enough [16:30] fwereade_: and we need to keep track of the machine ports so that we know which ports to close when a unit's ports change. [16:31] rog, I guess we don't have anything stopping two units on the same machine from attempting to mess with the same ports? [16:31] fwereade_: i discussed this with niemeyer [16:31] So, have to step out, Carmen calls for dinner. But will return later. [16:32] fwereade_: we thought it was reasonable if each port was considered "owned" by a given unit [16:32] fwereade_: then the open-port command should give an error if the port is owned by another unit [16:33] fwereade_: note that machine.ports in my sketch is a map from state.Port to *unit [16:33] rog, yeah, that sounds sensible... and, ah-ha; yeah, I'm sleepy :( [16:33] fwereade_: ok, happy snoozes! [16:33] rog, cheers [16:34] might be on a bit later, not sure [16:35] fwereade_: Have a good one [16:50] Doc appointment.. back in ~40mins [17:17] * rog is off for the night. have fun, see y'all tomorrow [19:48] Break time. biab [19:50] niemeyer: have fun! === Aram2 is now known as Aram [20:36] niemeyer, ping [20:43] niemeyer, so, I did a bit of archaeology and figured out what happened with those Pinger changes: long story short, they were part of the testing change that I thought I'd proposed among the other changes (and were therefore ok, because they were *changed* code but not *moved* and changed code), and it turns out they're not anywhere in the changesets in https://codereview.appspot.com/6348053/ ...so, I thought I'd proposed them, but I actually hadn't, a [20:43] nd committed them anyway, which is a pretty monumental screwup :( [20:44] niemeyer, I am very much aware that this is Not Ok, and I guess it must have happened via a bedtime `lbox propose` whose results I didn't check [20:45] niemeyer, but that's an explanation not an excuse [20:46] niemeyer, and I guess we're just lucky that, actually, it *does* seem to be more stable now and by sheer luck (or maybe judgment, who knows) I didn't screw it up really badly [20:46] niemeyer, I think [20:46] niemeyer, um, anyway, flagellation over, but if you want to take over for a bit I wouldn't feel it was unjustified [20:48] niemeyer, I guess it's good that presence.ChildrenWatcher has such a venerable history, nobody would have noticed if I'd reproposed from scratch as I was tempted to do :/ [21:25] fwereade_: thank you for raising that issue, i'll take a swipe at it today [21:26] davecheney, cool, cheers [21:27] davecheney, I suspect it will be best to start by changing Initialize and leaving bootstrap alone for a bit, given that we expect some churn in environ config [21:27] i'm concerned that this change will be delayed by the other churn in that area [21:27] which was my motivation for proposing it is addressed later [21:29] davecheney, your instincts are probably better than mine: I was only thinking about that code because we need default-series for deploy, and I think you're more deeply involved than I am [21:29] fwereade_: i wouldn't say deeply :) i'm motivated primarily by the goal of having deploying somethign [21:30] davecheney, but because I wanted the default-series to be a quick fix before going back to the UA, I have been somewhat scattered about it [21:30] err, having something to deploy [21:30] davecheney, oh, me too ;) [21:30] anyway, i'll take a look now [21:30] davecheney, but implementing deploy ended up feeling like more trouble than I had hoped :/ [21:30] btw, juju status, the output, is that actul yaml, or just formatted to look similar ? [21:31] davecheney, should be yaml [21:32] davecheney, takes --format, defaults to yaml, also allows json [21:32] fwereade_: right, that is what i thought, but I was confused by the doc strings that appeared to dictate a format [21:32] or a layout, to be more exact [21:33] davecheney, hmm, also dot, svg, png, it seems [21:33] map's gonna map [21:34] haha :) [21:35] davecheney, hmm, I just wanted to create a Hooker type :/ [21:36] davecheney, I can probably think of a better name [21:40] first bug logged against launchpad, do I get a badge or achivement award ? [21:49] fwereade_: forgive my thickness [21:49] but reading the bug you raised [21:49] i don't see how that is (directly) related to the problem at hand [21:50] which I though was the population of machine/0 into the state after we had returned from initalise [22:11] fwereade_: So what's the history there? [22:11] fwereade_: Sorry, that's not the right question [22:11] fwereade_: What do we want? :-) [22:15] niemeyer: two secs [22:16] niemeyer: http://codereview.appspot.com/6408047/#msg3, which became, https://bugs.launchpad.net/juju-core/+bug/1025656 [22:22] davecheney: Sorry, I'm a bit out of context [22:22] davecheney: I was asking about the stuff fwereade_ said earlier [22:22] ok, ignore me then :) [22:22] davecheney: But that's interesting too :-) [22:25] davecheney: What is potentially racy, more precisely? [22:27] davecheney: When you have a few minutes, I'd like to understand better what's going on thre [22:27] there [22:28] niemeyer: racy because we are setting /initalised, then doing some more fudging of the state before the command exits [22:28] davecheney: That part is totally fine [22:29] davecheney: that node handles the lack of atomicity in zk [22:29] yes, i think so too, because the only consumers of those machine entries, are started by cloud init after that process has exited [22:29] davecheney: It prevents code from trying to e.g. create other nodes before the fundamental parents even exist [22:29] davecheney: That's irrelevant, IMO [22:29] what is irrelevant [22:29] davecheney: AddMachine in State should work [22:30] davecheney: That's a standalone assumption [22:30] davecheney: Do I misunderstand? [22:30] no, we are in agreement [22:30] so, can I drop that comment block ? [22:31] davecheney: Yeah, the comment seems to assume /initialized is something it's not [22:31] niemeyer: will do [22:32] davecheney: Now, for the second part of the question: why are we adding a machine by hand like this rather than doing it the way we did in Python? [22:33] * davecheney checks the python [22:34] davecheney: I can describe, sorry.. I thought it was a conscious decision [22:34] davecheney: This isn't hard-coded [22:35] sub_parser.add_argument( [22:35] "--instance-id", required=True, [22:35] help="Provider instance id for the bootstrap node") [22:35] davecheney: Hmmm.. nevermind [22:36] davecheney: Clearly my memories are failing me [22:36] niemeyer: yup, but the current state.Initialize provides no way to pass that value in [22:36] maybe it should [22:36] davecheney: Don't worry.. I'm on crack [22:36] davecheney: i was incorrectly complaining about something else [22:36] which is the bug I thought that william was going to raise for me last night [22:36] niemeyer: np [22:37] davecheney: I'm surprised to see machine/0 hardcoded around initialize, but it *is* hardcoded in Python too [22:37] niemeyer: :) [22:37] davecheney: Which clearly means we shouldn't worry about that now [22:37] niemeyer: which was my point [22:37] davecheney: I can see why William was worried too, because in Python /initialize was guarding the machine creation [22:37] it could be better, but at the moment I think a todo in the code and bug in the issue tracker for the backlog should suffice [22:38] davecheney: But that's not it's reason of existence really.. I do recall that part :-) [22:38] niemeyer: hence my comemnt about a possible race [22:38] davecheney: The problem was that it was extremely boring to guard every single action against the lack of existence of the critical parents [22:38] but given the consumer of that piece of information in the state is not started till jujud initzk returns [22:38] Like /charms, /services, etc [22:38] yeah, that sounds dull [22:38] davecheney: So /initialized came into play [22:39] davecheney: But the machine creation is clearly outside of that need [22:39] davecheney: So +1 on dropping the comment, and +1 on getting this in [22:39] davecheney: The TODO should also not mention Initialize.. seems fine where it is [22:40] davecheney: Some day we do want that to move into Bootstrap, though.. in the not upcoming future.. :-) [22:40] i'll raise a bug for the backlog [22:42] davecheney: Thanks! [23:05] niemeyer: thanks for the review [23:05] davecheney: np [23:06] I'm also addding a comment about the stuff we just talked about