davecheney | niemeyer: scp & scp reproposed, some nice reuse in there | 00:14 |
---|---|---|
niemeyer | davecheney: Sweet, will have a quick look | 00:49 |
niemeyer | davecheney: http://play.golang.org/ | 00:52 |
niemeyer | Erm | 00:52 |
niemeyer | davecheney: http://play.golang.org/p/cPscJ6RuoX | 00:52 |
davecheney | niemeyer: yes, i had to check that as well | 00:53 |
niemeyer | davecheney: Sorry, I don't get it? | 00:53 |
niemeyer | davecheney: "This will panic if len(c.Args) == 1. I've redone the logic to be less crackful." | 00:53 |
davecheney | conshttp://play.golang.org/p/okbxoy-UI2 | 00:53 |
niemeyer | davecheney: c.Target, c.Args = c.Args[0], c.Args[1:] | 00:53 |
davecheney | niemeyer: http://play.golang.org/p/TObeRIa8wL | 00:53 |
davecheney | yes, you are right | 00:54 |
davecheney | but I dont understand why http://play.golang.org/p/Fd_jNi6mVe | 00:54 |
niemeyer | davecheney: Because that's out of bounds | 00:57 |
niemeyer | davecheney: s[1:] works when len(s) == 1 for the same reason that s[:1] works. | 00:58 |
davecheney | i dont' think that is correct | 00:59 |
davecheney | but this isn't the right place to argue about it | 01:00 |
davecheney | i'll fix my code | 01:00 |
davecheney | ok, someone explained it too me in the channel | 01:02 |
niemeyer | davecheney: Stepping out for the night.. have a good day! | 03:27 |
davecheney | enjoy | 03:54 |
rogpeppe | davecheney, fwereade_: mornin' | 06:05 |
davecheney | rogpeppe: howdy | 06:05 |
fwereade_ | davecheney, rogpeppe: heyhey | 06:06 |
rogpeppe | fwereade_: it'd be great if you could have a glance at the uniter upgrade branch, if you have a moment sometime this morning: https://codereview.appspot.com/6561063/ | 06:06 |
fwereade_ | rogpeppe, sure | 06:07 |
fwereade_ | rogpeppe, (still thinking) | 06:14 |
rogpeppe | fwereade_: np | 06:15 |
fwereade_ | rogpeppe, ISTM that it would be simpler (not to mention less conflicty, and mildly kinder to the network) to get the Unit only when you actually need to run an upgrade | 06:18 |
fwereade_ | rogpeppe, doing that removes the need to change runOnce and Uniter | 06:19 |
fwereade_ | rogpeppe, is there a deeper motivation in play than "I need a unit"? | 06:19 |
* rogpeppe has a look to remind himself | 06:20 | |
rogpeppe | fwereade_: we need to unit immediately | 06:20 |
rogpeppe | fwereade_: because we use it to announce the current agent version | 06:20 |
rogpeppe | fwereade_: but that's not the reason we change runOnce and Uniter | 06:21 |
rogpeppe | fwereade_: the reason for that is that we want to upgrader to be as independent as possible of uniter bugs | 06:21 |
rogpeppe | fwereade_: so we do the absolute minimum necessary before starting the upgrader | 06:22 |
rogpeppe | fwereade_: hence it's important that the uniter factory method doesn't return an error - early errors should not take down the upgrader. | 06:22 |
rogpeppe | s/need to unit/need the unit/ | 06:23 |
rogpeppe | fwereade_: does that make sense? | 06:23 |
fwereade_ | rogpeppe, hmm, apart from the ones which should, but I see it's tricky | 06:23 |
rogpeppe | fwereade_: which errors *should* take down the upgrader? | 06:24 |
rogpeppe | fwereade: last thing i was was [07:23:38] <fwereade_> rogpeppe, hmm, apart from the ones which should, but I see it's tricky | 06:25 |
rogpeppe | s/was/saw/ | 06:26 |
fwereade | rogpeppe, cheers | 06:26 |
fwereade | rogpeppe, I'm not quite convinced that deferring the error helps you much though, 1 mo | 06:26 |
fwereade | rogpeppe, surely an early error return from the Uniter will hose the upgrader just as badly, because runTasks will terminate it | 06:27 |
rogpeppe | fwereade: no, because we've got some special case logic in the upgrader for just such an eventuality | 06:28 |
rogpeppe | fwereade: if it's killed early on, it waits until it has at least had a squizz at the proposed version | 06:28 |
rogpeppe | fwereade: and if that's changed, it doesn't exit until it has actually downloaded the upgrade | 06:28 |
rogpeppe | fwereade: (well, with some timeout too) | 06:29 |
rogpeppe | fwereade: it's given 5 minutes | 06:29 |
* fwereade continues to think | 06:30 | |
fwereade | rogpeppe, something about all that does make me a little queasy... I feel that if I ask an upgrader to stop, it should jolly well stop, by criminy | 06:37 |
rogpeppe | fwereade: this was discussed at the time | 06:37 |
rogpeppe | fwereade: i think the upgrader is special | 06:37 |
rogpeppe | fwereade: because it's the only way we can escape bad s/w | 06:37 |
fwereade | rogpeppe, I agree, but IMO that means it should have control of the tasks rather than being antisocial :) | 06:38 |
rogpeppe | fwereade: it *will* stop... when it's made good and sure that someone is not trying to upgrade us | 06:38 |
rogpeppe | fwereade: well, as you know, that's how i started and that was deemed incorrect | 06:39 |
rogpeppe | fwereade: so this is what we're doing. and it doesn't seem bad to me. | 06:39 |
fwereade | rogpeppe, I do feel your pain there... but I think I need to talk to niemeyer about this | 06:40 |
rogpeppe | fwereade: ok | 06:40 |
fwereade | rogpeppe, sorry :( | 06:40 |
rogpeppe | fwereade: if we change this now, BTW, it knocks everything off | 06:41 |
rogpeppe | fwereade: because everything is built in this way currently. | 06:41 |
rogpeppe | fwereade: the upgrader is "just" another task | 06:41 |
rogpeppe | fwereade: so, given that we're not swimming in free time, and this architecture will work for the time being, perhaps we could move forward as we are and consider a change later? | 06:42 |
fwereade | rogpeppe, the collision with my own uniter changes is not entirely trivial | 06:43 |
fwereade | rogpeppe, and there is a value I return from NewUniter that really should block upgrades | 06:43 |
rogpeppe | fwereade: which is? | 06:44 |
fwereade | ErrUnitDead | 06:44 |
rogpeppe | fwereade: i don't think it matters too much tbh. it's an edge case - it doesn't matter much if we do upgrade in that case, it's just an extra 5s delay | 06:45 |
TheMue | morning | 06:46 |
rogpeppe | fwereade: FWIW i think i'd probably structure it in a similar way even if the upgrader was in control - we want the upgrader to be independent right up until the moment it decides that now is the time to upgrade. | 06:47 |
rogpeppe | TheMue: hiya | 06:47 |
TheMue | rogpeppe: hiya | 06:48 |
rogpeppe | fwereade: if you like, i'll merge your most recent uniter branch and include it as a prerequisite | 06:48 |
fwereade | rogpeppe, surely the right thing to do there is to have the upgrader in change of runTasks? the wait & retry business could be handled internally, surely? | 06:48 |
fwereade | rogpeppe, it's not the code conflict that bothers me so much as the lack of clarity | 06:49 |
rogpeppe | fwereade: it's not *that* straightforward. you want to start the upgrader independently, regardless of what the other tasks are doing. then the wait and retry isn't logic that's in the outer loop (because it's dependendent on what gets downloaded) but in a separate goroutine. | 06:50 |
rogpeppe | fwereade: so the structure starts to look similar to what we've currently got | 06:50 |
rogpeppe | fwereade: i think the notion of a task being able to delay shutdown until it's done what it needs to do is a reasonable one, and one of the reasons we use tombs like we do. | 06:52 |
fwereade | rogpeppe, yeah, I see that | 06:53 |
rogpeppe | fwereade: last seen: [07:53:09] <fwereade> rogpeppe, yeah, I see that | 06:54 |
fwereade | rogpeppe, I *think* I'm convinced, although I still don't quite entirely like something about it | 06:54 |
rogpeppe | fwereade: i understand. | 06:56 |
fwereade | rogpeppe, that said, I think I am coming around to your perspective that something outside the uniter should handle death-watching | 06:56 |
rogpeppe | fwereade: wasn't there a load of mode-specific stuff that you needed to do when killed? | 06:57 |
fwereade | rogpeppe, the uniter's Dying response is not interesting from your perspective, I think (although there's little point upgrading a dying unit) | 06:57 |
fwereade | rogpeppe, I think you probably *should* be watching for Dead and crash-stopping though | 06:58 |
rogpeppe | fwereade: "me" being...? | 06:58 |
rogpeppe | fwereade: ah, i see | 06:58 |
rogpeppe | fwereade: you mean i should crash-stop when something returns ErrDead | 06:59 |
fwereade | rogpeppe, almost certainly yes | 06:59 |
rogpeppe | fwereade: maybe. although i think it's a very rare edge case tbh | 06:59 |
fwereade | rogpeppe, but what I was trying to say is that the upgrader itself should know not to bother watching a unit once it's dying | 07:00 |
rogpeppe | fwereade: the upgrader doesn't watch a unit | 07:00 |
fwereade | rogpeppe, I submit that it should, so that binary upgrades work within the same framework as anything else -- I don't think they transcend entity lifetime ;) | 07:01 |
rogpeppe | fwereade: there's always a delay between entity being killed and entity actually dying. this just makes it a little longer in some edge cases. | 07:01 |
rogpeppe | fwereade: if you upgrade a system, there's a possibility that some dying units might linger for a few seconds more as they download a new version. i don't think that's too bad a price. | 07:03 |
fwereade | rogpeppe, well, it means that it continues to act alive -- in some, but not all, respects -- for arbitrarily longer | 07:03 |
rogpeppe | fwereade: does it? | 07:04 |
fwereade | rogpeppe, it may be that this is actually correct behaviour | 07:04 |
rogpeppe | fwereade: what is the upgrader keeping alive? | 07:04 |
rogpeppe | fwereade: after all, the Unit is marked dead | 07:04 |
fwereade | rogpeppe, wait, I'm talking about bad behaviour on Dying more than ugliness on Dead here | 07:05 |
fwereade | rogpeppe, I don't like the ugliness on Dead but I could live with it | 07:05 |
rogpeppe | fwereade: i'm not sure it changes the Dying behaviour at all, does it? | 07:05 |
fwereade | rogpeppe, the question that currently exercises me is "should Dying entities upgrade their code?" | 07:05 |
fwereade | rogpeppe, maybe, actually, they should | 07:06 |
rogpeppe | fwereade: i think so | 07:06 |
fwereade | rogpeppe, yeah, I can imagine a bug blocking clean shutdown that can be resolved by an upgrade | 07:06 |
rogpeppe | fwereade: i think it's nice to have upgrading totally divorced from any of the other logic | 07:06 |
fwereade | rogpeppe, the fact that they stop watching for charm upgrades on Dying then becomes potentially problematic | 07:07 |
rogpeppe | fwereade: i think charm upgrades are a different thing - they're at a higher level. | 07:07 |
rogpeppe | fwereade: and they really do relate directly to the unit | 07:07 |
rogpeppe | fwereade: so it makes sense not to upgrade a charm when the unit is dying | 07:08 |
fwereade | rogpeppe, hm, the bug-blocking-clean-shutdown I guess does *not*apply because the user can already use juju resolved | 07:10 |
rogpeppe | fwereade: for charm upgrade, yeah - we're providing the always-available layer on top of a charm. | 07:11 |
fwereade | rogpeppe, ok, all sounds reasonable, I'll finish the review :) | 07:11 |
rogpeppe | fwereade: tyvm | 07:11 |
rogpeppe | fwereade: there's also this fairly trivial: https://codereview.appspot.com/6564063/ | 07:12 |
rogpeppe | everything in my window system has just start breaking. i'm gonna reboot | 07:28 |
rogpeppe | started | 07:28 |
rog | anyone here use multiple monitors under ubuntu? | 07:33 |
fwereade | rog, ok, I have another thought | 07:33 |
fwereade | rog, sorry no | 07:33 |
rog | fwereade: ok, listening | 07:33 |
fwereade | rog, I wil be comfortable with this is there is some very basic life handling at the top level | 07:33 |
fwereade | rog, I think we need: | 07:34 |
fwereade | rog, 1) drop the unit return from runOnce, because it's potentially panic-inducing anyway | 07:34 |
fwereade | rog, 2) when we get the unit in runOnce, return a special error on NotFound, and that same special error if the unit exists but is Dead | 07:35 |
fwereade | rog, 3) if we see that error in Run, return nil | 07:36 |
fwereade | rog, 4) return the UpgradedError when we're doing an upgrade restart | 07:36 |
fwereade | rog, 5) add a "normal exit 0" stanza to the upstart conf | 07:36 |
fwereade | rog, and I think that's it | 07:36 |
fwereade | rog, that I think conveys my intent as best I can | 07:37 |
fwereade | rog, the precise details of implementation are ofc approximate | 07:37 |
davecheney | the disparity between *ConfigNode.Map() and Charm.Config().Option is a pain in the balls | 07:38 |
fwereade | rog, does that sound sane | 07:38 |
fwereade | davecheney, ouch, I bet | 07:38 |
davecheney | map[string]string vs map[string]interface{} | 07:38 |
davecheney | mix in some json or yaml and it's a royal pain | 07:39 |
rog | fwereade: i don't know what you mean about dropping the unit return from runOnce - i don't think it can induce panic | 07:39 |
rog | fwereade: if we can't get the unit in runOnce, we'll never start the upgrader | 07:39 |
fwereade | rog, bah, true | 07:39 |
rog | fwereade: we could also check for dead if you like. | 07:39 |
rog | fwereade: that would be trivial | 07:39 |
* rog wishes we used map[string]string throughout | 07:41 | |
rog | bbs | 07:42 |
fwereade | rog, leaving unit stuff aside for a moment, what is your opinion of the "normal exit 0" thing? | 07:42 |
fwereade | rog, (when you return ofc0 | 07:42 |
rog | fwereade: maybe the agent should just remove its own upstart conf | 07:44 |
fwereade | rog, that feels icky to me | 07:45 |
rog | fwereade: who else is going to remove it? | 07:45 |
fwereade | rog, the machine agent? | 07:45 |
rog | fwereade: ... or the principal unit agent, right? | 07:46 |
fwereade | rog, and I think it's fine to have a unit agent run on startup and exit immediately without error to indicate that it's done all it has to do | 07:46 |
fwereade | rog, yeah, whoever deployed it | 07:46 |
fwereade | rog, or a machine agent for that matter | 07:46 |
rog | fwereade: yeah, i think you're right | 07:47 |
rog | fwereade: presumably it *is* possible to get upstart to never start something again after it's exited ok | 07:47 |
rog | fwereade: i'm slightly dubious though - if we reboot, surely it'll start again anyway | 07:47 |
fwereade | rog, AIUI the stanza "normal exit 0" should be sufficient | 07:48 |
rog | fwereade: ok, sounds fine. | 07:48 |
fwereade | rog, it will; and the UA will cleanly observe its deadness, exit without error, and never trouble the machine again that run | 07:48 |
rog | fwereade: interesting point: currently if one of the workers dies without an error, runTasks will just let it die, but continue on. i wonder if actually it should kill everything as usual in that case, so the tasks are always tied together. | 07:49 |
fwereade | rog, hmmmmmmmmmm | 07:50 |
rog | fwereade: then the unit agent can exit with a nil error when the unit is dead | 07:50 |
rog | fwereade: and everything gets shut down cleanly. | 07:50 |
fwereade | rog, I think that in code I would prefer an explicit error return | 07:51 |
rog | fwereade: a nil error seems a fine way of saying "i'm shutting down with no error" to me | 07:51 |
fwereade | rog, an an error seems to me to be a fine way to signal "what you asked me to do ain't gonna happen" | 07:52 |
rog | fwereade: it's done *exactly* what we asked it to do, surely? | 07:52 |
rog | fwereade: but... given that it wants other things to die along with it, perhaps ErrDead is good. | 07:53 |
fwereade | rog, gaah, sorry | 07:54 |
rog | fwereade: last thing i saw was: [08:52:16] <fwereade> rog, an an error seems to me to be a fine way to signal "what you asked me to do ain't gonna happen" | 07:55 |
fwereade | rog, well, we get to choose how we define it -- I see the condition of lacking a unit as fundamentally an error condition for the uniter, because it means it can't do anything | 07:55 |
fwereade | rog, the client code may be in a position to handle that specific error in a different way | 07:55 |
rog | fwereade: zero is a ok value :-) | 07:55 |
rog | fwereade: last thing i said before you went: | 07:56 |
rog | [08:53:45] <rog> fwereade: but... given that it wants other things to die along with it, perhaps ErrDead is good. | 07:56 |
fwereade | rog, a Uniter *needs* a viable Unit to do its job -- if it cannot get the unit, or the unit is dead, that is an error :) | 07:56 |
fwereade | rog, ah, I missed that, sorry | 07:56 |
fwereade | rog, +1 on ErrDead causing nil return from Run | 07:56 |
rog | fwereade: ok, will do | 07:57 |
fwereade | rog, btw, would you put it somewhere easily accessible, like state, so I can also return it from the Uniter please? | 08:00 |
rog | fwereade: it could even go in the Uniter if we wanted | 08:01 |
rog | fwereade: i mean, in worker/uniter | 08:01 |
fwereade | rog, +1 | 08:01 |
fwereade | rog, ok, sorry to keep banging on about the unit return, but it really doesn't feel right | 08:06 |
fwereade | rog, and it's a little goroutine-icky, even if not technically unsafe | 08:06 |
rog | fwereade: it's never used two goroutines simultaneously | 08:06 |
fwereade | rog, but it crosses my mind that the *upgrader* already has the unit, and can surely send down its PathKey in the UpgradedError itself? | 08:06 |
fwereade | rog, that's why I said not unsafe | 08:07 |
rog | fwereade: i'm not sure i see the issue | 08:07 |
rog | fwereade: runOnce only returns when none of its sub-tasks are running | 08:07 |
rog | fwereade: therefore there can be no problem with goroutine ickiness | 08:07 |
rog | fwereade: i may be on crack | 08:08 |
fwereade | rog, I guess I just don't like a return value that is present-but-useless when no error, or most errors; present-and-useful on one specific error, and not-present on some other errors | 08:09 |
rog | fwereade: it's always present :-) | 08:10 |
rog | oh no it's not | 08:10 |
fwereade | rog, so you somehow return a unit when a unit is not found? | 08:10 |
rog | indeed | 08:10 |
fwereade | rog, ISTM that the PathKey is the important thing, not the context, and that that would be a fine thing to send down with the error | 08:10 |
fwereade | rog, sorry, "not the context" is kinda meaningless | 08:12 |
rog | fwereade: that does assume that the agent is *always* going to be named after the PathKey. | 08:12 |
rog | fwereade: otherwise the upgrader wouldn't be able to send it | 08:12 |
rog | fwereade: i know what you mean about the ickiness though | 08:13 |
fwereade | rog, hmm, AIUI that was the only point of PathKey in the first place | 08:13 |
fwereade | rog, does Machine have one too? | 08:13 |
rog | fwereade: yeah i know - that's why i wanted to call it "AgentName" ... | 08:13 |
rog | fwereade: it does | 08:13 |
fwereade | rog, feels to me like the way to go | 08:14 |
rog | fwereade: the ironic thing is that we already have the means to make the agent name, without involving unit or machine | 08:14 |
fwereade | rog, ha! | 08:14 |
fwereade | rog, but then, meh, we (currently) always need the unit anyway, we may as well ask it what it thinks | 08:14 |
fwereade | rog, +1 on AgentName | 08:15 |
rog | fwereade: been there, it was wrong | 08:15 |
fwereade | rog, understood, but it may just have been an idea before its time :) | 08:15 |
rog | fwereade: maybe. perhaps i'll start with PathKey and see whether the incongruence might change niemeyer's mind | 08:18 |
fwereade | rog, SGTM | 08:21 |
rog | fwereade: the other thing that occurred to me is the UpgradedError could contain a closure which would do the upgrade (i.e. call ChangeAgentTools) | 08:22 |
rog | fwereade: but that's probably a bit icky | 08:23 |
fwereade | <fwereade> rog, the only thing that stopped me proposing that was a feeling that UpgradedError was weird enough already | 08:24 |
fwereade | rog, but I think it would probably actually be the cleanest solution | 08:24 |
rog | [09:23:16] <rog> fwereade: but that's probably a bit icky | 08:25 |
fwereade | rog, not sure -- it's already not quite right, because it's not an upgrade*d* error | 08:26 |
fwereade | rog, it's actually an UpgradeReadyError | 08:26 |
rog | fwereade: i dunno - it's "someone has upgraded me" | 08:26 |
fwereade | rog, and if you think of it like that, a RunUpgrade method mades sense | 08:26 |
fwereade | rog, the actual upgrade of the agent does not take place until the agent calls ChangeAgentTools though | 08:27 |
rog | fwereade: UpgradeReadyError doesn't sound much like an error | 08:27 |
fwereade | rog, in what way is it worse than UpgradedError> | 08:27 |
rog | fwereade: i agree | 08:27 |
rog | fwereade: you're probably right, it's just about the same | 08:27 |
fwereade | rog, indeed -- the whole error idea is what still feels mildly abusive | 08:28 |
fwereade | rog, but I think at least it's worth a try | 08:28 |
fwereade | rog, https://codereview.appspot.com/6561063/ reviewed anyway | 08:32 |
fwereade | rog, https://codereview.appspot.com/6564063/ LGTM | 08:37 |
rog | fwereade: thanks | 08:37 |
rog | fwereade: dammit, the upgrader can't do the actual upgrade easily | 08:49 |
rog | fwereade: it doesn't have agent.Conf.DataDir | 08:49 |
fwereade | rog, really? blast | 08:49 |
rog | fwereade: and i'm slightly reluctant to pass that in just for this | 08:49 |
fwereade | rog, doesn't it use DataDir somehow to figure out the place to put the real tools dir in the first place? | 08:50 |
fwereade | rog, I *thought* the upgrader put the tools, and then just told the client that they can now symlink to the tools' known lcation | 08:50 |
rog | fwereade: yes, it does indeed have dataDir around, you're right | 08:50 |
fwereade | rog, cool | 08:51 |
rog | fwereade: ok, np then | 08:51 |
Aram | hello. | 09:18 |
rog | Aram: mornin' | 09:20 |
TheMue | Aram: moin | 09:29 |
rog | TheMue: hiya | 09:34 |
TheMue | rog: didn't we already seen 3h ago? ;) | 09:35 |
rog | TheMue: i thought maybe i hadn't :-) | 09:35 |
TheMue | rog: it's ok, we both are 40+. so i know that. :P | 09:36 |
rog | fwereade: do you think *all* upstart scripts we produce should have the "normal exit 0" stanza? | 09:36 |
rog | TheMue: :-) | 09:36 |
fwereade | rog, hmm, it demands a similar change in the machine agent, but -- for our purposes at least -- that might be a sensible way to go | 09:37 |
fwereade | rog, it will not be unbearably hard to make it configurable later if we want to | 09:37 |
rog | fwereade: i've already made the change to the machine agent | 09:38 |
fwereade | rog, well then go for it :) | 09:39 |
TheMue | release meeting time | 11:02 |
* davecheney waits | 11:02 | |
TheMue | i ping the boss | 11:03 |
Aram | aah, meeting. | 11:04 |
Aram | I forgot about this. | 11:04 |
TheMue | hmm, no answer | 11:04 |
rog | fwereade: this is still WIP, but hopefully is closer to what you'd like: https://codereview.appspot.com/6561063 | 11:07 |
mramm2 | pong | 11:14 |
TheMue | ah, the master | 11:14 |
mramm2 | hahah | 11:14 |
TheMue | just wanted to collect the state of all to send it to you, but now you're here. fine. | 11:15 |
rog | davecheney, fwereade: invites are out | 11:18 |
Aram | davecheney: fwereade: https://plus.google.com/hangouts/_/9dde8fe32d77795910d89838149fce4ccddea664?authuser=1&hl=en | 11:18 |
rog | i lost conn | 11:27 |
rog | now "we're having trouble connecting with the plugin" | 11:28 |
rog | fwereade: presumably your net connection isn't good enough for a hangout this morning? | 11:36 |
fwereade | rog, oh *hell* I completely forgot and was eating lunch | 11:36 |
fwereade | rog, joining | 11:36 |
rog | fwereade: https://plus.google.com/hangouts/_/9dde8fe32d77795910d89838149fce4ccddea664?authuser=1&hl=en | 11:37 |
davecheney | all: http://paste.ubuntu.com/1247363/ | 11:44 |
davecheney | ^ hook, sadness | 11:44 |
TheMue | fwereade: is the current ServiceRelationWatcher what you expected with issue #1032539? | 11:48 |
fwereade | TheMue, just a mo, let me take a look; if it has Added/Removed, no | 11:49 |
rog | davecheney: have you looked at the log output? | 11:49 |
rog | davecheney: that's where the hook output will go, which should be diagnostic | 11:49 |
rog | bbs | 11:51 |
rog | fwereade: done, i think: https://codereview.appspot.com/6561063 | 12:06 |
rog | (live tests pass) | 12:07 |
fwereade | rog, awesome, I'll take a look in a sec | 12:07 |
rog | fwereade: thanks | 12:07 |
rog | fwereade: trivial? https://codereview.appspot.com/6568064 | 12:32 |
rog | lunch | 12:32 |
fwereade | rog, LGTM on UpgradeReadyError... oh wait there was something I meant to check | 12:41 |
fwereade | rog, what's the deal with changing the arge to the machine agent? | 12:41 |
rog | fwereade: hmm, let me check | 12:41 |
rog | fwereade: i'm not sure what you mean. which argument? | 12:42 |
fwereade | rog, I can't find it, I may be remembering from an older version | 12:43 |
fwereade | rog, yeah, it doesn't exist -- sorry | 12:44 |
fwereade | rog, https://codereview.appspot.com/6568064/ looks trivial to me | 12:46 |
rog | fwereade: cool, will submit | 12:46 |
rog | fwereade: (thanks!) | 12:47 |
niemeyer | Hello all! | 13:04 |
niemeyer | Sorry, my phone warned me about the early meeting a bit late.. did we have one? | 13:06 |
Aram | we did | 13:06 |
niemeyer | Aram: Nice | 13:10 |
niemeyer | Aram: Have you seen the machines watcher I've pushed? | 13:10 |
Aram | niemeyer: still wip, found a bug and making final adjustments now. | 13:11 |
Aram | niemeyer: no, I'll take a look. | 13:11 |
niemeyer | Aram: https://codereview.appspot.com/6566066/ | 13:11 |
niemeyer | Aram: Thanks | 13:11 |
niemeyer | Aram: Since we're both working on that stuff, it's good to cross-review | 13:11 |
niemeyer | Aram: Btw, I've talked about a similar pattern to the watcher you're pushing with William in the sprint, and now it occurred to me that you were not around in the conversation | 13:12 |
niemeyer | Aram: is it using a single goroutine, or a goroutine per unit? | 13:12 |
Aram | a goroutine for the machine plus one goroutine for each principal unit (not for subordinates). | 13:13 |
niemeyer | Aram: I don't think we need more than a single goroutine | 13:15 |
niemeyer | Aram: The logic ends up simpler rather than more complex with a single goroutine | 13:16 |
niemeyer | Aram: Please have a look at RelationUnitsWatcher | 13:16 |
niemeyer | Aram: It has a similar problem in which it has to decide what to watch as it goes | 13:16 |
Aram | niemeyer: yeah, it can be done with only one goroutine. I'll change it after I fix this one bug. I did it this way because it matched the way I thought of the problem, but it's easy to change now. | 13:17 |
niemeyer | Aram: It started with the same design of one-per-subcontext, and then after a conversation it got refactored to be a single goroutine | 13:17 |
niemeyer | Aram: Yeah, I totally understand that.. it feels "right", which is why I figured I should ask | 13:18 |
rog | niemeyer: yo! | 13:27 |
niemeyer | rog: Yo! | 13:28 |
rog | niemeyer: after discussion with william this morning, i made quite a few changes to the uniter upgrade branch | 13:28 |
rog | niemeyer: i hope you find them ok... | 13:28 |
rog | niemeyer: https://codereview.appspot.com/6561063/ | 13:28 |
rog | niemeyer: am just running the first live test on the machine agent with provisioner and firewaller in-built | 13:29 |
rog | niemeyer: if that works, we're basically there for upgrading | 13:29 |
niemeyer | rog: It looks like you've made changes to the upgrader itself? | 13:30 |
rog | niemeyer: as requested, yes | 13:30 |
niemeyer | rog: Why did it affect the branch, specifically? | 13:30 |
rog | niemeyer: because fwereade was not happy about the fact that runOnce returned the unit | 13:31 |
rog | niemeyer: because it was only valid sometimes | 13:31 |
rog | niemeyer: so the fix was to make the upgrader responsible for doing the actual upgrade itself. | 13:32 |
niemeyer | rog: Yeah, I'm not against it for sure.. sounds like a great idea | 13:32 |
niemeyer | rog: I just wish this was done by itself | 13:32 |
niemeyer | rog: Rather than on top of another big branch | 13:32 |
rog | niemeyer: yes, perhaps i should have done that. it seemed like i'd have got into a twisty mess, but it may have been ok | 13:33 |
niemeyer | rog: I think that's where I am right now :) | 13:33 |
rog | niemeyer: ok, i'll back out the changes and create another branch | 13:34 |
niemeyer | rog: It's fine, I'm already on it | 13:34 |
niemeyer | rog: if err == uniter.ErrDead {? | 13:36 |
niemeyer | rog: Was this merged with William's change? | 13:36 |
rog | niemeyer: no, this was another thing that william suggested | 13:36 |
rog | niemeyer: perhaps i should have left that for later too | 13:36 |
niemeyer | rog: How's that related to upgrading? | 13:37 |
niemeyer | 82 if state.IsNotFound(err) || err == nil && unit.Life() == state.Dead { | 13:37 |
niemeyer | ? | 13:37 |
niemeyer | rog: Gosh man.. | 13:37 |
rog | niemeyer: it's related to the logic around upgrading. | 13:37 |
niemeyer | rog: That's related to unit lifecycle | 13:38 |
rog | niemeyer: william asked me to change it, so i did. i'll change it back, np | 13:38 |
niemeyer | rog: Which is not handled now, and as far as I understand is completely unrelated to upgrading | 13:38 |
niemeyer | rog: Okay, can we please go back to applying the upgrading pattern that was already in place to the unit? | 13:39 |
niemeyer | rog: Or, optionally, refactor the upgrading out *without* adding it to the unit | 13:39 |
niemeyer | rog: and then add it to the unit in a separate step | 13:39 |
niemeyer | rog: The branch as it is seems somewhat out of scope in whichever angle we look | 13:39 |
rog | niemeyer: if i just take out the lifecycle changes, would that be enough? | 13:40 |
niemeyer | rog: I have no idea.. right now I have a branch that has at least: 1) Add upgrade support to unit; 2) Refactor the upgrade machanism; 3) Add lifecycle support to the unit | 13:40 |
niemeyer | rog: I'd do them as (2), (1), and keep (3) out entirely.. William is working on it | 13:41 |
rog | niemeyer: ok | 13:41 |
niemeyer | rog: I'd also be fine with (1) + (2), though, as that's the order you did and might be easier to get back onto that state | 13:42 |
rog | niemeyer: that would indeed be much easier, as i did the lifecycle thing last | 13:43 |
niemeyer | rog: Right | 13:43 |
rog | niemeyer: thanks | 13:43 |
rog | niemeyer: i should know better when to stop what i'm doing and make a new independent branch | 13:43 |
niemeyer | rog: I have a hard time understanding it, to be honest | 13:44 |
niemeyer | rog: It's so pleasing to get small branches in.. | 13:44 |
niemeyer | rog: Fast, painless | 13:44 |
niemeyer | rog: Easy for people to look at and "Hah, of course we want that" | 13:44 |
rog | niemeyer: yeah, but in this case, the incentive was "if i just make these few small changes, then william will be happy" | 13:44 |
niemeyer | rog: You can always make William happy in another branch :-) | 13:45 |
rog | niemeyer: yeah | 13:45 |
niemeyer | rog: Even more when the change is *already* huge | 13:45 |
TheMue | so, late lunchtime today, afterwards an appointment outside | 13:50 |
Aram | niemeyer: you got a review. | 13:51 |
TheMue | niemeyer: maybe i come back to you in the evening regarding the security groups, but most is already clear | 13:51 |
niemeyer | Aram: Thanks! | 13:51 |
Aram | niemeyer: I'll mark my branch as WIP and do some more changes. | 13:51 |
niemeyer | TheMue: Super, that branch Aram reviewed has your test, btw | 13:51 |
niemeyer | Aram: Cool | 13:51 |
TheMue | niemeyer: yes, i've revied it too and like it | 13:52 |
TheMue | reviewed | 13:52 |
niemeyer | Aram: Can you clarify this bit: "to check that when we get an event, the lifecycle of the entity is really what we expect it to be." | 13:53 |
Aram | niemeyer: we get []int{2,3,5}, it would be nice if we could check that 5 is alive because it was just added, 2 is dead and 3 is dying. | 13:54 |
niemeyer | Aram: Hmm.. what are we testing? | 13:54 |
niemeyer | Aram: This feels like testing the test itself? I mean, it's the test itself that is putting the unit to Dead | 13:55 |
Aram | the test puts the unit to dead, but perhaps the watcher misfired for whatever wrong reason, stil delivering the correct event. by checking that the unit is dead we make sure that the watcher fired for the correct reason. | 13:56 |
niemeyer | Aram: Sorry, I don't think that's the case. The test is saying "u.EnsureDead()".. it doesn't make sense for this specific test to ask "Is u actually dead?".. | 13:57 |
niemeyer | Aram: We have tests for EnsureDead elsewhere | 13:57 |
niemeyer | Aram: Regarding this: "I pondered about this. If we use Select anyway, why don't we use the real document, machineDoc in this case, and ignore the fields we don't care about?" | 13:58 |
niemeyer | Aram: I was on the fence.. | 13:58 |
niemeyer | Aram: I think you're right.. we should just use the machine doc | 13:58 |
rog | niemeyer: i hope this is more digestible: https://codereview.appspot.com/6561063 | 14:20 |
niemeyer | rog: Thanks very much, looking | 14:20 |
rog | fwereade: i've taken out the life-cycle-related changes - they'll go in another CL | 14:20 |
fwereade | rog, ok, SGTM, so long as they come soon I won't fret :) | 14:20 |
niemeyer | rog: At these times I <3 Rietveld | 14:22 |
rog | niemeyer: yeah, being able to diff against the different stages is invaluable | 14:23 |
niemeyer | rog: Yeah, and diffing across the back and forth produces clean diffs | 14:23 |
rog | niemeyer: ah yes | 14:23 |
Aram | at previous job we had the worst review tool ever. | 14:23 |
Aram | custom made in house, probably 15 years ago. | 14:24 |
rog | fwereade, niemeyer: factored-out branch: https://codereview.appspot.com/6567067/ | 14:33 |
niemeyer | rog: Reviewed | 14:39 |
niemeyer | rog: The first one, that is | 14:39 |
rog | niemeyer: tyvm | 14:39 |
niemeyer | Aram: Nasty. I think I never had anything close to Rietveld either, to be honest | 14:40 |
niemeyer | So much lifetime wasted :-) | 14:40 |
rog | niemeyer: the cloudinit change | 14:42 |
rog | is necessary because the upgrader uses the machine's PathKey for the agent | 14:42 |
rog | name. I could roll that back and explicitly pass in an agent name to NewUpgrader | 14:42 |
rog | instead, if you like. | 14:42 |
Aram | dy = -dx/(x + c1) | 14:44 |
Aram | y = -I[dx/(x + c1)] | 14:44 |
Aram | y = -ln|x + c1| + c2' = -ln|x + c1| + ln(exp(c2')) = -ln(exp(c2')|(x + c1)|) = -ln|c2(x + c1)| | 14:44 |
Aram | meh, sorry, not here. | 14:44 |
Aram | damn paste. | 14:44 |
niemeyer | rog: No, sounds sensible then, thank you | 14:44 |
rog | niemeyer: cool. | 14:44 |
niemeyer | Aram: Curious | 14:45 |
rog | niemeyer: BTW for the nil error, i think maybe it would be best as: if err == nil {err = fmt.Errorf("tasks finished with no error") } or something | 14:45 |
niemeyer | rog: "uniter error: %v" feels very terse and clear | 14:46 |
rog | niemeyer: rather than "uniter error: nil" which is still not great | 14:46 |
rog | niemeyer: ok, will do | 14:46 |
rog | niemeyer: seems funny saying "error" when there's no error, that's all | 14:46 |
Aram | niemeyer: I found an awesome hacky way of solving the not so complex differential equation x * y' + 1 = exp(y). | 14:47 |
niemeyer | Aram: What does the equation mean? | 14:48 |
Aram | can't remember where I first encountered it, particle physics for sure. A new solution just sprang (?) in my mind randomly and I had to write it down. | 14:52 |
niemeyer | rog: Fair enough regarding the error message, you're right | 14:52 |
niemeyer | Aram: That's how it generally goes :-) | 14:53 |
rog | niemeyer: ok, i'll go with my suggestion above, thanks | 14:53 |
niemeyer | rog: Thank you | 14:53 |
rog | ah, this will be why my combined machine and provsioning agent isn't working! | 14:53 |
rog | 2012/09/28 14:47:51 JUJU loaded invalid environment configuration: no registered provider for "ec2" | 14:53 |
rog | not too hard to fix :-) | 14:54 |
Aram | I must specify that I've probably encountered this equation more than 7 years ago though, heh. | 14:54 |
Aram | at a particular physics olympiad. | 14:54 |
niemeyer | rog: Follow up reviewed | 14:54 |
rog | niemeyer: thanks! | 14:55 |
rog | niemeyer: i wondered about worker.ErrDead but considered that it might be useful for a client to be able to distinguish which worker had given the ErrDead. but maybe that's overthinking it. | 14:56 |
niemeyer | Aram: Differential equations are surprisingly useful.. I wish I could keep the background math in my head for longer.. but haven't really ever used them in anger, so it remains just as a spare tool | 14:57 |
=== hazmat` is now known as hazmat | ||
niemeyer | rog: The only agent that can be dead is the one that is running | 14:58 |
rog | niemeyer: ok. | 14:59 |
niemeyer | rog: e.g. we can't have a dead provisioning worker without a dead machine | 14:59 |
Aram | I have to step out early today guys, but I'll finish work in the weekend on the watcher. | 15:00 |
rog | Aram: have fun! | 15:00 |
niemeyer | Aram: Have a pleasant EOD | 15:00 |
Aram | thanks. | 15:00 |
rog | machiner/provisioner/firewaller upgrade worked live, yay! | 15:15 |
niemeyer | rog: WOAH | 15:17 |
niemeyer | I guess we're closing September in pretty good shape | 15:18 |
rog | niemeyer: in worker: var ErrDead = errors.New("agent object is dead") ? | 15:20 |
rog | pwd | 15:20 |
rog | niemeyer: better than we were a week ago, no question :-) | 15:21 |
niemeyer | rog: s/object/entity/, otherwise LGTM | 15:22 |
rog | niemeyer: cool | 15:22 |
rog | niemeyer: https://codereview.appspot.com/6570063/ | 15:51 |
rog | niemeyer: the final piece of the puzzle | 15:51 |
niemeyer | rog: Looking | 15:51 |
rog | fwereade: https://codereview.appspot.com/6570063/ | 15:51 |
fwereade | rog, ack | 15:52 |
fwereade | rog, that is awesome | 15:54 |
fwereade | rog, LGTM | 15:55 |
rog | fwereade: cheers! | 15:55 |
rog | fwereade: i need to do another branch to pass the state into the Machiner directly, i think. | 15:56 |
rog | fwereade: or maybe niemeyer will call that out in this branch | 15:56 |
rog | fwereade: actually, nope, it's not a problem, cool. | 15:59 |
niemeyer | rog: Done | 15:59 |
rog | niemeyer: thanks! | 15:59 |
niemeyer | rog: Very cool | 15:59 |
rog | niemeyer: it was very easy and *almost* worked first time... | 16:00 |
niemeyer | rog: Great stuff | 16:00 |
niemeyer | Lunch is calling | 16:01 |
niemeyer | biab | 16:01 |
rog | right upgrades are GO! | 16:11 |
rog | mramm2: ^ | 16:11 |
mramm2 | rog: awesome | 16:20 |
rog | mramm2: the only thing we don't have currently is the --bump-version functionality. | 16:21 |
rog | niemeyer: stage 1 of --bump-version: https://codereview.appspot.com/6560066 | 16:43 |
niemeyer | rog: Looking | 16:43 |
niemeyer | rog: Done | 16:47 |
rog | niemeyer: thanks! | 16:48 |
niemeyer | rog: My pleasure | 16:48 |
rog | niemeyer: BTW, about --bump-version: | 16:48 |
rog | niemeyer: i'm wondering when we *don't* want it enabled | 16:48 |
rog | niemeyer: if we're uploading tools to private storage | 16:48 |
niemeyer | rog: Any time checking out a release and uploading it | 16:49 |
rog | niemeyer: why would we want to do that without bumping the build version? nothing will see the new release. | 16:50 |
niemeyer | rog: Bumping the version is a hack.. uploading tools to private storage is not | 16:51 |
niemeyer | rog: People can upload tools to private storage in their own cloud, with a real juju release | 16:51 |
rog | niemeyer: ok. should we make it an error if the tools already exist in the storage then? | 16:51 |
niemeyer | rog: How's that connected to the above? | 16:52 |
rog | niemeyer: because otherwise it's likely to be a mistake, and people might be surprised when nothing happens. | 16:52 |
niemeyer | rog: When you copy a file to a location that already contains the same file, nothing happens other than the same file being in the same place | 16:52 |
niemeyer | rog: That's not surprising | 16:53 |
rog | niemeyer: i'm not sure. in this case, it may well feel like we're asking juju to use a particular version. | 16:53 |
rog | niemeyer: which it will be, just not the version we've just uploaded | 16:54 |
niemeyer | rog: It will be the version we just uploaded if that's the version that is being used | 16:54 |
niemeyer | rog: I don't see the problem, really | 16:54 |
niemeyer | rog: --upload-tools uploads tools.. that's all | 16:54 |
rog | niemeyer: well, the problem there is we've got two "versions" of the s/w with the same version number | 16:54 |
niemeyer | rog: If people expect more than this, they'll be wrong | 16:55 |
rog | niemeyer: it's actually "upgrade-juju --upload-tools" and the problem i see is in the first word there. | 16:55 |
niemeyer | rog: That means you've changed the release | 16:55 |
rog | niemeyer: yeah - we've changed the release but the version number hasn't changed. that feels a bit like a mistake to me. | 16:56 |
rog | niemeyer: i'm trying to imagine a situation where it's useful to overwrite the existing tools for a given version | 16:57 |
niemeyer | rog: If you checkout 2.2.0, that's 2.2.0 | 16:57 |
niemeyer | rog: Not 2.2.0.100, not 2.2.0.1001 | 16:57 |
rog | niemeyer: ok, that's fine. | 16:58 |
niemeyer | rog: We're doing development, and we have tools to do what we want when we have to | 16:58 |
niemeyer | rog: Thus --bump-version | 16:58 |
rog | niemeyer: if someone has already checked out and uploaded 2.2.0, is it useful to them to be able to overwrite the existing 2.2.0 in storage? | 16:58 |
niemeyer | rog: Don't know.. doesn't look like a problem we should worry about right now | 16:59 |
rog | niemeyer: (i'm not suggesting automatic --bump-version BTW) | 16:59 |
rog | niemeyer: ok | 16:59 |
niemeyer | rog: If people ask to upload, and it uploads, sounds fine | 16:59 |
niemeyer | rog: We can fine tune that behavior over time too.. anything we think right now will likely be wrong in the real world | 17:00 |
rog | niemeyer: ok, seems reasonable | 17:00 |
rog | niemeyer: submitted. ... and with that, it's time for me to call it a week. oh, i might sign off a few bugs first :-) | 17:07 |
niemeyer | rog: Super, thanks for the great stuff this week | 17:07 |
rog | niemeyer: got there in the end :-) | 17:08 |
niemeyer | rog: Yeah | 17:08 |
rog | fwereade, Aram, niemeyer, everyone else: have a great weekend! | 17:14 |
fwereade | rog, and yourself :) | 17:14 |
niemeyer | rog: Thanks, you too! | 17:14 |
fwereade | niemeyer, btw, I just put a couple of trivials in the queue | 17:14 |
niemeyer | fwereade: I'll look right now | 17:14 |
niemeyer | fwereade: Done | 17:19 |
niemeyer | fwereade: Done | 17:21 |
fwereade | niemeyer, lovely, thanks | 17:21 |
niemeyer | fwereade: My pleasure.. have a couple of questions on the second one | 17:22 |
fwereade | niemeyer, the bits you're asking about are basically the same | 17:22 |
fwereade | niemeyer, if you do a plain waitHooks{}, it means, "Iexpect no further hooks to have been run" | 17:22 |
fwereade | niemeyer, the initial StartSync is to poke the uniter into responding to events that might cause spurious events to happen | 17:23 |
fwereade | niemeyer, the subsequent ones are to make sure that it responds to state changes in a timely way | 17:23 |
fwereade | niemeyer, sound sensible? | 17:24 |
niemeyer | fwereade: Ah, yeah, thanks.. I suggest dropping it down to, say, 100ms then, since this is the happy case | 17:25 |
fwereade | niemeyer, SGTM | 17:25 |
niemeyer | fwereade: We're getting a second every 5 of those, otherwise | 17:25 |
fwereade | niemeyer, indeed, dropping it will not hurt at all | 17:25 |
fwereade | niemeyer, ok to submit with that change? | 17:26 |
niemeyer | fwereade: LGTM | 17:26 |
fwereade | niemeyer, cheers | 17:26 |
fwereade | hmm, got to go out in a mo, actually, but they should be in soon enough :) | 17:27 |
niemeyer | fwereade: Have a good weekend, and thanks for the hard work too.. feels like we're on the runway for a fully working implementation | 17:30 |
* niemeyer steps out for a while | 19:41 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!