[00:27] davecheney, niemeyer: you'll be glad to know that the recent cloudinit changes and traffic auth changes mean that we can actually run a uniter live. [00:27] wrtp: WOAH [00:27] wrtp: No kidding! [00:27] * niemeyer grabs some port wine to celebrate :) [00:27] davecheney: in a few minutes, i *should* see the first live uniter upgrade [00:27] niemeyer: ^ [00:28] niemeyer: the first time failed because i'd forgotten to add the UpgradedError check... [00:28] * wrtp does the same [00:37] ah, interesting case. it *would* work, but it doesn't get as far as starting the upgrader because the uniter is returning an error when initialised. [00:38] hmm, not easily fixed either. [00:48] niemeyer: this an interesting case that i hadn't anticipated; i'm not quite sure what to do [01:08] wrtp: Hmm [01:08] wrtp: What's the error? [01:08] wrtp: and at which point? [01:08] niemeyer: it actually doesn't really matter so much what the error is in fact [01:09] niemeyer: although it's actually ModeInit: bad http response 404 Not Found [01:10] niemeyer: the problem is that if one of the workers consistently dies early, the task exits, and all the other tasks (including the upgrader) get killed off [01:10] niemeyer: so the upgrader never has a chance to upgrade [01:10] niemeyer: i've worked out what i think is a decent solution though [01:10] wrtp: If the uniter is consistently dying early, sounds like something is wrong? [01:11] niemeyer: indeed it is, but our code should be able to cope with that, even if it is wrong. [01:11] niemeyer: otherwise we have a situation where we're running some bad code and we can't upgrade out of that, even though nothing has crashed [01:12] wrtp: If it's dying consistently before even starting, I don't see how we'd be able to do anything? [01:12] wrtp: We can't go "Hey, I think I'll upgrade anyway, just in case!" [01:12] niemeyer: it was, but my first thought was: put the checks in *after* we've started the uniter [01:13] niemeyer: so we're running the upgrader concurrently. [01:13] niemeyer: but then you run into the case above. [01:13] niemeyer: but as i said, i think i have a nice solution. [01:16] wrtp: Oh/ [01:16] ? [01:17] niemeyer: 1) we change the runTasks function so that it gives UpgradedError priority over other errors [01:17] niemeyer: 2) we change the upgrader so that if it's stopped while downloading, it waits a while for the download to complete before actually dying [01:17] wrtp: Hmm [01:28] wrtp: Feels easy to fall onto races [01:28] wrtp: I wish we had a more deterministic way to say that [01:28] niemeyer: i think it's not too difficult [01:29] niemeyer: obviously we can't wait forever because then a download that hung up would mean that an upgrader would never be killable, but i reckon a minute would be fine [01:29] wrtp: Feels like guess work [01:30] wrtp: It could also take 10 in a slow network, or 20 [01:30] niemeyer: point of reference: in ec2 it takes 4s [01:31] niemeyer: we could add something to the downloader that signals that some progress is being made [01:31] wrtp: So S3 to EC2 is your reference of "slow network"!? :-) [01:31] niemeyer: it'd still be guesswork, but not quite so wild [01:31] wrtp: That's probably as good as it gets [01:31] niemeyer: fair enough [01:31] niemeyer: we could make the timeout 1 day if we liked [01:31] wrtp: I don't like any of that.. :( [01:32] wrtp: We need a more deterministic way to define whether we want to stop *now* or whether we think we should check for an upgrade [01:32] niemeyer: when do we ever want to stop *now* ? [01:33] wrtp: When we say "stop".. when there's an unrecoverable error in the state connection, .. [01:33] niemeyer: yes, we could add code to enable stopping immediately, but that doesn't solve our problem [01:34] niemeyer: ah yes, if we could know when we had such an error, that would be good [01:34] niemeyer: we'd have to lose lots of errorContextfs though [01:35] niemeyer: or actually, maybe errorContextf could be changed to deal with it [01:36] wrtp: Hm? [01:36] niemeyer: ignore me [01:36] niemeyer: i'm on crack [01:36] niemeyer: we just need a way of asking whether the state connection is still ok [01:38] wrtp: Hmm [01:39] wrtp: Did you get to the point of seeing the upgrader behavior when the issue was happening? [01:39] wrtp: Was it bailing out before ever checking for any upgrade? [01:40] niemeyer: i'm not sure. it doesn't currently print a log message when a download starts [01:41] wrtp: I was just having a look at it, a twist on the first idea you had sounds like an interesting direction [01:42] niemeyer: i've actually finished writing the code for the first idea already BTW [01:43] wrtp: It's not just about "if a download is in progress", though [01:43] niemeyer: it's also "have we seen an initial environ config" [01:43] wrtp: Right [01:43] niemeyer: i've already got that logic too [01:44] wrtp: We have to ignore Dying and, if we have a valid environ, *go check for an upgrade*.. only then, put the Dying() consideration back on the table [01:44] wrtp: Oh, you rock :) [01:44] niemeyer: http://paste.ubuntu.com/1201690/ [01:46] dying = time.After(1 * time.Minute) [01:46] !? [01:46] niemeyer: oops, well obviously we can't do *that* :-) [01:46] niemeyer: but something that sends on dying after a minute would do the job [01:48] wrtp: I'd say we can ignore dying on the first round, and go try an upgrade (facilitated by factoring out somethings onto a function). [01:49] wrtp: If we have a timeout, that timeout should be on the downloader itself, not on that loop [01:49] wrtp: It's the downloader that should be responsible for complaining that things are stuck [01:50] wrtp: if we don't do that, we have problems either way (what if a charm download stops, what if ...) [01:50] niemeyer: what about waiting for the initial config? [01:51] wrtp: Expand please? [01:52] niemeyer: if we haven't yet received the inital environ config, there's no download to complain [01:52] niemeyer: so we would need a top-level timeout anyway [01:53] niemeyer: but that's not hard to arrange [01:53] niemeyer: we can use both [01:53] wrtp: We can ask the state for an environment upfront, without using WaitForEnviron [01:54] wrtp: and run one attempt to upgrade [01:54] wrtp: By calling a function explicitly that contains the block that is within the select today [01:54] niemeyer: we actually don't need to use WaitForEnviron at all. we don't use the environ. [01:55] erm, maybe we do [01:55] tools, err := environs.FindTools(environ, binary, flags) [01:55] We need it [01:56] niemeyer: ah yes, good point. [01:57] wrtp: Either way, your plan is a great quick solution [01:57] wrtp: dying = u.tomb.Dying + 5 min delay [01:58] wrtp: Once we know there are no downloads available, reset it to pure Dying [01:58] niemeyer: once we know there are no downloads available, we can just return [01:59] wrtp: Uh? [01:59] i think [01:59] wrtp: What if there is a new upgrade? [01:59] wrtp: It should still work as usual [02:00] wrtp: The loop waits for environ changes [02:00] niemeyer: this is *after* we've already received a dying signal, right? [02:00] wrtp: No, I was talking about upgrader behavior at all times [02:00] niemeyer: ok, i'm not sure what you mean by "dying = u.tomb.Dying + 5 min delay" then [02:00] wrtp: On entrance, a dying channel is built that will fire if u.tomb.Dying fires, but with a 5 mins delay [02:01] niemeyer: ahhh [02:01] wrtp: Everything works as usual then.. we get into the loop, Changes fires with the first env config, [02:01] wrtp: THen, when we get the first download-not-found, we reset dying to take off that Delay [02:02] wrtp: From then on it's fine for it to die at any point [02:02] niemeyer: we still want to try to complete a download if we're killed, no? [02:02] wrtp: THen, when we get the first download-not-found, we reset dying to take off that Delay [02:03] wrtp: By definition, to get a download-not-found, we've checked if there was a download, and noticed there wasn't any [02:03] wrtp: Before that, dying is Dying + 5 mins delay [02:03] wrtp: Which means we have the 5 mins to complete the download [02:03] niemeyer: yes, that sounds good [02:04] niemeyer: because it's only the first download opportunity we really care about [02:04] wrtp: We can tweak as we go, but it *sounds* like this is a pretty trivial change on top of what we have today [02:04] wrtp: Right [02:04] wrtp: If we break in the middle later, we'll loop and get into that again in either case [02:04] niemeyer: yeah, that sounds great actually [02:05] wrtp: If there's no upgrade to do we reset it as well, of course [02:06] niemeyer: ah, i thought that's what you meant by download-not-found actually [02:06] niemeyer: but it's all good [02:06] wrtp: Yeah, both that and actual failures in download (we just break right now) [02:08] god i love how easy it is to reason about this stuff with channels [02:09] wrtp: +1! [02:09] niemeyer: e.g. http://paste.ubuntu.com/1201718/ [02:12] niemeyer: a little better: http://paste.ubuntu.com/1201720/ [02:17] wrtp: Yeah, that's nice [02:18] wrtp: A bit unfortunate that it hangs a goroutine forever for the 5 mins delay [02:18] wrtp: But I guess that's minor [02:18] niemeyer: well, it can be 5 minutes only if the channel is buffered [02:19] wrtp: Hm? [02:19] niemeyer: oh no, i'm talking rubbish [02:19] niemeyer: it hangs a goroutine for 5 minutes, as you say, minor. [02:19] wrtp: No, it hangs forever [02:20] niemeyer: oh yeah, i see what you mean [02:21] niemeyer: i can't get very worked up about it :-) [02:21] wrtp: If we use a local var t *Tomb assigning to the u.tomb, we could use something like this: http://paste.ubuntu.com/1201731/ [02:22] wrtp: So we Kill the delayed tomb after we have the first round acknowledged [02:22] wrtp: and set t back to the real tomb [02:22] niemeyer: i see what you're doing, but i think it's overkill [02:22] niemeyer: we really couldn't care less about one goroutine [02:23] wrtp: Up to you.. the two functions have the same size, are ready, and one of them doesn't leak memory [02:25] Alright, it's time for me to take some shower [02:25] niemeyer: "leaking memory" is a little strong. it cleans up when the upgrader quits [02:25] wrtp: Dude, and it's super late for you too [02:26] niemeyer: indeed [02:26] wrtp: Yes, when the process dies, it will release the memory :-) [02:26] niemeyer: or 5 minutes after the upgrader returns. [02:27] wrtp: Which is done when the process dies!? [02:27] wrtp: Otherwise we'll always have an upgrader running [02:27] wrtp: Either way.. I'm not keen or trashing memory like that when it's trivial to avoid, but I'm not keen on bikeshedding on it either.. have a great sleep there [02:28] niemeyer: will do, thanks [02:28] niemeyer: enjoy your shower... [07:03] hey, anyone who's around, cath is not well and I'm popping out to get some medicine... and I may in general be a bit sketchier than usual today [07:05] right o [08:00] fwereade: mornin' [08:06] wrtp, heyhey [08:07] wrtp, looks like you got a lot done last night -- nice :D [08:07] fwereade: you'll be glad to know i got the uniter running live last night [08:07] fwereade: it broke immediately, but that's not the point :-) [08:07] fwereade: it was started and connected to the state etc [08:07] wrtp, oh, bugger, I didn't read everything I missed [08:07] wrtp, what fell over? [08:07] fwereade: it failed to download the charm [08:08] fwereade: it's probably something i'm doing wrong in the test [08:08] wrtp, huh, weird [08:08] fwereade: one mo, i'll show you its log [08:09] fwereade: it repeats forever doing this: http://paste.ubuntu.com/1202116/ [08:09] fwereade: (it would be nice if we could see the url that's failed, don't you think?) [08:09] fwereade: it's actually quite good that it failed because it exposed me to an upgrader issue that i hadn't considered before [08:10] wrtp, huh, very odd [08:10] wrtp, but ModeInit is I think before install time, isn't it? [08:10] wrtp, that should be ModeInstalling [08:11] wrtp, most likely something to do with getting the unit address [08:14] fwereade: this is a classic example of why errorcontextf is not always sufficient BTW [08:15] wrtp, I'm not denying your general point about ErrorContextf, but I think that had it been used in the provider methods it would have been fine... surely? [08:16] wrtp, your point that it makes it inconvenient to return specific error values holds far more water IMO :) [08:16] fwereade: well, yeah, it must be used in every single function [08:17] wrtp, it must be used mindfully, at any rate :) [08:17] wrtp, IMO it comes down to a question of who has responsibility for producing sane errors -- the thing that fails, or its client [08:18] wrtp, despite the drawbacks, of which I am aware, I still come down on the thing-that-failed side [08:18] fwereade: it depends what you mean by a sane error i think [08:18] wrtp, probably, yeah :) [08:19] fwereade: with the current scheme, we're always passing the buck - when we see a function like the above, we say it's fine because it adds error context. but it's only fine if all the things it calls do the same. [08:20] fwereade: and in many of those functions it doesn't really look like a context is necessary. [08:20] fwereade: so nobody calls it out in review [08:20] wrtp, hmm, I think that statement is flawed in exactly the same way as "if it's up to the client, it's only ok if every single error everywhere is annotated" would be [08:20] wrtp, in either case it's essentially advocating tracebacks [08:21] fwereade: i think some kind of (fairly minimal) traceback is exactly what we need for diagnosing problems like we've just encountered [08:22] fwereade: we need just enough to walk us through the tree of possible error paths. [08:23] fwereade: what i'm going to have to do now is find out which of the functions that ModeInit is calling might be able to generate the error we saw. that's slow and error prone in itself. [08:23] wrtp, AFAICS there are only two things that could have failed there, and the actual problem is the same in either case, right? [08:24] wrtp, one is PrivateAddress, the other is PublicAddress, and (almost certainly) they'd each need to be fixed in the same way [08:24] wrtp, I totally understand there are cases where it could be much less pleasant [08:24] fwereade: ah yeah, that's true. i'd forgotten they accessed an http link [08:25] wrtp, and such a case is certainly evidence for inadequate annotation [08:25] wrtp, but I *think* I'm comfortable with the tradeoff [08:25] fwereade: i'll change their error messages to be a little better for a start [08:25] wrtp, the number of cases where I've actually been baffled by an error is pretty small [08:26] fwereade: just wait until we've encountered more of these kinds of errors reported by people in the wild :-) [08:27] wrtp, fair point -- but all the same, I'm uncomfortable with the idea that the way to handle errors right is to hand-hack traceback functionality everywhere [08:27] fwereade: i don't think of it as traceback functionality. i think of it as describing the error :-) [08:38] wrtp, it's the "everywhere" that bugs me more than the precise term we use for the mechanism by which we describe the error (but, yes, good descriptions will be better than tracebacks) [08:39] fwereade: i really feel there's no shortcut. but i'm rolling with it for now. if we decide to change the policy in the future, it won't be too hard. [08:40] wrtp, yeah :) [08:52] wrtp, btw, are your current branches messing with PATH in upstart? [08:53] fwereade: no [08:53] wrtp, awesome :) [08:53] wrtp, but I do have to fix something now then [08:53] fwereade: at least... there might be a dreg in the tests. i'll check. [08:54] wrtp, yeah, the problem is that the uniter tests mess with PATH [08:54] wrtp, it's not too bad though, I can do a trivial and notsonice fix quickly, and then sort out the long-running-hook-server thing which will let me do it nicely [08:54] fwereade: i've found the metadata problem BTW [08:54] wrtp, oh yes? [08:55] fwereade: it's using the version 1.0, but public-hostname didn't exist in that version [08:55] wrtp, ha :) [08:55] fwereade: tbh i think it should probably use "latest". [08:55] wrtp, mmmmmmaybe, I'm just not sure what guarantees they make re "latest" compatibility [08:56] fwereade: grr [08:56] 04.15.979 ... value *errors.errorString = &errors.errorString{s:"cannot put charm: cannot write file \"local_3a_precise_2f_dummy-1\" to control bucket: remote error: handshake failure"} ("cannot put charm: cannot write file \"local_3a_precise_2f_dummy-1\" to control bucket: remote error: handshake failure") [08:56] wrtp, yeah, those piss me off :/ [08:56] fwereade: ok, i'll use 2012-01-12 then [08:56] wrtp, brb cig [08:56] wrtp, cool [08:59] fwereade: actually i think "latest" should be fine [09:03] wrtp, I'm happy to trust your research/judgment :) [09:04] fwereade: for metadata categories it talks about "version introduced" which to me implies that a category can't be retracted once introduced [09:05] fwereade: and all their examples use "latest" [09:05] wrtp, sounds sane to me :) [09:12] hello. [09:16] Aram, heyhey [09:17] fwereade: holy shit, it worked! [09:17] * fwereade cheers [09:17] wrtp, running unit agent? [09:17] fwereade: unit agent upgraded [09:18] wrtp, sweeeet [09:18] wrtp, and I'm just about to land charm upgrades too [09:18] wrtp, although I haven't actually implemented upgrade-charm yet [09:18] fwereade: i need to think of a nice test so we can live test that the uniter can run a charm [09:19] fwereade: how about a trivial charm that starts a web server? then we can poll for a while after the charm has started to see if the web server comes up [09:19] Aram: morning! [09:20] wrtp, yeah, sounds sensible, just checking for unit status is not adequate to tell that the *charm* is working [09:20] * wrtp is quite happy now [09:20] fwereade: indeed. i want to check that a hook has been executed for real [09:20] fwereade: this can lead into other tests that test more sophisticated functionality, i think [09:20] wrtp, sgtm [09:21] wrtp, will also test expose, which is nice [09:21] wrtp, maybe just an echo server? [09:21] fwereade: in fact, can i tell from the status that a charm has finished executing its start hook? [09:21] wrtp, yeah, once unit status is started [09:22] fwereade: sweet. that means that i won't have to wait too long after that [09:22] fwereade: and i've just implemented the unit watcher, so i can use that to watch the status [09:22] fwereade: lovely jubbly [09:22] wrtp, you could write an echo server with plain/pirate/3117 transforms, to test config-changed as ell :) [09:22] s/ ell/ well/ [09:23] fwereade: i think i'll just write a server that executes a specified jujuc hook and sends back the result [09:23] s/hook/callback/ [09:23] wrtp, how do you plan to do that? [09:23] fwereade: hmm, good point! [09:23] wrtp, :p [09:24] fwereade: darn, out of context callbacks [09:24] wrtp, yeah, I'm strongly inclined to do the preliminary work for that today, just because it makes the uniter much cleaner (and will, I *think*, improve performance a little) [09:25] wrtp, but you shouldn't count on the full thing being available any time soon at all [09:25] wrtp, but even a trivial echo server will verify that open-port works, and by extension (probably) the rest of the jujuc stuff [09:27] fwereade: maybe it's silly, but i'd prefer something that provides some actual info from the other side, so we *know* that we're talking to the right thing. [09:28] wrtp, well, anything you want to send back from jujuc can be grabbed and stored when you run the start hook, right? [09:29] fwereade: hmm, can i assume that python will be installed? [09:29] wrtp, but ISTM that config-changed is a useful mechanism for checking end-to-endness [09:29] wrtp, sorry I don't know [09:29] fwereade: yeah, i was thinking that [09:29] fwereade: what's the simplest web server i can write? [09:30] fwereade: that doesn't need to pull in any more resources than we already have [09:31] wrtp, if you have python, SimpleHTTPServer is pretty damn trivial [09:31] fwereade: yeah, if [09:31] wrtp, *but* the default python version is changing, so you want to be careful of that, [09:31] fwereade: oh pish [09:31] wrtp, I actually think you probably can guarantee *some* python [09:31] wrtp, but yeah, exactly [09:31] fwereade: perl is probably installed [09:32] fwereade: (although i've never written a line of perl in my life) [09:32] wrtp, whatever you need, it's one line in theinstall hook, surely? [09:32] wrtp, in my very limited experience you're not missing much ;p [09:32] fwereade: i firmly believe that to be the case :-) [09:37] fwereade: i miss the inferno shell where i could write: listen 'tcp!*!12345' {echo hello there} [09:37] fwereade: and instantly have a working server [09:38] fwereade: i think we must have python because cloud-init is written in python. [09:38] fwereade: i'll just have to be a teeny bit careful of version changes [09:41] fwereade: does hook output get logged into /var/log/juju/unit*.log ? [09:41] wrtp, nope, but juju-log will [09:41] fwereade: hmm. where *does* hook output go? [09:42] wrtp, I suspect it goes nowhere at the moment [09:42] fwereade: i think that's wrong [09:42] wrtp, fair point, we just want to be careful not to do it how we did in python, where every line on stderr gets prefixed with ERROR: ;) [09:43] fwereade: just trying to deploy the real wordpress charm, just to see what happens... [09:45] fwereade: http://paste.ubuntu.com/1202263/ [09:45] fwereade: also: http://paste.ubuntu.com/1202265/ [09:46] fwereade: it's looking good! [09:46] fwereade: no way of knowing why the install hook failed though [09:46] wrtp, that's pretty cool :D [09:47] wrtp, I kinda would like to separate hook output from uniter output, but that's just an instinct, not sure it holds water [09:47] wrtp, yeah, debug-hooks would be useful for that [09:47] fwereade: i'm not sure it does. they do need to be readily distinguishable though [09:48] fwereade: what does debug-hooks do again? [09:48] wrtp, yeah, that's the heart of it [09:48] wrtp, whenever the uniter runs a hook it just gives the client a shell instead [09:49] fwereade: [09:49] 2012/09/13 09:42:35 JUJU HOOK some hook output [09:49] ? [09:50] fwereade: i don't think we need to know if it was written to stdout or stderr [09:50] wrtp, sounds reasonable [09:51] fwereade: i might just go and do that to scratch my itch. [09:51] wrtp, +1 [09:53] wrtp, btw, I think I have some idea why I wanted the agent tools to go in the agent tools dir -- because the uniter does write to it [09:54] fwereade: i don't think that's a problem. a given version of the uniter will always write the same thing, no? [09:54] wrtp, yeah, indeed, I'm not sure it's really a big deal [10:04] fwereade: ah, is it likely the reason for the failure is that the PATH isn't set up yet? [10:04] wrtp, ha! [10:04] wrtp, yes, very likely indeed [10:05] fwereade: hook output logging is done BTW, with the small matter of a few tests to go [10:05] fwereade: very happy with how easy it was to navigate the uniter code [10:24] wrtp, sweet :D [10:46] wrtp, sorry, I think I'm being dense... what's the use case for the unit watcher? [10:46] fwereade: the test uses it to watch for the agent tools version changing [10:47] fwereade: it can also be used to watch for the unit status changing [10:47] fwereade: otherwise we'll have to poll, which seems wrong. [10:47] wrtp, hmm, I'm -1 on test-only code polluting the API, where we can help it [10:47] wrtp, I'm comfortable polling in tests (although ofc I do appreciate it when I don'rt have to) [10:48] fwereade: i don't think it's strictly test only. we could easily have some client functionality that wants to watch some aspect of a unit. [10:48] wrtp, however, it doesn't cover status changes properly (down?) [10:49] wrtp, I dunno, I feel we have too much speculative code in state anyway [10:50] fwereade: FWIW we've already got MachineWatcher which is basically the same thing [10:51] wrtp, is MachineWatcher used? (or does it have a concrete use case in mind?) [10:51] fwereade: it's used to watch the instance id for one [10:51] wrtp, ok, so it has a use, and I'm fine with that (although *really* that STM like a job for a more specialized watcher) [10:52] fwereade: this was at niemeyer's request. i started with a more specialized watcher. [10:52] wrtp, ha, ok [10:53] fwereade: i think the UnitWatcher is sufficiently general that it doesn't really count as test-only code, even though that's how we're using it currently. [10:53] wrtp, I kinda feel a watcher that doesn't fire on certain important changes is a bit of a bad thing though [10:53] fwereade: we will eventually use it to do major-version upgrades [10:53] wrtp, if it's meant to watch status, it ought to actually watch status [10:54] fwereade: yeah, maybe it should watch the alive status too. [10:54] wrtp, a ToolsVersionWatcher that can handle both units and machines, and strips out unwanted changes, feels much cleaner to me [10:55] fwereade: i agree, but i've been told that's wrong. so here we are. [10:56] wrtp, (btw, tangentially related: with the provisioner bool in machine, can we drop the PA now and just run the appropriate workers in the MA?) [10:56] wrtp, bah [10:56] wrtp, this is the trouble with the "we should do more reviews ourselves thing" [10:56] fwereade: yes, that's on the todo list, maybe today [10:56] fwereade: i know [10:56] wrtp, sweet [11:23] fwereade: what do you think is the right behaviour for logging output from hooks that continue to run something printing in the background after exiting? [11:24] wrtp, hum, I rather feel that hooks probably shouldn't do that, but good question [11:24] fwereade: i'm tempted to throw away any output after the hook exits [11:24] fwereade: that way at least it's clean and we can't get any intermingling [11:24] wrtp, yeah, I'd assumed it would just be a CombinedOutput [11:24] wrtp, wait a mo [11:25] fwereade: i don't want to use CombinedOutput as i want to log lines as they happen [11:25] wrtp, won't plain hook output go to logdir/agentname.out anyway? [11:26] fwereade: why would it? [11:26] * fwereade goes to peer at docs [11:27] wrtp, ok, it wouldn't [11:30] wrtp, FWIW, log-until-it-ends matches the python [11:30] wrtp, but I agree that logging lines as they happen is a good thing [11:30] wrtp, (that is also what py does0 [11:30] wrtp, bbs, updates need restart [12:00] rogpeppe, btw, when you said the machine/unit watchers would be used for major version upgrades... how? [12:00] rogpeppe, ah ok, sorry, now I see [12:00] fwereade: cool, np [12:00] rogpeppe, except [12:00] * fwereade thinks [12:02] rogpeppe, ok, so we upgrade everything, and everything knows not to hit state (apart from that bit?) until after the state itself has been upgraded? [12:02] fwereade: that's right [12:03] fwereade: well, actually we probably make a new state server [12:03] rogpeppe, doesn't that put quite a major restriction on the sort of state changes we can actually make? [12:03] fwereade: i'm not sure of the details [12:03] fwereade: how do you mean? [12:06] rogpeppe, well, where's it going to write the change? in the old document (created/managed by old code)? or in the new document, which doesn't exist yet and is on a different server? [12:07] fwereade: where is what going to write the change? [12:07] fwereade: the upgrader? [12:07] rogpeppe, yeah [12:07] rogpeppe, oh, wait, it writes the code version before it's running the new code, right? [12:07] rogpeppe, feels a bit off somehow [12:09] fwereade: it uploads the new tools, writes the tools version, waits until all agents are in "pending" state, transforms the database, then triggers all the agents to have at it [12:09] fwereade: by pending, i mean "major-upgrade-pending" of course [12:10] rogpeppe, ok, thanks :) [12:18] Aram, pre-review delivered on https://codereview.appspot.com/6492110/ -- but I'm worried I've got completely the wrong end of the stick [12:19] Aram, I thought the one thing we *weren't* meant to be notified of was a remove? (except when the watcher happens to have missed a Dead, in which case we should get a Dead change..?) [12:21] fwereade: I only implemented current behavior, not what we want it to be in the future. [12:21] you are right about the future. [12:22] Aram, my concern is that the future is *now* [12:23] Aram, AIUI we dropped the life stuff in state so we could focus on it in mstate [12:24] we can never switch, nor test everything if the API is not the same though. [12:24] Aram, but this API is useless [12:24] Aram, as is the one in state [12:27] Aram, actually, how many of these watchers have non-test clients currently? [12:28] Aram, I presume that at least the firewaller/provisioner/machiner does use some of them, maybe most of them [12:28] white:juju-core$ lsr | egrep 'go$' | egrep -v 'state1|mstate|test' | xargs egrep 'Watch[A-Z][A-Za-z]*' | wc [12:28] 64 394 5470 [12:30] white:juju-core$ lsr | egrep 'go$' | egrep -v 'state1|mstate|test' | xargs egrep '[A-Z][A-Za-z]*Change' | lsr | egrep 'go$' | egrep -v 'state1|mstate|test' | xargs egrep '[A-Z][A-Za-z]+Change[^A-Za-z]' | wc [12:30] 206 1304 18589 [12:31] nah, that's wrong [12:31] white:juju-core$ lsr | egrep 'go$' | egrep -v 'state|mstate|test' | xargs egrep '[A-Z][A-Za-z]*Change[^A-Za-z]' | wc [12:31] 8 52 792 [12:31]