[01:22] le sigh [01:23] charm deployments all broke overnight [01:23] was working yesterday [01:23] now nothing deploys [05:44] davecheney, ouch :( [05:46] davecheney, what was it? [06:09] no idea [06:13] fwereade: i thought it was just the charms I was testing [06:13] or some ec2 weirdness [06:13] but I went back and testing charms i knew worked [06:13] and they all jam with the same error [06:13] or [06:13] waiting for the error state to clear [06:48] morning [06:51] TheMue, heyhey [06:51] davecheney, huh, what's the error? [06:51] fwereade: heya [07:07] morning all [07:07] wrtp: hi [07:07] davecheney: i hope that's not my fault [07:07] TheMue: hiya [07:08] davecheney: do you see the uniter version in the status? [08:14] wrtp, fwereade: here's a trivial one https://codereview.appspot.com/6649044/ [08:15] TheMue: can't we get rid of FwDefault completely? [08:15] TheMue, yeah, I was just about to ask about that :) [08:25] as far as i understood niemeyer he wants to keep it [08:26] TheMue: i'd prefer to have a method on Environ that returns the environment's default firewall mode. [08:26] TheMue: as it is, FwDefault doesn't mean anything - we don't know what semantics to expect. [08:26] wrtp: please add it as comment so that i can discuss it with niemeyer [08:27] TheMue: ok, will do. i'm afraid that means that this branch can't be taken to be trivial and submitted immediately. [08:27] wrtp: see also point 2 in his mail [08:27] wrtp: it is only the first one in a row of 6 [08:28] TheMue: reading his email, it seems like he wants to get rid of FwDefault too [08:29] TheMue: " [08:29] We can't ever use FwDefault outside of the [08:29] provider [08:29] " [08:32] wrtp: today we have no EnvironProvider.Validate. imho that is intended to to validate in the mode and in case of FwDefault return the providers default mode, instance or global [08:32] wrtp: that's my interpretation of point 2 [08:33] TheMue: ah, i think i understand now. [08:33] thinking [08:34] wrtp: so different environments can return different defaults [08:35] TheMue: i'm not sure i understand why we need an explicit default value. can't ValidateConfig see that the firewall mode is missing from the config, and substitute its default mode? [08:36] TheMue: that is: we already have a good way of specifying default values - we leave the value unspecified [08:37] yep [08:45] whoops, doctor's appointment, gtg [08:45] see you shortly [09:22] moin. [09:37] Aram: moin moin [09:37] * TheMue has to step out to play taxi driver for his daughter. bbiab [12:00] fwereade: it looks to me as if the cloud-init script runs as root. can you think of a reason to retain the sudos in environs/cloudinit ? [12:00] wrtp, not offhand, no [12:00] fwereade: i think they're just there as a legacy of the python [12:00] fwereade: i'll remove 'em and see if anything breaks [12:01] wrtp, that's probably the easiest way [12:01] fwereade: BTW i think we should probably not be running our agents as root [12:01] wrtp, except, wait, is everything broken since last night? [12:01] fwereade: it seems to work for me. but i haven't tried deploying a charm [12:01] wrtp, that would be the ideal but I'm not sure I see a good way round it [12:02] fwereade: oh, because we need to be root to start LXC containers? [12:02] * fwereade wonders if he broke config somehow, really ought to actually take a look at it, it's just the contextswitching... [12:02] wrtp, I thought we did, yeah [12:02] fwereade: hmm, that makes sense :-( [12:03] fwereade: i'm just trying to deploy a charm now. will paste you the log output if it fails [12:03] wrtp, debug output would probably be helpful if that can be arranged [12:04] fwereade: i think debug output is always enabled currently [12:06] fwereade: it certainly *seems* to be deploying ok [12:08] fwereade: hmm, it seems to have executed the install and start hooks, but the status is still "pending" [12:08] fwereade: http://paste.ubuntu.com/1273059/ [12:09] fwereade: current status: http://paste.ubuntu.com/1273062/ [12:10] * fwereade looks [12:11] wrtp, huh, it seemed on dave's that he was always failing start hooks, that looks like the hook is fine [12:11] fwereade: yeah, but shouldn't the status show something other than "pending"? [12:12] wrtp, yeah, so somehow it seems ModeAbide is wedged, very early on [12:13] wrtp, which is strange... I suspect it's waiting for a config event which should be guaranteed [12:13] fwereade: is this unexpected? "2012/10/11 12:07:34 JUJU cannot read hook output: read |0: bad file descriptor" [12:13] wrtp, isn't that your hookLogger? ;p [12:13] fwereade: probs [12:13] wrtp, I haven't seen it but AIUI that sort of thing is not entirely unepected is it? [12:14] fwereade: that's maybe what happens when the fd isn't closed properly, i can't remember. [12:14] wrtp, if it had debug output I would be able to tell what the filter was doing, I think [12:14] fwereade: ah, is debug output not enabled by the global debug flag? [12:14] wrtp, (even though I would also have to wade through the state debug stuff as well... sigh :)) [12:15] wrtp, hmm, not sure, haven't looked at our cloudinits for a while, didn't actually know it was doing the debug flag [12:15] fwereade: ah, container doesn't enable the debug flag, grr [12:16] niemeyer: yo! [12:22] Morning! [12:26] fwereade, niemeyer: trivial CL: https://codereview.appspot.com/6654043 [12:29] wrtp: LGTM [12:29] niemeyer: thanks [12:34] niemeyer, heyhey [12:37] fwereade: Heya [12:45] niemeyer: another small one: https://codereview.appspot.com/6655043/ [12:47] wrtp: LGTM [12:47] niemeyer: thanks [12:47] niemeyer: (the quick reviews are really appreciated!) [12:48] wrtp: My pleasure [12:49] fwereade: When you have a spare moment to continue yesterday's brainstorm, please ping [12:49] niemeyer, should be good in just a sec [13:03] niemeyer, ping [13:03] fwereade: Yoyo [13:04] fwereade: Can we have a quick call? I think we can save time [13:04] niemeyer, sgtm [13:05] niemeyer, https://plus.google.com/hangouts/_/044d83e4e11954e90f12cb6c2a82bf2e8ed1c724?authuser=0&hl=en# [13:05] fwereade: Thanks.. funny that you can paste the URL and my phone is ringing before I manage to open the G+ page [13:06] * fwereade is quick like ninja, on very rare occasions :) [13:37] niemeyer: morning [13:42] fss: Heya [13:48] niemeyer: heya, did you read rogers comment about point 3 of the firewall mode changes? he prefers to put the global open/close/ports at Environ instead of EnvironProvider. i like that idea, because it feel more natural work on the concrete environment. what do you think? [13:49] TheMue: He's totally right.. it was my mistake [13:49] niemeyer: fine, will do so [14:30] niemeyer: the final piece, i believe: https://codereview.appspot.com/6655044 [14:38] wrtp: Looking [14:46] wrtp: done [14:46] niemeyer: thanks [14:46] wrtp: np [14:47] niemeyer: the _ for the state was deliberate BTW. it means i can return nil, err and still see if the state was set. [14:47] niemeyer: i'd prefer not to use bare returns [14:48] wrtp: Let's please name it properly as usual [14:48] niemeyer: ok, i'll rename the variable further down i guess [14:48] wrtp: I don't understand why you have to [14:48] niemeyer: if i don't do that, then my defer won't work [14:49] niemeyer: because it won't be able to see that st!=nil [14:49] wrtp: You don't need the defer either [14:49] wrtp: See the rest of the review [14:49] niemeyer: ah, ok [14:50] niemeyer: i'm not sure about putting the password in the agent dir itself. do we guarantee that the agent dir is mode 700 ? [14:51] niemeyer: mind you, i suppose not much information can be gleaned from the password length as it's always the same. [14:52] wrtp: No, just the file [15:21] niemeyer: PTAL https://codereview.appspot.com/6655044 [15:21] niemeyer, lbox appears to be panicking when I try to authenticate -- 3/4 times so far (the 4th time it complained about an incorrect password, which I'm ... pretty sure I typed right, but ofc cannot verify) [15:22] niemeyer, is this reminiscent of anything known to you? [15:23] niemeyer, this is the (interesting bit of, i think) the panic: http://paste.ubuntu.com/1273426/ [15:24] fwereade: No, there was an issue with auth when Go changed in an incompatible way between releases, but it shouldn't be happening now [15:24] niemeyer, google seems to agree that my password is what I think it is... I think it is moderately unlikely that I would have typed it wrong on all 4 occasions [15:24] fwereade: Try to drop your auth details (rm ~/.lbox*) [15:25] niemeyer, don't seem to have any [15:25] fwereade: Sorry, that should be .lpad [15:27] niemeyer, is that connected to the google auth? [15:27] fwereade: Sorry, I'm clearly on crack... that's ~/.goetveld* [15:28] ha, now I'm getting a blank auth page on launchpad :/ [15:30] niemeyer, bah, same ol' panic [15:34] * fwereade resorts to the universal panacea [15:38] fwereade: Let me try to kill my auth and see if I can repro [15:39] niemeyer, yeah, if I do a deliberate wrong password it fails nicely [15:39] niemeyer, correct password has panicked every time [15:39] wrtp: LGTM [15:39] niemeyer, which is *insane* because I proposed earlier today [15:39] niemeyer: thanks! [15:40] fwereade: Did you upgrade in between? [15:40] niemeyer, hum, I think I probably did [15:40] niemeyer, not sure what from [15:40] fwereade: I hope it's not an interim breakage again [15:41] niemeyer: and with that, i think our current authentication stuff is done. woo. [15:41] niemeyer: i'm gonna push a few trivial changes that i've been putting off for a while, but if you have something urgent that i should move on to, let me know. [15:42] wrtp: Cool, the main thing is really to keep pushing on that front, testing etc.. but trivial changes as a break sounds good too [15:49] fwereade: I'm facing EOF issues with LP before even getting there [15:49] fwereade: The cloud gods are mad [15:49] niemeyer: it seems to be working ok for me [15:50] fwereade: Worked fine, but I think I was using a locally built lbox.. let me make sure I'm getting the one from the PPA [15:52] * fwereade wonders what he did :/ [15:54] fwereade: Boom [15:54] niemeyer, ha: [15:54] 2012/10/11 17:54:23 RIETVELD Login on https://codereview.appspot.com successful. [15:54] 2012/10/11 17:54:23 RIETVELD Login failed: Get http://example.com/marker: redirect blocked [15:55] fwereade: Launchpad's lbox must be compiled with the broken tip [15:55] fwereade: I'll fix that after lunch [15:55] fwereade: Meanwhile, go get launchpad.net/lbox will get you going [15:55] niemeyer, lovely, thanks [15:58] Hmm [15:58] It's compiling against stable, actually, which seems to indicate that it was released broken? Uh oh [16:00] * niemeyer => lunch [16:00] niemeyer, yeah, the freshly built version seems to fall over the same way [16:00] niemeyer, I will also be back later [16:00] niemeyer, (ty for the advice re config-changed, infinitely cleaner now) [16:29] fwereade: it catched me too, lbox panicked *sigh* [16:34] This really sucks [16:35] I anticipated the bug, reported, made sure it was fixed, and even then it got released without the fix.. :( [16:42] TheMue: go get launchpad.net/lbox [16:43] I'll remove the Launchpad package [16:43] Or perhaps just make it build against tip [16:43] * niemeyer checks it [16:49] niemeyer: got it, same error here. funnily the package one already worked twice today for me [16:49] TheMue: Ah, you probably have go 1.0.3 installed [16:49] TheMue: So hold off a bit until the package bulids [16:50] TheMue: The broken one will work fine until you have to auth again [16:50] niemeyer: yes, 1.0.3. so i'll wait [16:53] niemeyer, fwereade, TheMue, Aram: this CL makes log messages consistent, as talked about on juju-dev. it's a large CL, but pretty trivial. https://codereview.appspot.com/6654044/ === wrtp is now known as rogpeppe [16:54] rogpeppe: *wow* [17:00] rogpeppe: It's huge indeed, and in a quick sampling it doesn't feel so great [17:00] rogpeppe: Comments sent [17:00] niemeyer: thank you [17:00] rogpeppe: It'd be good to have a more careful evaluation [17:01] niemeyer: i thought the HOOK output was perhaps not so good [17:01] niemeyer: i went with a literal interpretation of the rules to start with [17:01] rogpeppe: » » fmt.Fprintf(os.Stderr, "%s: %v\n", filepath.Base(os.Args[0]), err [17:01] rogpeppe: What did the literal rules say about fmt.Fprintf? [17:02] niemeyer: i interpreted that as a kind of log output. perhaps that's a stretch. [17:02] niemeyer: even if that doesn't get done in this CL, i think it's worth doing. [17:02] rogpeppe: Yeah, *log* and *output* are not the same thing [17:02] niemeyer: saying "error:" for every command is not great. [17:03] niemeyer: the unix standard is to print the name of the command [17:03] rogpeppe: Good old let's-derail-the-conversation-until-we-disagree [17:03] niemeyer: ok, sorry, i thought it was pretty uncontroversial. [17:04] niemeyer: i'll rewind those changes [17:04] rogpeppe: It's *very* uncotroversial to bikeshed widely about log messages, send a message to the mailing list so we agree, explicitly state that it's about log.Printf/Debugf for *clarity*, then send a 100 files change that does something else entirely [17:05] niemeyer: i'm sorry [17:06] niemeyer: as a separate CL, might you agree in principle that changing the "error:" messages is a good idea? [17:06] rogpeppe: No, I want to keep moving forward and stop the bikeshed immediately [17:07] niemeyer: when i see shell script output, it's useful to know what commands have printed the messages. [17:07] niemeyer: i've run across this a few times so far. [17:07] . [17:09] * rogpeppe thinks this is more than just a bikeshed colour. It adds to juju usability. [17:09] but i'll stop there. [17:10] Thanks, let's make it work [17:11] perhaps i should abandon all the log printf changes too, if it's just bikeshedding. [17:13] rogpeppe: Messages are inconsistent, we discussed this on IRC, and in the mailing list, and agreed on something. If you wanna drop it, someone else can pick it up later, no worries. [17:14] rogpeppe: That's unrelated to continue fiddling with output messages, though. [17:14] niemeyer: do you agree with dropping the cmd/ prefix for commands BTW? [17:14] rogpeppe: Is it ok to have cmd/juju and juju prefixed by the same thing? [17:15] niemeyer: hmm, good question. [17:15] niemeyer: probably not. that's a good call. [17:23] TheMue: lbox rebuilt.. wanna give it a try? [17:23] niemeyer: yes [17:25] niemeyer: yep, it works, thanks a lot [17:25] TheMue: Phew, np [17:26] fwereade: ^ [17:27] * TheMue is stepping out, dinner time [17:27] TheMue: Enjoy [17:27] niemeyer: first three CLs for the global mode are in [17:46] niemeyer, having supper now, but https://codereview.appspot.com/6632062 reproposed [17:47] fwereade: Sweet [17:47] fwereade: Thanks much [17:48] niemeyer: this should be better i hope: https://codereview.appspot.com/6654044 [17:48] i'm off now, night all [17:49] rogpeppe: Thanks, have a good one [17:50] fwereade: That looks very nice indeed [17:56] niemeyer: and you [18:16] niemeyer, fwiw we've had a few people today run into a nil memory ref in lbox.. https://pastebin.canonical.com/76339/ [18:16] hazmat: Yeah, it's fixed already [18:16] hazmat: unfortunately 1.0.3 got released with a bug [18:17] niemeyer, awesome re fix, thanks [18:17] hazmat: Go 1.0.3 that is.. we've fixed the issue before it was out, but the release manager missed the fix [18:19] niemeyer, ah.. i remember you mentioning that a while back re bug in go http client. [18:20] cool [18:20] hazmat: Exactly.. we've rushed to fix that shortly after the bug was introduced, back in july.. sadly the bug was merged onto the release and the fix wasn't [18:34] niemeyer, cool, thanks [18:44] fwereade: ping [19:19] * niemeyer => doc.. back soonish [21:23] % juju bootstrap --upload-tools [21:23] error: cannot upload tools: cannot write file "tools/juju-0.0.1-precise-amd64.tgz" to control bucket: We encountered an internal error. Please try again. [21:23] thanks, amazon [21:39] niemeyer, pong [21:49] davecheney, heyhey -- I never got to investigating your problems from yesterday, which I felt I kinda should have, but... well, er, I didn't [21:49] davecheney, can I be of any assistance now though? [21:51] fwereade: hey [21:51] i'm trying to have a look now as part of adding juju remove-unit [21:52] but I can't get an environment to bootstrap in any zone [21:52] davecheney, ha, ouch :( [21:52] us-east-1 is screwed [21:52] ap-southeast-1 is broken [21:52] trying us-west-1 [21:53] if I can get an env going [21:53] who sort of debugging will be useful ? [21:53] % juju bootstrap -e us-west-1 --upload-tools [21:53] lucky(~/src/launchpad.net/juju-core/cmd/juju) % juju deploy -e us-west-1 -n 5 mysql [21:53] error: cannot put charm: cannot make S3 control bucket: Your previous request to create the named bucket succeeded and you already own it. [21:53] that is fucking great [21:53] every non us-east-1 env now won't bootstrap [21:54] davecheney, oof [21:55] davecheney, well, I'm not sure that it would be, necessarily, with the logs you sent -- but rog was having a problem that seemed potentially kinda similar, that might have been helped by it [21:55] davecheney, I just mean running the agents with --debug [21:55] yeah, i saw he made that change last night [21:55] i've merge from trunk [21:55] so when I do get something deployed, it will be in --debug [22:00] davecheney, I appear to have something bootstrapping in eu-west-1 [22:00] fwereade: seet [22:00] sweet [22:00] while I have you [22:00] i've implemented conn.RemoveUnits(units ...) [22:00] davecheney, I was planning to try to deploy mongodb and see what happened :0 [22:01] as setting unit.EnsureDying() [22:01] and leaving it as that [22:01] am i correct that the UA will detect that, do whatever, and set the unit to Dead ? [22:01] davecheney, yeah, that should be enough [22:01] davecheney, it would also be good to implement --force [22:02] davecheney, which would call EnsureDead [22:02] that was my initial attempt [22:02] it didn't work [22:02] davecheney, which will be handy right now for removing horribly borked deployments [22:02] because it stepped around the UA [22:03] davecheney, blocked by something? or nothing responded? [22:03] davecheney, it shouldn't be the default but it should be possible, I think [22:03] unit went away [22:03] but the machine didnt [22:03] fwereade: Yo [22:04] davecheney: Morning! [22:04] davecheney, machines shouldn't go away until we terminate-machine [22:04] davecheney: Isn't that super early? :) [22:04] niemeyer, heyhey [22:04] fwereade: I was going to ask you about the service config watcher [22:04] niemeyer, oh yes [22:04] niemeyer: i was going to go for a ride, but it is pissing down [22:04] fwereade: I'm wondering about what logic to implement for the multi-config world [22:04] so, best to strike while the iron is hot [22:05] fwereade: Pondering if it would be fine to do per-charm-url watching [22:05] niemeyer, I think it would [22:05] aaaaaaaaaaaargh! [22:05] % juju status [22:05] error: We encountered an internal error. Please try again. [22:05] us-east-1 is screwed [22:05] fwereade: and assume that the filter would be kind enough to re-watch so the modes/etc don't have to care [22:05] fwereade: What do you think? [22:05] niemeyer, the filter is already in a position to know what the unit's current charm is, so it should be near-enough trivial [22:06] fwereade: Awesome! That makes things pretty easy [22:06] fwereade: Now that we have explicit settings management, the next step is almost trivial [22:07] niemeyer, the semantic change is that you say something like u.f.NotifyCharm(url, mustForceUpgrade), and assume that all config and upgrade events are filtered relative to that state [22:07] awesome [22:07] now i can't deploy at all [22:07] 2012/10/11 22:07:16 JUJU state: opening state; mongo addresses: ["localhost:37017"] [22:07] 2012/10/11 22:07:16 JUJU machiner: unauthorized access [22:07] fwereade: Neat [22:07] niemeyer, I just need to be slightly careful about not resetting the config watch when the charm didn't change so I don;t poo extra events into the stream [22:07] davecheney: Uh [22:07] that is on machine 0 [22:08] it's got itself locked out [22:08] davecheney: I guess rogpeppe didn't really test it live :( [22:08] davecheney, ey up [22:08] william@diz:~/code/go/src/launchpad.net/juju-core$ juju deploy mongodb [22:08] error: cannot put charm: cannot make S3 control bucket: Your previous request to create the named bucket succeeded and you already own it. [22:08] davecheney: Do you have access-secret set up in your environment? [22:08] davecheney: Perhaps it is assuming it is set [22:08] niemeyer: nope [22:09] davecheney: Try to set it in the environment config [22:09] niemeyer: i have an admin-secret [22:09] is that the same thing ? [22:09] davecheney: Erm, sorry, that's the one [22:09] fwereade: i get that error in every non us-east-1 env [22:09] 2012/10/11 21:58:58 JUJU machiner: machine agent starting [22:09] 2012/10/11 21:58:58 JUJU state: opening state; mongo addresses: ["localhost:37017"] [22:09] 2012/10/11 21:58:58 JUJU machiner: unauthorized access [22:09] 2012/10/11 21:59:01 JUJU machiner: rerunning machiner [22:09] the machiner/PA never starts [22:09] davecheney: Okay, so we'll need to debug it, or wait until rogpeppe addresses it [22:09] davecheney, ah balls, isn't that that the namespace is global or something, just a mo [22:09] davecheney: How does cloud-init look like? [22:10] fwereade: yes, all buckets live inthe same namespace [22:10] don't just copy the bucket config from one env to another [22:10] but even then, it doesn't help [22:10] ap-southeast-1: [22:10] type: ec2 [22:10] control-bucket: juju-7 [22:10] niemeyer: checking cloud init now [22:16] davecheney, bah, yeah, I appear to be just as screwed as you [22:16] niemeyer: bootstrapping now [22:16] * fwereade slopes back off to kick relations around some more [22:22] niemeyer: [22:22] 2012/10/11 22:20:46 JUJU:DEBUG watcher: loading new events from changelog collection... [22:22] 2012/10/11 22:20:46 JUJU storing no-secrets environment configuration [22:22] 2012/10/11 22:20:46 JUJU bootstrap-state initial password "" [22:22] jujud-machine-0 start/running, process 10711 [22:23] and then [22:23] 2012/10/11 22:20:46 JUJU machiner: machine agent starting [22:23] 2012/10/11 22:20:46 JUJU state: opening state; mongo addresses: ["localhost:37017"] [22:23] 2012/10/11 22:20:46 JUJU machiner: unauthorized access [22:23] 2012/10/11 22:20:49 JUJU machiner: rerunning machiner [22:23] 2012/10/11 22:20:49 JUJU machiner: machine agent starting [22:28] davecheney: That looks wrong [22:28] davecheney: It should have an initial password [22:28] niemeyer: let me dig into cloudinit on our side [22:29] davecheney: Or just have a look at what it looks like in the server [22:29] davecheney: I'm pretty sure rogpeppe addressed that already in theory [22:29] niemeyer: ironically, juju status works [22:29] so _i_ can connect to the state [22:29] just none of the agents [22:30] davecheney: That empty initial password is a good hint of what's wrong [22:30] davecheney: It should not be empty [22:53] niemeyer, offhand, do you know what refcounts exist at the moment in state? [22:54] niemeyer, I am at the point of needing to have a sane relation lifecycle in place, and probably service too [22:55] fwereade: I don't think we have any yet [22:55] niemeyer, am I right in recalling agreement that relation existence (with any life) should block service removal? [22:55] niemeyer, sorry, service Dyingness even [22:55] fwereade: THe thing we were going to add was machine units in the machine, but we ended up with the units themselves there [22:55] I mean, unit names [22:56] fwereade: Yes, that's my understanding as well [22:56] niemeyer, great [22:56] niemeyer, except, hm, when I say removal, I actually *mean* EnsureDying [22:57] fwereade: I hadn't thought of that, but I guess it makes sense [22:58] niemeyer, so: EnsureDying demands that no relations exist; AddRelation demands that the services be alive; therefore I think I need a relation-count on service [22:58] niemeyer, sane-sounding? [23:02] niemeyer, then, similarly, relation.EnsureDead demands that no units are in scope, while unit.EnterScope demands that the relation not be... I *think* Dead, but this reasoning also applies to Dying... so I think I need a units-in-scope-count on relation [23:05] fwereade: Hmm.. I'm think it there's a chance we could do without, but I guess we need it if we want that semantics [23:05] niemeyer, and then, I suspect, service.EnsureDead requires no units, while service.AddUnit demands that the service be Alive [23:05] niemeyer, yeah, I am a little upset that there are so many, but I think that they are the sane way to manage the transactions [23:05] fwereade: What would happen if we allowed EnsureDying on the service? [23:06] fwereade: EnsureDying in general is a sign that the given entity is dying, with no practical consequences other than things getting into a termination procedure [23:06] niemeyer, essentially we'd take down all the relations with the service, and I have a vague recollection that we've been requiring explicit relation removal before allowing people to remove a key piece of infrastructure [23:10] fwereade: It actually doesn't sound to bad to allow it, I think, assuming the practical consequences are positive [23:11] niemeyer, (I think that if we want remove-service => remove-many-relations, things potentially get somewhat complex as well -- I'm not sure that I can compose a transaction that will successfully remove every relation on a service reliably) [23:12] niemeyer, (oh, wait, refcount=N; docExists x N) [23:12] niemeyer, so we don't have to worry about handling it in the uniter [23:13] niemeyer, I think I would prefer that such drastic action at least require a --force, though [23:13] fwereade: Sorry, I missed the line there [23:13] niemeyer, sorry, not quite sure what you saw, let me repaste [23:13] niemeyer, (I think that if we want remove-service => remove-many-relations, things potentially get somewhat complex as well -- I'm not sure that I can compose a transaction that will successfully remove every relation on a service reliably) [23:13] niemeyer, (oh, wait, refcount=N; docExists x N) [23:13] niemeyer, so we don't have to worry about handling it in the uniter [23:13] niemeyer, I think I would prefer that such drastic action at least require a --force, though [23:14] fwereade: I mean I'm missing your line of thinking there, regarding refcount and doc-exists.. how does that affect things? [23:16] niemeyer, if we're going to set a service to Dying, I think we should also set all its relations to Dying too, so that the counterpart services clean up nicely [23:16] niemeyer, I think that if we do a transaction involving setting the service and N relations to Dying, we can assert that: [23:16] niemeyer, service.relation-count == N [23:17] niemeyer, and, N times over, that the respective relation document exists [23:18] niemeyer, and I think be protected against anyone adding a relation to a dying service (assuming they assert service Alive ofc) [23:18] fwereade: We can't assert that service and relation are both dying on the same transaction [23:18] niemeyer, ...oh [23:19] fwereade: Because nothing prevents a new relation from being inserted right before we start to execute that transaction [23:19] niemeyer, wait, that wasn't what I wanted to assert [23:19] fwereade: niemeyer, I think that if we do a transaction involving setting the service and N relations to Dying, we can assert that: [23:20] fwereade: That's what I was referring to [23:20] niemeyer, I'm setting, not asserting, Dying; is that relevant? [23:22] niemeyer, actually, I'm not at all sure that I can do that [23:23] niemeyer, so, ok, if we want to take down all relations with a service (which feels somewhat alarming to me), I think we have to have the uniter handling relation-killing when it detects a dying service -- just as it kills the unit. right? [23:24] fwereade: I don't yet have a strong opinion on it going either way, so I'm happy to explore the option you feel most comfortable with [23:24] fwereade: One thing to note, perhaps as a guideline for us, [23:24] fwereade: is that the interface we offer to the user, doesn't have to match precisely the internal details [23:26] fwereade: So, let's say we refcount [23:26] fwereade: and at least for the moment, let's say we prevent people from killing services with relations [23:26] :) [23:27] fwereade: the at least for the moment is based on the above idea.. we can go this way even if later we decide to change [23:27] fwereade: Because we can make the remove-service command deal with the manipulation [23:27] fwereade: and break down in case someone else runs a race, for example [23:27] fwereade: Either way, continuing [23:28] fwereade: When do we *allow* the service to be EnsureDying'ed? [23:28] fwereade: When all relations are Dying, or when all relations are Dead? [23:28] niemeyer, IIRC we did a long time ago agree that it was when none even existed -- but I think that translates effectively to all Dead [23:28] niemeyer, however I do not think this is optimal [23:29] fwereade: Okay, that seems a bit unfriendly [23:29] niemeyer, Dying, to me, makes a lot more sense [23:29] fwereade: That's better, but still makes me ponder [23:29] niemeyer, and this ofc changes the underlying assumptions of what I said above, I think [23:31] fwereade: Okay, here is a strawman [23:32] fwereade: We can do both: [23:32] 1) Add refcounting to the service, so we can implement force/non-force logic and error out if we please [23:32] niemeyer, dying service requires dying relations; dead service requires dead relations? [23:32] % bzr conflicts [23:32] Conflict adding file juju/deploy.go.BASE. Moved existing file to juju/deploy.go.BASE.moved. [23:32] Conflict adding file juju/deploy.go.OTHER. Moved existing file to juju/deploy.go.OTHER.moved. [23:32] Conflict adding file juju/deploy.go.THIS. Moved existing file to juju/deploy.go.THIS.moved. [23:32] none of those six files exist [23:33] niemeyer, ah no forget I said anything [23:33] niemeyer, please continue [23:33] 2) Implement logic that supports dying service with non-dying relations, and terminates properly [23:34] fwereade: This means, in a way, we have the cake and can eat it as well.. to terminate setting service to Dying is enough, but we can tweak the API to our desires, so we can blow it up or not (perhaps supporting --force) [23:35] davecheney: bzr conflicts reports what happened [23:35] davecheney: Run "bzr resolved" to notify you've dealt with it [23:35] niemeyer: bzr couldn't fix it [23:35] i copied my changes away and did a revert [23:36] it was caused by swiching branches with uncomitted changes [23:36] niemeyer, ok, this is the return of the corrective agent :) [23:36] fwereade: Oh, hopefully not [23:36] fwereade: Why would we need it? [23:37] niemeyer, well, I don;t think we can put it in the unit agent, because we can't know that any units will actually exist to run that logic, I think [23:37] niemeyer, so I was imagining that that would have to handle (2) [23:37] fwereade: Who sets service to Dead? [23:37] fwereade: and then removes it? [23:38] niemeyer, the unit agent, assuming one exists -- or, I presume, the client, if it can be sure that such action is warranted by reason of there being no units [23:39] fwereade: Exactly.. the whoever removes the last unit, is probably the half-answer [23:39] fwereade: This is the place, and the code path, to delete the relations [23:40] niemeyer, I don't think we can sanely set service to Dead until *after* all relations are Dead (or gone), can we? [23:41] niemeyer, otherwise we have a live relation referencing a dead service, and that's not going to end well [23:41] fwereade: How can we have live relations if all units of one side are gone? [23:42] fwereade: Well, I guess we have to wait until the other side acknowledges [23:42] niemeyer, the relation scope may be unoccupied, but that doesn't imply anything about the relation's Life, does it? [23:42] fwereade: And that sounds fine.. we can backtrack the service removal to the last thing that acks the release of a resource attached to that service [23:43] niemeyer, yeah, that sounds right [23:43] fwereade: Hmm.. [23:44] fwereade: We can probably implement something simpler [23:44] fwereade: Way simpler, in fact [23:44] niemeyer, this would make me happy [23:44] fwereade: We can in fact set service and all its relations as dying at once [23:44] niemeyer, that was what I suggested originally [23:45] fwereade: Right, and I thought that wasn't possible, but it is [23:45] fwereade: We have to loop, though [23:45] fwereade: To avoid races [23:45] fwereade: We have to assert on the number of relations [23:45] fwereade: Hmm.. which is perhaps wrong [23:46] fwereade: Yeah, we can't do that in fact.. :( [23:46] * niemeyer thinks about alternatives [23:47] * niemeyer => whiteboard [23:47] * fwereade looks up at clock [23:47] niemeyer, sorry to abandon you, but I should call it a night [23:48] niemeyer, I look forward to continuing the discussion tomorrow [23:48] fwereade: Okay, it's doable [23:49] * fwereade cheers, and decides to stick around for a mo [23:49] fwereade: We can take the refcount into account in the transaction [23:49] fwereade: Sorry [23:49] fwereade: The txn-revno [23:50] fwereade: To guarantee that the refcount hasn't been incref'd and decref'd in-between [23:50] fwereade: So we loop, and assert txn-revno is the same, and include service and all relations in the same transaction [23:51] fwereade: This guarantees that either service dies with all its relations, or we get an ErrAbort [23:51] fwereade: by die I mean become Dying [23:51] niemeyer, yeah, I think that makes sense [23:52] niemeyer, it was what I had being fumbling for with the docExistses [23:52] fwereade: If we get ErrAbort, we start over and ensure we're still doing a sane transaction [23:52] fwereade: The problem is that we can assert on non-existence [23:52] fwereade: Erm, cannot [23:53] fwereade: So we have to: [23:53] 1) Grab service [23:53] 2) Grab all its relations [23:54] 3) Do a transaction with relations from 2, and include an assertion ensuring that service hasn't changed since (1) [23:54] niemeyer, yep [23:54] That means (2) observed everything (1) knew about [23:54] and thus (3) is consistent [23:54] niemeyer, yep, exactly [23:54] fwereade: You can sleep well, I think :-) [23:55] niemeyer, indeed :) [23:55] fwereade, davecheney: Btw, [23:55] fwereade, davecheney: Tomorrow is a national holiday here [23:55] fwereade, davecheney: And I'll try to relax a bit.. but I'll be around at some point. There's at least one meeting I need to attend at 1:30 [23:56] niemeyer, cool [23:56] Which is 16:30 UTC [23:56] mramm: ^^ [23:56] niemeyer, I will surely see you around then :) [23:56] fwereade: Yeah [23:57] niemeyer, btw, I would be most grateful if you would take a very quick look at https://codereview.appspot.com/6650043/ before you go [23:58] niemeyer, blast, I should repropose, it's on top of the old config-changed bits [23:58] fwereade: Will do [23:59] niemeyer, just want to propose another first though, just because all it's waiting on is a full test run