/srv/irclogs.ubuntu.com/2012/10/11/#juju-dev.txt

davecheneyle sigh01:22
davecheneycharm deployments all broke overnight01:23
davecheneywas working yesterday01:23
davecheneynow nothing deploys01:23
fwereadedavecheney, ouch :(05:44
fwereadedavecheney, what was it?05:46
davecheneyno idea06:09
davecheneyfwereade: i thought it was just the charms I was testing06:13
davecheneyor some ec2 weirdness06:13
davecheneybut I went back and testing charms i knew worked06:13
davecheneyand they all jam with the same error06:13
davecheneyor06:13
davecheneywaiting for the error state to clear06:13
TheMuemorning06:48
fwereadeTheMue, heyhey06:51
fwereadedavecheney, huh, what's the error?06:51
TheMuefwereade: heya06:51
wrtpmorning all07:07
TheMuewrtp: hi07:07
wrtpdavecheney: i hope that's not my fault07:07
wrtpTheMue: hiya07:07
wrtpdavecheney: do you see the uniter version in the status?07:08
TheMuewrtp, fwereade: here's a trivial one https://codereview.appspot.com/6649044/08:14
wrtpTheMue: can't we get rid of FwDefault completely?08:15
fwereadeTheMue, yeah, I was just about to ask about that :)08:15
TheMueas far as i understood niemeyer he wants to keep it08:25
wrtpTheMue: i'd prefer to have a method on Environ that returns the environment's default firewall mode.08:26
wrtpTheMue: as it is, FwDefault doesn't mean anything - we don't know what semantics to expect.08:26
TheMuewrtp: please add it as comment so that i can discuss it with niemeyer08:26
wrtpTheMue: ok, will do. i'm afraid that means that this branch can't be taken to be trivial and submitted immediately.08:27
TheMuewrtp: see also point 2 in his mail08:27
TheMuewrtp: it is only the first one in a row of 608:27
wrtpTheMue: reading his email, it seems like he wants to get rid of FwDefault too08:28
wrtpTheMue: "08:29
wrtpWe can't ever use FwDefault outside of the08:29
wrtpprovider08:29
wrtp"08:29
TheMuewrtp: today we have no EnvironProvider.Validate. imho that is intended to to validate in the mode and in case of FwDefault return the providers default mode, instance or global08:32
TheMuewrtp: that's my interpretation of point 208:32
wrtpTheMue: ah, i think i understand now.08:33
wrtpthinking08:33
TheMuewrtp: so different environments can return different defaults08:34
wrtpTheMue: i'm not sure i understand why we need an explicit default value. can't ValidateConfig see that the firewall mode is missing from the config, and substitute its default mode?08:35
wrtpTheMue: that is: we already have a good way of specifying default values - we leave the value unspecified08:36
TheMueyep08:37
fwereadewhoops, doctor's appointment, gtg08:45
fwereadesee you shortly08:45
Arammoin.09:22
TheMueAram: moin moin09:37
* TheMue has to step out to play taxi driver for his daughter. bbiab09:37
wrtpfwereade: it looks to me as if the cloud-init script runs as root. can you think of a reason to retain the sudos in environs/cloudinit ?12:00
fwereadewrtp, not offhand, no12:00
wrtpfwereade: i think they're just there as a legacy of the python12:00
wrtpfwereade: i'll remove 'em and see if anything breaks12:00
fwereadewrtp, that's probably the easiest way12:01
wrtpfwereade: BTW i think we should probably not be running our agents as root12:01
fwereadewrtp, except, wait, is everything broken since last night?12:01
wrtpfwereade: it seems to work for me. but i haven't tried deploying a charm12:01
fwereadewrtp, that would be the ideal but I'm not sure I see a good way round it12:01
wrtpfwereade: oh, because we need to be root to start LXC containers?12:02
* fwereade wonders if he broke config somehow, really ought to actually take a look at it, it's just the contextswitching...12:02
fwereadewrtp, I thought we did, yeah12:02
wrtpfwereade: hmm, that makes sense :-(12:02
wrtpfwereade: i'm just trying to deploy a charm now. will paste you the log output if it fails12:03
fwereadewrtp, debug output would probably be helpful if that can be arranged12:03
wrtpfwereade: i think debug output is always enabled currently12:04
wrtpfwereade: it certainly *seems* to be deploying ok12:06
wrtpfwereade: hmm, it seems to have executed the install and start hooks, but the status is still "pending"12:08
wrtpfwereade: http://paste.ubuntu.com/1273059/12:08
wrtpfwereade: current status: http://paste.ubuntu.com/1273062/12:09
* fwereade looks12:10
fwereadewrtp, huh, it seemed on dave's that he was always failing start hooks, that looks like the hook is fine12:11
wrtpfwereade: yeah, but shouldn't the status show something other than "pending"?12:11
fwereadewrtp, yeah, so somehow it seems ModeAbide is wedged, very early on12:12
fwereadewrtp, which is strange... I suspect it's waiting for a config event which should be guaranteed12:13
wrtpfwereade: is this unexpected? "2012/10/11 12:07:34 JUJU cannot read hook output: read |0: bad file descriptor"12:13
fwereadewrtp, isn't that your hookLogger? ;p12:13
wrtpfwereade: probs12:13
fwereadewrtp, I haven't seen it but AIUI that sort of thing is not entirely unepected is it?12:13
wrtpfwereade: that's maybe what happens when the fd isn't closed properly, i can't remember.12:14
fwereadewrtp, if it had debug output I would be able to tell what the filter was doing, I think12:14
wrtpfwereade: ah, is debug output not enabled by the global debug flag?12:14
fwereadewrtp, (even though I would also have to wade through the state debug stuff as well... sigh :))12:14
fwereadewrtp, hmm, not sure, haven't looked at our cloudinits for a while, didn't actually know it was doing the debug flag12:15
wrtpfwereade: ah, container doesn't enable the debug flag, grr12:15
wrtpniemeyer: yo!12:16
niemeyerMorning!12:22
wrtpfwereade, niemeyer: trivial CL: https://codereview.appspot.com/665404312:26
niemeyerwrtp: LGTM12:29
wrtpniemeyer: thanks12:29
fwereadeniemeyer, heyhey12:34
niemeyerfwereade: Heya12:37
wrtpniemeyer: another small one: https://codereview.appspot.com/6655043/12:45
niemeyerwrtp: LGTM12:47
wrtpniemeyer: thanks12:47
wrtpniemeyer: (the quick reviews are really appreciated!)12:47
niemeyerwrtp: My pleasure12:48
niemeyerfwereade: When you have a spare moment to continue yesterday's brainstorm, please ping12:49
fwereadeniemeyer, should be good in just a sec12:49
fwereadeniemeyer, ping13:03
niemeyerfwereade: Yoyo13:03
niemeyerfwereade: Can we have a quick call? I think we can save time13:04
fwereadeniemeyer, sgtm13:04
fwereadeniemeyer, https://plus.google.com/hangouts/_/044d83e4e11954e90f12cb6c2a82bf2e8ed1c724?authuser=0&hl=en#13:05
niemeyerfwereade: Thanks.. funny that you can paste the URL and my phone is ringing before I manage to open the G+ page13:05
* fwereade is quick like ninja, on very rare occasions :)13:06
fssniemeyer: morning13:37
niemeyerfss: Heya13:42
TheMueniemeyer: heya, did you read rogers comment about point 3 of the firewall mode changes? he prefers to put the global open/close/ports at Environ instead of EnvironProvider. i like that idea, because it feel more natural work on the concrete environment. what do you think?13:48
niemeyerTheMue: He's totally right.. it was my mistake13:49
TheMueniemeyer: fine, will do so13:49
wrtpniemeyer: the final piece, i believe: https://codereview.appspot.com/665504414:30
niemeyerwrtp: Looking14:38
niemeyerwrtp: done14:46
wrtpniemeyer: thanks14:46
niemeyerwrtp: np14:46
wrtpniemeyer: the _ for the state was deliberate BTW. it means i can return nil, err and still see if the state was set.14:47
wrtpniemeyer: i'd prefer not to use bare returns14:47
niemeyerwrtp: Let's please name it properly as usual14:48
wrtpniemeyer: ok, i'll rename the variable further down i guess14:48
niemeyerwrtp: I don't understand why you have to14:48
wrtpniemeyer: if i don't do that, then my defer won't work14:48
wrtpniemeyer: because it won't be able to see that st!=nil14:49
niemeyerwrtp: You don't need the defer either14:49
niemeyerwrtp: See the rest of the review14:49
wrtpniemeyer: ah, ok14:49
wrtpniemeyer: i'm not sure about putting the password in the agent dir itself. do we guarantee that the agent dir is mode 700 ?14:50
wrtpniemeyer: mind you, i suppose not much information can be gleaned from the password length as it's always the same.14:51
niemeyerwrtp: No, just the file14:52
wrtpniemeyer: PTAL https://codereview.appspot.com/665504415:21
fwereadeniemeyer, lbox appears to be panicking when I try to authenticate -- 3/4 times so far (the 4th time it complained about an incorrect password, which I'm ... pretty sure I typed right, but ofc cannot verify)15:21
fwereadeniemeyer, is this reminiscent of anything known to you?15:22
fwereadeniemeyer, this is the (interesting bit of, i think) the panic: http://paste.ubuntu.com/1273426/15:23
niemeyerfwereade: No, there was an issue with auth when Go changed in an incompatible way between releases, but it shouldn't be happening now15:24
fwereadeniemeyer, google seems to agree that my password is what I think it is... I think it is moderately unlikely that I would have typed it wrong on all 4 occasions15:24
niemeyerfwereade: Try to drop your auth details (rm ~/.lbox*)15:24
fwereadeniemeyer, don't seem to have any15:25
niemeyerfwereade: Sorry, that should be .lpad15:25
fwereadeniemeyer, is that connected to the google auth?15:27
niemeyerfwereade: Sorry, I'm clearly on crack... that's ~/.goetveld*15:27
fwereadeha, now I'm getting a blank auth page on launchpad :/15:28
fwereadeniemeyer, bah, same ol' panic15:30
* fwereade resorts to the universal panacea15:34
niemeyerfwereade: Let me try to kill my auth and see if I can repro15:38
fwereadeniemeyer, yeah, if I do a deliberate wrong password it fails nicely15:39
fwereadeniemeyer, correct password has panicked every time15:39
niemeyerwrtp: LGTM15:39
fwereadeniemeyer, which is *insane* because I proposed earlier today15:39
wrtpniemeyer: thanks!15:39
niemeyerfwereade: Did you upgrade in between?15:40
fwereadeniemeyer, hum, I think I probably did15:40
fwereadeniemeyer, not sure what from15:40
niemeyerfwereade: I hope it's not an interim breakage again15:40
wrtpniemeyer: and with that, i think our current authentication stuff is done. woo.15:41
wrtpniemeyer: i'm gonna push a few trivial changes that i've been putting off for a while, but if you have something urgent that i should move on to, let me know.15:41
niemeyerwrtp: Cool, the main thing is really to keep pushing on that front, testing etc.. but trivial changes as a break sounds good too15:42
niemeyerfwereade: I'm facing EOF issues with LP before even getting there15:49
niemeyerfwereade: The cloud gods are mad15:49
wrtpniemeyer: it seems to be working ok for me15:49
niemeyerfwereade: Worked fine, but I think I was using a locally built lbox.. let me make sure I'm getting the one from the PPA15:50
* fwereade wonders what he did :/15:52
niemeyerfwereade: Boom15:54
fwereadeniemeyer, ha:15:54
fwereade2012/10/11 17:54:23 RIETVELD Login on https://codereview.appspot.com successful.15:54
fwereade2012/10/11 17:54:23 RIETVELD Login failed: Get http://example.com/marker: redirect blocked15:54
niemeyerfwereade: Launchpad's lbox must be compiled with the broken tip15:55
niemeyerfwereade: I'll fix that after lunch15:55
niemeyerfwereade: Meanwhile, go get launchpad.net/lbox will get you going15:55
fwereadeniemeyer, lovely, thanks15:55
niemeyerHmm15:58
niemeyerIt's compiling against stable, actually, which seems to indicate that it was released broken? Uh oh15:58
* niemeyer => lunch16:00
fwereadeniemeyer, yeah, the freshly built version seems to fall over the same way16:00
fwereadeniemeyer, I will also be back later16:00
fwereadeniemeyer, (ty for the advice re config-changed, infinitely cleaner now)16:00
TheMuefwereade: it catched me too, lbox panicked *sigh*16:29
niemeyerThis really sucks16:34
niemeyerI anticipated the bug, reported, made sure it was fixed, and even then it got released without the fix.. :(16:35
niemeyerTheMue: go get launchpad.net/lbox16:42
niemeyerI'll remove the Launchpad package16:43
niemeyerOr perhaps just make it build against tip16:43
* niemeyer checks it16:43
TheMueniemeyer: got it, same error here. funnily the package one already worked twice today for me16:49
niemeyerTheMue: Ah, you probably have go 1.0.3 installed16:49
niemeyerTheMue: So hold off a bit until the package bulids16:49
niemeyerTheMue: The broken one will work fine until you have to auth again16:50
TheMueniemeyer: yes, 1.0.3. so i'll wait16:50
wrtpniemeyer, fwereade, TheMue, Aram: this CL makes log messages consistent, as talked about on juju-dev. it's a large CL, but pretty trivial. https://codereview.appspot.com/6654044/16:53
=== wrtp is now known as rogpeppe
TheMuerogpeppe: *wow*16:54
niemeyerrogpeppe: It's huge indeed, and in a quick sampling it doesn't feel so great17:00
niemeyerrogpeppe: Comments sent17:00
rogpeppeniemeyer: thank you17:00
niemeyerrogpeppe: It'd be good to have a more careful evaluation17:00
rogpeppeniemeyer: i thought the HOOK output was perhaps not so good17:01
rogpeppeniemeyer: i went with a literal interpretation of the rules to start with17:01
niemeyerrogpeppe: »       »       fmt.Fprintf(os.Stderr, "%s: %v\n", filepath.Base(os.Args[0]), err17:01
niemeyerrogpeppe: What did the literal rules say about fmt.Fprintf?17:01
rogpeppeniemeyer: i interpreted that as a kind of log output. perhaps that's a stretch.17:02
rogpeppeniemeyer: even if that doesn't get done in this CL, i think it's worth doing.17:02
niemeyerrogpeppe: Yeah, *log* and *output* are not the same thing17:02
rogpeppeniemeyer: saying "error:" for every command is not great.17:02
rogpeppeniemeyer: the unix standard is to print the name of the command17:03
niemeyerrogpeppe: Good old let's-derail-the-conversation-until-we-disagree17:03
rogpeppeniemeyer: ok, sorry, i thought it was pretty uncontroversial.17:03
rogpeppeniemeyer: i'll rewind those changes17:04
niemeyerrogpeppe: It's *very* uncotroversial to bikeshed widely about log messages, send a message to the mailing list so we agree, explicitly state that it's about log.Printf/Debugf for *clarity*, then send a 100 files change that does something else entirely17:04
rogpeppeniemeyer: i'm sorry17:05
rogpeppeniemeyer: as a separate CL, might you agree in principle that changing the "error:" messages is a good idea?17:06
niemeyerrogpeppe: No, I want to keep moving forward and stop the bikeshed immediately17:06
rogpeppeniemeyer: when i see shell script output, it's useful to know what commands have printed the messages.17:07
rogpeppeniemeyer: i've run across this a few times so far.17:07
niemeyer.17:07
* rogpeppe thinks this is more than just a bikeshed colour. It adds to juju usability.17:09
rogpeppebut i'll stop there.17:09
niemeyerThanks, let's make it work17:10
rogpeppeperhaps i should abandon all the log printf changes too, if it's just bikeshedding.17:11
niemeyerrogpeppe: Messages are inconsistent, we discussed this on IRC, and in the mailing list, and agreed on something. If you wanna drop it, someone else can pick it up later, no worries.17:13
niemeyerrogpeppe: That's unrelated to continue fiddling with output messages, though.17:14
rogpeppeniemeyer: do you agree with dropping the cmd/ prefix for commands BTW?17:14
niemeyerrogpeppe: Is it ok to have cmd/juju and juju prefixed by the same thing?17:14
rogpeppeniemeyer: hmm, good question.17:15
rogpeppeniemeyer: probably not. that's a good call.17:15
niemeyerTheMue: lbox rebuilt.. wanna give it a try?17:23
TheMueniemeyer: yes17:23
TheMueniemeyer: yep, it works, thanks a lot17:25
niemeyerTheMue: Phew, np17:25
niemeyerfwereade: ^17:26
* TheMue is stepping out, dinner time17:27
niemeyerTheMue: Enjoy17:27
TheMueniemeyer: first three CLs for the global mode are in17:27
fwereadeniemeyer, having supper now, but https://codereview.appspot.com/6632062 reproposed17:46
niemeyerfwereade: Sweet17:47
niemeyerfwereade: Thanks much17:47
rogpeppeniemeyer: this should be better i hope: https://codereview.appspot.com/665404417:48
rogpeppei'm off now, night all17:48
niemeyerrogpeppe: Thanks, have a good one17:49
niemeyerfwereade: That looks very nice indeed17:50
rogpeppeniemeyer: and you17:56
hazmatniemeyer, fwiw we've had a few people today run  into a nil memory ref in lbox..  https://pastebin.canonical.com/76339/18:16
niemeyerhazmat: Yeah, it's fixed already18:16
niemeyerhazmat: unfortunately 1.0.3 got released with a bug18:16
hazmatniemeyer, awesome re fix, thanks18:17
niemeyerhazmat: Go 1.0.3 that is.. we've fixed the issue before it was out, but the release manager missed the fix18:17
hazmatniemeyer, ah.. i remember you mentioning that a while back re bug in go http client.18:19
hazmatcool18:20
niemeyerhazmat: Exactly.. we've rushed to fix that shortly after the bug was introduced, back in july.. sadly the bug was merged onto the release and the fix wasn't18:20
fwereadeniemeyer, cool, thanks18:34
niemeyerfwereade: ping18:44
* niemeyer => doc.. back soonish19:19
davecheney% juju bootstrap --upload-tools21:23
davecheneyerror: cannot upload tools: cannot write file "tools/juju-0.0.1-precise-amd64.tgz" to control bucket: We encountered an internal error. Please try again.21:23
davecheneythanks, amazon21:23
fwereadeniemeyer, pong21:39
fwereadedavecheney, heyhey -- I never got to investigating your problems from yesterday, which I felt I kinda should have, but... well, er, I didn't21:49
fwereadedavecheney, can I be of any assistance now though?21:49
davecheneyfwereade: hey21:51
davecheneyi'm trying to have a look now as part of adding juju remove-unit21:51
davecheneybut I can't get an environment to bootstrap in any zone21:52
fwereadedavecheney, ha, ouch :(21:52
davecheneyus-east-1 is screwed21:52
davecheneyap-southeast-1 is broken21:52
davecheneytrying us-west-121:52
davecheneyif I can get an env going21:53
davecheneywho sort of debugging will be useful ?21:53
davecheney% juju bootstrap -e us-west-1 --upload-tools21:53
davecheneylucky(~/src/launchpad.net/juju-core/cmd/juju) % juju deploy -e us-west-1 -n 5 mysql21:53
davecheneyerror: cannot put charm: cannot make S3 control bucket: Your previous request to create the named bucket succeeded and you already own it.21:53
davecheneythat is fucking great21:53
davecheneyevery non us-east-1 env now won't bootstrap21:53
fwereadedavecheney, oof21:54
fwereadedavecheney, well, I'm not sure that it would be, necessarily, with the logs you sent -- but rog was having a problem that seemed potentially kinda similar, that might have been helped by it21:55
fwereadedavecheney, I just mean running the agents with --debug21:55
davecheneyyeah, i saw he made that change last night21:55
davecheneyi've merge from trunk21:55
davecheneyso when I do get something deployed, it will be in --debug21:55
fwereadedavecheney, I appear to have something bootstrapping in eu-west-122:00
davecheneyfwereade: seet22:00
davecheneysweet22:00
davecheneywhile I have you22:00
davecheneyi've implemented conn.RemoveUnits(units ...)22:00
fwereadedavecheney, I was planning to try to deploy mongodb and see what happened :022:00
davecheneyas setting unit.EnsureDying()22:01
davecheneyand leaving it as that22:01
davecheneyam i correct that the UA will detect that, do whatever, and set the unit to Dead ?22:01
fwereadedavecheney, yeah, that should be enough22:01
fwereadedavecheney, it would also be good to implement --force22:01
fwereadedavecheney, which would call EnsureDead22:02
davecheneythat was my initial attempt22:02
davecheneyit didn't work22:02
fwereadedavecheney, which will be handy right now for removing horribly borked deployments22:02
davecheneybecause it stepped around the UA22:02
fwereadedavecheney, blocked by something? or nothing responded?22:03
fwereadedavecheney, it shouldn't be the default but it should be possible, I think22:03
davecheneyunit went away22:03
davecheneybut the machine didnt22:03
niemeyerfwereade: Yo22:03
niemeyerdavecheney: Morning!22:04
fwereadedavecheney, machines shouldn't go away until we terminate-machine22:04
niemeyerdavecheney: Isn't that super early? :)22:04
fwereadeniemeyer, heyhey22:04
niemeyerfwereade: I was going to ask you about the service config watcher22:04
fwereadeniemeyer, oh yes22:04
davecheneyniemeyer: i was going to go for a ride, but it is pissing down22:04
niemeyerfwereade: I'm wondering about what logic to implement for the multi-config world22:04
davecheneyso, best to strike while the iron is hot22:04
niemeyerfwereade: Pondering if it would be fine to do per-charm-url watching22:05
fwereadeniemeyer, I think it would22:05
davecheneyaaaaaaaaaaaargh!22:05
davecheney% juju status22:05
davecheneyerror: We encountered an internal error. Please try again.22:05
davecheneyus-east-1 is screwed22:05
niemeyerfwereade: and assume that the filter would be kind enough to re-watch so the modes/etc don't have to care22:05
niemeyerfwereade: What do you think?22:05
fwereadeniemeyer, the filter is already in a position to know what the unit's current charm is, so it should be near-enough trivial22:05
niemeyerfwereade: Awesome! That makes things pretty easy22:06
niemeyerfwereade: Now that we have explicit settings management, the next step is almost trivial22:06
fwereadeniemeyer, the semantic change is that you say something like u.f.NotifyCharm(url, mustForceUpgrade), and assume that all config and upgrade events are filtered relative to that state22:07
davecheneyawesome22:07
davecheneynow i can't deploy at all22:07
davecheney2012/10/11 22:07:16 JUJU state: opening state; mongo addresses: ["localhost:37017"]22:07
davecheney2012/10/11 22:07:16 JUJU machiner: unauthorized access22:07
niemeyerfwereade: Neat22:07
fwereadeniemeyer, I just need to be slightly careful about not resetting the config watch when the charm didn't change so I don;t poo extra events into the stream22:07
niemeyerdavecheney: Uh22:07
davecheneythat is on machine 022:07
davecheneyit's got itself locked out22:08
niemeyerdavecheney: I guess rogpeppe didn't really test it live :(22:08
fwereadedavecheney, ey up22:08
fwereadewilliam@diz:~/code/go/src/launchpad.net/juju-core$ juju deploy mongodb22:08
fwereadeerror: cannot put charm: cannot make S3 control bucket: Your previous request to create the named bucket succeeded and you already own it.22:08
niemeyerdavecheney: Do you have access-secret set up in your environment?22:08
niemeyerdavecheney: Perhaps it is assuming it is set22:08
davecheneyniemeyer: nope22:08
niemeyerdavecheney: Try to set it in the environment config22:09
davecheneyniemeyer: i have an admin-secret22:09
davecheneyis that the same thing ?22:09
niemeyerdavecheney: Erm, sorry, that's the one22:09
davecheneyfwereade: i get that error in every non us-east-1 env22:09
davecheney2012/10/11 21:58:58 JUJU machiner: machine agent starting22:09
davecheney2012/10/11 21:58:58 JUJU state: opening state; mongo addresses: ["localhost:37017"]22:09
davecheney2012/10/11 21:58:58 JUJU machiner: unauthorized access22:09
davecheney2012/10/11 21:59:01 JUJU machiner: rerunning machiner22:09
davecheneythe machiner/PA never starts22:09
niemeyerdavecheney: Okay, so we'll need to debug it, or wait until rogpeppe addresses it22:09
fwereadedavecheney, ah balls, isn't that that the namespace is global or something, just a mo22:09
niemeyerdavecheney: How does cloud-init look like?22:09
davecheneyfwereade: yes, all buckets live inthe same namespace22:10
davecheneydon't just copy the bucket config from one env to another22:10
davecheneybut even then, it doesn't help22:10
davecheney  ap-southeast-1:22:10
davecheney    type: ec222:10
davecheney    control-bucket: juju-722:10
davecheneyniemeyer: checking cloud init now22:10
fwereadedavecheney, bah, yeah, I appear to be just as screwed as you22:16
davecheneyniemeyer: bootstrapping now22:16
* fwereade slopes back off to kick relations around some more22:16
davecheneyniemeyer:22:22
davecheney2012/10/11 22:20:46 JUJU:DEBUG watcher: loading new events from changelog collection...22:22
davecheney2012/10/11 22:20:46 JUJU storing no-secrets environment configuration22:22
davecheney2012/10/11 22:20:46 JUJU bootstrap-state initial password ""22:22
davecheneyjujud-machine-0 start/running, process 1071122:22
davecheneyand then22:23
davecheney2012/10/11 22:20:46 JUJU machiner: machine agent starting22:23
davecheney2012/10/11 22:20:46 JUJU state: opening state; mongo addresses: ["localhost:37017"]22:23
davecheney2012/10/11 22:20:46 JUJU machiner: unauthorized access22:23
davecheney2012/10/11 22:20:49 JUJU machiner: rerunning machiner22:23
davecheney2012/10/11 22:20:49 JUJU machiner: machine agent starting22:23
niemeyerdavecheney: That looks wrong22:28
niemeyerdavecheney: It should have an initial password22:28
davecheneyniemeyer: let me dig into cloudinit on our side22:28
niemeyerdavecheney: Or just have a look at what it looks like in the server22:29
niemeyerdavecheney: I'm pretty sure rogpeppe addressed that already in theory22:29
davecheneyniemeyer: ironically, juju status works22:29
davecheneyso _i_ can connect to the state22:29
davecheneyjust none of the agents22:29
niemeyerdavecheney: That empty initial password is a good hint of what's wrong22:30
niemeyerdavecheney: It should not be empty22:30
fwereadeniemeyer, offhand, do you know what refcounts exist at the moment in state?22:53
fwereadeniemeyer, I am at the point of needing to have a sane relation lifecycle in place, and probably service too22:54
niemeyerfwereade: I don't think we have any yet22:55
fwereadeniemeyer, am I right in recalling agreement that relation existence (with any life) should block service removal?22:55
fwereadeniemeyer, sorry, service Dyingness even22:55
niemeyerfwereade: THe thing we were going to add was machine units in the machine, but we ended up with the units themselves there22:55
niemeyerI mean, unit names22:55
niemeyerfwereade: Yes, that's my understanding as well22:56
fwereadeniemeyer, great22:56
fwereadeniemeyer, except, hm, when I say removal, I actually *mean* EnsureDying22:56
niemeyerfwereade: I hadn't thought of that, but I guess it makes sense22:57
fwereadeniemeyer, so: EnsureDying demands that no relations exist; AddRelation demands that the services be alive; therefore I think I need a relation-count on service22:58
fwereadeniemeyer, sane-sounding?22:58
fwereadeniemeyer, then, similarly, relation.EnsureDead demands that no units are in scope, while unit.EnterScope demands that the relation not be... I *think* Dead, but this reasoning also applies to Dying... so I think I need a units-in-scope-count on relation23:02
niemeyerfwereade: Hmm.. I'm think it there's a chance we could do without, but I guess we need it if we want that semantics23:05
fwereadeniemeyer, and then, I suspect, service.EnsureDead requires no units, while service.AddUnit demands that the service be Alive23:05
fwereadeniemeyer, yeah, I am a little upset that there are so many, but I think that they are the sane way to manage the transactions23:05
niemeyerfwereade: What would happen if we allowed EnsureDying on the service?23:05
niemeyerfwereade: EnsureDying in general is a sign that the given entity is dying, with no practical consequences other than things getting into a termination procedure23:06
fwereadeniemeyer, essentially we'd take down all the relations with the service, and I have a vague recollection that we've been requiring explicit relation removal before allowing people to remove a key piece of infrastructure23:06
niemeyerfwereade: It actually doesn't sound to bad to allow it, I think, assuming the practical consequences are positive23:10
fwereadeniemeyer, (I think that if we want remove-service => remove-many-relations, things potentially get somewhat complex as well -- I'm not sure that I can compose a transaction that will successfully remove every relation on a service reliably)23:11
fwereadeniemeyer, (oh, wait, refcount=N; docExists x N)23:12
fwereadeniemeyer, so we don't have to worry about handling it in the uniter23:12
fwereadeniemeyer, I think I would prefer that such drastic action at least require a --force, though23:13
niemeyerfwereade: Sorry, I missed the line there23:13
fwereadeniemeyer, sorry, not quite sure what you saw, let me repaste23:13
fwereade<fwereade> niemeyer, (I think that if we want remove-service => remove-many-relations, things potentially get somewhat complex as well -- I'm not sure that I can compose a transaction that will successfully remove every relation on a service reliably)23:13
fwereade niemeyer, (oh, wait, refcount=N; docExists x N)23:13
fwereade niemeyer, so we don't have to worry about handling it in the uniter23:13
fwereade niemeyer, I think I would prefer that such drastic action at least require a --force, though23:13
niemeyerfwereade: I mean I'm missing your line of thinking there, regarding refcount and doc-exists.. how does that affect things?23:14
fwereadeniemeyer, if we're going to set a service to Dying, I think we should also set all its relations to Dying too, so that the counterpart services clean up nicely23:16
fwereadeniemeyer, I think that if we do a transaction involving setting the service and N relations to Dying, we can assert that:23:16
fwereadeniemeyer, service.relation-count == N23:16
fwereadeniemeyer, and, N times over, that the respective relation document exists23:17
fwereadeniemeyer, and I think be protected against anyone adding a relation to a dying service (assuming they assert service Alive ofc)23:18
niemeyerfwereade: We can't assert that service and relation are both dying on the same transaction23:18
fwereadeniemeyer, ...oh23:18
niemeyerfwereade: Because nothing prevents a new relation from being inserted right before we start to execute that transaction23:19
fwereadeniemeyer, wait, that wasn't what I wanted to assert23:19
niemeyerfwereade: <fwereade> niemeyer, I think that if we do a transaction involving setting the service and N relations to Dying, we can assert that:23:19
niemeyerfwereade: That's what I was referring to23:20
fwereadeniemeyer, I'm setting, not asserting, Dying; is that relevant?23:20
fwereadeniemeyer, actually, I'm not at all sure that I can do that23:22
fwereadeniemeyer, so, ok, if we want to take down all relations with a service (which feels somewhat alarming to me), I think we have to have the uniter handling relation-killing when it detects a dying service -- just as it kills the unit. right?23:23
niemeyerfwereade: I don't yet have a strong opinion on it going either way, so I'm happy to explore the option you feel most comfortable with23:24
niemeyerfwereade: One thing to note, perhaps as a guideline for us,23:24
niemeyerfwereade: is that the interface we offer to the user, doesn't have to match precisely the internal details23:24
niemeyerfwereade: So, let's say we refcount23:26
niemeyerfwereade: and at least for the moment, let's say we prevent people from killing services with relations23:26
fwereade:)23:26
niemeyerfwereade: the at least for the moment is based on the above idea.. we can go this way even if later we decide to change23:27
niemeyerfwereade: Because we can make the remove-service command deal with the manipulation23:27
niemeyerfwereade: and break down in case someone else runs a race, for example23:27
niemeyerfwereade: Either way, continuing23:27
niemeyerfwereade: When do we *allow* the service to be EnsureDying'ed?23:28
niemeyerfwereade: When all relations are Dying, or when all relations are Dead?23:28
fwereadeniemeyer, IIRC we did a long time ago agree that it was when none even existed -- but I think that translates effectively to all Dead23:28
fwereadeniemeyer, however I do not think this is optimal23:28
niemeyerfwereade: Okay, that seems a bit unfriendly23:29
fwereadeniemeyer, Dying, to me, makes a lot more sense23:29
niemeyerfwereade: That's better, but still makes me ponder23:29
fwereadeniemeyer, and this ofc changes the underlying assumptions of what I said above, I think23:29
niemeyerfwereade: Okay, here is a strawman23:31
niemeyerfwereade: We can do both:23:32
niemeyer1) Add refcounting to the service, so we can implement force/non-force logic and error out if we please23:32
fwereadeniemeyer, dying service requires dying relations; dead service requires dead relations?23:32
davecheney% bzr conflicts23:32
davecheneyConflict adding file juju/deploy.go.BASE.  Moved existing file to juju/deploy.go.BASE.moved.23:32
davecheneyConflict adding file juju/deploy.go.OTHER.  Moved existing file to juju/deploy.go.OTHER.moved.23:32
davecheneyConflict adding file juju/deploy.go.THIS.  Moved existing file to juju/deploy.go.THIS.moved.23:32
davecheneynone of those six files exist23:32
fwereadeniemeyer, ah no forget I said anything23:33
fwereadeniemeyer, please continue23:33
niemeyer2) Implement logic that supports dying service with non-dying relations, and terminates properly23:33
niemeyerfwereade: This means, in a way, we have the cake and can eat it as well.. to terminate setting service to Dying is enough, but we can tweak the API to our desires, so we can blow it up or not (perhaps supporting --force)23:34
niemeyerdavecheney: bzr conflicts reports what happened23:35
niemeyerdavecheney: Run "bzr resolved" to notify you've dealt with it23:35
davecheneyniemeyer: bzr couldn't fix it23:35
davecheneyi copied my changes away and did a revert23:35
davecheneyit was caused by swiching branches with uncomitted changes23:36
fwereadeniemeyer, ok, this is the return of the corrective agent :)23:36
niemeyerfwereade: Oh, hopefully not23:36
niemeyerfwereade: Why would we need it?23:36
fwereadeniemeyer, well, I don;t think we can put it in the unit agent, because we can't know that any units will actually exist to run that logic, I think23:37
fwereadeniemeyer, so I was imagining that that would have to handle (2)23:37
niemeyerfwereade: Who sets service to Dead?23:37
niemeyerfwereade: and then removes it?23:37
fwereadeniemeyer, the unit agent, assuming one exists -- or, I presume, the client, if it can be sure that such action is warranted by reason of there being no units23:38
niemeyerfwereade: Exactly.. the whoever removes the last unit, is probably the half-answer23:39
niemeyerfwereade: This is the place, and the code path, to delete the relations23:39
fwereadeniemeyer, I don't think we can sanely set service to Dead until *after* all relations are Dead (or gone), can we?23:40
fwereadeniemeyer, otherwise we have a live relation referencing a dead service, and that's not going to end well23:41
niemeyerfwereade: How can we have live relations if all units of one side are gone?23:41
niemeyerfwereade: Well, I guess we have to wait until the other side acknowledges23:42
fwereadeniemeyer, the relation scope may be unoccupied, but that doesn't imply anything about the relation's Life, does it?23:42
niemeyerfwereade: And that sounds fine.. we can backtrack the service removal to the last thing that acks the release of a resource attached to that service23:42
fwereadeniemeyer, yeah, that sounds right23:43
niemeyerfwereade: Hmm..23:43
niemeyerfwereade: We can probably implement something simpler23:44
niemeyerfwereade: Way simpler, in fact23:44
fwereadeniemeyer, this would make me happy23:44
niemeyerfwereade: We can in fact set service and all its relations as dying at once23:44
fwereadeniemeyer, that was what I suggested originally23:44
niemeyerfwereade: Right, and I thought that wasn't possible, but it is23:45
niemeyerfwereade: We have to loop, though23:45
niemeyerfwereade: To avoid races23:45
niemeyerfwereade: We have to assert on the number of relations23:45
niemeyerfwereade: Hmm.. which is perhaps wrong23:45
niemeyerfwereade: Yeah, we can't do that in fact.. :(23:46
* niemeyer thinks about alternatives23:46
* niemeyer => whiteboard23:47
* fwereade looks up at clock23:47
fwereadeniemeyer, sorry to abandon you, but I should call it a night23:47
fwereadeniemeyer, I look forward to continuing the discussion tomorrow23:48
niemeyerfwereade: Okay, it's doable23:48
* fwereade cheers, and decides to stick around for a mo23:49
niemeyerfwereade: We can take the refcount into account in the transaction23:49
niemeyerfwereade: Sorry23:49
niemeyerfwereade: The txn-revno23:49
niemeyerfwereade: To guarantee that the refcount hasn't been incref'd and decref'd in-between23:50
niemeyerfwereade: So we loop, and assert txn-revno is the same, and include service and all relations in the same transaction23:50
niemeyerfwereade: This guarantees that either service dies with all its relations, or we get an ErrAbort23:51
niemeyerfwereade: by die I mean become Dying23:51
fwereadeniemeyer, yeah, I think that makes sense23:51
fwereadeniemeyer, it was what I had being fumbling for with the docExistses23:52
niemeyerfwereade: If we get ErrAbort, we start over and ensure we're still doing a sane transaction23:52
niemeyerfwereade: The problem is that we can assert on non-existence23:52
niemeyerfwereade: Erm, cannot23:52
niemeyerfwereade: So we have to:23:53
niemeyer1) Grab service23:53
niemeyer2) Grab all its relations23:53
niemeyer3) Do a transaction with relations from 2, and include an assertion ensuring that service hasn't changed since (1)23:54
fwereadeniemeyer, yep23:54
niemeyerThat means (2) observed everything (1) knew about23:54
niemeyerand thus (3) is consistent23:54
fwereadeniemeyer, yep, exactly23:54
niemeyerfwereade: You can sleep well, I think :-)23:54
fwereadeniemeyer, indeed :)23:55
niemeyerfwereade, davecheney: Btw,23:55
niemeyerfwereade, davecheney: Tomorrow is a national holiday here23:55
niemeyerfwereade, davecheney: And I'll try to relax a bit.. but I'll be around at some point. There's at least one meeting I need to attend at 1:3023:55
fwereadeniemeyer, cool23:56
niemeyerWhich is 16:30 UTC23:56
niemeyermramm: ^^23:56
fwereadeniemeyer, I will surely see you around then :)23:56
niemeyerfwereade: Yeah23:56
fwereadeniemeyer, btw, I would be most grateful if you would take a very quick look at https://codereview.appspot.com/6650043/ before you go23:57
fwereadeniemeyer, blast, I should repropose, it's on top of the old config-changed bits23:58
niemeyerfwereade: Will do23:58
fwereadeniemeyer, just want to propose another first though, just because all it's waiting on is a full test run23:59

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!