[00:01] wallyworld_: why is bootstrap doing sync tools ? [00:01] this is ec2 [00:01] not sure. it should only do that if it can't find any. i'll take a look [00:02] davecheney: well, it was looking for 1.15 [00:02] which it couldn't find [00:02] so you need --upload-tools [00:02] sure, but 1.12 exists in the s3 bucket [00:02] otherwise it will sync [00:03] but different minor version [00:03] 12 = 15 [00:03] != [00:03] wallyworld_: is that important [00:03] 1.12.0-precise-amd64 tools exist in the public s3 bucket [00:03] this is an ec2 environment [00:03] this is a regression [00:04] no - we can't guarantee 12 is compatible with 15 [00:04] we talked about this remember? [00:04] you asked the change be held till after 1.13 was released [00:04] yes, we talked [00:04] but i don't see how this is related [00:04] i am using 1.15 client [00:04] you are bootstrapping using juju 1.15 right? [00:05] 1.12 tools exist in the public bucket [00:05] why is bootstrap syncing the 1.12 tools [00:05] shouldn't it refuse to run saying there are no 1.15 tools ? [00:05] sync tools is dumb [00:05] it needs work [00:05] should I raise a bug ? [00:05] it is doing what it has always done and grab whatever tools it can find - the algorithm needs work [00:06] the problem is that folks in CTS will be using the devel version and they won't understand why their bootstrap times on ec2 have blown out [00:06] sure, raise a bug - it's being worked on as we speak [00:06] as part of the tools work [00:06] wallyworld_: i thought the change was exact match or TFO [00:06] TFO? [00:08] bootstrap requires an exact match, but if it can't find one, it does a sync tools. and sync tools needs some love. it's all in progress [00:09] i guess the expectation was that people running from source ie dev version, should always know to do an upload-tools [00:09] upload-tools = problem solved [00:10] /s/FTO/GTFO [00:11] ok, upload-tools it is [00:12] sorry about that - upload tools is a consequence of exact version match [00:12] i think it should do it automatically for dev versions [00:12] with an appropriate message [00:17] wallyworld_: 2013-09-10 00:15:00 INFO juju.environs.tools tools.go:131 filtering tools by series: precise [00:17] 2013-09-10 00:15:00 INFO juju.environs.tools tools.go:53 no architecture specified when finding tools, looking for any [00:17] ^ why does it say that [00:17] amd64 is the default [00:18] davecheney: the tools finding code it calling "findtools" without specifying an architecture - that's how it's always been. but now, with the simplestreams metadata, tools info is keyed of arch and series, so it logs that message [00:19] so the code which calls FindTools needs to be looked at to see if it can get access to an arch to pass in [00:19] davecheney: where is the default arch specified - i can't recall right now [00:20] and if amd64 is not found, doesn't it default back to i386? [00:20] in which case passing in nothing for arch is ok? since it will look for tools matching any arch? [00:23] bigjools: ffs. gwacl panics when asked for the url of a non-existent file :-( [00:24] wallyworld_: dunno, sounds wrong [00:24] the default is precise/amd64 [00:24] and the tools lookup clearly shows it is scoping by precise [00:24] wallyworld_: \o/ [00:24] it shold also scope by amd64 [00:25] davecheney: probs. but i'm not sure how arm or i386 would be specified [00:25] wallyworld_: I hear you're an expert in Go, you could fix it! [00:25] bigjools: have i told you today? [00:25] lol [00:25] actual lol [00:25] wallyworld_: ok [00:25] good point [01:06] bigjools: can you add me the the gwacl-hackers team, and +1 this? https://code.launchpad.net/~wallyworld/gwacl/storage-error-not-panic/+merge/184704 [01:06] wallyworld_: yes yes [01:06] bigjools: thanks, and we use the bot is assume? [01:06] ie just set to approved and the bot will work [01:06] wallyworld_: I hope so. [01:07] we'll find out :-) [01:07] wallyworld_: it requires one +1 review [01:07] juju-core does now too [01:07] for a little while [01:10] wallyworld_: you need to run " make format" [01:10] bigjools: i ran go fmt and it fucked everything [01:10] i had to revert and start again :-( [01:10] wallyworld_: yes, use make format :) [01:10] WHY??? [01:10] what's wrong with go fmt [01:10] because [01:10] because tabs are fucking evil [01:10] that is the problem with standards; everyone wants their own [01:11] there are so many to choose from [01:11] yeah agree. wish i had known that [01:11] ... the Aristocrats! [01:11] before if*cked myself [01:11] bzr has this really useful "commit" thing [01:12] bigjools: yes, but one does not expect to have to run it before doing a go fmt [01:12] changes pushed [01:13] wallyworld_: I never ever trust tools [01:13] ever [01:13] bigjools: that's why i don't trust you [01:13] and he's here ALL WEEK [01:13] yep :-D [01:17] wallyworld_: anyway I already told you once we used 4 space indents in gwacl and you said that was awesome [01:18] bigjools: and you expect me to remember that after many bottles of red have passed under the bridge? [01:18] and months of "go fmt" muscle memory [01:18] hahaha [01:18] we left our present [01:19] brb, someone just rang my doorbell [01:20] hey wait, why is that bag on fire [01:20] snork [01:23] o/ [01:23] * thumper makes a smoothie for lunch [01:27] davecheney: you wanna LGTM this? https://codereview.appspot.com/13620043/ [01:30] I wanna [01:30] done [01:30] LGTM [01:30] fire when ready [01:31] many thanks [01:32] thank you sir [02:13] wallyworld_: ping [02:13] hi [02:14] wallyworld_: got a few minutes for a hangout? [02:14] thumper: sure, just give me 5 [02:20] thumper: ok, free now [02:20] ok [02:21] wallyworld_: https://plus.google.com/hangouts/_/6b1e2ea0df710aa62ee610909253df3e89de4c9b?hl=en [03:21] somehow my branch revision just disappeared :( [03:21] * axw_ starts the null provider again [03:23] ?! [03:23] axw_: disappeared how? [03:23] nfi. [03:23] I committed [03:23] then did a go build [03:23] then my files are all gone, and the revision is back where it was before [03:24] fortunately all the hard stuff is already uploaded [03:33] weird === axw__ is now known as axw [04:22] axw: ing [04:22] p [04:24] thumper: pong [04:24] axw: are you backporting bug 1222664 to the 1.14 branch? [04:24] <_mup_> Bug #1222664: maas provider's instance is not a Stringer

[04:24] not this very moment, but I will [04:24] if you are busy [04:24] I can do it [04:24] won't take long at all [04:24] then you can ack it [04:25] if you don't mind, trying to unbugger my stream [04:25] kk [04:25] will do, right now [04:25] thanks [04:27] happy birthday thumper! Today is also my son's birthday. [04:28] jam: thanks, and happy birthday to your son :) [04:28] outed! happy birthday thumper [04:29] you sly bugger [04:33] axw, davecheney: https://code.launchpad.net/~thumper/juju-core/backport-bug-1222664-to-1.14/+merge/184718 I've kept it out of lbox for simplicity [04:33] thumper: approved, thank you [04:33] np [04:34] * thumper looks at the last issue [04:44] davecheney: ping [04:44] thumper: looking [04:44] davecheney: I don't need you to look [04:44] I need to talk to you [04:44] ack [04:44] davecheney: this one with the race to start [04:44] yup [04:44] you talk about the provisioner, but I don't see that in the logs [04:44] do you mean the state connection? [04:45] this is the machiner on machine 0 [04:45] which has the provisioner worker [04:45] my terminilogy is old [04:45] you mean the machine agent [04:45] the provisioner is a task, as is the machiner [04:46] * thumper thinks how to solve this sanely [04:46] thumper: the agent ? [04:46] is trying to conncet to a service that it hosts itself [04:46] hence, a gordian knot [04:46] * thumper nods [04:47] I think I have a plan [04:47] * thumper tries it [04:47] i guess that plan doesn't include making the api server a different process ? [04:48] https://bugs.launchpad.net/juju-core/+bug/1218329 [04:48] <_mup_> Bug #1218329: Update default environment.yaml for Azure to use Precise for default-series

[04:48] ^ is this really fix committed on 1.14 ? [04:48] i see no branch [04:49] I didn't branch [04:49] davecheney: yes it is there [04:49] I checked the last commit [04:49] I'll help axw understand the process later [04:49] when I'm done [04:49] ok [04:49] cool [04:49] perhaps we should write up a backporting howto [04:49] just checking [04:49] and shove it in the tree [04:49] davecheney: I see the problem with the machine agent [04:49] * thumper thinks [04:52] davecheney: bug #1218329 was landed, let me see if I can dig up the branch [04:52] <_mup_> Bug #1218329: Update default environment.yaml for Azure to use Precise for default-series

[04:53] jam: i can see the commit, it'll do [04:53] davecheney: https://code.launchpad.net/~axwalk/juju-core/lp1218329-azure-released-images/+merge/183381 [04:53] it is pretty small :) [04:54] jam: that is not the commit [04:54] that is the trunk commit [04:55] davecheney: I don't quite understand your "that is not the commit". The specific fix for bug 1218329 is just to change those 3 lines. [04:55] <_mup_> Bug #1218329: Update default environment.yaml for Azure to use Precise for default-series

[04:56] it is about having a good "juju init" value for azure, which was delayed until we had proper images upload to azure [04:57] jam: i am lookin for the mp for the merge onto the 1.14 branch [04:58] davecheney: it hasn't ever landed on 1.14 [04:58] Fix Committed is landed-in-trunk [04:59] jam: it's there, I just didn't do whatever the normal procedure is [04:59] jam: ok, i'm trying to get to the bottom of why it says ' [04:59] davecheney: http://bazaar.launchpad.net/~juju/juju-core/1.14/revision/1738 [04:59] thank you [05:00] i cannot find a mp for that [05:00] so i cannot link it to the issue [05:00] davecheney: there isn't one [05:00] that was my problem [05:00] according to axw [05:00] * axw nods [05:00] one problem with making the series branches not owned by the bot, is that people end up with direct commit access. [05:01] jam: how can we fix that ? [05:01] davecheney: we can change the owner and I can add the bot to controlling that branch [05:01] excellent, sgtm [05:03] thumper: is there a way to change the owner of a branch to a person where you aren't the direct owner? [05:03] davecheney: worst case, I create a bot branch, and just change what "lp:juju-core/1.14" points to [05:03] sgtm [05:04] jam: not sure I understand [05:04] but right now, I only have groups *I'm* in as potential owners, and the bot is intentionally not in ~juju so he can't touch things he hasn't been given direct access to. [05:04] thumper: I want the go-bot to own an existing branch [05:04] and the go-bot is a person? [05:04] but he isn't in ~juju, and I'm not the bot if I'm ~jameinel [05:04] thumper: yes [05:04] jam: you need to get an lp-admin to do that [05:04] you can't [05:04] thumper: then I'll just create another branch [05:04] you *used* to be able to hand branches to people [05:04] however [05:04] but I think that was considered a csecurity hole [05:04] you can pass it through an intermediate team [05:04] if you are both a member of the same team [05:05] you change it to the team [05:05] they change from the team to themselves [05:05] yes, it was a security hole [05:05] we closed it [05:06] thumper: I thought the plan was to just have confirmation by both parties [05:06] so someone gives it to you, but you have to 'accept' it. [05:06] but meh [05:06] nah [05:06] too much work [05:14] davecheney: how do I ask if a channel is closed? [05:14] thumper: you cannot [05:14] we normally use a tomb to provide that [05:15] select on a closed channel succeeds immediately yes? [05:18] yes [05:18] select { [05:18] case c, ok := <- ch: ; if !ok // closed [05:18] you can also do [05:19] c, ok := <- ch outside select and it will block until ch is closed [05:19] but there is no isClosed(ch) builtin [05:20] c, ok := <- ch will return if closed or something on the channel, right? [05:20] y [05:27] davecheney: https://code.launchpad.net/juju-core/1.14 is now pointing at ~go-bot/juju-core/1.14 [05:28] thumper: I deleted the ~juju/juju-core/1.14 branch, so it deleted your MP against it. But I really didn't want to have 2 1.14 branches that might cause confusion [05:28] ok [05:28] it had landed [05:28] thumper: yeah, if you want it for posterity, you can submit it against the new branch and mark it merged :) [05:29] nah, don't care [05:33] davecheney: so the change from ian was explicitly requested from fwereade_ about "juju bootstrap" will only support exact Major.Minor versions. [05:33] so for dev releases you have to use '--upload-tools' [05:34] because of the chance that we break bootstrap between minor versions (which we've done) [05:34] davecheney: I'm happy to have you in the conversation (I'm more on the "lets make it easier to be cross version compatible, rather than stricter"). And it doesn't make a *huge* difference for finally released things. [05:35] but it does mean you can't use dev to bootstrap stable [05:41] mgz: when you're up and around, I need to get some info from you to control the go-bot config. (You can't access the root bot with just the admin password, you have to have the cert.pem files as well) [05:41] mgz: also, I can't just test CACerts by renaming my own certs, because what we need to test is the bootstrap on the remote machine and "wget" *there* not on my local machine. [05:42] (I can set up a test-suite HTTPS server with self-signed certs pretty easily, so I have 'local' testing done) [05:51] * thumper gives up on subtlety [05:58] * thumper adds a sleep [06:02] thumper: how did you arrive at 3s ? [06:02] :) [06:02] would it be easier to just retry 1 time to connect rather than sleeping? [06:02] as in, try to connect for 3s [06:02] rather than sleep [06:02] have you looked at the code.. [06:03] hmm... [06:04] perhaps [06:04] thumper: line 147 [06:04] honestly, I didn't think of it from that side :) [06:04] when it says "failed to connect" error, just retry 1 time, or retry for 3s or whatever [06:04] and EOD now [06:04] an exercise for the reader [07:50] mornin' all [07:51] morning [07:51] morning [07:52] thumper not around anymore? hmm, ok, have to congratulate him later. [08:00] jam: ping [08:03] axw: do you know anything about the background for this, by any chance? https://codereview.appspot.com/13412047 [08:04] rogpeppe1: I think there's a race between the API server starting up, and something trying to connect to it; the connection fails, and brings down the process [08:04] or something like that. [08:04] axw: that's what the comment in the CL seems to imply, but i can't quite see how it can actually happen [08:04] * axw looks [08:05] axw: AFAICS if the API worker fails to connect to the API, it will finish with a non-fatal error, and be restarted after a little while by the top level runner [08:06] axw: that's how i *intended* it to work anyway :-) [08:08] hey rogpeppe1 [08:08] jam: hiya [08:08] jam: see my question to axw above [08:08] rogpeppe1: right now, when the APIWorker fails it restarts everything [08:08] https://bugs.launchpad.net/ubuntu/+source/charm-tools/+bug/1223225 - this needs fixing. [08:08] there is a bug on it [08:08] <_mup_> Bug #1223225: charm-tools needs to stop recommending juju [08:09] jam: really? [08:09] jam: if the connection to the API fails, it should not restart everything [08:09] rogpeppe1: bug #1220027 [08:09] <_mup_> Bug #1220027: worker/provisioner: cannot restart cleanly due to hard dependency on api server

[08:09] jam: the top level runner does not use allFatal [08:09] rogpeppe1: the openAPIAs code has DialTimeout of 0 [08:09] so it always restarts [08:10] jam: that should be fine, because the top level runner does *not* exit when one of its tasks exits [08:10] jam: so the APIWorker task should be restarted until the API server is available, no? [08:11] rogpeppe1: the issue is the "provisioner" is triggering restarts because it can't connect to the API [08:11] that is what the bug report claims [08:11] jam: why is the provisioner trying to connect to the API? [08:11] rogpeppe1: I haven't debugged this [08:12] I'm just conveying the context I have so far [08:12] jam: shouldn't it be using the API connection that is opened by openAPIState? [08:12] jam: sure, thanks [08:12] rogpeppe1: I see in worker.Runner.run code that does check isFatal [08:12] rogpeppe1: state.OpenState (something) returns both an api and state connection [08:12] am I looking at the right thing? [08:12] everything needs a connection to the api [08:12] maybe also the api [08:12] axw: isFatal is slightly-less than allFatal [08:12] jam: well, it then goes and calls killAll(workers) [08:12] davecheney: yeah, the API also (initially) needs a connection to the API, except on the bootstrap node, obviously [08:13] axw: see the definition of isFatal in jujud/agent.go for which errors are considered fatal at the top level [08:13] rogpeppe1: this creapt in late in the 1.11.x cycle [08:14] it is present in 1.12.0 [08:14] davecheney: what crept in, sorry? [08:14] rogpeppe1: the race condition triggering restarts [08:14] rogpeppe1: ah sorry, I misread your statement before - I thought you said it didn't call isFatal [08:15] mgz: poke [08:15] mgz [08:15] jam, davecheney: i'm not sure that https://bugs.launchpad.net/juju-core/+bug/1220027 is a bug at all [08:15] <_mup_> Bug #1220027: worker/provisioner: cannot restart cleanly due to hard dependency on api server

[08:16] jam, davecheney: i *think* it's working as intended [08:16] In case you are actually around now: [08:16] (9:41:32) jam: mgz: when you're up and around, I need to get some info from you to control the go-bot config. (You can't access the root bot with just the admin password, you have to have the cert.pem files as well) [08:16] (9:42:00) jam: mgz: also, I can't just test CACerts by renaming my own certs, because what we need to test is the bootstrap on the remote machine and "wget" *there* not on my local machine. [08:17] rogpeppe1: "but it causes extended delays" sounds like it is more than a 3s [08:17] delay [08:17] jam: it does, yes. but i don't see more than a 3 second delay in the bug report [08:18] davecheney: you've marked the bug as "papercut" - could you explain a bit more about why it's a particular problem? [08:18] rogpeppe1: so davecheney ran into this at someone's site (hence the papercut issue) so I'm guessing there could be more background. I agree that the particular log snippet only takes 3s from start to finish, but note the first line is "restarting "state" [08:18] rogpeppe1: I think the first line is pretty key [08:19] something caused the "state" worker to restart [08:19] 1:46:09 is restarting state in 3s, then 1:46:11 is restarting api in 3s, then 1:46:12 is starting "state" again. [08:19] rogpeppe1: papercut is (one of the many) bugs used by various parts of the company to indicate that this bug affects customers [08:20] the name is the least describtive, but comes from SABFDL [08:20] davecheney: yeah, i'm aware of the name - i just wondered how this was affecting customers [08:20] rogpeppe1: the process will eventually start up correctly, as the orer of job manage environ jobs is not specified [08:20] TheMue: as you're on call today, can you look at my goose branches? https://codereview.appspot.com/13379047/ and https://codereview.appspot.com/13396048/ [08:21] jam: yeah, that's odd, and something which probably isn't addressed by the proposed fix AFAICS [08:21] jam: sure [08:21] TheMue: thanks [08:21] davecheney: have you got the full log from when the problem happens? [08:21] davecheney: I think what rogpeppe1 is getting at, is that the bouncing provisioner might be a symptom rather than a cause. [08:21] however rogpeppe1: if the provisioner fails to start, won't it bounce the "state" worker, [08:21] causing the API server to bounce [08:22] causing them to get into a restart dance? [08:22] so the issue is that something that *isn't* being run by the APIWorker is depending on the API, which then kills the API servere [08:22] server [08:22] so it can't start itself. [08:22] jam: could you please take a look at https://codereview.appspot.com/13522043/ for me? need a lgtm here too. ;) [08:23] jam: ah, that seems wrong [08:23] jam: i thought the provisioner only talked to state [08:23] rogpeppe1: you can generate one for yourself [08:23] just deploy a few services on ec2 [08:24] and watch the machine-0 log [08:24] davecheney: it happens every time you deploy a service? [08:24] TheMue: so that one is "support empty strings locally as meaning "set this to the empty strings", but preserve the API path which means "empty strings ==> default value" ? === ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: TheMue | Bugs: 9 Critical, 122 High - https://bugs.launchpad.net/juju-core/ [08:25] jam: yep [08:26] rogpeppe1: pretty reliably at the moment [08:27] davecheney: hmm, weird - i will take a deeper look. [08:27] rogpeppe1: ta [08:30] jam: both reviewed [08:34] fwereade_: rogpeppe1, TheMue: I'm not going to make the standup today (it is my son's 6th b-day party at school). so you can make dimitern crack the whip if you want. [08:34] jam: ok [08:34] jam: ok, have sweet-cake fun :-) [08:34] jam: and enjoy the b-day [08:39] ahasenack: ping [08:39] axw: hi [08:39] ahasenack: hi. stupid question - how did the /etc/init/juju* files get into your clean container? [08:39] axw: oh, that is run inside the container? [08:39] axw: I ran that on my host [08:39] ahh [08:40] yes, on the container [08:40] I'll have to run again then [08:40] axw: so my env is bootstrapped using lxc [08:40] ahasenack: ok, that'll be a problem. you can't add-machine an already bootstrapped machine [08:40] axw: I will bring up a new container with lxc-create and lxc-start, one that was never touched by juju [08:40] cool [08:40]