[00:13] <wallyworld> thumper: do you know how to kick tarmac in the balls to reset it?
[00:13] <thumper> wallyworld: sorry, no
[00:14] <wallyworld> ok. ta. ffs it seems i can never just land a branch without hassle
[00:17] <thumper> poos
[00:18] <thumper> I want the opposite of updateSecrets
[00:18] <thumper> I want to get some settings from the EnvironConfig
[00:18] <thumper> and put it into the local config
[00:18]  * thumper thinks
[00:20]  * thumper resolves to just use default hard coded port numbers
[00:20] <thumper> and allow people to override them in the config
[00:20] <thumper> too hard to be dynamic here
[01:04]  * thumper does a little dance
[01:05] <thumper> wallyworld: http://paste.ubuntu.com/5879343/
[01:06] <wallyworld> thumper: \o/
[01:06] <wallyworld> as Borat would say, niiiiiiiice
[01:08] <thumper> wallyworld: now... to break it up, land it and have some tests...
[01:08] <thumper> it has been super hacky so far
[01:08] <thumper> davecheney: local provider is working
[01:08] <wallyworld> it runs, who needs tests :-P
[01:08] <thumper> wallyworld: it is actually pretty hard to test some of this
[01:09] <thumper> as it requries root
[01:09] <wallyworld> yeah :-(
[01:09] <thumper> so test around the edges where you can
[01:09] <thumper> and have some form of live tests
[01:12]  * thumper goes to make lunch for the minions
[02:41] <thumper> wallyworld: didn't you write a function somewhere to get the container type from a machine id?
[02:41] <wallyworld> sorry, what's the context?
[02:42] <wallyworld> i've worked on 5 branches today
[02:43] <thumper> s'ok, found it
[02:43] <thumper> ContainerTypeFromId
[02:43] <wallyworld> cool
[02:47] <wallyworld> thumper: i'm dumb. i read your question as "why didn't you write...."  and was thinking that i thought i had. clearly my brain is dying
[02:48] <thumper> heh
[02:48] <thumper> you had already
[02:49]  * thumper enfixorates the ensure lxc bit
[02:50]  * thumper starts teasing apart the threads of the local provider work to land bits independently
[02:51] <thumper> hmm  7 pipes unlanded already
[02:52] <wallyworld> is that all? :-P
[02:52] <wallyworld> go on, try for 10
[02:52] <thumper> wallyworld: it probably will be by the time I've done the teasing
[02:52] <thumper> wallyworld: but IT WORKS!!!!
[02:52] <thumper> so far
[02:53] <wallyworld> thumper: so you sorted out th addrssing for local?
[02:53]  * thumper wonders how to stress test it
[02:53] <thumper> what do you mean?
[02:53] <wallyworld> i thought you had to tell the containers what ip address thy could use
[02:54] <wallyworld> or make the containers to look methods not yt existing on state.machine
[02:55] <thumper> wallyworld: no, not for the local provider
[02:55] <thumper> wallyworld: the default lxc settings work fine
[02:55] <thumper> wallyworld: the problem is having the user, on the outside of the containers getting the ip addresses of the containers
[02:55] <wallyworld> ah ok. so they all just use the localhost address?
[02:56] <thumper> wallyworld: instead of fucking around with lxc to get it from the outside
[02:56] <thumper> wallyworld: I want to fix it in state
[02:56] <thumper> wallyworld: no, they use a 10.0.3.0/24 address
[02:56] <thumper> wallyworld: which is routed through 10.0.3.1 bridge
[02:56] <wallyworld> ok
[02:57] <thumper> hence my email that I sent a few minutes ago
[02:57]  * wallyworld hits refresh
[03:18]  * thumper waits while lbox does its thing
[03:18] <thumper> this thing that lp does too
[03:20] <thumper>  bit 1: https://codereview.appspot.com/11321043
[03:28] <thumper> bit 2: https://codereview.appspot.com/11319044
[03:51] <thumper> bit 3: https://codereview.appspot.com/11325043
[03:51] <jam> thumper: but lbox does it synchronously and less accurately. (though you do know when it has actually finished, vs async in lp)
[03:52] <thumper> hi jam
[03:52] <thumper> when are you off for holidays?
[03:53] <jam> thumper: my flight is tomorrow evening.
[03:53] <jam> so I'll be working some tomorrow, but maybe not a full day
[03:54]  * thumper nods
[03:54]  * thumper goes to make coffee and take a break
[03:54] <thumper> will submit bit 4 RSN
[04:02] <wallyworld> jam: hi, tarmac is fooked and i don't know how to kick it in the guts
[04:02] <jam> wallyworld: can you point me to context?
[04:03] <wallyworld> jam: i approved a mp and nothing happened. the tarmac log says there was a bzr error merging (from memory, can't recall exactly), and now it appears to not ven be trying to look at any new approvals to process
[04:04] <wallyworld> i at least wanted just to kick start it again
[04:08] <jam> wallyworld: it is running, but it just keeps failing on your proposal: http://paste.ubuntu.com/5879636/
[04:08] <wallyworld> jam: that timestamp is hours old
[04:09] <jam> wallyworld: UTC?
[04:09] <wallyworld> it only failed once and then stopped processing
[04:09] <jam> 4 ohurs
[04:09] <wallyworld> i think it was about 4 hours ago from memory
[04:10] <wallyworld> jam: and it was the 2nd try because the first run failed some tests
[04:11] <jam> wallyworld: yeah, date says 4 hrs ago
[04:11] <jam> I don't see anything running, which is strange, cron should still be firing off tarmac every minute
[04:11] <wallyworld> yes it is weird
[04:12] <jam> wallyworld: we use "flock ...." as a way to avoid having 2 tarmac processes running concurrently (you can use crontab -l to see it). I did a lot of searching, but I didn't see any processes running flock or python.
[04:12] <jam> So I just deleted the lock file
[04:12] <jam> and it is running now.
[04:12] <jam> It is going for ~jtv's code
[04:12] <wallyworld> hmm. ok. i didn't realise that's what we did
[04:12] <wallyworld> maks sense
[04:12] <wallyworld> so the error handling needs improving
[04:13] <wallyworld> to always release the lock
[04:13] <jam> wallyworld: it is the "flock" process outside of tarmac
[04:13] <jam> I didn't think we could get that wrong
[04:13] <jam> as it starts a process, and when that process dies it unlocks
[04:14] <jam> wallyworld: http://paste.ubuntu.com/5879648/
[04:14] <jam> from man flock
[04:14] <jam> makes me think it might have left a mongo or something running.
[04:15] <wallyworld> yeah, wouldn't be surprised
[04:15] <jam> wallyworld: --wait 3600 maybe? (so we auto-drop flock after 1 hour) not sure on that one.
[04:16] <wallyworld> an hour sounds reasonable, maybe even less
[04:16] <wallyworld> hopefully this won't happen too often
[04:16] <jam> wallyworld: so the test suite should only take 15min or so, but we might run multiple
[04:16] <jam> and I don't think flock actually kills the subprocess.
[04:16] <jam> it would just stop locking
[04:19] <jam> wallyworld: any ideas what would be non-ascii with what you submitted?
[04:19] <jam> ISTR there were also nonascii patches because gcc was outputing Unicode sequences.
[04:20] <wallyworld> jam: no, that's just the thing. it worked first time, some tests failed, and i just re-approved
[04:20] <jam> https://bugs.launchpad.net/tarmac/+bug/750930
[04:20] <_mup_> Bug #750930: breaks on non-ascii characters in verify_command output on failure <Tarmac:In Progress by jameinel> <https://launchpad.net/bugs/750930>
[04:20] <jam> that is my patch, which appears to still not have landed...
[04:20] <wallyworld> ah, i did push up a small change
[04:20] <wallyworld> i'll check the logs
[04:23] <wallyworld> jam: is was a one line fix to add "jc." in front of IsTrue cause a trunk change while the branch was in review broke my test
[04:23] <wallyworld> so i don't see any non-ascii in there
[04:24] <jam> wallyworld: I have a feeling it could be a gcc sort of issue (gcc uses fancy quotes if your terminal is marked UTF-8)
[04:24] <jam> so maybe something with gwacl or something
[04:24] <jam> anyway, I have the patch, not sure what the issue is, unfortunately.
[04:24] <jam> I'm putting together a branch for us
[04:24] <jam> because lp:tarmac is owned by rockstar only
[04:25] <jam> I already have a couple local-only patches, and I don't want to make that worse.
[04:25] <jam> the local patches are just logging changes
[04:25] <wallyworld> jam: ah, there was a gofmt change in code i didn't touch. the diff in loggerhead shows "nothing" changed, so it could have been a tab
[04:26] <wallyworld> or something
[04:26] <jam> wallyworld: well a tab is still ascii
[04:26] <jam> it sounds like a build that failed
[04:26] <wallyworld> true
[04:26] <thumper> bit 4: https://codereview.appspot.com/11326043
[04:29] <jam> wallyworld: note that the code that failed indicated it was trying to mark the proposal as failed
[04:29] <wallyworld> hmm. ok. i'll rerun the tests
[04:53] <wallyworld> jam: just got an email - it ran and merged ok that time
[04:53] <wallyworld> i didn't change anything
[05:10] <thumper> bit 5: https://codereview.appspot.com/11327043
[05:16] <thumper> bit 6:  https://codereview.appspot.com/11327044
[06:27] <thumper> bit 7: https://codereview.appspot.com/11330043
[06:28] <wallyworld> jam: tarmac hates me. this time it appears to be stuck on my branch because i did the prereq myself. so i guess i have to repropose against trunk and adjust all the downstream branches?
[06:34] <thumper> bit 8: https://codereview.appspot.com/11333043
[06:35] <thumper> and that last one enables the local provider
[06:35]  * thumper is done for the day
[06:35] <thumper> plz review nicely :)
[06:35] <thumper> laters...
[06:43] <davecheney> is anyone able to bootstrap on ec2 ?
[06:43] <davecheney> i'm seeing bootstrap nodes stillborn
[06:56] <wallyworld> davecheney: i did earlier today
[06:56] <davecheney> just sits there waiting for the mgo server to come up
[06:56] <davecheney> can't see to the instance either
[06:57] <wallyworld> hmmm. not sure sorry :-(
[07:04] <jam> wallyworld: you can just propose the prereq and manually mark it merged.
[07:05] <jam> davecheney: I was bootstrapping a lot yesterday, but I have not tried it today.
[07:34] <davecheney> jam: wallyworld i cannot bootstrap in ap-southeast-2
[07:34] <davecheney> ap-southeast-1 works
[07:35] <davecheney> -2 results in an aninstance that is running but does not repond at all on the netwokr
[07:35] <davecheney> no ssh no nothing
[07:35] <jam> davecheney: can you get the boot information from ec2?
[07:37] <davecheney> oh, and it looks like destroy-environment doens't work either
[07:37] <jam> davecheney: well if you can't read the s3 bucket, you can't destroy stuff, I think.
[07:38] <jam> It *sounds* like an ec2 side issue, but I could certainly be wrong.
[07:38] <davecheney> jam: get system log looks fine
[07:38] <davecheney> jam: if it was a bucket issue, --upload-tools would have failed
[07:38] <davecheney> they are the same bucket
[07:38] <jam> true enough
[07:42] <davecheney> hang on, my ap-southeast-1 is set to raring
[07:42] <davecheney> ... i wonder if I set it to precise
[07:42] <davecheney> ...
[07:48] <davecheney> sigh - no
[07:49] <davecheney> it's just ap-southeast-2 is busted today
[07:50] <rogpeppe> davecheney: hiya
[07:50] <rogpeppe> mornin' all
[07:51] <davecheney> rogpeppe: howdy
[07:53] <rogpeppe> davecheney: just wondering: if you needed to reinstall your laptop ubuntu, what would you use for backup and restore? just tar?
[07:55] <rogpeppe> davecheney: my laptop is in a bad state since upgrading to 13.04 and it's possible that reinstalling might fix things
[07:55] <davecheney> rogpeppe: i just tar up ~
[07:55] <davecheney> but really you want to avoid all the dot files shit in ~
[07:56] <rogpeppe> davecheney: i'm just concerned that i'll lose all the stuff outside $HOME that i've accumulated over the years.
[07:56] <davecheney> rogpeppe: ahh, i don't ever step outside $HOME for that reason
[07:56] <rogpeppe> davecheney: i'm thinking mostly of apt-get stuff
[07:57] <rogpeppe> davecheney: but i guess i can just stumble along until i find something missing, then apt-get it
[07:57] <davecheney> dpkg -l | grep ^ii
[07:58] <rogpeppe> davecheney: ah, cool. what's the significance of "ii" ?
[07:59] <davecheney> ii == installed
[07:59] <davecheney> lots of other stuff in there as turds
[07:59] <rogpeppe> davecheney: and do you know of any way i can ask for only those packages which aren't depended on by others
[07:59] <rogpeppe> ?
[08:00] <davecheney> no, but jam or tim will know
[08:00] <rogpeppe> davecheney: something which could produce output suitable for tsort might work
[08:01] <davecheney> google says debtree
[08:01] <TheMue> morning
[08:01] <jam> I know you can ask for rdepends I believe, but I don't know a specific way to say "give me the list of packages explicitly installed ignoring their dependencies". I know it is possible, because when you "apt-get remove" it can tell you "these packages are no longer required"
[08:01] <jam> I just don't know it.
[08:10] <rogpeppe> hmm, the solution here looks plausible (though i'm sure i haven't explicitly installed *all* the 2080 packages i get when applying the first answer)
[08:10] <rogpeppe> http://unix.stackexchange.com/questions/3595/ubuntu-list-explicitly-installed-packages/3624#3624
[08:11] <TheMue> rogpeppe: what problem do you have? is your system in trouble?
[08:12] <rogpeppe> TheMue: yeah, various pieces of the system are broken in weird ways
[08:12] <rogpeppe> TheMue: it's possible that it's a hardware issue
[08:12] <rogpeppe> TheMue: but i have to try a fresh install first
[08:13] <TheMue> rogpeppe: iiirgks, that really doesn't sound good
[08:13] <TheMue> rogpeppe: i had to cleanup my package list after 13.04 upgrade
[09:02] <jamespage> davecheney, trying your no-strip suggestion now
[09:16] <pavel> guys, is juju-core 1.11.2 behavior stable today?
[09:18] <mgz> pavel:  what do you mean exactly?
[09:19] <pavel> I mean that I have weird errors all day
[09:19] <mgz> pastebin?
[09:19] <pavel> if there are no any common issue, then it's on my side
[09:20] <mgz> yeah, as far as I know we've not broken the published tools or anything of late
[09:20] <pavel> ok, thanks
[09:56] <jam> wallyworld: when you're back, I figured out how flock works. It opens a file flocks it, then execs into the child process which means that all spawned processes hold open that flock until they all exit. And there was a 'mongo' running that meant the lock was permanently held.
[09:57] <mgz> hah
[09:58] <mgz> o_cloexec plz
[10:50] <jtv> Anybody up for a second review?  Nothing big — just discarding an unneeded complication: https://codereview.appspot.com/11322043
[10:51] <rogpeppe> jtv: looking
[10:51] <jtv> Thanks!
[10:57] <rogpeppe> jtv: out of interest, what does gwacl stand for?
[10:59] <TheMue> jtv: you've got a +1
[10:59] <dimitern> hey guys I need a review on the last bit of the deployer API stuff (client-side): https://codereview.appspot.com/11342043
[10:59] <TheMue> rogpeppe: i assume go windows azure cloud library (or client library)
[11:00] <TheMue> dimitern: *click*
[11:00] <rogpeppe> TheMue: ah, sounds plausible. i reckon it could do with a package doc comment...
[11:02] <TheMue> rogpeppe: yes, sounds reasonable. didn't look if there exists one
[11:05] <jam> mgz: well you need it for the first process, and you can set a flag as to 'don't inherit' but I don't know what it kills when.
[11:06] <rogpeppe> jtv: reviewed
[11:07] <mgz> jam: the other option is specifically when spawning mongo, to do the post-exec go through and close all file handles hack
[11:07] <rogpeppe> mgz: what's the issue?
[11:07] <jam> mgz: or not have the test suite crash without cleaning itself up?  :)
[11:08] <mgz> that too :)
[11:08] <dimitern> fwereade: ping
[11:08] <mgz> rogpeppe: flock persisting when a child process spawns with an fd open
[11:08] <jam> mgz: http://linux.die.net/man/1/flock
[11:09] <jam> I would consider --close, but that seems to negate the point of using flock in the first place (
[11:09] <jam> (don't let 2 tarmac processes run concurrently)
[11:09] <rogpeppe> mgz: 5&- ?
[11:09] <rogpeppe> mgz: if 5 is your fd
[11:10] <jam> rogpeppe: earlier today we had a submission go haywire and the bot was hung unable to process new requests for 4+ hours.
[11:10] <rogpeppe> mgz: sorry, >5&-
[11:10] <jam> It would appear because mongod was spawned by a test case, and was not stopped when the 'go test' executable exited
[11:11] <rogpeppe> jam: hmm, interesting - it *should* be stopped at the end of the test
[11:11] <jam> rogpeppe: sure, but given the test suite failed via some sort of crash, some resource was not properly cleaned up
[11:11] <jam> in this case a mongo, which then had a file descriptor it inherited still open
[11:11] <jam> so while --close sounds like it might work (we want to hold open the handle for tarmac, but not for children)
[11:12] <jam> it sounds like it closes too early.
[11:12] <rogpeppe> jam: could we use process groups or sessions or something related, and kill the session after go test finishes?
[11:14] <rogpeppe> jam: it sounds like it might be a mistake to make gotest/mongo not hold the lock, because presumably we don't want multiple garbage mongod's accumulating
[11:18] <jam> rogpeppe: we don't, but I also don't mind *a* garbage mongod preventing us from landing anything until I come online, especially since I'm gone for 2+ weeks (2 vacation, 1 for Isle of Man)
[11:18] <jam> I guess I can just tell Martin and Wallyworld they have to deal with it :)
[11:19] <rogpeppe> jam: presumably it might end up as an arbitrary number of mongods if the same issue reoccurs.
[11:19] <jam> rogpeppe: it is slightly harder to discover, but easier to diagnose when it does happen. :)
[11:19] <rogpeppe> jam: in which case just {go test 200>&-} should do the job
[11:20]  * rogpeppe hasn't done fd manipulation in sh for a while, it seems :-)
[11:20] <jam> rogpeppe: the piece that knows what verify_command to run needs to know what handle flock opend. And you can force the flock handle, but it isn't like 200 is the default.
[11:20] <jtv> rogpeppe: thanks — I think it was Go Windows Azure Client Library.
[11:21] <jtv> And thanks TheMue too.  :)
[11:21] <rogpeppe> jam: aren't we writing both bits?
[11:22] <jam> rogpeppe: we do control both bits, they are spread far apart in terms of configuration, so if we find we have to have it we can, but I would avoid it
[11:22] <rogpeppe> jam: $FLOCK_HANDLE ?
[11:33] <jam> rogpeppe: https://plus.google.com/hangouts/_/f497381ca4d154890227b3b35a85a985b894b471 standup
[11:58] <jam> mramm: I'm in the 1:1 whenever you're ready
[11:58] <mramm> ok
[11:58] <mramm> answering questions for dimitern quickly
[11:58] <mramm> there in a min or two
[12:01] <jam> np
[12:20] <TheMue> dimitern: you've got a review
[12:21] <dimitern> TheMue: tyvm
[12:21] <TheMue> dimitern: yw
[12:22]  * TheMue => lunchtime
[12:25] <dimitern> fwereade: when you can, PTAL https://codereview.appspot.com/11342043/
[12:27]  * rogpeppe goes to lunch
[12:30] <wallyworld__> fwereade: you disappeared. i think we had sort of finished anyway
[12:38] <dimitern> wallyworld__, others: fwereade just texted me that his connection was gone haywire and he's going to lunch now
[12:38] <wallyworld__> ok
[12:47] <fwereade> jam, before I go, about the api addresses -- would it be that awful to connect to state and get api addresses from there? we should be able to assume sane/valid state info, right?
[12:48] <jam> fwereade: we have a state connection, though it is hidden behind the state.Machine we have in the code.
[12:48] <jam> we need it to SetPassword
[12:48] <jam> so we will have a state conn
[12:48] <jam> but I'm not sure how to get it passed in.
[12:48] <jam> I can investigate
[12:49] <fwereade> jam, I feel like it ought to be possible but, yeah, I guess all the entity stuff is a little tangly, maybe it's not worth it
[12:49] <fwereade> jam, have a little look but if it's going to be costly I guess we're fine without it
[12:50] <fwereade> jam, I'm just a bit worried about what'll happen if that code lives longer than we expect it to ;)
[12:53] <fwereade> rogpeppe, bug fix LGTM, thanks
[12:54] <fwereade> rogpeppe, that was *not* how I expected that Format to work though :)
[12:54] <fwereade> (evidently ;p)
[13:04] <jam> fwereade: that isn't how anyone who doesn't read the docs closely thinks it works
[13:04] <jam> given it is the first time I've seen a strftime that *wasn't* %H:%M:%S based
[13:05] <fwereade> jam, yeah, and there's even mention of those %H~s etc in the docs iirc
[13:05] <dimitern> fwereade: thanks for the review
[13:06] <jam> http://golang.org/pkg/time/#Time.Format doesn't actually mention the % versions that I can see
[13:06] <jam> and http://code.google.com/p/go/issues/detail?id=444 clearly indicates it doesn't want strftime
[13:07] <jam> fwereade: Given the issue, is 2006 actually better?
[13:07] <jam> which is 1
[13:07] <jam> what is 2
[13:07] <jam> IMO you still need the docs to figure out what to pass
[13:08] <jam> maybe if you did it enough you'd remember the magic date better
[13:09] <fwereade> jam, ha, clearly my crack consumption was much higher than usual that day... I could swear I remembered looking it up
[13:09] <fwereade> jam, maybe I was an idiot and looked up strftime instead
[13:09] <fwereade> jam, that's probably it
[13:10] <jam> fwereade: right, it must use strftime syntax, I'll go track it down and use it
[13:10] <fwereade> jam, yeah, indeed
[13:11] <fwereade> jam, there's something quite neat about what they do there but it's a touch astonishing too
[13:14] <rogpeppe> fwereade: i think it's only astonishing if you're used to strftime. i just wish they'd chosen a more memorable date - the y/m/d ordering is quite parochial
[13:26] <jam> fwereade: so.. s.State.APIAddresses() is actually wrong in the test suite (because it does the same "give me the default  port on all these things") s.APIInfo(c).Addrs has the correct value...
[13:26] <jam> It doesn't matter for real world case
[13:26] <jam> but it is true that we aren't recording in State the *actual* API Info
[13:26] <jam> (addresses)
[13:30]  * rogpeppe wishes there was an easy way of traversing forwards through a pipeline of merge proposals
[13:31] <jam> rogpeppe: pump ?
[13:31] <rogpeppe> jam: when reviewing
[13:31] <jam> or you mean web browse would link them
[13:31] <jam> LP links them together
[13:31] <jam> well, only towards the prereq maybe
[13:31] <rogpeppe> jam: does it? perhaps i've missed that.
[13:32] <rogpeppe> jam: exactly (and even then it only points to the branch, not the MP, i think)
[13:32] <jam> rogpeppe: so if you go to the MP page, it has links to the branch itself, and the prerequisite branch, if you click on the branch itself, it says "1 branch depending on this one"
[13:32] <jam> which you can click to
[13:32] <jam> and then get to the MP from there
[13:32] <jam> so the links are there
[13:32] <jam> but not direct
[13:33] <jam> hmmm.. the "1 branch dependent on this one" takes you to the page which has the list of merges to that branch
[13:33] <jam> which *doesn't* include the MPs that depend on the branch
[13:33] <jam> I wonder if it was intended to do so
[13:33] <jam> rogpeppe: poke tim :)
[13:34] <rogpeppe> jam: yeah, i just saw that.
[13:34] <rogpeppe> jam: it looks wrong.
[13:37] <abentley> jam: verrrrry longstanding bug.
[13:41] <jam> abentley: :)
[13:41] <jam> fwereade: https://codereview.appspot.com/11137044/ has been updated with a state.State.APIAddresses call. I wish that actually did the right thing in the test suite.
[13:42] <jam> (We don't record the API Addresses in the DB, so APIAddresses infers them from the State.Addresses that *are* recorded)
[14:03] <jam> wallyworld__: something is wrong with your branch. I'm getting: Running test command: go fmt ./... && go build ./... && go test ./...
[14:03] <jam> Command appears to be hung. There has been no output for 900 seconds. Sending SIGTERM.
[14:03] <jam> It is trying again right now, but other branches have been able to land I believe.
[14:08] <rogpeppe> jam: ah, i see now why mongo wasn't shut down
[15:12] <rogpeppe> hmm, my laptop seems to have stopped talking to its wired ethernet :(
[16:04] <jam> fwereade,  dimitern: ping about some recent timeouts on go-bot
[16:04] <fwereade> jam, oh yes?
[16:04] <jam> we've had several failures like this one: https://code.launchpad.net/~dimitern/juju-core/070-deployer-client-facade/+merge/174973
[16:05] <jam> Where it appears deployer tests are getting a 500ms timeout
[16:05] <dimitern> jam: yeah, i noticed
[16:05] <jam> I have the feeling Canonistack is overloaded, and go-bot is running slowly
[16:05] <jam> but I noticed the deployer code wakes up every 50ms, but doesn't do anything like StartSync in the inner loop
[16:06] <jam> fwereade: we have s.State.StartSync() at the beginning, but not in the inner loops
[16:06] <jam> this doesn't fix everything, but I think when tests are failing they aren't cleaning up cleanly, so we have a bunch of follow on failure.
[16:07] <fwereade> jam, yeah, those all do look like timeouts while the SUT is actually doing what it should, but slowly
[16:07] <jam> fwereade: right, it is getting some of them done (like svc 0 but not svc 1)
[16:08] <fwereade> jam, StartSync in inner loops is only necessary when there's no way to tell when a triggering action has actually taken place
[16:09] <jam> note stuff like: ok  	launchpad.net/juju-core/worker/uniter	365.596s
[16:09] <jam> which is one of the slower tests
[16:09] <jam> but 370s is super long
[16:10] <jam> tarmac was running the whole test suite in 15min
[16:10] <jam> fwereade: so I'm considering just bumping up the global LongWait (and changing the deployer code to use that value).
[16:10] <jam> But I figured I'd bring it up for discussion.
[16:11] <fwereade> jam, yeah, I'm +1 on that -- in normal circumstances these tests pass relatively fast, but in unhappy circumstances we still want them to work
[16:11] <fwereade> oof, yeah, I did not see that one
[16:12] <jam> fwereade: and long timeout is supposed to be the "waiting this long is a failure" not the "sleep a bit to let things progress"
[16:12] <fwereade> jam, yeah, indeed
[16:21] <fwereade> jam, rogpeppe1: how might we be getting an "unauthorized" error out of machiner?
[16:21] <rogpeppe1> fwereade: context?
[16:21] <fwereade> rogpeppe1, deploying to saucy on azure: http://paste.ubuntu.com/5880671/
[16:21] <rogpeppe1> fwereade: is machiner using the API now?
[16:21] <rogpeppe1> fwereade: (i'm presuming not)
[16:21] <fwereade> rogpeppe1, no, I don't think it actually is, which is what's baffling
[16:21] <rogpeppe1> fwereade: might line 22 etc be significant here?
[16:22] <rogpeppe1> fwereade: hmm, no
[16:22] <rogpeppe1> fwereade: that's expected
[16:22] <fwereade> rogpeppe1, I think that's normal, yeah
[16:24] <rogpeppe1> fwereade: this only happened deploying to saucy on azure?
[16:24] <rogpeppe1> s/happened/happens/
[16:24] <fwereade> rogpeppe1, I have only seen it reported there
[16:25] <fwereade> rogpeppe1, but I have not deployed today
[16:26] <rogpeppe1> fwereade: and this is on the bootstrap machine too, right? that's very odd.
[16:27] <rvba> rogpeppe1: this is only on the bootstrap machine.  I can't deploy nodes.
[16:28] <rogpeppe1> rvba: what happens when you run juju status?
[16:28] <rvba> rogpeppe1: http://paste.ubuntu.com/5881387/
[16:29] <rogpeppe1> rvba: is it possible could you give me ssh access to the bootrap machine?
[16:29] <rvba> rogpeppe1: nothing is listening on the API port: http://paste.ubuntu.com/5881394/
[16:30] <rvba> rogpeppe1: sure, just one sec.
[16:30] <rvba> rogpeppe1: launchpad id?
[16:31] <rogpeppe1> rvba: that's not surprising
[16:31] <rvba> k
[16:31] <rvba> rogpeppe1: ssh ubuntu@juju-azure-saucyy7xrrjl4h9zemnvvbaqpfqtelpm2jzjjzu375hmczp20ldz.cloudapp.net
[16:32] <rogpeppe1> rvba: as the machine agent is failing to start
[16:32] <rogpeppe1> rvba: and that's what runs the API
[16:33] <rogpeppe1> rvba: my ssh public key is:
[16:33] <rogpeppe1> ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDOjaOjVRHchF2RFCKQdgBqrIA5nOoqSprLK47l2th5I675jw+QYMIihXQaITss3hjrh3+5ITyBO41PS5rHLNGtlYUHX78p9CHNZsJqHl/z1Ub1tuMe+/5SY2MkDYzgfPtQtVsLasAIiht/5g78AMMXH3HeCKb9V9cP6/lPPq6mCMvg8TDLrPp/P2vlyukAsJYUvVgoaPDUBpedHbkMj07pDJqe4D7c0yEJ8hQo/6nS+3bh9Q1NvmVNsB1pbtk3RKONIiTAXYcjclmOljxxJnl1O50F5sOIi38vyl7Q63f6a3bXMvJEf1lnPNJKAxspIfEu8gRasny3FEsbHfrxEwVj rog@rog-x220
[16:33] <rogpeppe1> rvba: rogpeppe
[16:33] <rogpeppe1> rvba: (i think - perhaps it's rogpeppe@gmail.com; i never know what launchpad expects)
[16:34] <rvba> rogpeppe1: you should be able to login now, I've imported the key you put on lp.
[16:34] <rogpeppe1> rvba: am logged in, thanks
[16:34] <mgz> remember you can just do `ssh-import-id rogpeppe`
[16:34] <rvba> That's precisely what I did.
[16:39] <rogpeppe1> hmm, the agent config file looks ok, but mongo auth as that agent fails. http://paste.ubuntu.com/5881424/
[16:39] <rogpeppe1> and the log seems to indicate that the password for the MA was created correctly http://paste.ubuntu.com/5881429/
[16:42] <rogpeppe1> rvba: are you using tip?
[16:42] <rvba> rogpeppe1: yes
[16:43] <rogpeppe1> rvba: revno 1464?
[16:44] <rvba> rogpeppe1: yes
[16:48] <rogpeppe1> rvba: i'll just check to see if it works under ec2
[16:49] <rogpeppe1> hmm, looks like trunk is broken against the latest version of gwacl
[16:50] <rvba> rogpeppe1: yes, we're fixing this, use gwacl's revision number 182.
[16:50] <rogpeppe1> rvba: thanks. using that.
[17:15] <rogpeppe1> rvba: hmm, seems to work ok under ec2. can you reproduce the azure behaviour?
[17:16] <rvba> rogpeppe1: yes, I've tested it ~3 times this afternoon.
[17:20] <rvba> rogpeppe1: I'll try with a raring image right now.
[17:20] <rogpeppe1> rvba: please. i'm just trying ec2 with a saucy image (assuming it can find one)
[17:22] <rogpeppe1> oh bugger, forgot to use --upload-tools. discard as unverified all my previous assertions of okayness
[17:23] <rogpeppe1> darn, "no "saucy" images in us-east-1 "
[17:24] <rogpeppe1> i've reached eod and have to stop
[17:24] <rogpeppe1> rvba: please let me know how you get on with raring
[17:24] <rvba> rogpeppe1: will do… talk to you tomorrow!  Thanks for your help.
[17:24] <rogpeppe1> rvba: no help as yet, i'm afraid
[17:25] <rogpeppe1> rvba: g'night all
[17:26] <rvba> nn rogpeppe1
[17:53] <rvba> rogpeppe1: works ok on raring.
[21:06] <thumper> good morning
[21:19] <thumper> fark... bikeshed much?
[21:43] <fwereade> thumper, heh, sorry, is that my reviews to which you refer?
[21:44] <thumper> fwereade: no
[21:44] <thumper> fwereade: I gather you wanted to chat?
[21:45] <fwereade> thumper, yeah, can't quite remember what about -- was it the upload-tools stuff?
[21:45] <thumper> yeah
[21:45] <fwereade> thumper, I don't think it needs much actual discussion really; it's settled in my mind and I'm willing to accept it in the name of expediency, because I really want a local provider soon not late
[21:46] <thumper> :)
[21:46] <thumper> I'd be very surprised if it actually caused problems
[21:46] <fwereade> thumper, yeah, yuo may be right
[21:46] <thumper> but prepared to fix the problems should they occur
[21:46] <fwereade> thumper, I still feel it's a bit unnecessarily messy
[21:47] <fwereade> thumper, however, themue's back looking at auto-sync-tools again
[21:47] <thumper> I can accept that for now
[21:47]  * thumper nods
[21:47] <thumper> what is the auto-sync-tools option?
[21:47] <fwereade> thumper, and I kinda hope it'll wither naturally
[21:47] <fwereade> thumper, it's just a nicer first user experience
[21:47] <fwereade> thumper, if you can't find tools in this cloud, copy them from somewhere else
[21:48] <thumper> fwereade: hmm
[21:48] <thumper> fwereade: however the local provider should work disconnected
[21:48] <thumper> fwereade: providing that the user has the cloud image
[21:48]  * thumper thinks
[21:48] <thumper> hmm
[21:48] <thumper> the auto-update of the container may have problems
[21:48] <thumper> should check disconnected use
[21:49] <fwereade> thumper, istm that the jujud shortcut fits better with sync-tools than with upload-tools
[21:49] <fwereade> thumper, good point
[22:20] <wallyworld__> fwereade: i'm confused by your reference to agent.conf
[22:20] <fwereade> wallyworld__, ah, sorry
[22:21] <fwereade> wallyworld__, to be compatible, we need to be able to bootstrap older code with a newer client
[22:21] <wallyworld__> and we can i think, i tested it
[22:21] <wallyworld__> ah, i tested an older client
[22:21] <wallyworld__> againgst this new code
[22:21] <wallyworld__> the new provider-state file just appends info
[22:22] <fwereade> wallyworld__, so the args are deliberately minimal, and we're expected to write additional stuff to some well-known location such that newer code can use it if it's there
[22:22] <wallyworld__> so the older struct can read it just fine
[22:22] <fwereade> wallyworld__, sure, I understand that, that's a separate issue
[22:22] <fwereade> wallyworld__, and I'm not *really* too bothered about the BootstrapState issue
[22:23] <wallyworld__> i saw hw as a natural extension to the current bootstrap state, which right now is just instance ids
[22:23] <fwereade> wallyworld__, at least the type name indicates the sanity of what you do, and I think it doesn't matter if future code overwrites it and kills the hardware info
[22:23] <fwereade> wallyworld__, because its only actual client is bootstrap
[22:24] <wallyworld__> if it is over written later it doesn't matter cause it's been used by then
[22:24] <fwereade> wallyworld__, wrt agent.Conf, that's the existing mechanism for passing extra stuff into jujud
[22:24] <wallyworld__> i went to write a separate file but it involves a fair bit of extra code for little benefit
[22:24] <fwereade> wallyworld__, I have abiding grumpiness that it was designed as a single file with heaps of arbitrary duplication
[22:25] <wallyworld__> oh ok, i didn't know that about agent.conf, sorry
[22:25] <wallyworld__> or maybe you told be and i didn't understand
[22:25] <fwereade> wallyworld__, no worries at all, it's just another of these ill-documented details
[22:26] <fwereade> wallyworld__, so if you just tack a field in there it won't make it notably worse than it is today
[22:26] <wallyworld__> but we just want to pass stuff to the first bootstrap node here, not nodes in general
[22:26] <wallyworld__> so agent.conf is to be avoided i think
[22:26] <fwereade> wallyworld__, the agent conf has *so* much totally situational crap in there already that I can't get too worked up about it
[22:27] <wallyworld__> yes, i just tack an extra field in
[22:27] <wallyworld__> which is read as part of th final machine config  process
[22:27] <fwereade> wallyworld__, it's *everything*... how to connect to *and* to serve both state and the api, and probably a few other things besides
[22:27] <wallyworld__> so given all this, are you ok with it now?
[22:28] <fwereade> wallyworld__, I'm not ok with the extra parameter to bootstrap-state, purely because it's an unnecessary compatibility break
[22:28] <fwereade> wallyworld__, I'm fine with the extra field in envrons.BootstrapState
[22:29] <fwereade> wallyworld__, (but its precarious nature should be commented and justified I think)
[22:29] <wallyworld__> oh ok, you are talking about the jujud param
[22:29] <fwereade> wallyworld__, I'm talking about both in a kind of unhelpful way
[22:29] <fwereade> wallyworld__, the jujud param is the significant one
[22:57]  * thumper heading to take the dog to the dog park with the minions
[22:57] <thumper> bbl
[23:54] <davecheney> thumper: ping