[00:35] <perrito666> hey,anyone seen menn0?
[00:57] <thumper> perrito666: he is on holiday this week
[00:57] <perrito666> such is my luck
[00:57] <perrito666> thank you thumper
[00:57] <thumper> np
[01:05] <davecheney> thumper: wallyworld something is screwed with the cmd/juju tests
[01:05] <davecheney> 1,000's of goroutines all waiting on some semaphore to send the 404 message baqck to the client
[01:06] <wallyworld> is this new? or more likely thy've been screwed for a while
[01:07] <davecheney> not new
[01:07] <davecheney> but really getting worse
[01:07] <davecheney> who knows
[01:07] <davecheney> what spec is the machine you are booting for running CI tests ?
[01:14] <axw> wallyworld: hey, was TestBootstrapNoTools failing?
[01:14] <axw> ah, I see a bug #
[01:15] <wallyworld> axw: yeah, on i386 and ppc64
[01:15] <wallyworld> davecheney: it's a m3.large i think
[01:15] <axw> sorry, should've run them all with i386 first
[01:15] <wallyworld> np
[01:15] <wallyworld> axw: i think that test is obsolete now anyway
[01:15] <davecheney> wallyworld: ok, that shouldn't be penalised for running cpu intensive jobs
[01:16] <davecheney> i know the smalls and mediums get heavily penalised actually using the cpu
[01:16] <wallyworld> aws reduces their priority?
[01:18] <davecheney> yup
[01:18] <davecheney> spin up a small or a medum
[01:18] <davecheney> run mpstat
[01:19] <davecheney> and look at the %steal colum
[01:19] <davecheney> can be up to 80% on t1.smalls
[01:19] <davecheney> the more cpu you use, the more you are penalised
[01:19] <davecheney> so when you REALLY need to go fast, you go the slowest
[01:19] <davecheney> yay, cloud
[01:22] <davecheney> wallyworld: when cmd/juju does pass, it passed by the barest of margins
[01:22] <davecheney> ok  github.com/juju/juju/cmd/juju597.325s
[01:22] <davecheney> 2.675 seconds and this build would have failed
[01:22] <wallyworld> yep
[01:23] <wallyworld> if i had a magic wand to fix the tests i'd wave it
[01:23] <davecheney> go test -test.timeout=900s github.com/juju/juju/...
[01:23] <davecheney> possibly the option needs to go at the end
[01:23] <davecheney> but there is something horribly wrong with that cmd/juju test
[01:24] <axw> yeah, it goes end-to-end when it doesn't need to
[01:27] <davecheney> axw: do you know the name of the test off the top of your head
[01:27] <davecheney> i noticed it's using io.Pipe, not os.Pipe
[01:28] <davecheney> some buffering may help
[01:28] <axw> davecheney: which test?
[01:28] <davecheney> 11:23 < davecheney> but there is something horribly wrong with that cmd/juju test
[01:28] <davecheney> 11:24 < axw> yeah, it goes end-to-end when it doesn't need to
[01:28] <axw> I was referring to the entire package
[01:28] <davecheney> :(
[01:28] <wallyworld> davecheney: so many of our tests suck - they are not unit tests
[01:28] <wallyworld> there's no easy fix
[01:29] <davecheney> wallyworld: +1,000,000 to the 'not unit test' comment
[01:29] <wallyworld> yep :-(
[01:29] <wallyworld> doesn't help that mongo is *everywhere*
[01:30] <axw> we don't even need the API server in the mix, let alone mongo
[01:30] <axw> (for cmd/juju)
[01:32] <wallyworld> yup, the mongo comment was a general lament
[01:35] <perrito666> ok folks Ill go finish my sunday
[01:36] <perrito666> btw if someone wants to review https://github.com/juju/juju/pull/530 which has only been reviewed by junior reviewers and is in need of being merged
[01:37] <perrito666> cheers
[01:43] <thumper> davecheney: https://github.com/juju/juju/pull/570
[01:58] <davecheney> thumper: /me looks
[01:59] <davecheney> thumper: this all looks rather uncontravesial
[01:59] <thumper> davecheney: it should be :)
[02:15] <waigani> thumper: what version should the upgrade step target (adding state users as environ users)
[02:15] <thumper> waigani: 1.21.alpha1
[02:16] <waigani> thumper: a 'get all users' function is not jumping out at me, does one exist (I've got admin)?
[02:37] <thumper> waigani: I don't think there is one yet
[02:38] <davecheney> waigani: i don't think one exists
[02:38] <davecheney> we only have one user
[02:38] <davecheney> so you sort of knew the answer
[02:38] <waigani> in that case is it okay to just add admin to environ in the upgrade step?
[02:40] <davecheney> as long as you dont' call them "admin"
[02:42] <waigani> sigh
[02:43] <thumper> waigani: no... there is an example with the upgrade step
[02:43] <thumper> waigani: already in the code that iterates through ever user
[02:43] <thumper> waigani: it updates the last connection thingy
[02:43] <thumper> waigani: so just do something similar
[02:43] <waigani> thumper: okay, I'll take a look
[02:43] <thumper> davecheney: and I'm dealing with the admin name :)
[02:44] <waigani> davecheney, thumper: what are we calling admin once it's added as an environment user?
[02:44] <waigani> oh just saw your comment thumper, so I'll leave it admin for now?
[02:44] <thumper> waigani: what do you mean?
[02:44] <thumper> don't alter the existing user, just connect to the environment by adding an EnvUser
[02:44] <waigani> thumper: just trying to grok davecheney's comment about admin name
[02:45]  * thumper takes a deep breath and fixes state.Initialize
[02:46] <davecheney> waigani: the initial user of an envionment is "admin"
[03:01] <davecheney> wallyworld__: thumper so, bad news
[03:02] <davecheney> there is no stand out test in cmd/juju
[03:02] <davecheney> they are all slow
[03:02] <davecheney> well they are fast
[03:02] <davecheney> milliseconds
[03:02] <davecheney> but the setup for each test is a few seconds each
[03:03] <wallyworld__> that's a common failing across most of our tests because they are not true unit tests
[03:03]  * thumper agrees
[03:03] <thumper> waigani: can't hear you
[03:03] <waigani> thumper: shit sorry, hangon
[03:29] <wallyworld__> axw: when you have a moment, a fairly small one https://github.com/juju/juju/pull/598
[03:29] <axw> wallyworld__: yup, looking
[03:33] <wallyworld__> axw: thanks, yeah, i'm only doing this for 1.20
[03:35] <axw> wallyworld__: cool
[03:35] <axw> sorry, missed simplification until after LGTM, but it doesn't really matter
[03:39] <thumper> davecheney: your branch landed \o/
[03:39] <thumper> mine is running now
[03:55] <waigani> thumper: usersC.Find(nil).All(&users) finds one zero valued user, yet st.User("admin") finds the admin user?
[04:00] <thumper> waigani: then it will be a different query to list them all
[04:00] <thumper> perhaps you just have to specify something that matches all possible ones
[04:02] <thumper> axw: do you have a few minutes to talk through a problem?
[04:02] <axw> thumper: in a few mins, lunch is about to come out of the oven
[04:03] <thumper> axw: ok, ping me when you have 10 minutes or so... I'll pause and do something else for now
[04:13] <axw> thumper: ready
[04:14] <thumper> axw: https://plus.google.com/hangouts/_/grprl36idkwixt2q2cpajgeip4a?authuser=1&hl=en
[04:37] <wallyworld_> thumper: is menno around today?
[04:37] <thumper> waigani: menno is on leave this week
[04:37] <wallyworld_> ok, ta
[04:38] <wallyworld_> he should have marked it on the calendar :-)
[04:39] <thumper> yeah, I'll go add it
[04:50] <davecheney> thumper: finally
[04:50] <davecheney> looking into just getting a little more out of the cmd/juju tests now
[04:50] <davecheney> alone they take 400s on my machine
[04:50] <davecheney> running with others they can take > 600
[05:10] <davecheney> wallyworld_: I have a fix for the slowness of cmd/juju
[05:10] <davecheney> you probably won't like it
[05:10] <wallyworld_> maybe :-)
[05:11] <davecheney> wallyworld_: do you want me to tell you what i've done
[05:11] <davecheney> or just send a PR
[05:11] <wallyworld_> tl;dr; version?
[05:12] <davecheney> i've moved some of the tests into another package
[05:12] <wallyworld_> which package?
[05:12] <davecheney> cmd/juju -> cmd/juju/test
[05:13] <wallyworld_> doesn't that just move the problem?
[05:13] <davecheney> the problem is the package takes > 600s to test
[05:13] <davecheney> split the package up
[05:13] <davecheney> each part takes less time
[05:13] <wallyworld_> has this been causing failed builds?
[05:13] <davecheney> yes
[05:14] <davecheney> well there is the usual nonsense with the repl sets
[05:14] <davecheney> but over the weekend cmd/juju has constantly been taking > 600 seconds
[05:14] <davecheney> it takes 400s on my machine uncontended
[05:14] <davecheney> and clsoe to 550 with other tests running in parallel
[05:14] <wallyworld_> we could simply tweak the test timeout until the tests can be fixed
[05:14] <davecheney> i'm proposing my solution
[05:14] <davecheney> you're welcome to nack it
[05:14] <wallyworld_> ok
[05:15] <davecheney> but in my experience raising timeouts only leads to raising timeouts again
[05:15] <davecheney> and again
[05:15] <davecheney> and again
[05:16] <wallyworld_> well, we need to fix the test
[05:16] <wallyworld_> moving them just messes up the code base
[05:18] <wallyworld_> axw: remind me - to log onto juju's mongo, the password is recorded in the jenv file. yet mongo --ssl -u admin -p xxxx --port 37017 fails with an auth error
[05:19] <davecheney> wallyworld_: i didn't say it was a perfect solution
[05:19] <davecheney> but it solves the problem we have today
[05:19] <davecheney> where it takes two days to land a branch
[05:19] <wallyworld_> so does increasing the timeout
[05:19] <wallyworld_> without any churn the the code
[05:20] <davecheney> wallyworld_: i'll propose my solution, you can nack it
[05:20] <wallyworld_> two days?
[05:20] <axw> wallyworld_: try the hash of the password
[05:20] <wallyworld_> axw: sha256?
[05:21] <davecheney> wallyworld_: it took me friday, saturday anf this morning to land my names branch
[05:21] <davecheney> it failed 7 times
[05:21] <axw> wallyworld_: I don't know what the hash type is, it's stored in the bootstrap agent's  conf file IIRC
[05:21] <wallyworld_> davecheney: timeouts? i've never had any failure due to timeouts like that
[05:21] <axw> wallyworld_: what are you trying to do?
[05:21] <wallyworld_> axw: i want to log in and look at the collections
[05:22] <davecheney> wallyworld_: this is what tim pinged you aout this morning
[05:22] <davecheney> go check the build dashboard
[05:22] <axw> wallyworld_: https://github.com/kapilt/juju-dbinspect
[05:22] <davecheney> dozens of red builds all from cmd/juju timing out
[05:25] <wallyworld_> i see one red dot because of that
[05:25] <wallyworld_> maybe the other timeouts caused -p 2 to run and those worked
[05:27] <davecheney> wallyworld_: those are the timeouts i'm talking about
[05:27] <davecheney> and the reason why the build times have gone from 18 mintes to an hour
[05:27] <davecheney> in the last week
[05:27] <wallyworld_> we need to identify the root cause and fix that
[05:28] <wallyworld_> something must have changed to make the tests start running so long
[05:28] <davecheney> i noted on this channel 2 montsh ago that the times for cmd/juju were growiung
[05:28] <davecheney> they have now passed the 10 minute mark
[05:28] <davecheney> this is the result
[05:28] <davecheney> every SINGLE test case in cmd/juju takes 3-4 seconds to setup
[05:29] <davecheney> every time we add a new command or option
[05:29] <davecheney> boom another 3-5 seconds gone
[05:33] <wallyworld_> yep, so there's a systemic problem with those tests
[05:34] <wallyworld_> moving stuff to a new package is simply rearranging the deck chairs on the titanic
[05:49] <davecheney> wallyworld_: no argument there
[05:49] <davecheney> but it has blocked landing anything for a week now
[05:49] <wallyworld_> not bloked
[05:49] <davecheney> (do you want to see the graph again)
[05:49] <davecheney> yes blocked
[05:49] <wallyworld_> landings have occurred
[05:50] <davecheney> my change took all weekend to land
[05:50] <wallyworld_> only one was blocked
[05:50] <davecheney> the test have gone from 18 mntes to an hour
[05:50] <wallyworld_> only one red dot was attributable to this problem
[05:50] <davecheney> because of this
[05:50] <davecheney> so, instead of being able to land 3 changes per hour
[05:50] <davecheney> we can land less than one
[05:50] <davecheney> hows that for productivity ?
[05:50] <wallyworld_> that's not blocked bu definition
[05:50] <davecheney> my change is downright horrible
[05:50] <davecheney> but as nobody else is stepping up to address this issue
[05:50] <wallyworld_> so we can work around it by running with -p 2 to start with
[05:50] <davecheney> i stand by it
[05:51] <wallyworld_> we need to draw a line in the sand and just fix the freaking tests
[05:53] <wallyworld_> just as curtis has now started being firm about blocking landings for regressions, we also need to take a first stance
[05:53] <wallyworld_> firm
[05:54] <davecheney> fiar enough
[05:54] <davecheney> i don't know how to resolve this situation
[06:00] <wallyworld_> someone needs to look into what happens at test startup and determine how to better mock out the backend
[06:17] <wallyworld_> axw: i gotta go get my son and take him to doctor, will look at your branch when i get back
[06:18] <axw> wallyworld_: cheers
[06:18] <axw> it's a big one...
[06:21] <wallyworld_> that's what she said
[07:01] <dimitern> jam, hey, sorry I'll be 10m late for 1-1
[07:01] <jam> dimitern: np
[07:01] <jam> thanks for letting me know
[07:53] <TheMue> morning
[08:22] <jam> morning TheMue
[08:22] <jam> mgz: if you're still doing reviews today: https://github.com/juju/juju/pull/601
[08:48] <dimitern> jam, https://github.com/juju/juju/pull/601 LGTM
[08:51] <wallyworld_> axw: i wanted to get another fix done for 1.20. i can look at your branch now. if you have time, maybe you could look at https://github.com/juju/juju/pull/602
[08:51] <axw> wallyworld_: sure, looking
[08:52] <wallyworld_> axw: i'm not 100% sure it will fix the issue - the fix is based on reading the code
[08:54] <axw> wallyworld_: we shouldn't be starting the container provisioner until after the upgrades have finished, right?
[08:55] <wallyworld_> axw: it starts after the upgrade steps worker yes, but the upgrade work starts in parallel with it
[08:55] <wallyworld_> it's a bit confusing
[08:55] <wallyworld_> the upgrade work lsitens for upgrade requests
[08:55] <axw> wallyworld_: huh? how can it start after and in parallel?
[08:56] <wallyworld_> the upgrade steps worker does the upgardes
[08:56] <wallyworld_> there's 2 workers - upgrade and upgrade steps
[08:56] <axw> wallyworld_: ah ok. after upgrade steps, in parallel with upgrader
[08:56] <wallyworld_> yes, that's what it looked like
[08:57] <wallyworld_> and if there were no upgrade steps, it's all a bit of a race to see who starts first
[09:56] <perrito666> good morning everyone, I am OCR today along with mgz feel free to ask for reviews, Ill be taking a look at the queue anyway
[10:28] <jam> wallyworld_: this looks like a spurious failure to me: http://juju-ci.vapour.ws:8080/job/github-merge-juju/411/console
[10:28] <jam> I'll dig into it
[10:28] <jam> but in case you wanted it around for reference
[10:29] <wallyworld_> jam: thanks. its on my todo list this week to get these documented. our tests need work
[10:29] <jam> this isn't very helpful:
[10:29] <jam> goroutine 17700 [running]:
[10:29] <jam> 	goroutine running on other thread; stack unavailable
[10:29] <jam> created by launchpad.net/gocheck.(*suiteRunner).forkCall
[10:29] <jam> 	/home/ubuntu/juju-core_1.21-alpha1/src/launchpad.net/gocheck/gocheck.go:631 +0x23f
[10:29] <jam> the only thing "running" is trying to fork something
[10:33] <jam> wallyworld_: the second failure I don't fully understand, as it seems like maybe something didn't clean up in time and stayed bound to a port we thought we wanted to use in the next test
[10:36] <wallyworld_> jam: it seems we have all sorts of isolation issues, plus issues with mongo startup failing at various times. sadly, many of our unit tests aren't really unit tests
[10:46] <jam> dimitern: standup ?
[10:47] <dimitern> jam, owm
[11:00] <jam> TheMue: you weren't supposed to change your networking
[12:56] <perrito666> is there anyone besides menno that is familiar with upgrade mode?
[12:59] <dimitern> perrito666, wallyworld_ afaik, but he might be off already
[12:59] <perrito666> tx dimitern
[12:59]  * wallyworld_ is sorta here
[13:00] <wallyworld_> menno did all of the upgrade mode stuff though
[13:00] <wallyworld_> i there a specific question?
[13:01] <dimitern> wallyworld_, I see, ok it seems my knowledge is a bit out of date :)
[13:01] <wallyworld_> dimitern: i did the initital upgrade work, but tim's team has since taken it on
[13:01] <perrito666> wallyworld_: not really I am implementing a "restore mode" and william told me to discuss with menno to make sure I dont reinvent the wheel
[13:02] <wallyworld_> seems like there will be overlap there, or similar restrictions anyway
[13:03] <dimitern> perrito666, wallyworld_, what's a "mode"? a runner that runs its workers with delay from the rest?
[13:03] <dimitern> or more like a uniter mode
[13:03] <perrito666> wallyworld_: there will certainly be
[13:04] <perrito666> dimitern: it is an arbitrary term I believe
[13:04] <dimitern> :) ah
[13:04] <perrito666> in the case of upgrade
[13:04] <perrito666> its a state of the API server where it rejects most requests
[13:04] <wallyworld_> dimitern: at a high level, it means that the state server will reject connections while an upgrade is still running
[13:04] <perrito666> with an error indicating its upgrading
[13:04] <dimitern> wallyworld_, perrito666, I see, thanks guys!
[13:05] <wallyworld_> there's so much to keep track off
[13:05] <perrito666> dimitern: restore mode is to do something very similar
[13:05] <dimitern> yeah, just keeping all the far and wide networking effort going pretty much leaves me in the dark about what's going on with the other teams :/
[13:06] <wallyworld_> yup, same here
[13:09] <JoshStrobl> What hooks would be the appropriate place to call open-port? I'm guessing config-changed, start, install, etc. and close-port should be called on ./stop?
[13:15] <dimitern> JoshStrobl, in practice it doesn't matter - just do it (in a hook) before you need to use it; ideally both at config-changed time (close-port first, open-port after that)
[13:15] <dimitern> s/at//
[13:17] <dimitern> sorry, that should've been s/ideally both at/ideally at/
[13:17] <JoshStrobl> dimitern, thanks :)
[13:17] <dimitern> JoshStrobl, np :) and as for close-port - ideally in the stop hook
[13:18] <JoshStrobl> dimitern, yea I figured that much :D
[13:20] <dimitern> mgz, hey, it seems the merge bot has some issues today - multiple failures, timeouts..
[13:22] <ppetraki>  /join ##cacheio badblocks
[14:38] <ericsnow> mgz: you around?
[17:17] <arosales> fyi bug https://bugs.launchpad.net/juju-core/+bug/1322705 still is not targetted.
[17:17] <mup> Bug #1322705: juju help does not contain Joyent help information <joyent-provider> <ui> <juju-core:Triaged> <https://launchpad.net/bugs/1322705>
[17:17] <arosales> I'll see if I can get a branch proposal out
[23:02] <waigani_> thumper: standup?
[23:02] <thumper> coming
[23:23] <thumper> wallyworld: morning
[23:23] <wallyworld> thumper: hey
[23:23] <thumper> wallyworld: something to start your day :-) https://bugs.launchpad.net/juju-core/+bug/1361216
[23:23] <mup> Bug #1361216: unit tests for all series and archs fail <ci> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1361216>
[23:23] <wallyworld> sigh
[23:23] <thumper> wallyworld: it isn't a sudden change
[23:24] <thumper> wallyworld: it is JujuConnSuite, replica set, and cmd/juju tests
[23:24] <thumper> three different issues
[23:24] <wallyworld> yep
[23:24] <wallyworld> i can't fix them all
[23:24] <wallyworld> it needs a whole of team approach
[23:25] <thumper> I think we could fix many by changing cmd/juju tests to use a mock
[23:25] <thumper> or mocks
[23:25] <wallyworld> you think?
[23:25] <thumper> yes
[23:25] <thumper> most of the cmd/juju tests do too much
[23:25] <wallyworld> that was sarcasm
[23:25] <thumper> they are end to end tests
[23:25] <thumper> oh
[23:25] <thumper> sorry
[23:25] <thumper> hard to tell
[23:25] <wallyworld> yeah
[23:26] <wallyworld> i'm just pissed off out tests are so bad
[23:26] <wallyworld> why oh why did anyone think it's a good idea to bring up the whole stack to run unit tests
[23:27] <wallyworld> i guess the same reason why we interleave mongo throughout all of our business logic :-(