[00:35] hey,anyone seen menn0? [00:57] perrito666: he is on holiday this week [00:57] such is my luck [00:57] thank you thumper [00:57] np [01:05] thumper: wallyworld something is screwed with the cmd/juju tests [01:05] 1,000's of goroutines all waiting on some semaphore to send the 404 message baqck to the client [01:06] is this new? or more likely thy've been screwed for a while [01:07] not new [01:07] but really getting worse [01:07] who knows [01:07] what spec is the machine you are booting for running CI tests ? [01:14] wallyworld: hey, was TestBootstrapNoTools failing? [01:14] ah, I see a bug # [01:15] axw: yeah, on i386 and ppc64 [01:15] davecheney: it's a m3.large i think [01:15] sorry, should've run them all with i386 first [01:15] np [01:15] axw: i think that test is obsolete now anyway [01:15] wallyworld: ok, that shouldn't be penalised for running cpu intensive jobs [01:16] i know the smalls and mediums get heavily penalised actually using the cpu [01:16] aws reduces their priority? [01:18] yup [01:18] spin up a small or a medum [01:18] run mpstat [01:19] and look at the %steal colum [01:19] can be up to 80% on t1.smalls [01:19] the more cpu you use, the more you are penalised [01:19] so when you REALLY need to go fast, you go the slowest [01:19] yay, cloud [01:22] wallyworld: when cmd/juju does pass, it passed by the barest of margins [01:22] ok github.com/juju/juju/cmd/juju597.325s [01:22] 2.675 seconds and this build would have failed [01:22] yep [01:23] if i had a magic wand to fix the tests i'd wave it [01:23] go test -test.timeout=900s github.com/juju/juju/... [01:23] possibly the option needs to go at the end [01:23] but there is something horribly wrong with that cmd/juju test [01:24] yeah, it goes end-to-end when it doesn't need to [01:27] axw: do you know the name of the test off the top of your head [01:27] i noticed it's using io.Pipe, not os.Pipe [01:28] some buffering may help [01:28] davecheney: which test? [01:28] 11:23 < davecheney> but there is something horribly wrong with that cmd/juju test [01:28] 11:24 < axw> yeah, it goes end-to-end when it doesn't need to [01:28] I was referring to the entire package [01:28] :( [01:28] davecheney: so many of our tests suck - they are not unit tests [01:28] there's no easy fix [01:29] wallyworld: +1,000,000 to the 'not unit test' comment [01:29] yep :-( [01:29] doesn't help that mongo is *everywhere* [01:30] we don't even need the API server in the mix, let alone mongo [01:30] (for cmd/juju) [01:32] yup, the mongo comment was a general lament [01:35] ok folks Ill go finish my sunday [01:36] btw if someone wants to review https://github.com/juju/juju/pull/530 which has only been reviewed by junior reviewers and is in need of being merged [01:37] cheers [01:43] davecheney: https://github.com/juju/juju/pull/570 [01:58] thumper: /me looks [01:59] thumper: this all looks rather uncontravesial [01:59] davecheney: it should be :) [02:15] thumper: what version should the upgrade step target (adding state users as environ users) [02:15] waigani: 1.21.alpha1 [02:16] thumper: a 'get all users' function is not jumping out at me, does one exist (I've got admin)? [02:37] waigani: I don't think there is one yet [02:38] waigani: i don't think one exists [02:38] we only have one user [02:38] so you sort of knew the answer [02:38] in that case is it okay to just add admin to environ in the upgrade step? [02:40] as long as you dont' call them "admin" [02:42] sigh [02:43] waigani: no... there is an example with the upgrade step [02:43] waigani: already in the code that iterates through ever user [02:43] waigani: it updates the last connection thingy [02:43] waigani: so just do something similar [02:43] thumper: okay, I'll take a look [02:43] davecheney: and I'm dealing with the admin name :) [02:44] davecheney, thumper: what are we calling admin once it's added as an environment user? [02:44] oh just saw your comment thumper, so I'll leave it admin for now? [02:44] waigani: what do you mean? [02:44] don't alter the existing user, just connect to the environment by adding an EnvUser [02:44] thumper: just trying to grok davecheney's comment about admin name [02:45] * thumper takes a deep breath and fixes state.Initialize [02:46] waigani: the initial user of an envionment is "admin" [03:01] wallyworld__: thumper so, bad news [03:02] there is no stand out test in cmd/juju [03:02] they are all slow [03:02] well they are fast [03:02] milliseconds [03:02] but the setup for each test is a few seconds each [03:03] that's a common failing across most of our tests because they are not true unit tests [03:03] * thumper agrees [03:03] waigani: can't hear you [03:03] thumper: shit sorry, hangon [03:29] axw: when you have a moment, a fairly small one https://github.com/juju/juju/pull/598 [03:29] wallyworld__: yup, looking [03:33] axw: thanks, yeah, i'm only doing this for 1.20 [03:35] wallyworld__: cool [03:35] sorry, missed simplification until after LGTM, but it doesn't really matter [03:39] davecheney: your branch landed \o/ [03:39] mine is running now [03:55] thumper: usersC.Find(nil).All(&users) finds one zero valued user, yet st.User("admin") finds the admin user? [04:00] waigani: then it will be a different query to list them all [04:00] perhaps you just have to specify something that matches all possible ones [04:02] axw: do you have a few minutes to talk through a problem? [04:02] thumper: in a few mins, lunch is about to come out of the oven [04:03] axw: ok, ping me when you have 10 minutes or so... I'll pause and do something else for now [04:13] thumper: ready [04:14] axw: https://plus.google.com/hangouts/_/grprl36idkwixt2q2cpajgeip4a?authuser=1&hl=en [04:37] thumper: is menno around today? [04:37] waigani: menno is on leave this week [04:37] ok, ta [04:38] he should have marked it on the calendar :-) [04:39] yeah, I'll go add it [04:50] thumper: finally [04:50] looking into just getting a little more out of the cmd/juju tests now [04:50] alone they take 400s on my machine [04:50] running with others they can take > 600 [05:10] wallyworld_: I have a fix for the slowness of cmd/juju [05:10] you probably won't like it [05:10] maybe :-) [05:11] wallyworld_: do you want me to tell you what i've done [05:11] or just send a PR [05:11] tl;dr; version? [05:12] i've moved some of the tests into another package [05:12] which package? [05:12] cmd/juju -> cmd/juju/test [05:13] doesn't that just move the problem? [05:13] the problem is the package takes > 600s to test [05:13] split the package up [05:13] each part takes less time [05:13] has this been causing failed builds? [05:13] yes [05:14] well there is the usual nonsense with the repl sets [05:14] but over the weekend cmd/juju has constantly been taking > 600 seconds [05:14] it takes 400s on my machine uncontended [05:14] and clsoe to 550 with other tests running in parallel [05:14] we could simply tweak the test timeout until the tests can be fixed [05:14] i'm proposing my solution [05:14] you're welcome to nack it [05:14] ok [05:15] but in my experience raising timeouts only leads to raising timeouts again [05:15] and again [05:15] and again [05:16] well, we need to fix the test [05:16] moving them just messes up the code base [05:18] axw: remind me - to log onto juju's mongo, the password is recorded in the jenv file. yet mongo --ssl -u admin -p xxxx --port 37017 fails with an auth error [05:19] wallyworld_: i didn't say it was a perfect solution [05:19] but it solves the problem we have today [05:19] where it takes two days to land a branch [05:19] so does increasing the timeout [05:19] without any churn the the code [05:20] wallyworld_: i'll propose my solution, you can nack it [05:20] two days? [05:20] wallyworld_: try the hash of the password [05:20] axw: sha256? [05:21] wallyworld_: it took me friday, saturday anf this morning to land my names branch [05:21] it failed 7 times [05:21] wallyworld_: I don't know what the hash type is, it's stored in the bootstrap agent's conf file IIRC [05:21] davecheney: timeouts? i've never had any failure due to timeouts like that [05:21] wallyworld_: what are you trying to do? [05:21] axw: i want to log in and look at the collections [05:22] wallyworld_: this is what tim pinged you aout this morning [05:22] go check the build dashboard [05:22] wallyworld_: https://github.com/kapilt/juju-dbinspect [05:22] dozens of red builds all from cmd/juju timing out [05:25] i see one red dot because of that [05:25] maybe the other timeouts caused -p 2 to run and those worked [05:27] wallyworld_: those are the timeouts i'm talking about [05:27] and the reason why the build times have gone from 18 mintes to an hour [05:27] in the last week [05:27] we need to identify the root cause and fix that [05:28] something must have changed to make the tests start running so long [05:28] i noted on this channel 2 montsh ago that the times for cmd/juju were growiung [05:28] they have now passed the 10 minute mark [05:28] this is the result [05:28] every SINGLE test case in cmd/juju takes 3-4 seconds to setup [05:29] every time we add a new command or option [05:29] boom another 3-5 seconds gone [05:33] yep, so there's a systemic problem with those tests [05:34] moving stuff to a new package is simply rearranging the deck chairs on the titanic [05:49] wallyworld_: no argument there [05:49] but it has blocked landing anything for a week now [05:49] not bloked [05:49] (do you want to see the graph again) [05:49] yes blocked [05:49] landings have occurred [05:50] my change took all weekend to land [05:50] only one was blocked [05:50] the test have gone from 18 mntes to an hour [05:50] only one red dot was attributable to this problem [05:50] because of this [05:50] so, instead of being able to land 3 changes per hour [05:50] we can land less than one [05:50] hows that for productivity ? [05:50] that's not blocked bu definition [05:50] my change is downright horrible [05:50] but as nobody else is stepping up to address this issue [05:50] so we can work around it by running with -p 2 to start with [05:50] i stand by it [05:51] we need to draw a line in the sand and just fix the freaking tests [05:53] just as curtis has now started being firm about blocking landings for regressions, we also need to take a first stance [05:53] firm === uru-afk is now known as urulama [05:54] fiar enough [05:54] i don't know how to resolve this situation [06:00] someone needs to look into what happens at test startup and determine how to better mock out the backend [06:17] axw: i gotta go get my son and take him to doctor, will look at your branch when i get back [06:18] wallyworld_: cheers [06:18] it's a big one... [06:21] that's what she said [07:01] jam, hey, sorry I'll be 10m late for 1-1 [07:01] dimitern: np [07:01] thanks for letting me know [07:53] morning [08:22] morning TheMue [08:22] mgz: if you're still doing reviews today: https://github.com/juju/juju/pull/601 [08:48] jam, https://github.com/juju/juju/pull/601 LGTM [08:51] axw: i wanted to get another fix done for 1.20. i can look at your branch now. if you have time, maybe you could look at https://github.com/juju/juju/pull/602 [08:51] wallyworld_: sure, looking [08:52] axw: i'm not 100% sure it will fix the issue - the fix is based on reading the code [08:54] wallyworld_: we shouldn't be starting the container provisioner until after the upgrades have finished, right? [08:55] axw: it starts after the upgrade steps worker yes, but the upgrade work starts in parallel with it [08:55] it's a bit confusing [08:55] the upgrade work lsitens for upgrade requests [08:55] wallyworld_: huh? how can it start after and in parallel? [08:56] the upgrade steps worker does the upgardes [08:56] there's 2 workers - upgrade and upgrade steps [08:56] wallyworld_: ah ok. after upgrade steps, in parallel with upgrader [08:56] yes, that's what it looked like [08:57] and if there were no upgrade steps, it's all a bit of a race to see who starts first [09:56] good morning everyone, I am OCR today along with mgz feel free to ask for reviews, Ill be taking a look at the queue anyway [10:28] wallyworld_: this looks like a spurious failure to me: http://juju-ci.vapour.ws:8080/job/github-merge-juju/411/console [10:28] I'll dig into it [10:28] but in case you wanted it around for reference [10:29] jam: thanks. its on my todo list this week to get these documented. our tests need work [10:29] this isn't very helpful: [10:29] goroutine 17700 [running]: [10:29] goroutine running on other thread; stack unavailable [10:29] created by launchpad.net/gocheck.(*suiteRunner).forkCall [10:29] /home/ubuntu/juju-core_1.21-alpha1/src/launchpad.net/gocheck/gocheck.go:631 +0x23f [10:29] the only thing "running" is trying to fork something [10:33] wallyworld_: the second failure I don't fully understand, as it seems like maybe something didn't clean up in time and stayed bound to a port we thought we wanted to use in the next test [10:36] jam: it seems we have all sorts of isolation issues, plus issues with mongo startup failing at various times. sadly, many of our unit tests aren't really unit tests [10:46] dimitern: standup ? [10:47] jam, owm [11:00] TheMue: you weren't supposed to change your networking [12:56] is there anyone besides menno that is familiar with upgrade mode? [12:59] perrito666, wallyworld_ afaik, but he might be off already [12:59] tx dimitern [12:59] * wallyworld_ is sorta here [13:00] menno did all of the upgrade mode stuff though [13:00] i there a specific question? [13:01] wallyworld_, I see, ok it seems my knowledge is a bit out of date :) [13:01] dimitern: i did the initital upgrade work, but tim's team has since taken it on [13:01] wallyworld_: not really I am implementing a "restore mode" and william told me to discuss with menno to make sure I dont reinvent the wheel [13:02] seems like there will be overlap there, or similar restrictions anyway [13:03] perrito666, wallyworld_, what's a "mode"? a runner that runs its workers with delay from the rest? [13:03] or more like a uniter mode [13:03] wallyworld_: there will certainly be [13:04] dimitern: it is an arbitrary term I believe [13:04] :) ah [13:04] in the case of upgrade [13:04] its a state of the API server where it rejects most requests [13:04] dimitern: at a high level, it means that the state server will reject connections while an upgrade is still running [13:04] with an error indicating its upgrading [13:04] wallyworld_, perrito666, I see, thanks guys! [13:05] there's so much to keep track off [13:05] dimitern: restore mode is to do something very similar [13:05] yeah, just keeping all the far and wide networking effort going pretty much leaves me in the dark about what's going on with the other teams :/ [13:06] yup, same here [13:09] What hooks would be the appropriate place to call open-port? I'm guessing config-changed, start, install, etc. and close-port should be called on ./stop? [13:15] JoshStrobl, in practice it doesn't matter - just do it (in a hook) before you need to use it; ideally both at config-changed time (close-port first, open-port after that) [13:15] s/at// [13:17] sorry, that should've been s/ideally both at/ideally at/ [13:17] dimitern, thanks :) [13:17] JoshStrobl, np :) and as for close-port - ideally in the stop hook [13:18] dimitern, yea I figured that much :D [13:20] mgz, hey, it seems the merge bot has some issues today - multiple failures, timeouts.. [13:22] /join ##cacheio badblocks === HankM00dy is now known as thehe === psivaa_ is now known as psivaa [14:38] mgz: you around? === Ursinha is now known as Ursinha-afk === jrwren_ is now known as jrwren [17:17] fyi bug https://bugs.launchpad.net/juju-core/+bug/1322705 still is not targetted. [17:17] Bug #1322705: juju help does not contain Joyent help information [17:17] I'll see if I can get a branch proposal out === Ursinha-afk is now known as Ursinha === Ursinha is now known as Ursinha-afk === Ursinha-afk is now known as Ursinha === mwhudson_ is now known as mwhudson === JoshStrobl is now known as JoshStrobl[ZZZ] [23:02] thumper: standup? [23:02] coming [23:23] wallyworld: morning [23:23] thumper: hey [23:23] wallyworld: something to start your day :-) https://bugs.launchpad.net/juju-core/+bug/1361216 [23:23] Bug #1361216: unit tests for all series and archs fail [23:23] sigh [23:23] wallyworld: it isn't a sudden change [23:24] wallyworld: it is JujuConnSuite, replica set, and cmd/juju tests [23:24] three different issues [23:24] yep [23:24] i can't fix them all [23:24] it needs a whole of team approach [23:25] I think we could fix many by changing cmd/juju tests to use a mock [23:25] or mocks [23:25] you think? [23:25] yes [23:25] most of the cmd/juju tests do too much [23:25] that was sarcasm [23:25] they are end to end tests [23:25] oh [23:25] sorry [23:25] hard to tell [23:25] yeah [23:26] i'm just pissed off out tests are so bad [23:26] why oh why did anyone think it's a good idea to bring up the whole stack to run unit tests [23:27] i guess the same reason why we interleave mongo throughout all of our business logic :-(