perrito666 | hey,anyone seen menn0? | 00:35 |
---|---|---|
thumper | perrito666: he is on holiday this week | 00:57 |
perrito666 | such is my luck | 00:57 |
perrito666 | thank you thumper | 00:57 |
thumper | np | 00:57 |
davecheney | thumper: wallyworld something is screwed with the cmd/juju tests | 01:05 |
davecheney | 1,000's of goroutines all waiting on some semaphore to send the 404 message baqck to the client | 01:05 |
wallyworld | is this new? or more likely thy've been screwed for a while | 01:06 |
davecheney | not new | 01:07 |
davecheney | but really getting worse | 01:07 |
davecheney | who knows | 01:07 |
davecheney | what spec is the machine you are booting for running CI tests ? | 01:07 |
axw | wallyworld: hey, was TestBootstrapNoTools failing? | 01:14 |
axw | ah, I see a bug # | 01:14 |
wallyworld | axw: yeah, on i386 and ppc64 | 01:15 |
wallyworld | davecheney: it's a m3.large i think | 01:15 |
axw | sorry, should've run them all with i386 first | 01:15 |
wallyworld | np | 01:15 |
wallyworld | axw: i think that test is obsolete now anyway | 01:15 |
davecheney | wallyworld: ok, that shouldn't be penalised for running cpu intensive jobs | 01:15 |
davecheney | i know the smalls and mediums get heavily penalised actually using the cpu | 01:16 |
wallyworld | aws reduces their priority? | 01:16 |
davecheney | yup | 01:18 |
davecheney | spin up a small or a medum | 01:18 |
davecheney | run mpstat | 01:18 |
davecheney | and look at the %steal colum | 01:19 |
davecheney | can be up to 80% on t1.smalls | 01:19 |
davecheney | the more cpu you use, the more you are penalised | 01:19 |
davecheney | so when you REALLY need to go fast, you go the slowest | 01:19 |
davecheney | yay, cloud | 01:19 |
davecheney | wallyworld: when cmd/juju does pass, it passed by the barest of margins | 01:22 |
davecheney | ok github.com/juju/juju/cmd/juju597.325s | 01:22 |
davecheney | 2.675 seconds and this build would have failed | 01:22 |
wallyworld | yep | 01:22 |
wallyworld | if i had a magic wand to fix the tests i'd wave it | 01:23 |
davecheney | go test -test.timeout=900s github.com/juju/juju/... | 01:23 |
davecheney | possibly the option needs to go at the end | 01:23 |
davecheney | but there is something horribly wrong with that cmd/juju test | 01:23 |
axw | yeah, it goes end-to-end when it doesn't need to | 01:24 |
davecheney | axw: do you know the name of the test off the top of your head | 01:27 |
davecheney | i noticed it's using io.Pipe, not os.Pipe | 01:27 |
davecheney | some buffering may help | 01:28 |
axw | davecheney: which test? | 01:28 |
davecheney | 11:23 < davecheney> but there is something horribly wrong with that cmd/juju test | 01:28 |
davecheney | 11:24 < axw> yeah, it goes end-to-end when it doesn't need to | 01:28 |
axw | I was referring to the entire package | 01:28 |
davecheney | :( | 01:28 |
wallyworld | davecheney: so many of our tests suck - they are not unit tests | 01:28 |
wallyworld | there's no easy fix | 01:28 |
davecheney | wallyworld: +1,000,000 to the 'not unit test' comment | 01:29 |
wallyworld | yep :-( | 01:29 |
wallyworld | doesn't help that mongo is *everywhere* | 01:29 |
axw | we don't even need the API server in the mix, let alone mongo | 01:30 |
axw | (for cmd/juju) | 01:30 |
wallyworld | yup, the mongo comment was a general lament | 01:32 |
perrito666 | ok folks Ill go finish my sunday | 01:35 |
perrito666 | btw if someone wants to review https://github.com/juju/juju/pull/530 which has only been reviewed by junior reviewers and is in need of being merged | 01:36 |
perrito666 | cheers | 01:37 |
thumper | davecheney: https://github.com/juju/juju/pull/570 | 01:43 |
davecheney | thumper: /me looks | 01:58 |
davecheney | thumper: this all looks rather uncontravesial | 01:59 |
thumper | davecheney: it should be :) | 01:59 |
waigani | thumper: what version should the upgrade step target (adding state users as environ users) | 02:15 |
thumper | waigani: 1.21.alpha1 | 02:15 |
waigani | thumper: a 'get all users' function is not jumping out at me, does one exist (I've got admin)? | 02:16 |
thumper | waigani: I don't think there is one yet | 02:37 |
davecheney | waigani: i don't think one exists | 02:38 |
davecheney | we only have one user | 02:38 |
davecheney | so you sort of knew the answer | 02:38 |
waigani | in that case is it okay to just add admin to environ in the upgrade step? | 02:38 |
davecheney | as long as you dont' call them "admin" | 02:40 |
waigani | sigh | 02:42 |
thumper | waigani: no... there is an example with the upgrade step | 02:43 |
thumper | waigani: already in the code that iterates through ever user | 02:43 |
thumper | waigani: it updates the last connection thingy | 02:43 |
thumper | waigani: so just do something similar | 02:43 |
waigani | thumper: okay, I'll take a look | 02:43 |
thumper | davecheney: and I'm dealing with the admin name :) | 02:43 |
waigani | davecheney, thumper: what are we calling admin once it's added as an environment user? | 02:44 |
waigani | oh just saw your comment thumper, so I'll leave it admin for now? | 02:44 |
thumper | waigani: what do you mean? | 02:44 |
thumper | don't alter the existing user, just connect to the environment by adding an EnvUser | 02:44 |
waigani | thumper: just trying to grok davecheney's comment about admin name | 02:44 |
* thumper takes a deep breath and fixes state.Initialize | 02:45 | |
davecheney | waigani: the initial user of an envionment is "admin" | 02:46 |
davecheney | wallyworld__: thumper so, bad news | 03:01 |
davecheney | there is no stand out test in cmd/juju | 03:02 |
davecheney | they are all slow | 03:02 |
davecheney | well they are fast | 03:02 |
davecheney | milliseconds | 03:02 |
davecheney | but the setup for each test is a few seconds each | 03:02 |
wallyworld__ | that's a common failing across most of our tests because they are not true unit tests | 03:03 |
* thumper agrees | 03:03 | |
thumper | waigani: can't hear you | 03:03 |
waigani | thumper: shit sorry, hangon | 03:03 |
wallyworld__ | axw: when you have a moment, a fairly small one https://github.com/juju/juju/pull/598 | 03:29 |
axw | wallyworld__: yup, looking | 03:29 |
wallyworld__ | axw: thanks, yeah, i'm only doing this for 1.20 | 03:33 |
axw | wallyworld__: cool | 03:35 |
axw | sorry, missed simplification until after LGTM, but it doesn't really matter | 03:35 |
thumper | davecheney: your branch landed \o/ | 03:39 |
thumper | mine is running now | 03:39 |
waigani | thumper: usersC.Find(nil).All(&users) finds one zero valued user, yet st.User("admin") finds the admin user? | 03:55 |
thumper | waigani: then it will be a different query to list them all | 04:00 |
thumper | perhaps you just have to specify something that matches all possible ones | 04:00 |
thumper | axw: do you have a few minutes to talk through a problem? | 04:02 |
axw | thumper: in a few mins, lunch is about to come out of the oven | 04:02 |
thumper | axw: ok, ping me when you have 10 minutes or so... I'll pause and do something else for now | 04:03 |
axw | thumper: ready | 04:13 |
thumper | axw: https://plus.google.com/hangouts/_/grprl36idkwixt2q2cpajgeip4a?authuser=1&hl=en | 04:14 |
wallyworld_ | thumper: is menno around today? | 04:37 |
thumper | waigani: menno is on leave this week | 04:37 |
wallyworld_ | ok, ta | 04:37 |
wallyworld_ | he should have marked it on the calendar :-) | 04:38 |
thumper | yeah, I'll go add it | 04:39 |
davecheney | thumper: finally | 04:50 |
davecheney | looking into just getting a little more out of the cmd/juju tests now | 04:50 |
davecheney | alone they take 400s on my machine | 04:50 |
davecheney | running with others they can take > 600 | 04:50 |
davecheney | wallyworld_: I have a fix for the slowness of cmd/juju | 05:10 |
davecheney | you probably won't like it | 05:10 |
wallyworld_ | maybe :-) | 05:10 |
davecheney | wallyworld_: do you want me to tell you what i've done | 05:11 |
davecheney | or just send a PR | 05:11 |
wallyworld_ | tl;dr; version? | 05:11 |
davecheney | i've moved some of the tests into another package | 05:12 |
wallyworld_ | which package? | 05:12 |
davecheney | cmd/juju -> cmd/juju/test | 05:12 |
wallyworld_ | doesn't that just move the problem? | 05:13 |
davecheney | the problem is the package takes > 600s to test | 05:13 |
davecheney | split the package up | 05:13 |
davecheney | each part takes less time | 05:13 |
wallyworld_ | has this been causing failed builds? | 05:13 |
davecheney | yes | 05:13 |
davecheney | well there is the usual nonsense with the repl sets | 05:14 |
davecheney | but over the weekend cmd/juju has constantly been taking > 600 seconds | 05:14 |
davecheney | it takes 400s on my machine uncontended | 05:14 |
davecheney | and clsoe to 550 with other tests running in parallel | 05:14 |
wallyworld_ | we could simply tweak the test timeout until the tests can be fixed | 05:14 |
davecheney | i'm proposing my solution | 05:14 |
davecheney | you're welcome to nack it | 05:14 |
wallyworld_ | ok | 05:14 |
davecheney | but in my experience raising timeouts only leads to raising timeouts again | 05:15 |
davecheney | and again | 05:15 |
davecheney | and again | 05:15 |
wallyworld_ | well, we need to fix the test | 05:16 |
wallyworld_ | moving them just messes up the code base | 05:16 |
wallyworld_ | axw: remind me - to log onto juju's mongo, the password is recorded in the jenv file. yet mongo --ssl -u admin -p xxxx --port 37017 fails with an auth error | 05:18 |
davecheney | wallyworld_: i didn't say it was a perfect solution | 05:19 |
davecheney | but it solves the problem we have today | 05:19 |
davecheney | where it takes two days to land a branch | 05:19 |
wallyworld_ | so does increasing the timeout | 05:19 |
wallyworld_ | without any churn the the code | 05:19 |
davecheney | wallyworld_: i'll propose my solution, you can nack it | 05:20 |
wallyworld_ | two days? | 05:20 |
axw | wallyworld_: try the hash of the password | 05:20 |
wallyworld_ | axw: sha256? | 05:20 |
davecheney | wallyworld_: it took me friday, saturday anf this morning to land my names branch | 05:21 |
davecheney | it failed 7 times | 05:21 |
axw | wallyworld_: I don't know what the hash type is, it's stored in the bootstrap agent's conf file IIRC | 05:21 |
wallyworld_ | davecheney: timeouts? i've never had any failure due to timeouts like that | 05:21 |
axw | wallyworld_: what are you trying to do? | 05:21 |
wallyworld_ | axw: i want to log in and look at the collections | 05:21 |
davecheney | wallyworld_: this is what tim pinged you aout this morning | 05:22 |
davecheney | go check the build dashboard | 05:22 |
axw | wallyworld_: https://github.com/kapilt/juju-dbinspect | 05:22 |
davecheney | dozens of red builds all from cmd/juju timing out | 05:22 |
wallyworld_ | i see one red dot because of that | 05:25 |
wallyworld_ | maybe the other timeouts caused -p 2 to run and those worked | 05:25 |
davecheney | wallyworld_: those are the timeouts i'm talking about | 05:27 |
davecheney | and the reason why the build times have gone from 18 mintes to an hour | 05:27 |
davecheney | in the last week | 05:27 |
wallyworld_ | we need to identify the root cause and fix that | 05:27 |
wallyworld_ | something must have changed to make the tests start running so long | 05:28 |
davecheney | i noted on this channel 2 montsh ago that the times for cmd/juju were growiung | 05:28 |
davecheney | they have now passed the 10 minute mark | 05:28 |
davecheney | this is the result | 05:28 |
davecheney | every SINGLE test case in cmd/juju takes 3-4 seconds to setup | 05:28 |
davecheney | every time we add a new command or option | 05:29 |
davecheney | boom another 3-5 seconds gone | 05:29 |
wallyworld_ | yep, so there's a systemic problem with those tests | 05:33 |
wallyworld_ | moving stuff to a new package is simply rearranging the deck chairs on the titanic | 05:34 |
davecheney | wallyworld_: no argument there | 05:49 |
davecheney | but it has blocked landing anything for a week now | 05:49 |
wallyworld_ | not bloked | 05:49 |
davecheney | (do you want to see the graph again) | 05:49 |
davecheney | yes blocked | 05:49 |
wallyworld_ | landings have occurred | 05:49 |
davecheney | my change took all weekend to land | 05:50 |
wallyworld_ | only one was blocked | 05:50 |
davecheney | the test have gone from 18 mntes to an hour | 05:50 |
wallyworld_ | only one red dot was attributable to this problem | 05:50 |
davecheney | because of this | 05:50 |
davecheney | so, instead of being able to land 3 changes per hour | 05:50 |
davecheney | we can land less than one | 05:50 |
davecheney | hows that for productivity ? | 05:50 |
wallyworld_ | that's not blocked bu definition | 05:50 |
davecheney | my change is downright horrible | 05:50 |
davecheney | but as nobody else is stepping up to address this issue | 05:50 |
wallyworld_ | so we can work around it by running with -p 2 to start with | 05:50 |
davecheney | i stand by it | 05:50 |
wallyworld_ | we need to draw a line in the sand and just fix the freaking tests | 05:51 |
wallyworld_ | just as curtis has now started being firm about blocking landings for regressions, we also need to take a first stance | 05:53 |
wallyworld_ | firm | 05:53 |
=== uru-afk is now known as urulama | ||
davecheney | fiar enough | 05:54 |
davecheney | i don't know how to resolve this situation | 05:54 |
wallyworld_ | someone needs to look into what happens at test startup and determine how to better mock out the backend | 06:00 |
wallyworld_ | axw: i gotta go get my son and take him to doctor, will look at your branch when i get back | 06:17 |
axw | wallyworld_: cheers | 06:18 |
axw | it's a big one... | 06:18 |
wallyworld_ | that's what she said | 06:21 |
dimitern | jam, hey, sorry I'll be 10m late for 1-1 | 07:01 |
jam | dimitern: np | 07:01 |
jam | thanks for letting me know | 07:01 |
TheMue | morning | 07:53 |
jam | morning TheMue | 08:22 |
jam | mgz: if you're still doing reviews today: https://github.com/juju/juju/pull/601 | 08:22 |
dimitern | jam, https://github.com/juju/juju/pull/601 LGTM | 08:48 |
wallyworld_ | axw: i wanted to get another fix done for 1.20. i can look at your branch now. if you have time, maybe you could look at https://github.com/juju/juju/pull/602 | 08:51 |
axw | wallyworld_: sure, looking | 08:51 |
wallyworld_ | axw: i'm not 100% sure it will fix the issue - the fix is based on reading the code | 08:52 |
axw | wallyworld_: we shouldn't be starting the container provisioner until after the upgrades have finished, right? | 08:54 |
wallyworld_ | axw: it starts after the upgrade steps worker yes, but the upgrade work starts in parallel with it | 08:55 |
wallyworld_ | it's a bit confusing | 08:55 |
wallyworld_ | the upgrade work lsitens for upgrade requests | 08:55 |
axw | wallyworld_: huh? how can it start after and in parallel? | 08:55 |
wallyworld_ | the upgrade steps worker does the upgardes | 08:56 |
wallyworld_ | there's 2 workers - upgrade and upgrade steps | 08:56 |
axw | wallyworld_: ah ok. after upgrade steps, in parallel with upgrader | 08:56 |
wallyworld_ | yes, that's what it looked like | 08:56 |
wallyworld_ | and if there were no upgrade steps, it's all a bit of a race to see who starts first | 08:57 |
perrito666 | good morning everyone, I am OCR today along with mgz feel free to ask for reviews, Ill be taking a look at the queue anyway | 09:56 |
jam | wallyworld_: this looks like a spurious failure to me: http://juju-ci.vapour.ws:8080/job/github-merge-juju/411/console | 10:28 |
jam | I'll dig into it | 10:28 |
jam | but in case you wanted it around for reference | 10:28 |
wallyworld_ | jam: thanks. its on my todo list this week to get these documented. our tests need work | 10:29 |
jam | this isn't very helpful: | 10:29 |
jam | goroutine 17700 [running]: | 10:29 |
jam | goroutine running on other thread; stack unavailable | 10:29 |
jam | created by launchpad.net/gocheck.(*suiteRunner).forkCall | 10:29 |
jam | /home/ubuntu/juju-core_1.21-alpha1/src/launchpad.net/gocheck/gocheck.go:631 +0x23f | 10:29 |
jam | the only thing "running" is trying to fork something | 10:29 |
jam | wallyworld_: the second failure I don't fully understand, as it seems like maybe something didn't clean up in time and stayed bound to a port we thought we wanted to use in the next test | 10:33 |
wallyworld_ | jam: it seems we have all sorts of isolation issues, plus issues with mongo startup failing at various times. sadly, many of our unit tests aren't really unit tests | 10:36 |
jam | dimitern: standup ? | 10:46 |
dimitern | jam, owm | 10:47 |
jam | TheMue: you weren't supposed to change your networking | 11:00 |
perrito666 | is there anyone besides menno that is familiar with upgrade mode? | 12:56 |
dimitern | perrito666, wallyworld_ afaik, but he might be off already | 12:59 |
perrito666 | tx dimitern | 12:59 |
* wallyworld_ is sorta here | 12:59 | |
wallyworld_ | menno did all of the upgrade mode stuff though | 13:00 |
wallyworld_ | i there a specific question? | 13:00 |
dimitern | wallyworld_, I see, ok it seems my knowledge is a bit out of date :) | 13:01 |
wallyworld_ | dimitern: i did the initital upgrade work, but tim's team has since taken it on | 13:01 |
perrito666 | wallyworld_: not really I am implementing a "restore mode" and william told me to discuss with menno to make sure I dont reinvent the wheel | 13:01 |
wallyworld_ | seems like there will be overlap there, or similar restrictions anyway | 13:02 |
dimitern | perrito666, wallyworld_, what's a "mode"? a runner that runs its workers with delay from the rest? | 13:03 |
dimitern | or more like a uniter mode | 13:03 |
perrito666 | wallyworld_: there will certainly be | 13:03 |
perrito666 | dimitern: it is an arbitrary term I believe | 13:04 |
dimitern | :) ah | 13:04 |
perrito666 | in the case of upgrade | 13:04 |
perrito666 | its a state of the API server where it rejects most requests | 13:04 |
wallyworld_ | dimitern: at a high level, it means that the state server will reject connections while an upgrade is still running | 13:04 |
perrito666 | with an error indicating its upgrading | 13:04 |
dimitern | wallyworld_, perrito666, I see, thanks guys! | 13:04 |
wallyworld_ | there's so much to keep track off | 13:05 |
perrito666 | dimitern: restore mode is to do something very similar | 13:05 |
dimitern | yeah, just keeping all the far and wide networking effort going pretty much leaves me in the dark about what's going on with the other teams :/ | 13:05 |
wallyworld_ | yup, same here | 13:06 |
JoshStrobl | What hooks would be the appropriate place to call open-port? I'm guessing config-changed, start, install, etc. and close-port should be called on ./stop? | 13:09 |
dimitern | JoshStrobl, in practice it doesn't matter - just do it (in a hook) before you need to use it; ideally both at config-changed time (close-port first, open-port after that) | 13:15 |
dimitern | s/at// | 13:15 |
dimitern | sorry, that should've been s/ideally both at/ideally at/ | 13:17 |
JoshStrobl | dimitern, thanks :) | 13:17 |
dimitern | JoshStrobl, np :) and as for close-port - ideally in the stop hook | 13:17 |
JoshStrobl | dimitern, yea I figured that much :D | 13:18 |
dimitern | mgz, hey, it seems the merge bot has some issues today - multiple failures, timeouts.. | 13:20 |
ppetraki | /join ##cacheio badblocks | 13:22 |
=== HankM00dy is now known as thehe | ||
=== psivaa_ is now known as psivaa | ||
ericsnow | mgz: you around? | 14:38 |
=== Ursinha is now known as Ursinha-afk | ||
=== jrwren_ is now known as jrwren | ||
arosales | fyi bug https://bugs.launchpad.net/juju-core/+bug/1322705 still is not targetted. | 17:17 |
mup | Bug #1322705: juju help does not contain Joyent help information <joyent-provider> <ui> <juju-core:Triaged> <https://launchpad.net/bugs/1322705> | 17:17 |
arosales | I'll see if I can get a branch proposal out | 17:17 |
=== Ursinha-afk is now known as Ursinha | ||
=== Ursinha is now known as Ursinha-afk | ||
=== Ursinha-afk is now known as Ursinha | ||
=== mwhudson_ is now known as mwhudson | ||
=== JoshStrobl is now known as JoshStrobl[ZZZ] | ||
waigani_ | thumper: standup? | 23:02 |
thumper | coming | 23:02 |
thumper | wallyworld: morning | 23:23 |
wallyworld | thumper: hey | 23:23 |
thumper | wallyworld: something to start your day :-) https://bugs.launchpad.net/juju-core/+bug/1361216 | 23:23 |
mup | Bug #1361216: unit tests for all series and archs fail <ci> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1361216> | 23:23 |
wallyworld | sigh | 23:23 |
thumper | wallyworld: it isn't a sudden change | 23:23 |
thumper | wallyworld: it is JujuConnSuite, replica set, and cmd/juju tests | 23:24 |
thumper | three different issues | 23:24 |
wallyworld | yep | 23:24 |
wallyworld | i can't fix them all | 23:24 |
wallyworld | it needs a whole of team approach | 23:24 |
thumper | I think we could fix many by changing cmd/juju tests to use a mock | 23:25 |
thumper | or mocks | 23:25 |
wallyworld | you think? | 23:25 |
thumper | yes | 23:25 |
thumper | most of the cmd/juju tests do too much | 23:25 |
wallyworld | that was sarcasm | 23:25 |
thumper | they are end to end tests | 23:25 |
thumper | oh | 23:25 |
thumper | sorry | 23:25 |
thumper | hard to tell | 23:25 |
wallyworld | yeah | 23:25 |
wallyworld | i'm just pissed off out tests are so bad | 23:26 |
wallyworld | why oh why did anyone think it's a good idea to bring up the whole stack to run unit tests | 23:26 |
wallyworld | i guess the same reason why we interleave mongo throughout all of our business logic :-( | 23:27 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!