/srv/irclogs.ubuntu.com/2014/08/25/#juju-dev.txt

perrito666hey,anyone seen menn0?00:35
thumperperrito666: he is on holiday this week00:57
perrito666such is my luck00:57
perrito666thank you thumper00:57
thumpernp00:57
davecheneythumper: wallyworld something is screwed with the cmd/juju tests01:05
davecheney1,000's of goroutines all waiting on some semaphore to send the 404 message baqck to the client01:05
wallyworldis this new? or more likely thy've been screwed for a while01:06
davecheneynot new01:07
davecheneybut really getting worse01:07
davecheneywho knows01:07
davecheneywhat spec is the machine you are booting for running CI tests ?01:07
axwwallyworld: hey, was TestBootstrapNoTools failing?01:14
axwah, I see a bug #01:14
wallyworldaxw: yeah, on i386 and ppc6401:15
wallyworlddavecheney: it's a m3.large i think01:15
axwsorry, should've run them all with i386 first01:15
wallyworldnp01:15
wallyworldaxw: i think that test is obsolete now anyway01:15
davecheneywallyworld: ok, that shouldn't be penalised for running cpu intensive jobs01:15
davecheneyi know the smalls and mediums get heavily penalised actually using the cpu01:16
wallyworldaws reduces their priority?01:16
davecheneyyup01:18
davecheneyspin up a small or a medum01:18
davecheneyrun mpstat01:18
davecheneyand look at the %steal colum01:19
davecheneycan be up to 80% on t1.smalls01:19
davecheneythe more cpu you use, the more you are penalised01:19
davecheneyso when you REALLY need to go fast, you go the slowest01:19
davecheneyyay, cloud01:19
davecheneywallyworld: when cmd/juju does pass, it passed by the barest of margins01:22
davecheneyok  github.com/juju/juju/cmd/juju597.325s01:22
davecheney2.675 seconds and this build would have failed01:22
wallyworldyep01:22
wallyworldif i had a magic wand to fix the tests i'd wave it01:23
davecheneygo test -test.timeout=900s github.com/juju/juju/...01:23
davecheneypossibly the option needs to go at the end01:23
davecheneybut there is something horribly wrong with that cmd/juju test01:23
axwyeah, it goes end-to-end when it doesn't need to01:24
davecheneyaxw: do you know the name of the test off the top of your head01:27
davecheneyi noticed it's using io.Pipe, not os.Pipe01:27
davecheneysome buffering may help01:28
axwdavecheney: which test?01:28
davecheney11:23 < davecheney> but there is something horribly wrong with that cmd/juju test01:28
davecheney11:24 < axw> yeah, it goes end-to-end when it doesn't need to01:28
axwI was referring to the entire package01:28
davecheney:(01:28
wallyworlddavecheney: so many of our tests suck - they are not unit tests01:28
wallyworldthere's no easy fix01:28
davecheneywallyworld: +1,000,000 to the 'not unit test' comment01:29
wallyworldyep :-(01:29
wallyworlddoesn't help that mongo is *everywhere*01:29
axwwe don't even need the API server in the mix, let alone mongo01:30
axw(for cmd/juju)01:30
wallyworldyup, the mongo comment was a general lament01:32
perrito666ok folks Ill go finish my sunday01:35
perrito666btw if someone wants to review https://github.com/juju/juju/pull/530 which has only been reviewed by junior reviewers and is in need of being merged01:36
perrito666cheers01:37
thumperdavecheney: https://github.com/juju/juju/pull/57001:43
davecheneythumper: /me looks01:58
davecheneythumper: this all looks rather uncontravesial01:59
thumperdavecheney: it should be :)01:59
waiganithumper: what version should the upgrade step target (adding state users as environ users)02:15
thumperwaigani: 1.21.alpha102:15
waiganithumper: a 'get all users' function is not jumping out at me, does one exist (I've got admin)?02:16
thumperwaigani: I don't think there is one yet02:37
davecheneywaigani: i don't think one exists02:38
davecheneywe only have one user02:38
davecheneyso you sort of knew the answer02:38
waiganiin that case is it okay to just add admin to environ in the upgrade step?02:38
davecheneyas long as you dont' call them "admin"02:40
waiganisigh02:42
thumperwaigani: no... there is an example with the upgrade step02:43
thumperwaigani: already in the code that iterates through ever user02:43
thumperwaigani: it updates the last connection thingy02:43
thumperwaigani: so just do something similar02:43
waiganithumper: okay, I'll take a look02:43
thumperdavecheney: and I'm dealing with the admin name :)02:43
waiganidavecheney, thumper: what are we calling admin once it's added as an environment user?02:44
waiganioh just saw your comment thumper, so I'll leave it admin for now?02:44
thumperwaigani: what do you mean?02:44
thumperdon't alter the existing user, just connect to the environment by adding an EnvUser02:44
waiganithumper: just trying to grok davecheney's comment about admin name02:44
* thumper takes a deep breath and fixes state.Initialize02:45
davecheneywaigani: the initial user of an envionment is "admin"02:46
davecheneywallyworld__: thumper so, bad news03:01
davecheneythere is no stand out test in cmd/juju03:02
davecheneythey are all slow03:02
davecheneywell they are fast03:02
davecheneymilliseconds03:02
davecheneybut the setup for each test is a few seconds each03:02
wallyworld__that's a common failing across most of our tests because they are not true unit tests03:03
* thumper agrees03:03
thumperwaigani: can't hear you03:03
waiganithumper: shit sorry, hangon03:03
wallyworld__axw: when you have a moment, a fairly small one https://github.com/juju/juju/pull/59803:29
axwwallyworld__: yup, looking03:29
wallyworld__axw: thanks, yeah, i'm only doing this for 1.2003:33
axwwallyworld__: cool03:35
axwsorry, missed simplification until after LGTM, but it doesn't really matter03:35
thumperdavecheney: your branch landed \o/03:39
thumpermine is running now03:39
waiganithumper: usersC.Find(nil).All(&users) finds one zero valued user, yet st.User("admin") finds the admin user?03:55
thumperwaigani: then it will be a different query to list them all04:00
thumperperhaps you just have to specify something that matches all possible ones04:00
thumperaxw: do you have a few minutes to talk through a problem?04:02
axwthumper: in a few mins, lunch is about to come out of the oven04:02
thumperaxw: ok, ping me when you have 10 minutes or so... I'll pause and do something else for now04:03
axwthumper: ready04:13
thumperaxw: https://plus.google.com/hangouts/_/grprl36idkwixt2q2cpajgeip4a?authuser=1&hl=en04:14
wallyworld_thumper: is menno around today?04:37
thumperwaigani: menno is on leave this week04:37
wallyworld_ok, ta04:37
wallyworld_he should have marked it on the calendar :-)04:38
thumperyeah, I'll go add it04:39
davecheneythumper: finally04:50
davecheneylooking into just getting a little more out of the cmd/juju tests now04:50
davecheneyalone they take 400s on my machine04:50
davecheneyrunning with others they can take > 60004:50
davecheneywallyworld_: I have a fix for the slowness of cmd/juju05:10
davecheneyyou probably won't like it05:10
wallyworld_maybe :-)05:10
davecheneywallyworld_: do you want me to tell you what i've done05:11
davecheneyor just send a PR05:11
wallyworld_tl;dr; version?05:11
davecheneyi've moved some of the tests into another package05:12
wallyworld_which package?05:12
davecheneycmd/juju -> cmd/juju/test05:12
wallyworld_doesn't that just move the problem?05:13
davecheneythe problem is the package takes > 600s to test05:13
davecheneysplit the package up05:13
davecheneyeach part takes less time05:13
wallyworld_has this been causing failed builds?05:13
davecheneyyes05:13
davecheneywell there is the usual nonsense with the repl sets05:14
davecheneybut over the weekend cmd/juju has constantly been taking > 600 seconds05:14
davecheneyit takes 400s on my machine uncontended05:14
davecheneyand clsoe to 550 with other tests running in parallel05:14
wallyworld_we could simply tweak the test timeout until the tests can be fixed05:14
davecheneyi'm proposing my solution05:14
davecheneyyou're welcome to nack it05:14
wallyworld_ok05:14
davecheneybut in my experience raising timeouts only leads to raising timeouts again05:15
davecheneyand again05:15
davecheneyand again05:15
wallyworld_well, we need to fix the test05:16
wallyworld_moving them just messes up the code base05:16
wallyworld_axw: remind me - to log onto juju's mongo, the password is recorded in the jenv file. yet mongo --ssl -u admin -p xxxx --port 37017 fails with an auth error05:18
davecheneywallyworld_: i didn't say it was a perfect solution05:19
davecheneybut it solves the problem we have today05:19
davecheneywhere it takes two days to land a branch05:19
wallyworld_so does increasing the timeout05:19
wallyworld_without any churn the the code05:19
davecheneywallyworld_: i'll propose my solution, you can nack it05:20
wallyworld_two days?05:20
axwwallyworld_: try the hash of the password05:20
wallyworld_axw: sha256?05:20
davecheneywallyworld_: it took me friday, saturday anf this morning to land my names branch05:21
davecheneyit failed 7 times05:21
axwwallyworld_: I don't know what the hash type is, it's stored in the bootstrap agent's  conf file IIRC05:21
wallyworld_davecheney: timeouts? i've never had any failure due to timeouts like that05:21
axwwallyworld_: what are you trying to do?05:21
wallyworld_axw: i want to log in and look at the collections05:21
davecheneywallyworld_: this is what tim pinged you aout this morning05:22
davecheneygo check the build dashboard05:22
axwwallyworld_: https://github.com/kapilt/juju-dbinspect05:22
davecheneydozens of red builds all from cmd/juju timing out05:22
wallyworld_i see one red dot because of that05:25
wallyworld_maybe the other timeouts caused -p 2 to run and those worked05:25
davecheneywallyworld_: those are the timeouts i'm talking about05:27
davecheneyand the reason why the build times have gone from 18 mintes to an hour05:27
davecheneyin the last week05:27
wallyworld_we need to identify the root cause and fix that05:27
wallyworld_something must have changed to make the tests start running so long05:28
davecheneyi noted on this channel 2 montsh ago that the times for cmd/juju were growiung05:28
davecheneythey have now passed the 10 minute mark05:28
davecheneythis is the result05:28
davecheneyevery SINGLE test case in cmd/juju takes 3-4 seconds to setup05:28
davecheneyevery time we add a new command or option05:29
davecheneyboom another 3-5 seconds gone05:29
wallyworld_yep, so there's a systemic problem with those tests05:33
wallyworld_moving stuff to a new package is simply rearranging the deck chairs on the titanic05:34
davecheneywallyworld_: no argument there05:49
davecheneybut it has blocked landing anything for a week now05:49
wallyworld_not bloked05:49
davecheney(do you want to see the graph again)05:49
davecheneyyes blocked05:49
wallyworld_landings have occurred05:49
davecheneymy change took all weekend to land05:50
wallyworld_only one was blocked05:50
davecheneythe test have gone from 18 mntes to an hour05:50
wallyworld_only one red dot was attributable to this problem05:50
davecheneybecause of this05:50
davecheneyso, instead of being able to land 3 changes per hour05:50
davecheneywe can land less than one05:50
davecheneyhows that for productivity ?05:50
wallyworld_that's not blocked bu definition05:50
davecheneymy change is downright horrible05:50
davecheneybut as nobody else is stepping up to address this issue05:50
wallyworld_so we can work around it by running with -p 2 to start with05:50
davecheneyi stand by it05:50
wallyworld_we need to draw a line in the sand and just fix the freaking tests05:51
wallyworld_just as curtis has now started being firm about blocking landings for regressions, we also need to take a first stance05:53
wallyworld_firm05:53
=== uru-afk is now known as urulama
davecheneyfiar enough05:54
davecheneyi don't know how to resolve this situation05:54
wallyworld_someone needs to look into what happens at test startup and determine how to better mock out the backend06:00
wallyworld_axw: i gotta go get my son and take him to doctor, will look at your branch when i get back06:17
axwwallyworld_: cheers06:18
axwit's a big one...06:18
wallyworld_that's what she said06:21
dimiternjam, hey, sorry I'll be 10m late for 1-107:01
jamdimitern: np07:01
jamthanks for letting me know07:01
TheMuemorning07:53
jammorning TheMue08:22
jammgz: if you're still doing reviews today: https://github.com/juju/juju/pull/60108:22
dimiternjam, https://github.com/juju/juju/pull/601 LGTM08:48
wallyworld_axw: i wanted to get another fix done for 1.20. i can look at your branch now. if you have time, maybe you could look at https://github.com/juju/juju/pull/60208:51
axwwallyworld_: sure, looking08:51
wallyworld_axw: i'm not 100% sure it will fix the issue - the fix is based on reading the code08:52
axwwallyworld_: we shouldn't be starting the container provisioner until after the upgrades have finished, right?08:54
wallyworld_axw: it starts after the upgrade steps worker yes, but the upgrade work starts in parallel with it08:55
wallyworld_it's a bit confusing08:55
wallyworld_the upgrade work lsitens for upgrade requests08:55
axwwallyworld_: huh? how can it start after and in parallel?08:55
wallyworld_the upgrade steps worker does the upgardes08:56
wallyworld_there's 2 workers - upgrade and upgrade steps08:56
axwwallyworld_: ah ok. after upgrade steps, in parallel with upgrader08:56
wallyworld_yes, that's what it looked like08:56
wallyworld_and if there were no upgrade steps, it's all a bit of a race to see who starts first08:57
perrito666good morning everyone, I am OCR today along with mgz feel free to ask for reviews, Ill be taking a look at the queue anyway09:56
jamwallyworld_: this looks like a spurious failure to me: http://juju-ci.vapour.ws:8080/job/github-merge-juju/411/console10:28
jamI'll dig into it10:28
jambut in case you wanted it around for reference10:28
wallyworld_jam: thanks. its on my todo list this week to get these documented. our tests need work10:29
jamthis isn't very helpful:10:29
jamgoroutine 17700 [running]:10:29
jamgoroutine running on other thread; stack unavailable10:29
jamcreated by launchpad.net/gocheck.(*suiteRunner).forkCall10:29
jam/home/ubuntu/juju-core_1.21-alpha1/src/launchpad.net/gocheck/gocheck.go:631 +0x23f10:29
jamthe only thing "running" is trying to fork something10:29
jamwallyworld_: the second failure I don't fully understand, as it seems like maybe something didn't clean up in time and stayed bound to a port we thought we wanted to use in the next test10:33
wallyworld_jam: it seems we have all sorts of isolation issues, plus issues with mongo startup failing at various times. sadly, many of our unit tests aren't really unit tests10:36
jamdimitern: standup ?10:46
dimiternjam, owm10:47
jamTheMue: you weren't supposed to change your networking11:00
perrito666is there anyone besides menno that is familiar with upgrade mode?12:56
dimiternperrito666, wallyworld_ afaik, but he might be off already12:59
perrito666tx dimitern12:59
* wallyworld_ is sorta here12:59
wallyworld_menno did all of the upgrade mode stuff though13:00
wallyworld_i there a specific question?13:00
dimiternwallyworld_, I see, ok it seems my knowledge is a bit out of date :)13:01
wallyworld_dimitern: i did the initital upgrade work, but tim's team has since taken it on13:01
perrito666wallyworld_: not really I am implementing a "restore mode" and william told me to discuss with menno to make sure I dont reinvent the wheel13:01
wallyworld_seems like there will be overlap there, or similar restrictions anyway13:02
dimiternperrito666, wallyworld_, what's a "mode"? a runner that runs its workers with delay from the rest?13:03
dimiternor more like a uniter mode13:03
perrito666wallyworld_: there will certainly be13:03
perrito666dimitern: it is an arbitrary term I believe13:04
dimitern:) ah13:04
perrito666in the case of upgrade13:04
perrito666its a state of the API server where it rejects most requests13:04
wallyworld_dimitern: at a high level, it means that the state server will reject connections while an upgrade is still running13:04
perrito666with an error indicating its upgrading13:04
dimiternwallyworld_, perrito666, I see, thanks guys!13:04
wallyworld_there's so much to keep track off13:05
perrito666dimitern: restore mode is to do something very similar13:05
dimiternyeah, just keeping all the far and wide networking effort going pretty much leaves me in the dark about what's going on with the other teams :/13:05
wallyworld_yup, same here13:06
JoshStroblWhat hooks would be the appropriate place to call open-port? I'm guessing config-changed, start, install, etc. and close-port should be called on ./stop?13:09
dimiternJoshStrobl, in practice it doesn't matter - just do it (in a hook) before you need to use it; ideally both at config-changed time (close-port first, open-port after that)13:15
dimiterns/at//13:15
dimiternsorry, that should've been s/ideally both at/ideally at/13:17
JoshStrobldimitern, thanks :)13:17
dimiternJoshStrobl, np :) and as for close-port - ideally in the stop hook13:17
JoshStrobldimitern, yea I figured that much :D13:18
dimiternmgz, hey, it seems the merge bot has some issues today - multiple failures, timeouts..13:20
ppetraki /join ##cacheio badblocks13:22
=== HankM00dy is now known as thehe
=== psivaa_ is now known as psivaa
ericsnowmgz: you around?14:38
=== Ursinha is now known as Ursinha-afk
=== jrwren_ is now known as jrwren
arosalesfyi bug https://bugs.launchpad.net/juju-core/+bug/1322705 still is not targetted.17:17
mupBug #1322705: juju help does not contain Joyent help information <joyent-provider> <ui> <juju-core:Triaged> <https://launchpad.net/bugs/1322705>17:17
arosalesI'll see if I can get a branch proposal out17:17
=== Ursinha-afk is now known as Ursinha
=== Ursinha is now known as Ursinha-afk
=== Ursinha-afk is now known as Ursinha
=== mwhudson_ is now known as mwhudson
=== JoshStrobl is now known as JoshStrobl[ZZZ]
waigani_thumper: standup?23:02
thumpercoming23:02
thumperwallyworld: morning23:23
wallyworldthumper: hey23:23
thumperwallyworld: something to start your day :-) https://bugs.launchpad.net/juju-core/+bug/136121623:23
mupBug #1361216: unit tests for all series and archs fail <ci> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1361216>23:23
wallyworldsigh23:23
thumperwallyworld: it isn't a sudden change23:23
thumperwallyworld: it is JujuConnSuite, replica set, and cmd/juju tests23:24
thumperthree different issues23:24
wallyworldyep23:24
wallyworldi can't fix them all23:24
wallyworldit needs a whole of team approach23:24
thumperI think we could fix many by changing cmd/juju tests to use a mock23:25
thumperor mocks23:25
wallyworldyou think?23:25
thumperyes23:25
thumpermost of the cmd/juju tests do too much23:25
wallyworldthat was sarcasm23:25
thumperthey are end to end tests23:25
thumperoh23:25
thumpersorry23:25
thumperhard to tell23:25
wallyworldyeah23:25
wallyworldi'm just pissed off out tests are so bad23:26
wallyworldwhy oh why did anyone think it's a good idea to bring up the whole stack to run unit tests23:26
wallyworldi guess the same reason why we interleave mongo throughout all of our business logic :-(23:27

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!