/srv/irclogs.ubuntu.com/2014/12/16/#juju-dev.txt

thumpertime to go and make another coffee00:16
=== kadams54-away is now known as kadams54
=== kadams54 is now known as kadams54-away
menn0wallyworld: that's done. it ended being a standalone function in the envcmd package (it doesn't need anything on EnvCommandBase.00:36
menn0wallyworld: PTA quick L00:36
wallyworldsure, ty00:36
=== kadams54-away is now known as kadams54
wallyworldmenn0: thanks, looks good, and i do like where the helper is now located00:38
menn0wallyworld: great. merging!00:39
thumper\o/00:43
thumperanastasiamac_: hello on call reviewer :-) http://reviews.vapour.ws/r/639/01:05
thumperrick_h_: that one is for you ^^^01:06
wallyworldthumper: sadly anastasia is ill today01:17
wallyworldaxw: standup?01:17
thumperwallyworld: oh no... stuff is going around01:17
axwwallyworld: sorry, omw01:18
thumperwallyworld: could you take a look?01:18
thumperwallyworld: it is very small01:18
wallyworldthumper: sure, after standup whivch is now01:18
thumpercheers01:18
wallyworldmenn0: great that that bug is fix committed, now we just gotta wait for CI to be unblocked01:37
menn0wallyworld: yep. am backporting to 1.21 now01:38
wallyworldthumper: review done01:42
thumperwallyworld: cheers01:42
thumperwallyworld: valid comments, will fix when I'm not in the middle of another branch :)01:43
wallyworldsure :-)01:43
wallyworldthumper: i have a tiny one for you when you get a chance http://reviews.vapour.ws/r/640/02:06
thumperwallyworld: have you tested it?02:08
wallyworldthumper: doing it now on local and ec202:08
wallyworldthumper: i wish we could change the primary target for a bug - i don't think we can, can we?02:09
thumperyes02:09
thumperlike, for ages02:10
wallyworldi forget02:10
wallyworldi want to target to juju-core as primary02:10
wallyworldthen i can have targets for 1.22 and 1.2102:10
wallyworldmaybe i need to delete juju-core first02:10
wallyworldand then reassign02:10
thumperwhat do you mean?02:11
thumperall bug tasks are considered equal02:11
thumperhttps://bugs.launchpad.net/juju-core/+bug/132895802:12
mupBug #1328958: Local provider fails when an unrelated /home/ubuntu directory exists <local-provider> <juju-core:In Progress by wallyworld> <juju-core (Ubuntu):Triaged> <https://launchpad.net/bugs/1328958>02:12
wallyworldthumper: soon that bug, i can't add a 1.21 series02:12
wallyworldas the primary target is the distro02:12
wallyworldunless i'm stupid, which is a distinct possibility02:13
thumperwallyworld: is that what you wanted?02:13
thumperwallyworld: look at the bug02:13
wallyworldthumper: yes, so how did you change the primary target?02:14
thumperyou need to get to that bug through the project, personally, I just hacked the url02:14
wallyworldah, yes, you are right02:14
wallyworldi forgot that02:15
rick_h_  thumper :)02:26
=== kadams54_ is now known as kadams54-away
=== kadams54-away is now known as kadams54_
thumpermenn0: ugh... having to deal with environment specific collections in a non-normal way03:35
thumpermenn0: which means hoop jumping03:35
thumperand likely to clash with outstanding work...03:35
thumperfooey03:35
menn0menn0: do you want getRawCollection()?03:56
menn0thumper: ^^^ (did it again)04:02
wallyworldmenn0: i was to remove the legacy fields in MachineStatus in api client in master - i think that's ok, rught?04:03
thumpermenn0: nah, in the end I went with state.ForEnviron so I could use the easy method of getting the info :-)04:05
menn0wallyworld: hmmm04:11
menn0wallyworld: if we want old clients to continue working those fields have to stay04:12
wallyworldmaybe we can't if we must retain compatability with 1.2004:12
menn0wallyworld: 1.20 should be fine04:12
wallyworldbut i thought we only needed compatability with 1.2804:12
menn0wallyworld: but pre 1.19 won't work04:12
wallyworld1.1804:12
menn0wallyworld: exactly04:13
menn0wallyworld: a 1.18 client will still look for those fields04:13
wallyworldhmmm, yeah, i t will :-(04:13
wallyworlddamn04:13
menn0wallyworld: when I wrote that comment I was under the impression that we could stop worry about compatibility after 2 versions or something04:14
wallyworldi'll fix the comment04:14
menn0wallyworld: the fields could be removed now that we have API versioning though04:14
menn0wallyworld: as suggested by the comment04:14
wallyworldhmmm, but the server side call still needs to be there for older clients04:16
wallyworldso i'll have to leave the processAgent() logic alone04:16
wallyworldwhich is where i'm currently looking04:16
menn0wallyworld: yep. there would need to be 2 versions of the struct and 2 code paths04:16
wallyworldmight let sleeping dogs lie for now04:17
=== fuzzy_ is now known as Ponyo
=== kadams54_ is now known as kadams54-away
mattywmorning folks05:20
jam1axw: wallyworld: greetings06:02
jam1thought I'd try to catch you guys before we started for the day06:03
wallyworldjam1: hi06:06
axwjam1: hiya06:10
wallyworldjam1: let us know if you want to chat or whatever, maybe you've already started06:19
wallyworldfwereade: meeting?07:35
jam1wallyworld: we've started, though we're a bit hit and miss on internet connectivity, you should be seeing updates/responses in the doc08:16
wallyworldok, ty08:17
wallyworldaxw: ^^^^^08:17
axwyep I have been following updates08:17
axwthanks08:17
wallyworldfwereade: you around for our 1:1?08:17
TheMuemorning08:29
dimiternmorning TheMue09:11
TheMuedimitern: o/09:11
axwwallyworld: did you want to catch up still?10:02
wallyworldyeah, give me a couple of minutes10:02
axwsure10:03
wallyworldaxw: ok now, in our 1:110:11
axwbrt10:11
dimiternfwereade, hey10:23
dimiternfwereade, if you can have a look at http://reviews.vapour.ws/r/643/ - it's a backport to 1.21 of the fix for bug 1397376 you approved yesterday10:24
mupBug #1397376: maas provider: 1.21b3 removes ip from api-endpoints <api> <cloud-installer> <fallout> <landscape> <maas-provider> <juju-core:Fix Committed by dimitern> <juju-core 1.21:In Progress by dimitern> <https://launchpad.net/bugs/1397376>10:24
perrito666morning all10:25
dimiternperrito666, o/10:28
perrito666bug 1402826 seems to be fix committed what is the status tha triggers unlocking + topic change?10:38
mupBug #1402826: 1.21 cannot "add-machine lxc" to 1.18.1 <add-machine> <ci> <lxc> <regression> <juju-core:Fix Committed by menno.smits> <juju-core 1.21:Fix Committed by menno.smits> <https://launchpad.net/bugs/1402826>10:38
jam1perrito666: Fix Released is generally the actual unblock trunk, and it is supposed to only happen after CI actually does a run and sees that the failure isn't present anymore.10:52
jam1(AIUI)10:53
jam1perrito666: given that http://juju-ci.vapour.ws:8080/job/compatibility-control/ is now happy, maybe we can set it to Fixed ?10:54
jam1dimitern: ^^ think you can check into whether trunk is actually unblocked now /11:00
jam1?11:00
dimiternjam1, not sure how except by looking at the bugs - and the only one there is fix committed11:03
anastasiamac_jam1: dimitern: this seems to think that CI is still blocked http://goo.gl/4zd1e911:03
jam1dimitern: I mean just tracking through that the bug really is fixed (looks like it to me from the linked CI test that is now Blue)11:03
jam1anastasiamac_: so trunk *is currently blocked* my point is whether we should unblock it because the thing we said fixed it really did fix it11:04
dimiternjam1, I see the same, so it must be some lag11:05
anastasiamac_thnx for the link, jw4 :)11:06
anastasiamac_jam1: it would be gr8 to unblock CI - it's been a while :)11:07
dimiternmgz_, are you around?11:08
perrito666jam1: I think the topic thing is auto, isn't it?11:58
dimiternperrito666, nope it's not11:59
dimiternperrito666, I *did* try to set it via chanserv but it doesn't work like it used to and I get no error; trying to set it directly blows with "you're not a channel operator"12:00
=== ChanServ changed the topic of #juju-dev to: The topic for irc://irc.freenode.net:6697/#juju-dev is: https://juju.ubuntu.com/ | On-call reviewer: see calendar | Open critical bugs:
=== ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: see calendar | Report bugs: https://bugs.launchpad.net/juju-core/
dimiternthere!12:04
jam1dimitern: worked for me12:04
jam1I used "/msg chanserv help" and "/msg chanserv help topic" to sort it out12:04
dimiternyeah, I was doing "topic #juju-core" not #juju-dev :)12:04
jam1ah, yeah12:04
perrito666mm, merge bot does not thing the same12:08
jam1perrito666:12:16
jam1so, the way the bot works, (AIUI) is it checks that this launchpad search has no items: https://bugs.launchpad.net/juju-core/+bugs?field.searchtext=&orderby=-importance&field.status%3Alist=NEW&field.status%3Alist=CONFIRMED&field.status%3Alist=TRIAGED&field.status%3Alist=INPROGRESS&field.status%3Alist=FIXCOMMITTED&field.status%3Alist=INCOMPLETE_WITH_RESPONSE&field.status%3Alist=INCOMPLETE_WITHOUT_RESPONSE&field.importance%3Alist=CRITICAL&field.tag=ci+regression+&12:16
jam1since we still have the one as FixCommitted, it still rejects it12:17
jam1perrito666: so my statement is, if you're confident that bug is fixed, mark it fix released12:19
perrito666jam1: ah, I see I always mix those up12:20
=== ev_ is now known as ev
jam1perrito666: well it did change onece12:21
=== meetingology` is now known as meetingology
perrito666as more and more kids enter summer vacation period the internet at home worsens12:24
dimiternperrito666, would you mind reviewing this http://reviews.vapour.ws/r/643/ - it's a straight backport of a fix for bug 1397376 which was already approved by fwereade and landed on master yesterday12:34
mupBug #1397376: maas provider: 1.21b3 removes ip from api-endpoints <api> <cloud-installer> <fallout> <landscape> <maas-provider> <juju-core:Fix Committed by dimitern> <juju-core 1.21:In Progress by dimitern> <https://launchpad.net/bugs/1397376>12:34
perrito666dimitern: lgtmd but I am not a senior review dude12:57
dimiternperrito666, np, thanks anyway :)12:57
dimiternTheMue, can you lgtm it as well please? http://reviews.vapour.ws/r/643/12:58
TheMuedimitern: *click*13:02
TheMuedimitern: done, push it ;)13:04
dimiternTheMue, thanks!13:04
TheMueyw13:04
dimiternoh ffs... this is getting ridiculous ERROR: {"message":"API rate limit exceeded for jujubot.","documentation_url":"https://developer.github.com/v3/#rate-limiting"} Finished: FAILURE13:59
perrito666we maxed out our github quota?14:00
mgz_hm, is that the landing side or the reviewboard side?14:02
dimiternvoidspace, maas meeting?14:05
sinzuidimitern, what is blocking the backport of the https://bugs.launchpad.net/juju-core/+bug/1397376 to 1.21-beta4?14:17
mupBug #1397376: maas provider: 1.21b3 removes ip from api-endpoints <api> <cloud-installer> <fallout> <landscape> <maas-provider> <juju-core:Fix Committed by dimitern> <juju-core 1.21:In Progress by dimitern> <https://launchpad.net/bugs/1397376>14:17
perrito666sinzui: https://github.com/juju/juju/pull/132414:20
perrito666Seems that dimitern got a funky error14:20
perrito666<dimitern> oh ffs... this is getting ridiculous ERROR: {"message":"API rate limit exceeded for jujubot.","documentation_url":"https://developer.github.com/v3/#rate-limiting"} Finished: FAILURE14:20
jw4sinzui: who has to mark https://bugs.launchpad.net/juju-core/+bug/1402826 as fix released? menn0?14:20
mupBug #1402826: 1.21 cannot "add-machine lxc" to 1.18.1 <add-machine> <ci> <lxc> <regression> <juju-core:Fix Committed by menno.smits> <juju-core 1.21:Fix Committed by menno.smits> <https://launchpad.net/bugs/1402826>14:20
sinzuijw4, anyone who can verify the fix/test passes14:21
jw4sinzui: I see14:21
sinzuijw4, I will get to this in about 15 minutes. I need to sort out the block on beta4 first14:22
jw4sinzui: feeling like a one armed paper hanger?14:22
sinzuiyes14:22
dimiternsinzui, in a meeting, sorry14:23
sinzuiperrito666, dimitern looks like the job wrongly failed because the test couldn't clean up, not because of juju.14:23
sinzuidimitern, perrito666 I will clean up aws, make the job not lie, and send the merge back for merging14:23
jw4sinzui: fwiw the failing test http://juju-ci.vapour.ws:8080/job/compatibility-control/859/ is now passing http://juju-ci.vapour.ws:8080/job/compatibility-control/862/14:24
perrito666tx sinzui14:24
sinzuijw4, that is hard to read. the job tests *many* things. Without reading the data, we don't know which versions were under test14:25
perrito666sinzui: do you keep some sort of list of the amount of alcohol owed to you by us?14:25
jw4sinzui: cool - I'll quit bugging you now14:25
mgz_perrito666: please think of his health14:25
sinzuiperrito666, no, I suck at drinking14:25
perrito666sinzui: most people do, at least when using a straw14:25
perrito666mgz_: well I usually pay my teammates debts in beers :p but we can start doing sandwitches and stuff14:26
dimiternsinzui, that will be great14:27
perrito666gsamfira_: coming?14:30
perrito666alexisb: ?14:30
=== katco` is now known as katco
katcosinzui: hey lp has https://bugs.launchpad.net/juju-core/+bug/1402826 as fix committed, what's the state of CI?14:43
mupBug #1402826: 1.21 cannot "add-machine lxc" to 1.18.1 <add-machine> <ci> <lxc> <regression> <juju-core:Fix Committed by menno.smits> <juju-core 1.21:Fix Committed by menno.smits> <https://launchpad.net/bugs/1402826>14:43
katco/14:43
sinzuikatco, I need to retest to verify it is fixed. the test is a weekly test based on weekly streams14:43
katcosinzui: ah ok.. do you expect that anytime soon?14:44
sinzuiYes14:44
katcosinzui: you my friend.. rock!14:44
voidspacedimitern: sorry, was on lunch14:51
voidspacedimitern: didn't know we had one14:51
dimiternvoidspace, no worries, it was mostly about explaining why we need it :) - I'm preparing an agenda/minutes doc and will share later14:54
perrito666ericsnow: wwitzel3 natefinch15:01
wwitzel3perrito666: having issues getting hangouts to load15:02
perrito666I got kicked from the call first attempt15:02
wwitzel3plus.google.com is just timing out for me atm15:04
sinzuiSorry ladies and gentlemen, there is congestion in the merge/ci queue. We first need to unblock dimitern, to start testing beta4, then we can remove the block on master. which is also a block on beta4. We need to kick the landing bot to start moving15:08
perrito666wwitzel3: you broke google15:10
dimiternvoidspace, you should've received a link to the maas call agenda doc15:10
=== ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: see calendar | Open critical bugs: none
sinzuijw4, katco dimitern master is open. Don't all break it at once15:26
katcosinzui: :)15:26
sinzuidimitern, I see an intermittent test failure in the current merge. we will retest since we can see it passed in previous runs15:26
jw4sinzui: haha15:29
dimitern\o/ /o\15:29
dimiterndimitern, is it TestAddRemoveSet  for mongo/replicaset ?15:30
voidspacedimitern: I have, thanks15:35
voidspacedimitern: we should have the call in the calendar15:36
voidspacedimitern: subnet IP tracking only for aws... the trouble with that is that we need state for tracking15:37
voidspacedimitern: and the provider specific stuff doesn't usually have access to state15:38
dimiternvoidspace, it is on the calendar, but not the red one15:38
voidspacedimitern: you mean orange...15:38
voidspacedimitern: or whatever colour you pick...15:38
voidspacedimitern: if it's not on mine or juju-core I won't have it15:38
dimiternvoidspace, :) yeah I guess it can be orange-y15:38
dimiternvoidspace, you do have it on yours though?15:38
voidspacedimitern: we could have a "SupportsSensibleIPAllocation" provider method that returns false for aws15:39
dimiternvoidspace, we need to have a chat some time tomorrow to see how to go about the proposed change15:39
voidspacedimitern: and fallback to address picking15:39
dimiternvoidspace, yeah, that's a possibility15:39
voidspacedimitern: I *do* have it in my claendar15:39
voidspacedimitern: really sorry15:39
voidspacedimitern: I totally missed that :-(15:40
dimiternvoidspace, not to worry - it's fine :)15:40
voidspacedimitern: having two address allocation strategies is a pain15:41
voidspacedimitern: we could just not track unavailable ones15:41
voidspacedimitern: doesn't matter if we retry  them occassionally15:41
dimiternvoidspace, that's a good point15:42
voidspacedimitern: it's probably better not to track them15:42
dimiternvoidspace, unfortunatelly, I have to go soon and can't think it through now15:42
voidspacedimitern: ok chap15:42
voidspacedimitern: see you later/tomorrow15:42
dimiternvoidspace, but will appreciate your ideas on it :)15:42
dimiternvoidspace, cheers15:43
natefinchericsnow: sorry, you have time now?16:03
ericsnowsure16:04
=== kadams54 is now known as kadams54-away
=== sebas538_ is now known as sebas5384
ericsnowwwitzel3: ready? (I'm in moonstone)16:46
ericsnowsinzui: wwitzel3 and I are working on a new provider (GCE) and need to know what needs to be done image-wise (i.e. for simplestreams)16:47
wwitzel3ericsnow: ok, omw16:47
ericsnowsinzui: from what I understand we need some URL where the different images are hosted16:48
ericsnowsinzui: and that URL will be on GCE somewhere?16:48
mgz_ericsnow: you're probably better off talking to ben howard etc about images for gce?16:48
sinzuiericsnow, I don't have any experience16:48
ericsnowsinzui: ah, okay16:48
ericsnowmgz_: k, thanks16:49
mgz_you shouldn't required gce hosted simplestreams to just get started on the provider16:50
mgz_you can provide your own image-metadata-url pointing to anywhere16:51
perrito666gsamfira_: ping me whenever you are available please16:56
sinzuiericsnow, I have no experience with image streams the team in #cloudware know all about it.16:57
ericsnowsinzui: yeah, I'm hitting up utlemming (Ben Howard) about it16:58
ericsnowsinzui: thanks though16:58
sinzuiericsnow, as mgz_ we don't *need* to provide specific streams for a cloud, but it is nice to do. The Juju QA team can create local agent streams once it get credentials to test in the cloud and setup in local storage. if there isn't local storage, we don't need create agent streams17:00
=== kadams54-away is now known as kadams54
ericsnowsinzui: cool.  That would work great.  I'll send you an email just so we can track this better.17:22
ericsnowsinzui: also, what do you mean by "local storage"17:22
ericsnowsinzui: the equivalent of s3 in AWS?17:23
sinzuiericsnow, if the cloud has a storage mechanism like swift, s3, manta, then we can place agents in the cloud for faster delivery17:24
ericsnowsinzui: got it :)17:24
natefinchfwereade: you around?17:28
voidspaceg'night all18:25
perrito666If any OCR is around http://reviews.vapour.ws/r/645/19:42
perrito666if anyone else wants to, also welcome19:42
menn0thumper: bug 1403151 might be of interest to you20:16
mupBug #1403151: local provider stops bootstrapping: "Job is already running" <juju-core:New> <https://launchpad.net/bugs/1403151>20:16
thumperhmm20:16
=== kadams54 is now known as kadams54-away
menn0I sometimes forget how much better modern version control systems are20:43
menn0a file I'd changed in my branch had been moved to another directory by someone else20:43
menn0yet my change still applied cleanly because git knew how the file had moved20:44
ericsnowwhat is appropriate instance root disk size for a juju machine?20:47
ericsnowGCE lets you set whatever disk size you want (as long as the image you use fits in it)20:48
=== _thumper_ is now known as thumper
=== kadams54-away is now known as kadams54
=== ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: see calendar | Open critical bugs: 1403200
sinzuinatefinch, thumper: is someone about to solve bug 1403200 that block merges and the release of 1.21-beta421:10
mupBug #1403200: mass upgrades do no complete <ci> <maas-provider> <regression> <upgrade-juju> <juju-core:Triaged> <juju-core 1.21:Triaged> <https://launchpad.net/bugs/1403200>21:10
menn0sinzui: i've just started looking at the ticket21:11
menn0sinzui: the description in the ticket and the console output i'm seeing don't match21:11
menn0sinzui: the test appears to give up and destroy the env 2 minutes after starting the upgrade21:12
menn0sinzui: can you explain where you're seeing an upgrade still running after 30 mins?21:12
sinzuimenn0, jog ran his own run separate to ensure we got log21:13
menn0sinzui: also - side issue - the test looks like it attempts to capture the logs for after the env has been destroyed and the maas node is turned off21:13
sinzuimenn0, I think jog can help you get to the machine finfolk machines too21:14
sinzuimenn0, ! I just reported a bug about that too21:14
sinzuiIndeed the capture failed, which is why jog ran it himself21:14
menn0sinzui: well it seems like the behaviour jog is seeing when running tests manually is not matching what is happening during the actual CI runs21:15
sinzuimenn0, Our tests exit early because juju client raised an error trying to talk to the server21:15
sinzuimenn0, if not having an api server is not an error, then juju needs to not exit with an error code21:16
menn0sinzui: I don't see that in the console output21:16
sinzuinot all do that that21:16
menn0sinzui: as the upgrade runs it is possible to get a "maintenance in progress" error21:16
menn0sinzui: it could be that21:16
sinzuithis one just timesout after 10 minutes http://juju-ci.vapour.ws:8080/job/maas-upgrade-trusty-amd64/336/console21:16
menn0sinzui: it is short-lived21:16
sinzuiit didn't timeout the previous revision21:16
sinzuimenn0, jog let the env try to upgrade for 30 minutes21:17
menn0sinzui: ok21:18
menn0sinzui: i'm not saying there isn't a problem21:18
sinzuimenn0, I was hoping we could extend the timeout21:18
menn0sinzui: for that run (336) I don't see a 10 minute timeout21:19
menn0sinzui: it looks like 2 minutes to me21:19
menn02014-12-16 20:34:14 INFO juju.cmd supercommand.go:329 command finished21:19
menn01.20.14: 1, 0, 2, dummy-sink/0, dummy-source/0 ....timeout 600.00s juju --show-log destroy-environment maas-upgrade-trusty-amd64 --force -y21:19
menn02014-12-16 20:36:19 INFO juju.cmd supercommand.go:37 running juju [1.20.14-trusty-amd64 gc]21:19
menn02014-12-16 20:36:19 INFO juju.provider.common destroy.go:15 destroying environment "maas-upgrade-trusty-amd64"21:19
menn02014-12-16 20:36:20 INFO juju.cmd supercommand.go:329 command finished21:19
menn020:34:14 to 20:36:1921:19
sinzuimenn0, ah right21:20
sinzuibut the time is set to 20 minutes.21:20
sinzuimenn0, don't focus on that. We need to focus on the logs that jog attached...21:20
sinzuijog join in the conversation, can you get menn0 machines or rerun tests with him to learn why 30 minutes does not complete an ujpgrade21:21
jogsinzui, I think the "1, 0, 2, dummy-sink/0, dummy-source/0" output in the above is from our status check, which gets a non-zero exit code and may fail that job at that point21:21
jogyup, I ran this on my desktop using a few KVM machines21:21
sinzuithe copy remote logs is swallowing the error the juju command returned21:22
jogwe can manually try to reproduce on finfolk so others can get access...21:22
jogmenn0, I still have the env setup if there is anything you would like me to capture or check21:23
menn0jog: I just checked what you've attached to the bug and that look pretty complete thanks21:25
menn0jog: can you send me the steps you used to set up the env and run the test as well?21:25
jogsure21:25
menn0jog: also, where are the logs for machine-1?21:27
jogmachine-1 was added and removed before I tried I deployed by charms and did the upgrade21:27
jogs/I tried I deployed by/I deployed my/21:28
menn0jog: ok np21:30
sinzuimenn0, jog: I just pushed a change to not swallow the command error when we cannot get logs. any reason not to re-run one of the failing jobs?21:38
menn0sinzui: please do21:39
sinzuijog, I think we fail to get logs because the script tried to use the dns address instead of ip21:39
jogsinzui, go ahead I don't think there is anything to capture from there21:39
sinzuilets check back in 20 minutes to see where it is http://juju-ci.vapour.ws:8080/view/Juju%20Revisions/job/maas-upgrade-trusty-amd64/337/console21:40
menn0jog: I think it might help because the attempt to capture logs seems to come after the message about the node being shut down21:40
menn0jog: are you going to email those extra details or attach to the bug?21:40
jogI will attach to the bug21:41
sinzuimenn0, the nodes were shutdown after logs fails21:41
sinzuithe unless juju shutthem down21:41
menn0jog: looking at the logs from your manual run, the reason the upgrade didn't complete is because it didn't really start21:44
menn0jog: machine-0 is "waiting for the other state servers to be ready for upgrade"21:45
menn0jog: were there ever any other state servers in this environment?21:45
menn0jog: do you still have it up? I might ask you to check some things in the MongoDB soon if it is.21:46
joghmm, there should not have been any other state servers and yes it's still up21:46
menn0jog: ok, interesting.21:46
menn0jog: let me just figure out a command for you to run.21:47
menn0jog: ok, let's do some exploring21:49
menn0jog: please run: juju ssh 021:50
jogok21:50
menn0jog: and then once you're in: mongo 127.0.0.1:37017/admin --ssl --username "admin" --password "`sudo grep oldpassword /var/lib/juju/agents/machine-*/agent.conf  | cut -d' ' -f2`"21:50
menn0jog you might need to install the mongo command line tools for that to work (you'll get a hint)21:50
jogmenn0, so I have to log in directly with ssh, since 'juju ssh' rejects the connection 'ERROR login failed - maintenance in progress'21:52
menn0jog: of course.21:53
sinzuiJuju just called upgrade21:54
menn0jog: let me know when you're in and running the mongo shell21:55
jogmenn0, I'm there... had to install the mongo cmd line client first21:56
menn0jog: now: use juju21:56
jogdone21:57
menn0jog: then: db.upgradeInfo.find().pretty()21:57
joghttps://pastebin.canonical.com/122444/21:57
sinzuimenn0, jog. in the console log we can see after upgrade, status was polled a few times by the dots, then status timed out after 2 minutes because there was nothing to talk too21:58
sinzuimenn0, Should the state-server not me available for more than two minutes during an upgrade?21:59
* sinzui prefers to prove the tests are wrong so that he can get a blessed revision21:59
menn0the state server should be available but you can expect it to return "maintenance in progress" for a little bit during the upgrade22:00
menn0I would have thought 2 minutes would have been long enough though22:00
menn0but I don't know how fast this hardware is22:00
menn0waiting 5 mins might be better22:00
menn0jog: now this: db.machines.find({}, {jobs:1, life:1})22:01
jog{ "_id" : "3", "jobs" : [  1 ], "life" : 0 }22:01
jog{ "_id" : "2", "jobs" : [  1 ], "life" : 0 }22:01
jog{ "_id" : "0", "jobs" : [  2,  1 ], "life" : 0 }22:01
menn0jog: and db.instanceData.find({}, {"machineid": 1, "instanceid": 1, "env-uuid":1})22:03
jog{ "_id" : "2", "instanceid" : "/MAAS/api/1.0/nodes/node-63cfc508-3719-11e4-8b0a-52540018f567/" }22:03
jog{ "_id" : "3", "instanceid" : "/MAAS/api/1.0/nodes/node-640f98fe-3719-11e4-821a-52540018f567/" }22:03
jog{ "_id" : "0", "instanceid" : "/MAAS/api/1.0/nodes/node-63cebcda-3719-11e4-821a-52540018f567/" }22:03
menn0jog: thanks this is all very helpful22:04
menn0jog: let me have a think and a hunt through the code22:04
jognp22:04
thumpermenn0: ideas?22:04
menn0jog: so far everything looks correct22:04
jw4thumper: trivial typo maybe? http://reviews.vapour.ws/r/646/22:05
menn0thumper, jog: it appears that for some reason the state server thinks it had to wait for other state servers to signal they were ready for the upgrade, but there are no other state servers22:05
thumperhmm... interesting22:06
=== kadams54 is now known as kadams54-away
sinzuithe state-server has voices in its head or is so lonely it is thinks it has friends22:09
sinzuimenn0, I have a 5 minute timeout in place. Do you want me to retest?22:09
menn0sinzui: might as well22:10
jw4tx waigani22:12
menn0jog: can you pls also run: db.stateServers.find({_id: "e"}, {machineids: 1, votingmachineids:1, "env-uuid":1})22:14
jog{ "_id" : "e", "machineids" : [  "0" ], "votingmachineids" : [  "0" ] }22:14
=== kadams54-away is now known as kadams54
menn0jog: also: db.instanceData.find({"env-uuid": {"$exists": true}}).count()22:17
jog022:17
menn0jog: db.instanceData.find({_id: {"$in": ["0"]}})22:19
jog{ "_id" : "0", "arch" : "amd64", "cpucores" : NumberLong(1), "instanceid" : "/MAAS/api/1.0/nodes/node-63cebcda-3719-11e4-821a-52540018f567/", "mem" : NumberLong(2048), "tags" : [  "virtual" ], "txn-queue" : [  "5490714b164546420f00000a_1cb1133b",  "5490910816454643c3000002_02253369" ], "txn-revno" : NumberLong(2) }22:20
menn0jog: hmm everything looks normal22:21
menn0jog: I can't see why the upgrade didn't proceee22:21
menn0proceed22:21
menn0jog: let me try and repro locally using your instructions.22:22
menn0jog: pls leave the env up if you can22:22
jogmenn0, I've tried with LXCs and don't see the same issue22:22
menn0jog: interesting22:23
jogsinzui, should we pause the CI jobs and I can use the finfolk MaaS to reproduce?22:23
sinzuijog, build-revision is disabled...I already paused it22:24
jogsinzui, ok I meant the retry of jobs from Jenkins... I can try manual steps to bring it up so the machines stay around long enough to poke around22:25
sinzuijog, that is what I am doing right22:26
menn0jog: I haven't used maas before. is there a good guide for getting it working under kvm?22:32
menn0jog: i've found several22:32
jogmenn0, we have a doc the describes how we setup our env but it's not really trivial... sinzui, do you think we can give access to finfolk?22:34
* jog has a hard stop to pick up kids from school and will be back in about 45 minutes.22:35
sinzuijog, menn0 I think IS already gave everyone in canonical access to finfolk22:36
sinzuijog, but I don't see menn0  in /home.22:37
sinzuimenn0, I think you need the ssh rules to get to the machine, as you have cloud-city you have the keys to use the gateway we setup.22:38
menn0sinzui: ok, let me try22:38
sinzuijog, maybe i should abort this current test of maas upgrade. I think it has been stalled for 15 minutes22:43
sinzuimenn0, jog: I terminated the job using the -HUP, but that didn't call cleanup. We might have a dirty maas now :(22:52
=== kadams54 is now known as kadams54-away
=== kadams54-away is now known as kadams54
ericsnowwallyworld: you have a few minutes?23:05
=== kadams54 is now known as kadams54-away
joghi menn0, sinzui, I'm back23:25
menn0jog: hi again23:26
menn0jog: i'm on finfolk now23:26
menn0jog: I can repro the problem23:26
joggreat!23:26
menn0jog: just uploading a instrumented version of juju now23:27
menn0jog: hopefully the extra logging will tell me something23:27
menn0jog, sinzui: so the issue is that the state server doesn't think it's the master!23:35
menn0jog, sinzui: I've never seen this before23:35
* menn0 continues digging23:35
sinzuiIt is a modest state-server23:35
jogheh sinzui, I was thinking the same thing23:35
menn0when jujud comes up on 1.20 mongo tells the state server it's the master23:39
menn0after rebooting into 1.21 mongo tells the state server it's no longer the master23:40
menn0yet if I use the shell, mongodb says that the instance on machine-0 is the primary/master23:41
menn0so something in juju must be getting it wrong23:41
menn0but why only on maas...23:42
menn0moar digging23:42
menn0sinzui, jog: these tests stopped working for both master and 1.21 when dimiter's fix for bug 1397376 when in23:46
mupBug #1397376: maas provider: 1.21b3 removes ip from api-endpoints <api> <cloud-installer> <fallout> <landscape> <maas-provider> <juju-core:Fix Committed by dimitern> <juju-core 1.21:Fix Committed by dimitern> <https://launchpad.net/bugs/1397376>23:46
menn0looking at that now23:48
ericsnowwallyworld: ping23:49
katcoericsnow: i think he is off today23:49
ericsnowkatco: k, thanks23:49
katcoericsnow: np23:49
jogmenn0, sinzui, and timing of that commit was when we had our MaaS down for upgrading with the CI jobs just getting re-enabled after the weekend :(23:53
menn0jog, sinzui: i'm beginning to see how that commit could cause the behaviour we're seeing23:56
menn0jog, sinzui: adding more logging23:56
sinzui:)23:57
menn0jog, sinzui: with this release the isMaster check always returns false on maas23:57
menn0jog, sinzui: upgrades not completing is just fallout from that23:58
jog:)23:58
=== kadams54 is now known as kadams54-away

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!