[00:16] time to go and make another coffee === kadams54-away is now known as kadams54 === kadams54 is now known as kadams54-away [00:36] wallyworld: that's done. it ended being a standalone function in the envcmd package (it doesn't need anything on EnvCommandBase. [00:36] wallyworld: PTA quick L [00:36] sure, ty === kadams54-away is now known as kadams54 [00:38] menn0: thanks, looks good, and i do like where the helper is now located [00:39] wallyworld: great. merging! [00:43] \o/ [01:05] anastasiamac_: hello on call reviewer :-) http://reviews.vapour.ws/r/639/ [01:06] rick_h_: that one is for you ^^^ [01:17] thumper: sadly anastasia is ill today [01:17] axw: standup? [01:17] wallyworld: oh no... stuff is going around [01:18] wallyworld: sorry, omw [01:18] wallyworld: could you take a look? [01:18] wallyworld: it is very small [01:18] thumper: sure, after standup whivch is now [01:18] cheers [01:37] menn0: great that that bug is fix committed, now we just gotta wait for CI to be unblocked [01:38] wallyworld: yep. am backporting to 1.21 now [01:42] thumper: review done [01:42] wallyworld: cheers [01:43] wallyworld: valid comments, will fix when I'm not in the middle of another branch :) [01:43] sure :-) [02:06] thumper: i have a tiny one for you when you get a chance http://reviews.vapour.ws/r/640/ [02:08] wallyworld: have you tested it? [02:08] thumper: doing it now on local and ec2 [02:09] thumper: i wish we could change the primary target for a bug - i don't think we can, can we? [02:09] yes [02:10] like, for ages [02:10] i forget [02:10] i want to target to juju-core as primary [02:10] then i can have targets for 1.22 and 1.21 [02:10] maybe i need to delete juju-core first [02:10] and then reassign [02:11] what do you mean? [02:11] all bug tasks are considered equal [02:12] https://bugs.launchpad.net/juju-core/+bug/1328958 [02:12] Bug #1328958: Local provider fails when an unrelated /home/ubuntu directory exists [02:12] thumper: soon that bug, i can't add a 1.21 series [02:12] as the primary target is the distro [02:13] unless i'm stupid, which is a distinct possibility [02:13] wallyworld: is that what you wanted? [02:13] wallyworld: look at the bug [02:14] thumper: yes, so how did you change the primary target? [02:14] you need to get to that bug through the project, personally, I just hacked the url [02:14] ah, yes, you are right [02:15] i forgot that [02:26] thumper :) === kadams54_ is now known as kadams54-away === kadams54-away is now known as kadams54_ [03:35] menn0: ugh... having to deal with environment specific collections in a non-normal way [03:35] menn0: which means hoop jumping [03:35] and likely to clash with outstanding work... [03:35] fooey [03:56] menn0: do you want getRawCollection()? [04:02] thumper: ^^^ (did it again) [04:03] menn0: i was to remove the legacy fields in MachineStatus in api client in master - i think that's ok, rught? [04:05] menn0: nah, in the end I went with state.ForEnviron so I could use the easy method of getting the info :-) [04:11] wallyworld: hmmm [04:12] wallyworld: if we want old clients to continue working those fields have to stay [04:12] maybe we can't if we must retain compatability with 1.20 [04:12] wallyworld: 1.20 should be fine [04:12] but i thought we only needed compatability with 1.28 [04:12] wallyworld: but pre 1.19 won't work [04:12] 1.18 [04:13] wallyworld: exactly [04:13] wallyworld: a 1.18 client will still look for those fields [04:13] hmmm, yeah, i t will :-( [04:13] damn [04:14] wallyworld: when I wrote that comment I was under the impression that we could stop worry about compatibility after 2 versions or something [04:14] i'll fix the comment [04:14] wallyworld: the fields could be removed now that we have API versioning though [04:14] wallyworld: as suggested by the comment [04:16] hmmm, but the server side call still needs to be there for older clients [04:16] so i'll have to leave the processAgent() logic alone [04:16] which is where i'm currently looking [04:16] wallyworld: yep. there would need to be 2 versions of the struct and 2 code paths [04:17] might let sleeping dogs lie for now === fuzzy_ is now known as Ponyo === kadams54_ is now known as kadams54-away [05:20] morning folks [06:02] axw: wallyworld: greetings [06:03] thought I'd try to catch you guys before we started for the day [06:06] jam1: hi [06:10] jam1: hiya [06:19] jam1: let us know if you want to chat or whatever, maybe you've already started [07:35] fwereade: meeting? [08:16] wallyworld: we've started, though we're a bit hit and miss on internet connectivity, you should be seeing updates/responses in the doc [08:17] ok, ty [08:17] axw: ^^^^^ [08:17] yep I have been following updates [08:17] thanks [08:17] fwereade: you around for our 1:1? [08:29] morning [09:11] morning TheMue [09:11] dimitern: o/ [10:02] wallyworld: did you want to catch up still? [10:02] yeah, give me a couple of minutes [10:03] sure [10:11] axw: ok now, in our 1:1 [10:11] brt [10:23] fwereade, hey [10:24] fwereade, if you can have a look at http://reviews.vapour.ws/r/643/ - it's a backport to 1.21 of the fix for bug 1397376 you approved yesterday [10:24] Bug #1397376: maas provider: 1.21b3 removes ip from api-endpoints [10:25] morning all [10:28] perrito666, o/ [10:38] bug 1402826 seems to be fix committed what is the status tha triggers unlocking + topic change? [10:38] Bug #1402826: 1.21 cannot "add-machine lxc" to 1.18.1 [10:52] perrito666: Fix Released is generally the actual unblock trunk, and it is supposed to only happen after CI actually does a run and sees that the failure isn't present anymore. [10:53] (AIUI) [10:54] perrito666: given that http://juju-ci.vapour.ws:8080/job/compatibility-control/ is now happy, maybe we can set it to Fixed ? [11:00] dimitern: ^^ think you can check into whether trunk is actually unblocked now / [11:00] ? [11:03] jam1, not sure how except by looking at the bugs - and the only one there is fix committed [11:03] jam1: dimitern: this seems to think that CI is still blocked http://goo.gl/4zd1e9 [11:03] dimitern: I mean just tracking through that the bug really is fixed (looks like it to me from the linked CI test that is now Blue) [11:04] anastasiamac_: so trunk *is currently blocked* my point is whether we should unblock it because the thing we said fixed it really did fix it [11:05] jam1, I see the same, so it must be some lag [11:06] thnx for the link, jw4 :) [11:07] jam1: it would be gr8 to unblock CI - it's been a while :) [11:08] mgz_, are you around? [11:58] jam1: I think the topic thing is auto, isn't it? [11:59] perrito666, nope it's not [12:00] perrito666, I *did* try to set it via chanserv but it doesn't work like it used to and I get no error; trying to set it directly blows with "you're not a channel operator" === ChanServ changed the topic of #juju-dev to: The topic for irc://irc.freenode.net:6697/#juju-dev is: https://juju.ubuntu.com/ | On-call reviewer: see calendar | Open critical bugs: === ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: see calendar | Report bugs: https://bugs.launchpad.net/juju-core/ [12:04] there! [12:04] dimitern: worked for me [12:04] I used "/msg chanserv help" and "/msg chanserv help topic" to sort it out [12:04] yeah, I was doing "topic #juju-core" not #juju-dev :) [12:04] ah, yeah [12:08] mm, merge bot does not thing the same [12:16] perrito666: [12:16] so, the way the bot works, (AIUI) is it checks that this launchpad search has no items: https://bugs.launchpad.net/juju-core/+bugs?field.searchtext=&orderby=-importance&field.status%3Alist=NEW&field.status%3Alist=CONFIRMED&field.status%3Alist=TRIAGED&field.status%3Alist=INPROGRESS&field.status%3Alist=FIXCOMMITTED&field.status%3Alist=INCOMPLETE_WITH_RESPONSE&field.status%3Alist=INCOMPLETE_WITHOUT_RESPONSE&field.importance%3Alist=CRITICAL&field.tag=ci+regression+& [12:17] since we still have the one as FixCommitted, it still rejects it [12:19] perrito666: so my statement is, if you're confident that bug is fixed, mark it fix released [12:20] jam1: ah, I see I always mix those up === ev_ is now known as ev [12:21] perrito666: well it did change onece === meetingology` is now known as meetingology [12:24] as more and more kids enter summer vacation period the internet at home worsens [12:34] perrito666, would you mind reviewing this http://reviews.vapour.ws/r/643/ - it's a straight backport of a fix for bug 1397376 which was already approved by fwereade and landed on master yesterday [12:34] Bug #1397376: maas provider: 1.21b3 removes ip from api-endpoints [12:57] dimitern: lgtmd but I am not a senior review dude [12:57] perrito666, np, thanks anyway :) [12:58] TheMue, can you lgtm it as well please? http://reviews.vapour.ws/r/643/ [13:02] dimitern: *click* [13:04] dimitern: done, push it ;) [13:04] TheMue, thanks! [13:04] yw [13:59] oh ffs... this is getting ridiculous ERROR: {"message":"API rate limit exceeded for jujubot.","documentation_url":"https://developer.github.com/v3/#rate-limiting"} Finished: FAILURE [14:00] we maxed out our github quota? [14:02] hm, is that the landing side or the reviewboard side? [14:05] voidspace, maas meeting? [14:17] dimitern, what is blocking the backport of the https://bugs.launchpad.net/juju-core/+bug/1397376 to 1.21-beta4? [14:17] Bug #1397376: maas provider: 1.21b3 removes ip from api-endpoints [14:20] sinzui: https://github.com/juju/juju/pull/1324 [14:20] Seems that dimitern got a funky error [14:20] oh ffs... this is getting ridiculous ERROR: {"message":"API rate limit exceeded for jujubot.","documentation_url":"https://developer.github.com/v3/#rate-limiting"} Finished: FAILURE [14:20] sinzui: who has to mark https://bugs.launchpad.net/juju-core/+bug/1402826 as fix released? menn0? [14:20] Bug #1402826: 1.21 cannot "add-machine lxc" to 1.18.1 [14:21] jw4, anyone who can verify the fix/test passes [14:21] sinzui: I see [14:22] jw4, I will get to this in about 15 minutes. I need to sort out the block on beta4 first [14:22] sinzui: feeling like a one armed paper hanger? [14:22] yes [14:23] sinzui, in a meeting, sorry [14:23] perrito666, dimitern looks like the job wrongly failed because the test couldn't clean up, not because of juju. [14:23] dimitern, perrito666 I will clean up aws, make the job not lie, and send the merge back for merging [14:24] sinzui: fwiw the failing test http://juju-ci.vapour.ws:8080/job/compatibility-control/859/ is now passing http://juju-ci.vapour.ws:8080/job/compatibility-control/862/ [14:24] tx sinzui [14:25] jw4, that is hard to read. the job tests *many* things. Without reading the data, we don't know which versions were under test [14:25] sinzui: do you keep some sort of list of the amount of alcohol owed to you by us? [14:25] sinzui: cool - I'll quit bugging you now [14:25] perrito666: please think of his health [14:25] perrito666, no, I suck at drinking [14:25] sinzui: most people do, at least when using a straw [14:26] mgz_: well I usually pay my teammates debts in beers :p but we can start doing sandwitches and stuff [14:27] sinzui, that will be great [14:30] gsamfira_: coming? [14:30] alexisb: ? === katco` is now known as katco [14:43] sinzui: hey lp has https://bugs.launchpad.net/juju-core/+bug/1402826 as fix committed, what's the state of CI? [14:43] Bug #1402826: 1.21 cannot "add-machine lxc" to 1.18.1 [14:43] / [14:43] katco, I need to retest to verify it is fixed. the test is a weekly test based on weekly streams [14:44] sinzui: ah ok.. do you expect that anytime soon? [14:44] Yes [14:44] sinzui: you my friend.. rock! [14:51] dimitern: sorry, was on lunch [14:51] dimitern: didn't know we had one [14:54] voidspace, no worries, it was mostly about explaining why we need it :) - I'm preparing an agenda/minutes doc and will share later [15:01] ericsnow: wwitzel3 natefinch [15:02] perrito666: having issues getting hangouts to load [15:02] I got kicked from the call first attempt [15:04] plus.google.com is just timing out for me atm [15:08] Sorry ladies and gentlemen, there is congestion in the merge/ci queue. We first need to unblock dimitern, to start testing beta4, then we can remove the block on master. which is also a block on beta4. We need to kick the landing bot to start moving [15:10] wwitzel3: you broke google [15:10] voidspace, you should've received a link to the maas call agenda doc === ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: see calendar | Open critical bugs: none [15:26] jw4, katco dimitern master is open. Don't all break it at once [15:26] sinzui: :) [15:26] dimitern, I see an intermittent test failure in the current merge. we will retest since we can see it passed in previous runs [15:29] sinzui: haha [15:29] \o/ /o\ [15:30] dimitern, is it TestAddRemoveSet for mongo/replicaset ? [15:35] dimitern: I have, thanks [15:36] dimitern: we should have the call in the calendar [15:37] dimitern: subnet IP tracking only for aws... the trouble with that is that we need state for tracking [15:38] dimitern: and the provider specific stuff doesn't usually have access to state [15:38] voidspace, it is on the calendar, but not the red one [15:38] dimitern: you mean orange... [15:38] dimitern: or whatever colour you pick... [15:38] dimitern: if it's not on mine or juju-core I won't have it [15:38] voidspace, :) yeah I guess it can be orange-y [15:38] voidspace, you do have it on yours though? [15:39] dimitern: we could have a "SupportsSensibleIPAllocation" provider method that returns false for aws [15:39] voidspace, we need to have a chat some time tomorrow to see how to go about the proposed change [15:39] dimitern: and fallback to address picking [15:39] voidspace, yeah, that's a possibility [15:39] dimitern: I *do* have it in my claendar [15:39] dimitern: really sorry [15:40] dimitern: I totally missed that :-( [15:40] voidspace, not to worry - it's fine :) [15:41] dimitern: having two address allocation strategies is a pain [15:41] dimitern: we could just not track unavailable ones [15:41] dimitern: doesn't matter if we retry them occassionally [15:42] voidspace, that's a good point [15:42] dimitern: it's probably better not to track them [15:42] voidspace, unfortunatelly, I have to go soon and can't think it through now [15:42] dimitern: ok chap [15:42] dimitern: see you later/tomorrow [15:42] voidspace, but will appreciate your ideas on it :) [15:43] voidspace, cheers [16:03] ericsnow: sorry, you have time now? [16:04] sure === kadams54 is now known as kadams54-away === sebas538_ is now known as sebas5384 [16:46] wwitzel3: ready? (I'm in moonstone) [16:47] sinzui: wwitzel3 and I are working on a new provider (GCE) and need to know what needs to be done image-wise (i.e. for simplestreams) [16:47] ericsnow: ok, omw [16:48] sinzui: from what I understand we need some URL where the different images are hosted [16:48] sinzui: and that URL will be on GCE somewhere? [16:48] ericsnow: you're probably better off talking to ben howard etc about images for gce? [16:48] ericsnow, I don't have any experience [16:48] sinzui: ah, okay [16:49] mgz_: k, thanks [16:50] you shouldn't required gce hosted simplestreams to just get started on the provider [16:51] you can provide your own image-metadata-url pointing to anywhere [16:56] gsamfira_: ping me whenever you are available please [16:57] ericsnow, I have no experience with image streams the team in #cloudware know all about it. [16:58] sinzui: yeah, I'm hitting up utlemming (Ben Howard) about it [16:58] sinzui: thanks though [17:00] ericsnow, as mgz_ we don't *need* to provide specific streams for a cloud, but it is nice to do. The Juju QA team can create local agent streams once it get credentials to test in the cloud and setup in local storage. if there isn't local storage, we don't need create agent streams === kadams54-away is now known as kadams54 [17:22] sinzui: cool. That would work great. I'll send you an email just so we can track this better. [17:22] sinzui: also, what do you mean by "local storage" [17:23] sinzui: the equivalent of s3 in AWS? [17:24] ericsnow, if the cloud has a storage mechanism like swift, s3, manta, then we can place agents in the cloud for faster delivery [17:24] sinzui: got it :) [17:28] fwereade: you around? [18:25] g'night all [19:42] If any OCR is around http://reviews.vapour.ws/r/645/ [19:42] if anyone else wants to, also welcome [20:16] thumper: bug 1403151 might be of interest to you [20:16] Bug #1403151: local provider stops bootstrapping: "Job is already running" [20:16] hmm === kadams54 is now known as kadams54-away [20:43] I sometimes forget how much better modern version control systems are [20:43] a file I'd changed in my branch had been moved to another directory by someone else [20:44] yet my change still applied cleanly because git knew how the file had moved [20:47] what is appropriate instance root disk size for a juju machine? [20:48] GCE lets you set whatever disk size you want (as long as the image you use fits in it) === _thumper_ is now known as thumper === kadams54-away is now known as kadams54 === ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: see calendar | Open critical bugs: 1403200 [21:10] natefinch, thumper: is someone about to solve bug 1403200 that block merges and the release of 1.21-beta4 [21:10] Bug #1403200: mass upgrades do no complete [21:11] sinzui: i've just started looking at the ticket [21:11] sinzui: the description in the ticket and the console output i'm seeing don't match [21:12] sinzui: the test appears to give up and destroy the env 2 minutes after starting the upgrade [21:12] sinzui: can you explain where you're seeing an upgrade still running after 30 mins? [21:13] menn0, jog ran his own run separate to ensure we got log [21:13] sinzui: also - side issue - the test looks like it attempts to capture the logs for after the env has been destroyed and the maas node is turned off [21:14] menn0, I think jog can help you get to the machine finfolk machines too [21:14] menn0, ! I just reported a bug about that too [21:14] Indeed the capture failed, which is why jog ran it himself [21:15] sinzui: well it seems like the behaviour jog is seeing when running tests manually is not matching what is happening during the actual CI runs [21:15] menn0, Our tests exit early because juju client raised an error trying to talk to the server [21:16] menn0, if not having an api server is not an error, then juju needs to not exit with an error code [21:16] sinzui: I don't see that in the console output [21:16] not all do that that [21:16] sinzui: as the upgrade runs it is possible to get a "maintenance in progress" error [21:16] sinzui: it could be that [21:16] this one just timesout after 10 minutes http://juju-ci.vapour.ws:8080/job/maas-upgrade-trusty-amd64/336/console [21:16] sinzui: it is short-lived [21:16] it didn't timeout the previous revision [21:17] menn0, jog let the env try to upgrade for 30 minutes [21:18] sinzui: ok [21:18] sinzui: i'm not saying there isn't a problem [21:18] menn0, I was hoping we could extend the timeout [21:19] sinzui: for that run (336) I don't see a 10 minute timeout [21:19] sinzui: it looks like 2 minutes to me [21:19] 2014-12-16 20:34:14 INFO juju.cmd supercommand.go:329 command finished [21:19] 1.20.14: 1, 0, 2, dummy-sink/0, dummy-source/0 ....timeout 600.00s juju --show-log destroy-environment maas-upgrade-trusty-amd64 --force -y [21:19] 2014-12-16 20:36:19 INFO juju.cmd supercommand.go:37 running juju [1.20.14-trusty-amd64 gc] [21:19] 2014-12-16 20:36:19 INFO juju.provider.common destroy.go:15 destroying environment "maas-upgrade-trusty-amd64" [21:19] 2014-12-16 20:36:20 INFO juju.cmd supercommand.go:329 command finished [21:19] 20:34:14 to 20:36:19 [21:20] menn0, ah right [21:20] but the time is set to 20 minutes. [21:20] menn0, don't focus on that. We need to focus on the logs that jog attached... [21:21] jog join in the conversation, can you get menn0 machines or rerun tests with him to learn why 30 minutes does not complete an ujpgrade [21:21] sinzui, I think the "1, 0, 2, dummy-sink/0, dummy-source/0" output in the above is from our status check, which gets a non-zero exit code and may fail that job at that point [21:21] yup, I ran this on my desktop using a few KVM machines [21:22] the copy remote logs is swallowing the error the juju command returned [21:22] we can manually try to reproduce on finfolk so others can get access... [21:23] menn0, I still have the env setup if there is anything you would like me to capture or check [21:25] jog: I just checked what you've attached to the bug and that look pretty complete thanks [21:25] jog: can you send me the steps you used to set up the env and run the test as well? [21:25] sure [21:27] jog: also, where are the logs for machine-1? [21:27] machine-1 was added and removed before I tried I deployed by charms and did the upgrade [21:28] s/I tried I deployed by/I deployed my/ [21:30] jog: ok np [21:38] menn0, jog: I just pushed a change to not swallow the command error when we cannot get logs. any reason not to re-run one of the failing jobs? [21:39] sinzui: please do [21:39] jog, I think we fail to get logs because the script tried to use the dns address instead of ip [21:39] sinzui, go ahead I don't think there is anything to capture from there [21:40] lets check back in 20 minutes to see where it is http://juju-ci.vapour.ws:8080/view/Juju%20Revisions/job/maas-upgrade-trusty-amd64/337/console [21:40] jog: I think it might help because the attempt to capture logs seems to come after the message about the node being shut down [21:40] jog: are you going to email those extra details or attach to the bug? [21:41] I will attach to the bug [21:41] menn0, the nodes were shutdown after logs fails [21:41] the unless juju shutthem down [21:44] jog: looking at the logs from your manual run, the reason the upgrade didn't complete is because it didn't really start [21:45] jog: machine-0 is "waiting for the other state servers to be ready for upgrade" [21:45] jog: were there ever any other state servers in this environment? [21:46] jog: do you still have it up? I might ask you to check some things in the MongoDB soon if it is. [21:46] hmm, there should not have been any other state servers and yes it's still up [21:46] jog: ok, interesting. [21:47] jog: let me just figure out a command for you to run. [21:49] jog: ok, let's do some exploring [21:50] jog: please run: juju ssh 0 [21:50] ok [21:50] jog: and then once you're in: mongo 127.0.0.1:37017/admin --ssl --username "admin" --password "`sudo grep oldpassword /var/lib/juju/agents/machine-*/agent.conf | cut -d' ' -f2`" [21:50] jog you might need to install the mongo command line tools for that to work (you'll get a hint) [21:52] menn0, so I have to log in directly with ssh, since 'juju ssh' rejects the connection 'ERROR login failed - maintenance in progress' [21:53] jog: of course. [21:54] Juju just called upgrade [21:55] jog: let me know when you're in and running the mongo shell [21:56] menn0, I'm there... had to install the mongo cmd line client first [21:56] jog: now: use juju [21:57] done [21:57] jog: then: db.upgradeInfo.find().pretty() [21:57] https://pastebin.canonical.com/122444/ [21:58] menn0, jog. in the console log we can see after upgrade, status was polled a few times by the dots, then status timed out after 2 minutes because there was nothing to talk too [21:59] menn0, Should the state-server not me available for more than two minutes during an upgrade? [21:59] * sinzui prefers to prove the tests are wrong so that he can get a blessed revision [22:00] the state server should be available but you can expect it to return "maintenance in progress" for a little bit during the upgrade [22:00] I would have thought 2 minutes would have been long enough though [22:00] but I don't know how fast this hardware is [22:00] waiting 5 mins might be better [22:01] jog: now this: db.machines.find({}, {jobs:1, life:1}) [22:01] { "_id" : "3", "jobs" : [ 1 ], "life" : 0 } [22:01] { "_id" : "2", "jobs" : [ 1 ], "life" : 0 } [22:01] { "_id" : "0", "jobs" : [ 2, 1 ], "life" : 0 } [22:03] jog: and db.instanceData.find({}, {"machineid": 1, "instanceid": 1, "env-uuid":1}) [22:03] { "_id" : "2", "instanceid" : "/MAAS/api/1.0/nodes/node-63cfc508-3719-11e4-8b0a-52540018f567/" } [22:03] { "_id" : "3", "instanceid" : "/MAAS/api/1.0/nodes/node-640f98fe-3719-11e4-821a-52540018f567/" } [22:03] { "_id" : "0", "instanceid" : "/MAAS/api/1.0/nodes/node-63cebcda-3719-11e4-821a-52540018f567/" } [22:04] jog: thanks this is all very helpful [22:04] jog: let me have a think and a hunt through the code [22:04] np [22:04] menn0: ideas? [22:04] jog: so far everything looks correct [22:05] thumper: trivial typo maybe? http://reviews.vapour.ws/r/646/ [22:05] thumper, jog: it appears that for some reason the state server thinks it had to wait for other state servers to signal they were ready for the upgrade, but there are no other state servers [22:06] hmm... interesting === kadams54 is now known as kadams54-away [22:09] the state-server has voices in its head or is so lonely it is thinks it has friends [22:09] menn0, I have a 5 minute timeout in place. Do you want me to retest? [22:10] sinzui: might as well [22:12] tx waigani [22:14] jog: can you pls also run: db.stateServers.find({_id: "e"}, {machineids: 1, votingmachineids:1, "env-uuid":1}) [22:14] { "_id" : "e", "machineids" : [ "0" ], "votingmachineids" : [ "0" ] } === kadams54-away is now known as kadams54 [22:17] jog: also: db.instanceData.find({"env-uuid": {"$exists": true}}).count() [22:17] 0 [22:19] jog: db.instanceData.find({_id: {"$in": ["0"]}}) [22:20] { "_id" : "0", "arch" : "amd64", "cpucores" : NumberLong(1), "instanceid" : "/MAAS/api/1.0/nodes/node-63cebcda-3719-11e4-821a-52540018f567/", "mem" : NumberLong(2048), "tags" : [ "virtual" ], "txn-queue" : [ "5490714b164546420f00000a_1cb1133b", "5490910816454643c3000002_02253369" ], "txn-revno" : NumberLong(2) } [22:21] jog: hmm everything looks normal [22:21] jog: I can't see why the upgrade didn't proceee [22:21] proceed [22:22] jog: let me try and repro locally using your instructions. [22:22] jog: pls leave the env up if you can [22:22] menn0, I've tried with LXCs and don't see the same issue [22:23] jog: interesting [22:23] sinzui, should we pause the CI jobs and I can use the finfolk MaaS to reproduce? [22:24] jog, build-revision is disabled...I already paused it [22:25] sinzui, ok I meant the retry of jobs from Jenkins... I can try manual steps to bring it up so the machines stay around long enough to poke around [22:26] jog, that is what I am doing right [22:32] jog: I haven't used maas before. is there a good guide for getting it working under kvm? [22:32] jog: i've found several [22:34] menn0, we have a doc the describes how we setup our env but it's not really trivial... sinzui, do you think we can give access to finfolk? [22:35] * jog has a hard stop to pick up kids from school and will be back in about 45 minutes. [22:36] jog, menn0 I think IS already gave everyone in canonical access to finfolk [22:37] jog, but I don't see menn0 in /home. [22:38] menn0, I think you need the ssh rules to get to the machine, as you have cloud-city you have the keys to use the gateway we setup. [22:38] sinzui: ok, let me try [22:43] jog, maybe i should abort this current test of maas upgrade. I think it has been stalled for 15 minutes [22:52] menn0, jog: I terminated the job using the -HUP, but that didn't call cleanup. We might have a dirty maas now :( === kadams54 is now known as kadams54-away === kadams54-away is now known as kadams54 [23:05] wallyworld: you have a few minutes? === kadams54 is now known as kadams54-away [23:25] hi menn0, sinzui, I'm back [23:26] jog: hi again [23:26] jog: i'm on finfolk now [23:26] jog: I can repro the problem [23:26] great! [23:27] jog: just uploading a instrumented version of juju now [23:27] jog: hopefully the extra logging will tell me something [23:35] jog, sinzui: so the issue is that the state server doesn't think it's the master! [23:35] jog, sinzui: I've never seen this before [23:35] * menn0 continues digging [23:35] It is a modest state-server [23:35] heh sinzui, I was thinking the same thing [23:39] when jujud comes up on 1.20 mongo tells the state server it's the master [23:40] after rebooting into 1.21 mongo tells the state server it's no longer the master [23:41] yet if I use the shell, mongodb says that the instance on machine-0 is the primary/master [23:41] so something in juju must be getting it wrong [23:42] but why only on maas... [23:42] moar digging [23:46] sinzui, jog: these tests stopped working for both master and 1.21 when dimiter's fix for bug 1397376 when in [23:46] Bug #1397376: maas provider: 1.21b3 removes ip from api-endpoints [23:48] looking at that now [23:49] wallyworld: ping [23:49] ericsnow: i think he is off today [23:49] katco: k, thanks [23:49] ericsnow: np [23:53] menn0, sinzui, and timing of that commit was when we had our MaaS down for upgrading with the CI jobs just getting re-enabled after the weekend :( [23:56] jog, sinzui: i'm beginning to see how that commit could cause the behaviour we're seeing [23:56] jog, sinzui: adding more logging [23:57] :) [23:57] jog, sinzui: with this release the isMaster check always returns false on maas [23:58] jog, sinzui: upgrades not completing is just fallout from that [23:58] :) === kadams54 is now known as kadams54-away