[01:15] * wallyworld sighs. ocr on leave means reviews kinda stall [01:20] wallyworld, we should start having folks assign back-up for ocr when they go on vacation [01:20] yes we should - we used to have that as a polciy [01:20] i guess we still do in theory [01:20] folks just have to follow it :-) [02:04] wallyworld: just so you know, i'm currently dealing with a nasty upgrade issue that CTS has run in to [02:04] oh no [02:04] bug? [02:05] wallyworld: seems to affect any upgrade from 1.24.x [02:05] wallyworld: the agent won't restart... a worker is failing to stop [02:05] wallyworld: only seems to happen with big complex envs (bootstack in this case) [02:05] wallyworld: i'm using the bootstack staging env atm the repro and am making some progress [02:06] wallyworld: RT 82240... i'll make sure there's a Juju ticket if there isn't already [02:06] thumper was looking at this last week but I've taken it over in his absence [02:06] oh, ty [02:07] menn0: is it bug 1468653 [02:07] Bug #1468653: jujud hanging after upgrading from 1.24.0 to 1.24.1(and 1.24.2) [02:07] menn0, that is a bootstack bug [02:08] alexisb: I don't think so [02:08] alexisb: jujud is definitely not doing the right thing... it's getting stuck when trying to shut down [02:08] wallyworld: that is the ticket though (LP was timing out for me) [02:09] ok [02:58] wallyworld: this is definitely leadership related [02:58] oh joy [02:58] wallyworld: when I reproed there were 9 stuck API connections [02:58] wallyworld: and there were 9 goroutines waiting in BlockUntilLeadershipReleased [02:58] wallyworld: in 1.24.0 at least there's a naked channel read there [02:59] stuck before upgrade during shotdown of agents? [02:59] wallyworld: yep [02:59] wallyworld: when did you fix all the naked channel ops? [02:59] i didn't fix those [02:59] i think maybe william did? [02:59] or tim? [03:00] ok, I thought you did some [03:00] if i did i can't remember [03:00] anyway, I shoudl hopefully have this soon [03:00] i guess we think that 1.24.2 is ok [03:01] and 1.24.0 upgrades may need a manual process if it hangs [03:01] not sure, I think the problem still happens when upgrading from 1.24.2 [03:01] I'll check that soon [03:01] ok [03:02] it takes about 30 mins to build up the env to test it [03:03] and I don't want to tear it down just yet until I've finished looking at this env [03:08] ok [03:17] wallyworld: ok, i understand the problem now [03:17] wallyworld: checking to see if someone has already fixed it in a later 1.24 or in master [03:17] ok [03:18] wallyworld: basically if any of the leadership API requests are active (and some are quite long running) while an upgrade is initiated the server will get stuck [03:19] wallyworld: the more units you have the more likely you are to hit the problem [03:20] menn0: yes, that sounds very plausibl based on what has been observed before and what andrew/william fixed [03:20] i don't think there's a fix we can do because 1.24.0 is already running [03:22] wallyworld: that's true, but it would be good to ensure that the next 1.24.0 doesn't have the problem [03:22] sorry 1.24.x [03:23] wallyworld: from looking at the code, the problem is still there in master [03:23] wallyworld: hangout? [03:23] sure, sec [03:24] wallyworld: onyx standup? [04:02] wallyworld: it's not actually as simple as we thought... the thing returned by NewLeaseManager is actually the singleton which is supposedly getting killed [04:02] wallyworld: there must be some other aspect [04:03] otp, sec [04:03] wallyworld: no worries [04:03] wallyworld: i'll sort it out [04:04] ty [08:51] wallyworld: I have a fix for the problem [08:51] wallyworld: it's way past my EOD and i'm not working tomorrow so I'm writing up notes for thumper so he can write some tests around it and land it [08:52] wallyworld: we're not out of the woods yet though... post upgrade about 50% of the units on this env have hook failures [08:52] wallyworld: will send an email [08:52] menn0: thanks for sticking with it, i'll talk to tim tomorrow [09:48] Bug #1471231 changed: debugLogDBIntSuite teardown fails [12:46] morning [12:59] morning perrito666 [13:04] there is something about anual medical check that makes me feel old [13:04] * perrito666 sighs and makes appt [13:17] perrito666: wait until u get kids... :) [13:18] anastasiamac: +1 [13:25] perrito666: katco I cannot find the on-call reviewer callendar. Any clues? [13:25] sinzui: it's just the juju team calendar [13:35] katco: Anymore clue's. Canconal's Google Calenendar tells me none of the juju email address have a calendar? [13:41] wwitzel3: Can you review http://reviews.vapour.ws/r/2140/ [13:42] man, coming back from vacation is always so hard [13:43] natefinch: o/ hope you had a good time [13:43] katco: amazing time. Could have used another week (and a raise to be able to afford it ;) [13:44] katco: he had, seen it on Instagram ;) [13:44] natefinch: lol [13:44] TheMue: :) [13:44] sinzui: taking a look now [13:45] natefinch: looked like a lot of fun in a cool environment [13:45] TheMue: it was great. We did it last year in a house half this size... the extra interior space and nicer beach made this year even better. [13:45] (and 50% more expensive... but worth it) [13:46] ericsnow: wwitzel3: natefinch: we have 2 meetings overlapping. just meet in moonstone [13:46] sinzui: just the dep updates? I was able to update to them and build juju and bootstrap, so combined with the tests you did, LGTM. [13:46] natefinch: your familiy growed, you need the space [13:46] ;) [13:47] TheMue: yeah, I gotta stop doing thing [13:47] natefinch: cute family, no need to stop [13:47] wwitzel3: it is, I just wanted a dev to ask the hard questions about consequences. Thank you. This is the compararable branch for master. http://reviews.vapour.ws/r/2141/ [13:49] TheMue: haha... the number of bedrooms in my house, seats in my car, and lack of hair on my head say otherwise ;) [13:49] natefinch: hmm, ok, there are constraints, yep :D [13:58] sinzui: tim and wayne [13:58] thank you perrito666 katco and fwereade sorted me out [13:58] katco: I didn't check my calendar until now and just realized we have the iteration meeting nowish. I have to take my daughter to a swim lesson in about 15 minutes... can we push the iteration meeting back a couple hours? Sorry for the late notice... I forgot we'd pushed the iteration meeting to today. [13:59] natefinch: not really, i am taking the middle of today off to catch up on some things. you need to start checking your calendar dude [14:00] katco: I know, I know. Totally my fault. I'm sorry. === ericsnow is now known as ericsnow_afk [16:37] Is there someone who owns the CentOS support within Juju? [16:40] alexisb: ^^ [16:40] (I figure you'd be the most likely to know :) [16:41] gsamfira and team did the work [17:06] katco, wwitzel3, ericsnow_afk: how goes? [17:06] natefinch: good, I'm just working on wpm bugs [17:08] cherylj: I might be able to answer questions [17:20] natefinch: pick up some of the bugs in the backlog if you don't mind [17:20] wwitzel3: please tag the bug you're working on and move to actively working [17:21] katco: will do [17:21] katco: thanks === liam_ is now known as Guest62504 [17:46] katco: FYI: one bug was fixed by someone else, one was marked invalid, and one seems to be assigned to gsamfira, though that was 5 days ago, so I'm not sure if he's actually working on it. The other bug in the backlog is being worked on by wwitzel3. I could do my "clean up assigned bugs" task, unless you think there's something more important === ericsnow_afk is now known as ericsnow [19:48] natefinch, pending katco's arrival, there are plenty of bugs against 1.25 you can tackle :) [19:48] lots and lots [19:48] alexisb: heh ok [19:56] Bug #1424892 changed: rsyslog-gnutls is not installed when enable-os-refresh-update is false [20:12] mgz: don't suppose you're around? [20:16] sinzui: is my CI blockers bookmark incorrect? It shows no blockers, but trying to merge some code to main returns "does not match fixes-blah" My bookmark, for reference: https://bugs.launchpad.net/juju-core/+bugs?field.status%3Alist=TRIAGED&field.status%3Alist=INPROGRESS&field.importance%3Alist=CRITICAL&field.tag=ci+regression+&field.tags_combinator=ALL [20:53] natefinch: The status changes about 6 months ago, and the tags 3 months ago: look at this [20:53] natefinch: The status changes about 6 months ago, and the tags 3 months ago: look at this [20:54] natefinch: The status changes about 6 months ago, and the tags 3 months ago: look at this [20:54] natefinch: The status changes about 6 months ago, and the tags 3 months ago: look at this [20:54] natefinch: The status changes about 6 months ago, and the tags 3 months ago: look at this [20:54] https://bugs.launchpad.net/juju-core/+bugs?field.status%3Alist=NEW&field.status%3Alist=CONFIRMED&field.status%3Alist=TRIAGED&field.status%3Alist=INPROGRESS&field.status%3Alist=FIXCOMMITTED&field.status%3Alist=INCOMPLETE_WITH_RESPONSE&field.status%3Alist=INCOMPLETE_WITHOUT_RESPONSE&field.importance%3Alist=CRITICAL&field.tag=ci+blocker+&field.tags_combinator=ALL [20:54] natefinch: CI is testing the fixes now [20:55] looks the the osx change is good too [20:56] sinzui: mind if I add that link to the blocking bugs wiki page that Martin made today? That way, hopefully it'll get updated if the requirements change [20:57] natefinch: go ahead [20:59] done [20:59] thanks sinzui [21:08] Bug #1473461 changed: OSX/darwin builds fail: undefined: password.EnsureJujudPassword [21:33] ah mah gard, so many emails [21:35] fwereade: I'm here [21:44] thumper, heyhey [21:44] thumper, not so critical really, I think it's just JujuConnSuite being shite [21:44] thumper, and I've convinced myself that it's an INFO log anyway so it's moot [21:45] thumper, but if, in your Copious Free Time, you were to come up with a clean way of separating the logging (that wasn't just "replace JujuConnSuite"), that would be awesome [21:51] fwereade: there is a way... [21:52] * fwereade is all ears [21:53] the base suite brings in a logging sute [21:53] the logging suite captures the logs [21:53] and replaces the default logger (stderr) with one that goes to gocheck [21:54] so... wondering what the problem is [21:54] thumper, well, me too, I'm vaguely assuming that because JCS has everything running all at once there's some global logging setup somewhere that dumps the state stuff into the stderr of the testing.Context [21:55] thumper, I imagine the cmd.Logger or whatever it is has a hand in it? [21:55] IIRC, there was some change to the default loggers with the log roller [21:55] thumper, and it's not wrong to be sending all those logs to stdout [21:55] but I've not looked deeply [21:56] thumper, it's just that it's happening in the same process, which is out of the ordinary, and so gets logged with everything else [21:56] thumper, I guess the answer with that specific test to to run it against an api stub and check it doesn't log when nicely isolated [21:58] :) [22:36] \o/ [22:49] hi davecheney [22:50] davecheney: how'd the conference go for you? [22:51] thumper: excellently [22:51] i guess that means I beat axw back to austalia [23:13] is there any way to see what jujud is doing, load wise? we've got one constantly sitting between 100% - 150% cpu, and the logs aren't particularly illuminating - doesn't look too busy at all [23:20] Bug #1446871 changed: Unit hooks fail on windows if PATH is uppercase [23:43] bradm: best suggestion is to change the log settings to debug [23:43] bradm: or are they at debug already? [23:44] * thumper takes a deep breath and resolves conflicts between master and jes-cli branch [23:56] thumper: we have a 20G log file, so either we're on debug or its very very verbose for info, but we'll check. [23:57] yes, we're definately on debug [23:57] we're seeing a lot about ClaimLeadership