* wallyworld sighs. ocr on leave means reviews kinda stall | 01:15 | |
alexisb | wallyworld, we should start having folks assign back-up for ocr when they go on vacation | 01:20 |
---|---|---|
wallyworld | yes we should - we used to have that as a polciy | 01:20 |
wallyworld | i guess we still do in theory | 01:20 |
wallyworld | folks just have to follow it :-) | 01:20 |
menn0 | wallyworld: just so you know, i'm currently dealing with a nasty upgrade issue that CTS has run in to | 02:04 |
wallyworld | oh no | 02:04 |
wallyworld | bug? | 02:04 |
menn0 | wallyworld: seems to affect any upgrade from 1.24.x | 02:05 |
menn0 | wallyworld: the agent won't restart... a worker is failing to stop | 02:05 |
menn0 | wallyworld: only seems to happen with big complex envs (bootstack in this case) | 02:05 |
menn0 | wallyworld: i'm using the bootstack staging env atm the repro and am making some progress | 02:05 |
menn0 | wallyworld: RT 82240... i'll make sure there's a Juju ticket if there isn't already | 02:06 |
menn0 | thumper was looking at this last week but I've taken it over in his absence | 02:06 |
wallyworld | oh, ty | 02:06 |
wallyworld | menn0: is it bug 1468653 | 02:07 |
mup | Bug #1468653: jujud hanging after upgrading from 1.24.0 to 1.24.1(and 1.24.2) <canonical-bootstack> <juju-core:Triaged> <juju-core 1.24:In Progress by thumper> <https://launchpad.net/bugs/1468653> | 02:07 |
alexisb | menn0, that is a bootstack bug | 02:07 |
menn0 | alexisb: I don't think so | 02:08 |
menn0 | alexisb: jujud is definitely not doing the right thing... it's getting stuck when trying to shut down | 02:08 |
menn0 | wallyworld: that is the ticket though (LP was timing out for me) | 02:08 |
wallyworld | ok | 02:09 |
menn0 | wallyworld: this is definitely leadership related | 02:58 |
wallyworld | oh joy | 02:58 |
menn0 | wallyworld: when I reproed there were 9 stuck API connections | 02:58 |
menn0 | wallyworld: and there were 9 goroutines waiting in BlockUntilLeadershipReleased | 02:58 |
menn0 | wallyworld: in 1.24.0 at least there's a naked channel read there | 02:58 |
wallyworld | stuck before upgrade during shotdown of agents? | 02:59 |
menn0 | wallyworld: yep | 02:59 |
menn0 | wallyworld: when did you fix all the naked channel ops? | 02:59 |
wallyworld | i didn't fix those | 02:59 |
wallyworld | i think maybe william did? | 02:59 |
wallyworld | or tim? | 02:59 |
menn0 | ok, I thought you did some | 03:00 |
wallyworld | if i did i can't remember | 03:00 |
menn0 | anyway, I shoudl hopefully have this soon | 03:00 |
wallyworld | i guess we think that 1.24.2 is ok | 03:00 |
wallyworld | and 1.24.0 upgrades may need a manual process if it hangs | 03:01 |
menn0 | not sure, I think the problem still happens when upgrading from 1.24.2 | 03:01 |
menn0 | I'll check that soon | 03:01 |
wallyworld | ok | 03:01 |
menn0 | it takes about 30 mins to build up the env to test it | 03:02 |
menn0 | and I don't want to tear it down just yet until I've finished looking at this env | 03:03 |
wallyworld | ok | 03:08 |
menn0 | wallyworld: ok, i understand the problem now | 03:17 |
menn0 | wallyworld: checking to see if someone has already fixed it in a later 1.24 or in master | 03:17 |
wallyworld | ok | 03:17 |
menn0 | wallyworld: basically if any of the leadership API requests are active (and some are quite long running) while an upgrade is initiated the server will get stuck | 03:18 |
menn0 | wallyworld: the more units you have the more likely you are to hit the problem | 03:19 |
wallyworld | menn0: yes, that sounds very plausibl based on what has been observed before and what andrew/william fixed | 03:20 |
wallyworld | i don't think there's a fix we can do because 1.24.0 is already running | 03:20 |
menn0 | wallyworld: that's true, but it would be good to ensure that the next 1.24.0 doesn't have the problem | 03:22 |
menn0 | sorry 1.24.x | 03:22 |
menn0 | wallyworld: from looking at the code, the problem is still there in master | 03:23 |
menn0 | wallyworld: hangout? | 03:23 |
wallyworld | sure, sec | 03:23 |
menn0 | wallyworld: onyx standup? | 03:24 |
menn0 | wallyworld: it's not actually as simple as we thought... the thing returned by NewLeaseManager is actually the singleton which is supposedly getting killed | 04:02 |
menn0 | wallyworld: there must be some other aspect | 04:02 |
wallyworld | otp, sec | 04:03 |
menn0 | wallyworld: no worries | 04:03 |
menn0 | wallyworld: i'll sort it out | 04:03 |
wallyworld | ty | 04:04 |
menn0 | wallyworld: I have a fix for the problem | 08:51 |
menn0 | wallyworld: it's way past my EOD and i'm not working tomorrow so I'm writing up notes for thumper so he can write some tests around it and land it | 08:51 |
menn0 | wallyworld: we're not out of the woods yet though... post upgrade about 50% of the units on this env have hook failures | 08:52 |
menn0 | wallyworld: will send an email | 08:52 |
wallyworld | menn0: thanks for sticking with it, i'll talk to tim tomorrow | 08:52 |
mup | Bug #1471231 changed: debugLogDBIntSuite teardown fails <ci> <unit-tests> <juju-core db-log:Fix Committed> <https://launchpad.net/bugs/1471231> | 09:48 |
perrito666 | morning | 12:46 |
alexisb | morning perrito666 | 12:59 |
perrito666 | there is something about anual medical check that makes me feel old | 13:04 |
* perrito666 sighs and makes appt | 13:04 | |
anastasiamac | perrito666: wait until u get kids... :) | 13:17 |
ashipika | anastasiamac: +1 | 13:18 |
sinzui | perrito666: katco I cannot find the on-call reviewer callendar. Any clues? | 13:25 |
katco | sinzui: it's just the juju team calendar | 13:25 |
sinzui | katco: Anymore clue's. Canconal's Google Calenendar tells me none of the juju email address have a calendar? | 13:35 |
sinzui | wwitzel3: Can you review http://reviews.vapour.ws/r/2140/ | 13:41 |
natefinch | man, coming back from vacation is always so hard | 13:42 |
katco | natefinch: o/ hope you had a good time | 13:43 |
natefinch | katco: amazing time. Could have used another week (and a raise to be able to afford it ;) | 13:43 |
TheMue | katco: he had, seen it on Instagram ;) | 13:44 |
katco | natefinch: lol | 13:44 |
katco | TheMue: :) | 13:44 |
wwitzel3 | sinzui: taking a look now | 13:44 |
TheMue | natefinch: looked like a lot of fun in a cool environment | 13:45 |
natefinch | TheMue: it was great. We did it last year in a house half this size... the extra interior space and nicer beach made this year even better. | 13:45 |
natefinch | (and 50% more expensive... but worth it) | 13:45 |
katco | ericsnow: wwitzel3: natefinch: we have 2 meetings overlapping. just meet in moonstone | 13:46 |
wwitzel3 | sinzui: just the dep updates? I was able to update to them and build juju and bootstrap, so combined with the tests you did, LGTM. | 13:46 |
TheMue | natefinch: your familiy growed, you need the space | 13:46 |
TheMue | ;) | 13:46 |
natefinch | TheMue: yeah, I gotta stop doing thing | 13:47 |
TheMue | natefinch: cute family, no need to stop | 13:47 |
sinzui | wwitzel3: it is, I just wanted a dev to ask the hard questions about consequences. Thank you. This is the compararable branch for master. http://reviews.vapour.ws/r/2141/ | 13:47 |
natefinch | TheMue: haha... the number of bedrooms in my house, seats in my car, and lack of hair on my head say otherwise ;) | 13:49 |
TheMue | natefinch: hmm, ok, there are constraints, yep :D | 13:49 |
perrito666 | sinzui: tim and wayne | 13:58 |
sinzui | thank you perrito666 katco and fwereade sorted me out | 13:58 |
natefinch | katco: I didn't check my calendar until now and just realized we have the iteration meeting nowish. I have to take my daughter to a swim lesson in about 15 minutes... can we push the iteration meeting back a couple hours? Sorry for the late notice... I forgot we'd pushed the iteration meeting to today. | 13:58 |
katco | natefinch: not really, i am taking the middle of today off to catch up on some things. you need to start checking your calendar dude | 13:59 |
natefinch | katco: I know, I know. Totally my fault. I'm sorry. | 14:00 |
=== ericsnow is now known as ericsnow_afk | ||
cherylj | Is there someone who owns the CentOS support within Juju? | 16:37 |
cherylj | alexisb: ^^ | 16:40 |
cherylj | (I figure you'd be the most likely to know :) | 16:40 |
alexisb | gsamfira and team did the work | 16:41 |
natefinch | katco, wwitzel3, ericsnow_afk: how goes? | 17:06 |
wwitzel3 | natefinch: good, I'm just working on wpm bugs | 17:06 |
bogdanteleaga | cherylj: I might be able to answer questions | 17:08 |
katco | natefinch: pick up some of the bugs in the backlog if you don't mind | 17:20 |
katco | wwitzel3: please tag the bug you're working on and move to actively working | 17:20 |
natefinch | katco: will do | 17:21 |
wwitzel3 | katco: thanks | 17:21 |
=== liam_ is now known as Guest62504 | ||
natefinch | katco: FYI: one bug was fixed by someone else, one was marked invalid, and one seems to be assigned to gsamfira, though that was 5 days ago, so I'm not sure if he's actually working on it. The other bug in the backlog is being worked on by wwitzel3. I could do my "clean up assigned bugs" task, unless you think there's something more important | 17:46 |
=== ericsnow_afk is now known as ericsnow | ||
alexisb | natefinch, pending katco's arrival, there are plenty of bugs against 1.25 you can tackle :) | 19:48 |
alexisb | lots and lots | 19:48 |
natefinch | alexisb: heh ok | 19:48 |
mup | Bug #1424892 changed: rsyslog-gnutls is not installed when enable-os-refresh-update is false <cloud-init> <logging> <juju-core:Fix Released by natefinch> <juju-core 1.24:Fix Released by natefinch> <https://launchpad.net/bugs/1424892> | 19:56 |
natefinch | mgz: don't suppose you're around? | 20:12 |
natefinch | sinzui: is my CI blockers bookmark incorrect? It shows no blockers, but trying to merge some code to main returns "does not match fixes-blah" My bookmark, for reference: https://bugs.launchpad.net/juju-core/+bugs?field.status%3Alist=TRIAGED&field.status%3Alist=INPROGRESS&field.importance%3Alist=CRITICAL&field.tag=ci+regression+&field.tags_combinator=ALL | 20:16 |
sinzui | natefinch: The status changes about 6 months ago, and the tags 3 months ago: look at this | 20:53 |
sinzui | natefinch: The status changes about 6 months ago, and the tags 3 months ago: look at this | 20:53 |
sinzui | natefinch: The status changes about 6 months ago, and the tags 3 months ago: look at this | 20:54 |
sinzui | natefinch: The status changes about 6 months ago, and the tags 3 months ago: look at this | 20:54 |
sinzui | natefinch: The status changes about 6 months ago, and the tags 3 months ago: look at this | 20:54 |
sinzui | https://bugs.launchpad.net/juju-core/+bugs?field.status%3Alist=NEW&field.status%3Alist=CONFIRMED&field.status%3Alist=TRIAGED&field.status%3Alist=INPROGRESS&field.status%3Alist=FIXCOMMITTED&field.status%3Alist=INCOMPLETE_WITH_RESPONSE&field.status%3Alist=INCOMPLETE_WITHOUT_RESPONSE&field.importance%3Alist=CRITICAL&field.tag=ci+blocker+&field.tags_combinator=ALL | 20:54 |
sinzui | natefinch: CI is testing the fixes now | 20:54 |
sinzui | looks the the osx change is good too | 20:55 |
natefinch | sinzui: mind if I add that link to the blocking bugs wiki page that Martin made today? That way, hopefully it'll get updated if the requirements change | 20:56 |
sinzui | natefinch: go ahead | 20:57 |
natefinch | done | 20:59 |
natefinch | thanks sinzui | 20:59 |
mup | Bug #1473461 changed: OSX/darwin builds fail: undefined: password.EnsureJujudPassword <blocker> <ci> <osx> <regression> <juju-core:Fix Released by bteleaga> <https://launchpad.net/bugs/1473461> | 21:08 |
thumper | ah mah gard, so many emails | 21:33 |
thumper | fwereade: I'm here | 21:35 |
fwereade | thumper, heyhey | 21:44 |
fwereade | thumper, not so critical really, I think it's just JujuConnSuite being shite | 21:44 |
fwereade | thumper, and I've convinced myself that it's an INFO log anyway so it's moot | 21:44 |
fwereade | thumper, but if, in your Copious Free Time, you were to come up with a clean way of separating the logging (that wasn't just "replace JujuConnSuite"), that would be awesome | 21:45 |
thumper | fwereade: there is a way... | 21:51 |
* fwereade is all ears | 21:52 | |
thumper | the base suite brings in a logging sute | 21:53 |
thumper | the logging suite captures the logs | 21:53 |
thumper | and replaces the default logger (stderr) with one that goes to gocheck | 21:53 |
thumper | so... wondering what the problem is | 21:54 |
fwereade | thumper, well, me too, I'm vaguely assuming that because JCS has everything running all at once there's some global logging setup somewhere that dumps the state stuff into the stderr of the testing.Context | 21:54 |
fwereade | thumper, I imagine the cmd.Logger or whatever it is has a hand in it? | 21:55 |
thumper | IIRC, there was some change to the default loggers with the log roller | 21:55 |
fwereade | thumper, and it's not wrong to be sending all those logs to stdout | 21:55 |
thumper | but I've not looked deeply | 21:55 |
fwereade | thumper, it's just that it's happening in the same process, which is out of the ordinary, and so gets logged with everything else | 21:56 |
fwereade | thumper, I guess the answer with that specific test to to run it against an api stub and check it doesn't log when nicely isolated | 21:56 |
thumper | :) | 21:58 |
davecheney | \o/ | 22:36 |
thumper | hi davecheney | 22:49 |
thumper | davecheney: how'd the conference go for you? | 22:50 |
davecheney | thumper: excellently | 22:51 |
davecheney | i guess that means I beat axw back to austalia | 22:51 |
bradm | is there any way to see what jujud is doing, load wise? we've got one constantly sitting between 100% - 150% cpu, and the logs aren't particularly illuminating - doesn't look too busy at all | 23:13 |
mup | Bug #1446871 changed: Unit hooks fail on windows if PATH is uppercase <ci> <hooks> <windows> <juju-core:Fix Released by natefinch> <juju-core 1.24:Fix Released by natefinch> <https://launchpad.net/bugs/1446871> | 23:20 |
thumper | bradm: best suggestion is to change the log settings to debug | 23:43 |
thumper | bradm: or are they at debug already? | 23:43 |
* thumper takes a deep breath and resolves conflicts between master and jes-cli branch | 23:44 | |
bradm | thumper: we have a 20G log file, so either we're on debug or its very very verbose for info, but we'll check. | 23:56 |
bradm | yes, we're definately on debug | 23:57 |
bradm | we're seeing a lot about ClaimLeadership | 23:57 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!