[01:59] https://github.com/juju/juju/pull/11076 for anyone wanting a simple PR to review [02:01] thumper: lgtm ty [02:02] kelvinliu: thanks [02:08] np [02:47] babbageclunk: i've updated https://github.com/juju/juju/pull/11071 with the correct fix, just re-testing now [02:47] would be awesome if you could take another look [02:53] tlm: for QA steps, you should also run juju controller-config to check the value and also test updating it [03:02] wallyworld: ta, just found a bug with it as well. Will update all in a sec [03:02] ok [03:05] wallyworld: package move https://github.com/juju/juju/pull/11077 [03:05] ok [03:13] thumper: can't see any issues [03:13] wallyworld: I'm running the tests locally as well to ensure everything builds and passes [03:14] sgtm [03:17] wallyworld: all updated [03:17] righto [03:25] tlm: i've left a couple of things to look at, let me know if anything is unclear [03:26] cheers [03:33] wallyworld: sent you a private message just in case notifications are still off [03:50] wallyworld: ok looking [03:50] ta [03:57] kelvinliu: i have a trivial forward port of 2.7 into develp https://github.com/juju/juju/pull/11078 [03:58] wallyworld: lgtm ty [03:58] ta [03:59] tlm: also forgot to mention sorry, the controller name PR should be targetted against develop not 2.7 [04:01] np will shift it around [04:04] wallyworld: have we fixed the bootstrap / upgrade issue that we were hitting with the build number? [04:05] thumper: that's the PR that babbageclunk is looking at currently [04:05] sweet [09:15] getting nothing past hml, yesterday - haha [10:09] Any news on a new libjuju release? [10:10] zeestrat, doing it now [10:10] :D [10:10] Yay :D [10:11] zeestrat, just catching up on what the changes are [11:01] manadart, you around? [11:01] stickupkid: Yep. [11:01] quick ho? [11:01] Sure. [11:05] manadart, CR please https://github.com/juju/python-libjuju/pull/378 [11:19] zeestrat, hopefully we can verify that everything is fine so that 2.7.0 can go out https://github.com/juju/python-libjuju/pull/378 [11:20] zeestrat, I ran against this, but manage to mitigate this issue luckily https://github.com/juju/python-libjuju/issues/377 [11:40] manadart, when you get time https://github.com/juju/juju/pull/11067 [11:40] stickupkid: Yep. [12:34] stickupkid: Got a sec to HO? [12:55] anyone quick cr? https://github.com/juju/juju/pull/11079 [12:55] nammn_de: Done. [12:56] manadart: thankso! [13:25] manadart, i'm in daily [13:26] watching you eat === narindergupta is now known as narinderguptamac [14:26] hey folks, how do I get a leaderless applicatin to get a leader? [14:27] [INFO] canonical-livepatch does not have a leader <- appears repeatedly and juju status indeed shows no unit for that application with the nice asterisk * [14:35] roadmr: hmmm, can you check the logs to see if there's a reason no one's winning leadership? [14:35] roadmr: the cases we've seen with this in the past has been more leader stomping where it's taken more than the time allotted for the leader hook to run and someone else tries to get it and the first one gives up [14:47] zeestrat: 2.7.0 is released https://discourse.jujucharms.com/t/pylibjuju-2-7-0-release-notes/2489 [14:48] zeestrat, we had to bump the facades to 2.7.0, which was a bit of a pain, but now it's done the release went smoothly :D [14:48] let us/me know if you hit any issues [14:48] gnuoy: ^ [14:48] thedac: ^ /me can't recall who wanted it [14:48] rick_h, me ! it was me ! [14:48] thanks [14:50] stickupkid: ty very much! We'll let you know :) [15:32] rick_h: stickupkid: thanks, I'll spread that knowledge to my team. [15:40] roadmr: wondering if https://bugs.launchpad.net/juju/+bug/1853055 is related to your issue [15:40] Bug #1853055: "ERROR could not determine leader" but `juju status` says there is a leader [15:40] although your status shows a leader? [15:41] one thing i've had to do to work around that, is scrape the json status for the unit that shows "leader" and address that unit specifically [15:41] maybe see what juju status --format yaml (or json) shows [15:43] cmars: no, we see no leader :( [15:43] rick_h: 2020-01-07 15:28:36 INFO juju.worker.leadership tracker.go:217 canonical-livepatch leadership for canonical-livepatch/2 denied (no further explanation as to why) [15:44] anarchy! [15:45] cmars: exactly what I said :) [15:46] on some units I see "2020-01-06 14:28:25 DEBUG juju.worker.leadership tracker.go:125 canonical-livepatch/1827 making initial claim for canonical-livepatch leadership" right before the above [15:48] hml, I've updated https://github.com/juju/juju/pull/11072 [15:54] stickupkid: approved [15:54] ta [16:02] achilleasa: quick cr? https://github.com/juju/charmrepo/pull/158 [16:02] nammn_de: looking [16:35] rick_h: i have made the patch as small as possible. This now should mostly enhance bash completion ux https://github.com/juju/juju/pull/11032/ [16:36] nammn_de: k, let's try it [16:36] roadmr: can you file a bug please? [16:41] rick_h: thanks! something special needed to put that into edge beside just merging it? I did quite a lot of local testing and I don;t see any reason why this should make it worse... :D [16:41] nammn_de: no, just land it in develop and it'll go into edge when our builds get back happy [16:41] nammn_de: and just keep an eye out for any issues [16:54] nammn_de: can I get a quick CR on https://github.com/juju/charm/pull/301? [16:59] rick_h: will do! [17:21] nammn_de: or stickupkid can someone take a look at https://github.com/juju/juju/pull/11080? [17:36] achilleasa: approved [17:51] hml, there is fallout from my PR around handling outputs correctly, i.e. we put the empty value in stderr and not stdout, can you review my PR https://github.com/juju/cmd/pull/75 [17:52] achilleasa, or can you look into it (see above) [17:53] stickupkid: looking [17:56] stickupkid: done [18:09] rick_h: Hi! regarding roadmr leaderless issue, we have https://paste.ubuntu.com/p/Cp8tbkgHkv/ in the logs [18:09] cmars: ^ [18:09] even after adding a new unit [18:10] that logs is from a fresh unit [18:10] verterok: do you know if you're using legacy leases or raft leases? [18:10] rick_h: this is a IS managed controller, the main one...let me check [18:11] verterok: I'm not seeing "leadership for" in the current code so getting a bug with juju version, where it's at (prodstack I thought moved off leagacy leases but axino can correct me), and model version details would be good please [18:11] verterok: I'm a bit distracted atm but if we can get the details pulled together I can ask someone to investigate [18:12] rick_h: 2.6.10 in client and model [18:12] verterok: ok, good to know. on prodstack 4.5 you're saying? [18:12] yup [18:13] features [] [18:13] no legacy leases ! [18:13] axino: thx [18:13] axino: ok cool ty for the confirmation there [19:22] stickupkid: back, reviewing now. [19:22] stickupkid: or not. :-) [20:26] roadmr, cmars, rick_h: this looks pretty similar to what we are seeing: https://bugs.launchpad.net/juju/+bug/1656275 [20:26] Bug #1656275: unit leadership gets confused [20:27] but in a version that should have that fix...which points to a regression of some kind [20:27] 😱 [20:27] <_thumper_> verterok: I know that there was additional work later on that [20:28] <_thumper_> but I would have thought that it would be in 2.6.10 === _thumper_ is now known as thumper [20:30] thumper: right, we are seeing a similar symptom: leadership claim -> denied [20:30] only difference I can guess is that in our case is a subordinate...in case that makes any difference [20:30] verterok: I am wondering about whether this could be scale and timeout related... [20:32] 🦎 <- lots of scales here [20:32] I just see 🦎 [20:32] that is a rectangle... [20:33] heh [20:33] thumper: haha you copy-pasted the rectangle and I see exactly what I posted :) [20:33] thumper: the model itself is nothing crazy, 18 machines with same number of canonical-livepatch subordinates [20:33] verterok: but this is on the shared ps4.5 controller? [20:34] thumper: yup [20:34] that has scale [20:34] indeed [20:34] only two things to mention: this env has been up for a loong time and it sees a lot of churn (applications being created/removed very frequently) [20:34] since we add a new application on every new code version rollout [20:34] that tends to make juju unhappy [20:34] thumper: anything we could do to recover? is there any way to force a leader? [20:35] this is blocking rollouts to staging, as we use mojo and it doesn't like to have applications without a leader :) [20:35] verterok: it looks like raft thinkgs there is probably a leader, but mongo doesn't [20:36] verterok: we need to work out who raft thinks is the leader [20:36] you can probably do that with juju run over the app with 'is-leader' [20:36] then shut down that unit for a minute to force expiration [20:36] thumper: all return false [20:36] ah... [20:36] wat? [20:37] verterok: did you file the bug already? [20:37] verterok: please please make the bug title "ANARCHY!!!!!!" [20:37] thumper: https://paste.ubuntu.com/p/RwbDQFyYf3/ [20:37] babbageclunk: thoughts ? ^^^ [20:37] roadmr: didn't file one as I found 1656275 [20:38] ahh... bummer :( [20:38] verterok: please file a new one [20:38] thumper: will do [20:38] verterok: it is probably a different issue [20:40] it is anarchy!!! hahah [20:40] roadmr: can let you the honors [20:41] *do [20:41] * roadmr won't pass up the chance to file a silly bug [20:46] verterok: hm - we tried restarting juju agents for that application but not the machine-X agents, think it might help? [20:46] roadmr: I can restart all agents [20:47] thumper: missed this - reading back [20:48] https://bugs.launchpad.net/juju/+bug/1858693 has a summary [20:48] Bug #1858693: ANARCHY!!!!!!! Entirely leaderless application spotted in the wild [20:53] roadmr: all machine agents restarted, no changes [20:54] verterok: if you deploy a new application, does the new unit become the leader? [20:55] babbageclunk: a new application or add-unit? [20:56] verterok: trying to determine whether all of leadership is broken, or whether there's some kind of pinning happening for that application specifically [20:56] new application (can be an existing charm but with a new name) [20:56] babbageclunk: leadership for other applications seems to be fine [20:57] do you mean, you can see leaders for other applications? Or you've seen leadership change for other applications when the problem has been happening? [20:57] because the former might not indicate it's fine [20:57] it might just be stuck in a non-visible way [20:57] babbageclunk: let me try deploying a trivial ubuntu unit, would that be enough? [20:57] yup [20:58] thanks [20:58] running: juju deploy cs:ubuntu -n 2 [21:01] babbageclunk: I get the leadership * as expected in juju status [21:01] babbageclunk: and juju run --aplication ubuntu is-leader return True and False as expected [21:02] verterok: ok - so it sounds like the canonical-livepatch application has leadership pinned? [21:02] Was there an upgrade-series done at some point? [21:03] babbageclunk: no upgrade-series AFAIK, the env is old but always in xenial [21:04] verterok: oh, another useful thing might be to remove the leader unit of the new ubuntu and make sure that the other one eventually claims leadership [21:04] hmm, that's the only thing that would do a pin that I know of [21:04] babbageclunk: we do remove units and deploy new ones on each new code revision [21:05] babbageclunk: which also means killing and spawning new suboardinates (which includes canonical-livepatch) [21:05] verterok: 18 units in all, you said? each add/remove has a 20% or so chance of removing the current leader [21:05] right - which would explain why the leader went away? But not why there's not a new one [21:05] (we add 4 units, then remove the 4 old ones) [21:05] roadmr: right, 18 [21:06] It might be that the raft time isn't being updated, so it's not expiring the old one. [21:06] well if elections happened every 3 weeks eventually you'd also want there to not be a leader, right? [21:06] (j/k not helping haha) [21:07] if you kill the leader of the new ubuntu, does leadership switch to the other unit? [21:07] verterok: ^ [21:07] checking [21:08] gimme 2'...because I need to deploy them again (killed them too soon :P) [21:08] ha doh [21:08] babbageclunk: where can I check the controller logs? (I assume it's you who asked for those) [21:08] yeah, that's me [21:08] roadmr: IS can [21:08] ah ok [21:09] babbageclunk: how long could it take to elect a new leader? [21:10] ok, after a brief moment of panic, it worked [21:10] should happen in a minute or less [21:11] babbageclunk: /4 was leader, and after remove-unit the * is now in /3 [21:11] huh. [21:12] it's a pain but I'm kind of tempted to suggest you remove and re-add canonical-livepatch. [21:13] babbageclunk: that was our plan if we needed to unblock deploys [21:13] is that a non-suggestion suggestion? :) [21:14] I mean, it's the tactical nuclear fallback suggestion ;) [21:14] if there's nothing else to be gleaned from the current live environment, we might as well [21:14] we need to wait for controller logs anyway,if that's where more clues might be [21:15] Well, I could definitely do with the logs [21:15] as long as you're ok with waiting [21:17] filing the RT [21:18] but I'm not sure how it could get into this state without the application being pinned. [21:21] babbageclunk: do you have a sec to drop by #is in case Alexandre needs more specifics on what to dig for in the logs? [21:21] sure [21:21] (I can triangulate if needed but might be more efficient if you're there) [21:29] thanks for the help babbageclunk [21:29] * verterok EODs [22:10] hi cory_fu: the current 2.7 edge `2.7.1+2.7-ec91b32` should be working fine for CaaS cmr now. [22:10] kelvinliu: Great, thanks [22:10] I'll give it a try [22:10] cory_fu: ty, [22:19] babbageclunk: i just saw the bug - if you remove a unit that is leader, juju does not immediately elect a new leader, the current lease needs to time out. this is a known issue / works as designed. there are bugs like bug 1469731 raised back in 1.23 days [22:19] Bug #1469731: Leadership must be dropped before removing lead unit [22:20] wallyworld: yeah, but in this case it's been ages [22:20] ah, ok, longer than the lease timeout [22:22] 👴 ages :) [22:22] wallyworld: yup as in hours [22:22] :-( [22:22] wallyworld: but other applications seem to be ok - leadership changes fine [22:23] i guess we'll need the raft logs, engine reports etc [22:26] yeah, although none of that really helps if leases seem to be changing ok for other applications