
thumperhttps://github.com/juju/juju/pull/11076 for anyone wanting a simple PR to review01:59
kelvinliuthumper: lgtm ty02:01
thumperkelvinliu: thanks02:02
wallyworldbabbageclunk: i've updated https://github.com/juju/juju/pull/11071 with the correct fix, just re-testing now02:47
wallyworldwould be awesome if you could take another look02:47
wallyworldtlm: for QA steps, you should also run juju controller-config to check the value and also test updating it02:53
tlmwallyworld: ta, just found a bug with it as well. Will update all in a sec03:02
thumperwallyworld: package move https://github.com/juju/juju/pull/1107703:05
wallyworldthumper: can't see any issues03:13
thumperwallyworld: I'm running the tests locally as well to ensure everything builds and passes03:13
tlmwallyworld: all updated03:17
wallyworldtlm: i've left a couple of things to look at, let me know if anything is unclear03:25
tlmwallyworld: sent you a private message just in case notifications are still off03:33
babbageclunkwallyworld: ok looking03:50
wallyworldkelvinliu: i have a trivial forward port of 2.7 into develp https://github.com/juju/juju/pull/1107803:57
kelvinliuwallyworld: lgtm ty03:58
wallyworldtlm: also forgot to mention sorry, the controller name PR should be targetted against develop not 2.703:59
tlmnp will shift it around04:01
thumperwallyworld: have we fixed the bootstrap / upgrade issue that we were hitting with the build number?04:04
wallyworldthumper: that's the PR that babbageclunk is looking at currently04:05
stickupkidgetting nothing past hml, yesterday - haha09:15
zeestratAny news on a new libjuju release?10:09
stickupkidzeestrat, doing it now10:10
zeestratYay :D10:10
stickupkidzeestrat, just catching up on what the changes are10:11
stickupkidmanadart, you around?11:01
manadartstickupkid: Yep.11:01
stickupkidquick ho?11:01
stickupkidmanadart, CR please https://github.com/juju/python-libjuju/pull/37811:05
stickupkidzeestrat, hopefully we can verify that everything is fine so that 2.7.0 can go out https://github.com/juju/python-libjuju/pull/37811:19
stickupkidzeestrat, I ran against this, but manage to mitigate this issue luckily https://github.com/juju/python-libjuju/issues/37711:20
stickupkidmanadart, when you get time https://github.com/juju/juju/pull/1106711:40
manadartstickupkid: Yep.11:40
manadartstickupkid: Got a sec to HO?12:34
nammn_deanyone  quick cr? https://github.com/juju/juju/pull/1107912:55
manadartnammn_de: Done.12:55
nammn_demanadart: thankso!12:56
stickupkidmanadart, i'm in daily13:25
stickupkidwatching you eat13:26
=== narindergupta is now known as narinderguptamac
roadmrhey folks, how do I get a leaderless applicatin to get a leader?14:26
roadmr[INFO] canonical-livepatch does not have a leader <- appears repeatedly and juju status indeed shows no unit for that application with the nice asterisk *14:27
rick_hroadmr:  hmmm, can you check the logs to see if there's a reason no one's winning leadership?14:35
rick_hroadmr:  the cases we've seen with this in the past has been more leader stomping where it's taken more than the time allotted for the leader hook to run and someone else tries to get it and the first one gives up14:35
stickupkidzeestrat: 2.7.0 is released https://discourse.jujucharms.com/t/pylibjuju-2-7-0-release-notes/248914:47
stickupkidzeestrat, we had to bump the facades to 2.7.0, which was a bit of a pain, but now it's done the release went smoothly :D14:48
stickupkidlet us/me know if you hit any issues14:48
rick_hgnuoy:  ^14:48
rick_hthedac:  ^ /me can't recall who wanted it14:48
gnuoyrick_h, me ! it was me !14:48
zeestratstickupkid: ty very much! We'll let you know :)14:50
thedacrick_h: stickupkid: thanks, I'll spread that knowledge to my team.15:32
cmarsroadmr: wondering if https://bugs.launchpad.net/juju/+bug/1853055 is related to your issue15:40
mupBug #1853055: "ERROR could not determine leader" but `juju status`  says there is a leader <juju:New> <https://launchpad.net/bugs/1853055>15:40
cmarsalthough your status shows a leader?15:40
cmarsone thing i've had to do to work around that, is scrape the json status for the unit that shows "leader" and address that unit specifically15:41
cmarsmaybe see what juju status --format yaml (or json) shows15:41
roadmrcmars: no, we see no leader :(15:43
roadmrrick_h: 2020-01-07 15:28:36 INFO juju.worker.leadership tracker.go:217 canonical-livepatch leadership for canonical-livepatch/2 denied (no further explanation as to why)15:43
roadmrcmars: exactly what I said :)15:45
roadmron some units I see "2020-01-06 14:28:25 DEBUG juju.worker.leadership tracker.go:125 canonical-livepatch/1827 making initial claim for canonical-livepatch leadership" right before the above15:46
stickupkidhml, I've updated https://github.com/juju/juju/pull/1107215:48
hmlstickupkid: approved15:54
nammn_deachilleasa: quick cr? https://github.com/juju/charmrepo/pull/15816:02
achilleasanammn_de: looking16:02
nammn_derick_h: i have made the patch as small as possible. This now should mostly enhance bash completion ux https://github.com/juju/juju/pull/11032/16:35
rick_hnammn_de:  k, let's try it16:36
rick_hroadmr:  can you file a bug please?16:36
nammn_derick_h: thanks! something special needed to put that into  edge beside just merging it? I did  quite a lot of local testing and I don;t see any reason why this should make it worse... :D16:41
rick_hnammn_de:  no, just land it in develop and it'll go into edge when our builds get back happy16:41
rick_hnammn_de:  and just keep an eye out for any issues16:41
achilleasanammn_de: can I get a quick CR on https://github.com/juju/charm/pull/301?16:54
nammn_derick_h:  will do!16:59
achilleasanammn_de: or stickupkid can someone take a look at https://github.com/juju/juju/pull/11080?17:21
nammn_deachilleasa: approved17:36
stickupkidhml, there is fallout from my PR around handling outputs correctly, i.e. we put the empty value in stderr and not stdout, can you review my PR https://github.com/juju/cmd/pull/7517:51
stickupkidachilleasa, or can you look into it (see above)17:52
achilleasastickupkid: looking17:53
achilleasastickupkid: done17:56
verterokrick_h: Hi! regarding roadmr leaderless issue, we have https://paste.ubuntu.com/p/Cp8tbkgHkv/ in the logs18:09
verterokcmars: ^18:09
verterokeven after adding a new unit18:09
verterokthat logs is from a fresh unit18:10
rick_hverterok:  do you know if you're using legacy leases or raft leases?18:10
verterokrick_h: this is a IS managed controller, the main one...let me check18:10
rick_hverterok:  I'm not seeing "leadership for" in the current code so getting a bug with juju version, where it's at (prodstack I thought moved off leagacy leases but axino can correct me), and model version details would be good please18:11
rick_hverterok:  I'm a bit distracted atm but if we can get the details pulled together I can ask someone to investigate18:11
verterokrick_h: 2.6.10 in client and model18:12
rick_hverterok:  ok, good to know. on prodstack 4.5 you're saying?18:12
axinofeatures                 []18:13
axinono legacy leases !18:13
verterokaxino: thx18:13
rick_haxino:  ok cool ty for the confirmation there18:13
hmlstickupkid: back, reviewing now.19:22
hmlstickupkid: or not.  :-)19:22
verterokroadmr, cmars, rick_h: this looks pretty similar to what we are seeing: https://bugs.launchpad.net/juju/+bug/165627520:26
mupBug #1656275: unit leadership gets confused <leadership> <logging> <model-migration> <juju:Fix Released by thumper> <https://launchpad.net/bugs/1656275>20:26
verterokbut in a version that should have that fix...which points to a regression of some kind20:27
_thumper_verterok: I know that there was additional work later on that20:27
_thumper_but I would have thought that it would be in 2.6.1020:28
=== _thumper_ is now known as thumper
verterokthumper: right, we are seeing a similar symptom: leadership claim -> denied20:30
verterokonly difference I can guess is that in our case is a subordinate...in case that makes any difference20:30
thumperverterok: I am wondering about whether this could be scale and timeout related...20:30
roadmr🦎  <- lots of scales here20:32
thumperI just see 🦎20:32
thumperthat is a rectangle...20:32
roadmrthumper: haha you copy-pasted the rectangle and I see exactly what I posted :)20:33
verterokthumper: the model itself is nothing crazy, 18 machines with same number of canonical-livepatch subordinates20:33
thumperverterok: but this is on the shared ps4.5 controller?20:33
verterokthumper: yup20:34
thumperthat has scale20:34
roadmronly two things to mention: this env has been up for a loong time and it sees a lot of churn (applications being created/removed very frequently)20:34
roadmrsince we add a new application on every new code version rollout20:34
roadmrthat tends to make juju unhappy20:34
verterokthumper: anything we could do to recover? is there any way to force a leader?20:34
verterokthis is blocking rollouts to staging, as we use mojo and it doesn't like to have applications without a leader :)20:35
thumperverterok: it looks like raft thinkgs there is probably a leader, but mongo doesn't20:35
thumperverterok: we need to work out who raft thinks is the leader20:36
thumperyou can probably do that with juju run over the app with 'is-leader'20:36
thumperthen shut down that unit for a minute to force expiration20:36
verterokthumper: all return false20:36
roadmrverterok: did you file the bug already?20:37
roadmrverterok: please please make the bug title "ANARCHY!!!!!!"20:37
verterokthumper: https://paste.ubuntu.com/p/RwbDQFyYf3/20:37
thumperbabbageclunk: thoughts ? ^^^20:37
verterokroadmr: didn't file one as I found 165627520:37
roadmrahh... bummer :(20:38
thumperverterok: please file a new one20:38
verterokthumper: will do20:38
thumperverterok: it is probably a different issue20:38
roadmrit is anarchy!!! hahah20:40
verterokroadmr: can let you the honors20:40
* roadmr won't pass up the chance to file a silly bug20:41
roadmrverterok: hm - we tried restarting juju agents for that application but not the machine-X agents, think it might help?20:46
verterokroadmr: I can restart all agents20:46
babbageclunkthumper: missed this - reading back20:47
roadmrhttps://bugs.launchpad.net/juju/+bug/1858693 has a summary20:48
mupBug #1858693: ANARCHY!!!!!!! Entirely leaderless application spotted in the wild <juju:New> <https://launchpad.net/bugs/1858693>20:48
verterokroadmr: all machine agents restarted, no changes20:53
babbageclunkverterok: if you deploy a new application, does the new unit become the leader?20:54
verterokbabbageclunk: a new application or add-unit?20:55
babbageclunkverterok: trying to determine whether all of leadership is broken, or whether there's some kind of pinning happening for that application specifically20:56
babbageclunknew application (can be an existing charm but with a new name)20:56
verterokbabbageclunk: leadership for other applications seems to be fine20:56
babbageclunkdo you mean, you can see leaders for other applications? Or you've seen leadership change for other applications when the problem has been happening?20:57
babbageclunkbecause the former might not indicate it's fine20:57
babbageclunkit might just be stuck in a non-visible way20:57
verterokbabbageclunk: let me try deploying a trivial ubuntu unit, would that be enough?20:57
verterokrunning: juju deploy cs:ubuntu -n 220:58
verterokbabbageclunk: I get the leadership * as expected in juju status21:01
verterokbabbageclunk: and juju run --aplication ubuntu is-leader return True and False as expected21:01
babbageclunkverterok: ok - so it sounds like the canonical-livepatch application has leadership pinned?21:02
babbageclunkWas there an upgrade-series done at some point?21:02
verterokbabbageclunk: no upgrade-series AFAIK, the env is old but always in xenial21:03
babbageclunkverterok: oh, another useful thing might be to remove the leader unit of the new ubuntu and make sure that the other one eventually claims leadership21:04
babbageclunkhmm, that's the only thing that would do a pin that I know of21:04
verterokbabbageclunk: we do remove units and deploy new ones on each new code revision21:04
verterokbabbageclunk: which also means killing and spawning new suboardinates (which includes canonical-livepatch)21:05
roadmrverterok: 18 units in all, you said? each add/remove has a 20% or so chance of removing the current leader21:05
babbageclunkright - which would explain why the leader went away? But not why there's not a new one21:05
roadmr(we add 4 units, then remove the 4 old ones)21:05
verterokroadmr: right, 1821:05
babbageclunkIt might be that the raft time isn't being updated, so it's not expiring the old one.21:06
roadmrwell if elections happened every 3 weeks eventually you'd also want there to not be a leader, right?21:06
roadmr(j/k not helping haha)21:06
babbageclunkif you kill the leader of the new ubuntu, does leadership switch to the other unit?21:07
babbageclunkverterok: ^21:07
verterokgimme 2'...because I need to deploy them again (killed them too soon :P)21:08
babbageclunkha doh21:08
roadmrbabbageclunk: where can I check the controller logs? (I assume it's you who asked for those)21:08
babbageclunkyeah, that's me21:08
verterokroadmr: IS can21:08
roadmrah ok21:08
verterokbabbageclunk: how long could it take to elect a new leader?21:09
verterokok, after a brief moment of panic, it worked21:10
babbageclunkshould happen in a minute or less21:10
verterokbabbageclunk: /4 was leader, and after remove-unit the * is now in /321:11
babbageclunkit's a pain but I'm kind of tempted to suggest you remove and re-add canonical-livepatch.21:12
verterokbabbageclunk: that was our plan if we needed to unblock deploys21:13
roadmris that a non-suggestion suggestion? :)21:13
babbageclunkI mean, it's the tactical nuclear fallback suggestion ;)21:14
roadmrif there's nothing else to be gleaned from the current live environment, we might as well21:14
roadmrwe need to wait for controller logs anyway,if that's where more clues might be21:14
babbageclunkWell, I could definitely do with the logs21:15
babbageclunkas long as you're ok with waiting21:15
roadmrfiling the RT21:17
babbageclunkbut I'm not sure how it could get into this state without the application being pinned.21:18
roadmrbabbageclunk: do you have a sec to drop by #is in case Alexandre needs more specifics on what to dig for in the logs?21:21
roadmr(I can triangulate if needed but might be more efficient if you're there)21:21
verterokthanks for the help babbageclunk21:29
* verterok EODs21:29
kelvinliuhi cory_fu: the current 2.7 edge `2.7.1+2.7-ec91b32` should be working fine for CaaS cmr now.22:10
cory_fukelvinliu: Great, thanks22:10
cory_fuI'll give it a try22:10
kelvinliucory_fu: ty,22:10
wallyworldbabbageclunk: i just saw the bug - if you remove a unit that is leader, juju does not immediately elect a new leader, the current lease needs to time out. this is a known issue / works as designed.  there are bugs like bug 1469731 raised back in 1.23 days22:19
mupBug #1469731: Leadership must be dropped before removing lead unit <canonical-is> <charm> <charmers> <leadership> <teardown> <juju:Triaged> <postgresql (Juju Charms Collection):New> <https://launchpad.net/bugs/1469731>22:19
babbageclunkwallyworld: yeah, but in this case it's been ages22:20
wallyworldah, ok, longer than the lease timeout22:20
roadmr👴  ages :)22:22
babbageclunkwallyworld: yup as in hours22:22
babbageclunkwallyworld: but other applications seem to be ok - leadership changes fine22:22
wallyworldi guess we'll need the raft logs, engine reports etc22:23
babbageclunkyeah, although none of that really helps if leases seem to be changing ok for other applications22:26

