/srv/irclogs.ubuntu.com/2022/05/25/#ubuntu-server.txt

tewardbryceh: around?00:12
tewardoh nvm i can't read00:13
tewardcarry on00:13
=== xispita is now known as Guest5697
=== xispita_ is now known as xispita
cpaelzergood morning05:12
jamespagecoreycb: hey - would you have time to complete the submitter information for https://bugs.launchpad.net/ubuntu/+source/jaraco.context/+bug/197560006:22
ubottuLaunchpad bug 1975600 in jaraco.context (Ubuntu) "[MIR] jaraco.context" [Undecided, New]06:22
jamespagethen I can complete the MIR team review for you06:22
=== y0sh- is now known as y0sh_
=== xispita is now known as Guest7996
=== xispita_ is now known as xispita
=== thegodsq- is now known as thegodsquirrel
lvoytekGood morning12:39
ahasenackkanashiro: I started this discussion: https://lists.clusterlabs.org/pipermail/users/2022-May/030296.html13:03
ahasenackthere is a node remove command that works, but I'm kind of leaning towards a full cluster removal when making changes. Depending on what you change, you may get away with it, or you will get phantom data13:04
ahasenack`pcs cluster destroy` does a lot of things, it goes over /var/lib/pcsd, /var/lib/pacemaker and removes many files13:05
kanashiroahasenack, maybe the charms would be better using the high-level cluster management tools like pcs and crmsh instead of doing all of this manually (?)14:20
ahasenackmaybe14:20
ahasenackbut a bigger change14:20
ahasenackI think the key thing is changing nodeid, not just the name14:20
ahasenackif you keep the nodeid the same, and then change the name, all is fine (testing that now)14:20
kanashiroright, it makes sense, but from one of the answers in the thread I think if you restart first corosync and then pacemaker all should be fine14:22
kanashirodid you test that?14:22
ahasenackthe phantom node is all about pacemaker, yes14:22
ahasenackI've been doing "systemctl restart pacemaker corosync", unsure if the order in that command line affects things14:23
ahasenackbut after the package is installed, both are running, nothing that can be done about that (easily, other than policy-rc.d)14:23
ahasenackso the "contamination" with node1 happens right after install14:24
kanashiroso a possible minimum change to fix this would be to create a dependency between the pacemaker and corosync systemd services(?)14:24
ahasenackI have vague recollections of nish doing that in the past, and suffering a lot14:24
ahasenackit involved creating a file in one maintainerscript and checking for that file in another maintainer script14:25
ahasenackinter-package RPC :)14:25
kanashiroif we think this is too much we can at least document this in the server guide, so once we see this happening we can point users to it14:26
ahasenackeven in the case where you keep the nodeid the same, and crm status is clean, the "node1" node is still referenced in old cib files14:26
ahasenackwhich seems right, if I understand it correctly14:26
ahasenackwhat I don't get yet is, let's say I deploy 3 nodes14:27
ahasenackall 3 get node1, nodeid=1 (default pkg install)14:27
ahasenackthen in node1 I change name to be hostname, keep nodeid=1, adjust ring0_addr14:27
ahasenackand add the other 2 nodes to the config, with ids 2 and 314:27
ahasenackand send the config to them via scp, and restart everything14:27
ahasenackI don't get why changing nodeid from 1 to 2 and 3 in the *other* nodes doesn't introduce the same problem14:28
ahasenackmaybe because nodeid 1 is still around, it just has another name, and is no longer myself14:28
ahasenackI go from node1/id1, node1/id1, node1/id1 to f1/id1, f2/id2, f3/id314:29
ahasenack(fN being the new names)14:29
kanashiroI *think* that in this case the cluster has quorum and they vote to make sure that node does not exist. In a single-node cluster I am not sure when to consider it quorate14:30
ahasenackif I change node1/id1 to f1/id101, then node1 is still in the list, but offline, even with the 3 nodes14:30
ahasenackf1/id101 does not replace node1/id114:30
ahasenackand in reality, id1 really disappeared from the cluster in that case, no other node assumed id114:31
ahasenackhence it shows offline14:31
ahasenackby "disappeared" I mean there is no host anymore responding to pings on id114:31
ahasenackok, I may be starting to get this14:31
ahasenackthe charm does change the node ids too14:31
ahasenackfrom 1 to 1001 or something like that14:32
ahasenack2 to 1002, and so on14:32
ahasenackthe approach they took to fix it might be the simplest one after all. Pre-seed a config file14:32
ahasenackit's like one of the responses in the thread, don't start pacemaker until the config is final14:32
ahasenackachieves the same14:32
kanashiroI think that's the main takeaway here: do not restart pacemaker once everything in corosync is set14:35
ahasenackeach cib-N.raw file in /var/lib/pacemaker/cib/ is like a state, right. I can diff between them to see what changed14:45
ahasenackthere is probably a corosync/pacemaker (or cmrsh/pcs?) command to show that, I've sees some "diff" commands in some help output14:45
ahasenackmessing with these attributes in a live cluster is dangerous14:55
ahasenackMay 25 14:54:59 f3 pacemaker-controld[6239]:  warning: Node 'node1' and 'f1' share the same cluster nodeid: 1 f114:56
ahasenackMay 25 14:54:59 f3 pacemaker-controld[6239]:  error: crm_find_peer: Forked child 6391 to record non-fatal assert at membership.c:590 : member weirdness14:56
sergiodjkanashiro: hey, is https://bugs.launchpad.net/ubuntu/+source/openvpn/+bug/1975574 the bug you mentioned you were going to take a look during our housekeeping call today?19:43
ubottuLaunchpad bug 1975574 in openvpn (Ubuntu Kinetic) "OpenSSL 3.0 support in OpenVPN 2.5" [High, Confirmed]19:43
ahasenacksounds like it19:43
sergiodjI will mark it as server-todo and bump its priority to high, just in case19:44
sergiodjah, sorry19:44
sergiodjLucas already did that, but I had opened the URL before his update19:44
sergiodjkanashiro: nevermind :)19:44
kanashiro:)19:57
giu--hi to all21:55

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!