teward | bryceh: around? | 00:12 |
---|---|---|
teward | oh nvm i can't read | 00:13 |
teward | carry on | 00:13 |
=== xispita is now known as Guest5697 | ||
=== xispita_ is now known as xispita | ||
cpaelzer | good morning | 05:12 |
jamespage | coreycb: hey - would you have time to complete the submitter information for https://bugs.launchpad.net/ubuntu/+source/jaraco.context/+bug/1975600 | 06:22 |
ubottu | Launchpad bug 1975600 in jaraco.context (Ubuntu) "[MIR] jaraco.context" [Undecided, New] | 06:22 |
jamespage | then I can complete the MIR team review for you | 06:22 |
=== y0sh- is now known as y0sh_ | ||
=== xispita is now known as Guest7996 | ||
=== xispita_ is now known as xispita | ||
=== thegodsq- is now known as thegodsquirrel | ||
lvoytek | Good morning | 12:39 |
ahasenack | kanashiro: I started this discussion: https://lists.clusterlabs.org/pipermail/users/2022-May/030296.html | 13:03 |
ahasenack | there is a node remove command that works, but I'm kind of leaning towards a full cluster removal when making changes. Depending on what you change, you may get away with it, or you will get phantom data | 13:04 |
ahasenack | `pcs cluster destroy` does a lot of things, it goes over /var/lib/pcsd, /var/lib/pacemaker and removes many files | 13:05 |
kanashiro | ahasenack, maybe the charms would be better using the high-level cluster management tools like pcs and crmsh instead of doing all of this manually (?) | 14:20 |
ahasenack | maybe | 14:20 |
ahasenack | but a bigger change | 14:20 |
ahasenack | I think the key thing is changing nodeid, not just the name | 14:20 |
ahasenack | if you keep the nodeid the same, and then change the name, all is fine (testing that now) | 14:20 |
kanashiro | right, it makes sense, but from one of the answers in the thread I think if you restart first corosync and then pacemaker all should be fine | 14:22 |
kanashiro | did you test that? | 14:22 |
ahasenack | the phantom node is all about pacemaker, yes | 14:22 |
ahasenack | I've been doing "systemctl restart pacemaker corosync", unsure if the order in that command line affects things | 14:23 |
ahasenack | but after the package is installed, both are running, nothing that can be done about that (easily, other than policy-rc.d) | 14:23 |
ahasenack | so the "contamination" with node1 happens right after install | 14:24 |
kanashiro | so a possible minimum change to fix this would be to create a dependency between the pacemaker and corosync systemd services(?) | 14:24 |
ahasenack | I have vague recollections of nish doing that in the past, and suffering a lot | 14:24 |
ahasenack | it involved creating a file in one maintainerscript and checking for that file in another maintainer script | 14:25 |
ahasenack | inter-package RPC :) | 14:25 |
kanashiro | if we think this is too much we can at least document this in the server guide, so once we see this happening we can point users to it | 14:26 |
ahasenack | even in the case where you keep the nodeid the same, and crm status is clean, the "node1" node is still referenced in old cib files | 14:26 |
ahasenack | which seems right, if I understand it correctly | 14:26 |
ahasenack | what I don't get yet is, let's say I deploy 3 nodes | 14:27 |
ahasenack | all 3 get node1, nodeid=1 (default pkg install) | 14:27 |
ahasenack | then in node1 I change name to be hostname, keep nodeid=1, adjust ring0_addr | 14:27 |
ahasenack | and add the other 2 nodes to the config, with ids 2 and 3 | 14:27 |
ahasenack | and send the config to them via scp, and restart everything | 14:27 |
ahasenack | I don't get why changing nodeid from 1 to 2 and 3 in the *other* nodes doesn't introduce the same problem | 14:28 |
ahasenack | maybe because nodeid 1 is still around, it just has another name, and is no longer myself | 14:28 |
ahasenack | I go from node1/id1, node1/id1, node1/id1 to f1/id1, f2/id2, f3/id3 | 14:29 |
ahasenack | (fN being the new names) | 14:29 |
kanashiro | I *think* that in this case the cluster has quorum and they vote to make sure that node does not exist. In a single-node cluster I am not sure when to consider it quorate | 14:30 |
ahasenack | if I change node1/id1 to f1/id101, then node1 is still in the list, but offline, even with the 3 nodes | 14:30 |
ahasenack | f1/id101 does not replace node1/id1 | 14:30 |
ahasenack | and in reality, id1 really disappeared from the cluster in that case, no other node assumed id1 | 14:31 |
ahasenack | hence it shows offline | 14:31 |
ahasenack | by "disappeared" I mean there is no host anymore responding to pings on id1 | 14:31 |
ahasenack | ok, I may be starting to get this | 14:31 |
ahasenack | the charm does change the node ids too | 14:31 |
ahasenack | from 1 to 1001 or something like that | 14:32 |
ahasenack | 2 to 1002, and so on | 14:32 |
ahasenack | the approach they took to fix it might be the simplest one after all. Pre-seed a config file | 14:32 |
ahasenack | it's like one of the responses in the thread, don't start pacemaker until the config is final | 14:32 |
ahasenack | achieves the same | 14:32 |
kanashiro | I think that's the main takeaway here: do not restart pacemaker once everything in corosync is set | 14:35 |
ahasenack | each cib-N.raw file in /var/lib/pacemaker/cib/ is like a state, right. I can diff between them to see what changed | 14:45 |
ahasenack | there is probably a corosync/pacemaker (or cmrsh/pcs?) command to show that, I've sees some "diff" commands in some help output | 14:45 |
ahasenack | messing with these attributes in a live cluster is dangerous | 14:55 |
ahasenack | May 25 14:54:59 f3 pacemaker-controld[6239]: warning: Node 'node1' and 'f1' share the same cluster nodeid: 1 f1 | 14:56 |
ahasenack | May 25 14:54:59 f3 pacemaker-controld[6239]: error: crm_find_peer: Forked child 6391 to record non-fatal assert at membership.c:590 : member weirdness | 14:56 |
sergiodj | kanashiro: hey, is https://bugs.launchpad.net/ubuntu/+source/openvpn/+bug/1975574 the bug you mentioned you were going to take a look during our housekeeping call today? | 19:43 |
ubottu | Launchpad bug 1975574 in openvpn (Ubuntu Kinetic) "OpenSSL 3.0 support in OpenVPN 2.5" [High, Confirmed] | 19:43 |
ahasenack | sounds like it | 19:43 |
sergiodj | I will mark it as server-todo and bump its priority to high, just in case | 19:44 |
sergiodj | ah, sorry | 19:44 |
sergiodj | Lucas already did that, but I had opened the URL before his update | 19:44 |
sergiodj | kanashiro: nevermind :) | 19:44 |
kanashiro | :) | 19:57 |
giu-- | hi to all | 21:55 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!