[02:34] wallyworld: https://github.com/juju/systems/pull/1 please take a look [02:34] sure [02:40] wallyworld: thanks, if we are happy how this works I can move forward [02:58] hpidcock: did you copy the channel code from snapd? [03:01] wallyworld: mostly excluding architecture stuff [03:07] hpidcock: +1 but with fixes [03:07] architeture stuff included [03:07] plus system string parsing [03:28] wallyworld: thanks for thorough review [03:28] np [04:00] wallyworld or kelvinliu: please review this PR https://github.com/juju/charm/pull/321 [04:00] otp with tom, soon [04:01] wallyworld: no rush [04:03] looking [04:25] hpidcock: lgtm ty [04:26] kelvinliu: much appreciated. wallyworld can you take a quick look too [04:33] ok [04:37] hpidcock: since we've gone to v8, we can drop dependncies.tsv IMO [04:37] sounds good [04:38] wallyworld: doneski [04:49] hpidcock: mainly missing tests [04:49] since it is a library, we should be careful to have coverage === salmankhan1 is now known as salmankhan === salmankhan1 is now known as salmankhan [19:31] with the mysql-innodb-cluster is there way to get logged in to the mysql instance? [19:31] with the regular mysql charm you could do something like mysql -u root -p`sudo cat /var/lib/mysql/mysql.passwd` [20:35] also is it expected that if you remove mysql-innodb-cluster unit that you will get MySQL InnoDB Cluster not healthy: None [20:35] when you add a new unit but it receives the same ip address as the unit that was previously removed [20:36] it is also know that if you accidentally juju add-unit mysql-innodb-cluster --to 1 [20:37] and then do a juju remove-unit mysql-innodb-cluster/1 [20:37] that it will not uninstall mysql or remove that node from the mysql cluster [20:38] I guess, I'm wondering if these are known/expected behaviors or should be filed as bugs against the charm [21:28] tychicus: those do sound like bugs. Please file them! [21:29] petevg[m]: thanks will do [21:42] tychicus, innodb charm does expect 3 units, so the 'not healthy' msg seems expected [21:43] it happen when 1 unit has been removed and you attempt to add a new unit back [21:44] if you deploy to say a lxd container and that lxd container receives an ip address that was previously a member of the cluster it will never join the cluster [21:44] in this case MaaS is handing out the IP addresses [21:45] if you add another unit the next unit with an IP address that has never been a member of the cluster will join just fine [21:45] i'm curious as to why you removed a unit to begin with [21:46] also, there are some actions for removing/adding cluster members [21:46] https://opendev.org/openstack/charm-mysql-innodb-cluster/src/branch/master/src/actions.yaml [21:46] all of the instances were taken down [21:47] then came back up, but did not seem to recluster once they were all back online [21:49] so I ran juju run-action mysql-innodb-cluster/leader reboot-cluster-from-complete-outage [21:49] this brought up 2 of the 3 nodes [21:50] I attempted to do a rejoin-instance address=10.x.x.x [21:52] GTID set check of the MySQL instance at '10.x.x.x:3306' determined that it contains transactions that do not originate from the cluster, which must be discarded before it can join the cluster [21:52] so then I did a remove-instance address=10.x.x.x [21:53] which told me that force was null and expected a bool [21:53] so I retried with force [21:54] then attempted both rejoin-instance and add-instance [21:54] but neither of those worked [21:54] so I figured that I would remove the unit and add a new one [21:56] but I accidentally added it as a container to a host that already had a container of mysql-innodb-cluster running, not wanting to have 2 of my 3 mysql instances on the same physical host, I removed that unit [21:56] tychicus: Hi, trying to get all that went down here. When a new node has an old IP address it is a bit complicated. [21:57] The action cluster-status is your friend here. That will tell you what the cluster *thinks* is in metadata. [21:57] yes, [21:57] In normal operations you would remove a unit; run the remove-instance action with the old address; then add new unit [21:58] yes, that is what I did [21:58] but when I added a new unit, MaaS gave me the same IP address as the old unit [21:59] What I would have been interested in is what cluster status showed before you added the new unit. As well as what it it says now? [22:00] i'll see if I can get that into a pastebin for you [22:00] do you just want cluster status, or also cluster rescan [22:01] cluster status with some idea about which IP addresses are the nodes that have been in the cluster the whole time [22:03] To answer your original question you can get the passwords to access the local mysql and the cluster (clusteruser) from leader get: `juju run --unit mysql-innodb-cluster/leader leader-get` [22:04] oh that is fantastic [22:05] I'll include some of that output as well since it will have more IP address "history" [22:05] Just make sure to sanitize the output for public consumption [22:11] https://pastebin.ubuntu.com/p/tkFyCDDfvk/ [22:11] Ok, give me a sec to try and understand this. [22:12] :) [22:12] thanks [22:20] tychicus: OK, I think we can handle this. But I do want to make sure you have taken a backup. Anytime we are hitting the remove-instance button there is the possibility of human error. [22:20] You have a backup, right? :) And please double triple check what I type. [22:20] First we will remove the 33.1 node. It is actually good that it is still running: `juju run-action --wait mysql-innodb-cluster/leader remove-instance address=10.100.33.1` [22:20] In theory we don't need `force` because the node is currently running. [22:20] Once that is complete do another cluster-status to confirm 33.1 is NOT in the cluster; at which point you can stop mysql on the physical host and manually uninstall. [22:23] I do have a backup but the last time I tried to restore from backup I ran into some other GTID issue when trying to restore, but that's another story [22:23] https://pastebin.ubuntu.com/p/Sgx2Y3XVd4/ [22:24] OK, the required force may be another bug we can file. Let's try with force=True [22:24] should I try first with force=False [22:25] Oh, good point. Yes. Let's try that first [22:25] so it says outcome success, status completed [22:26] so now I have 3 nodes in cluster status [22:26] and no 33.1 node? [22:26] correct [22:27] You are safe to stop mysql on the physical node and use apt to remove the mysql packages and lastly you can remove the /var/lib/mysql dir if you want. [22:27] Two potential bugs we can file here. I agree "MySQL InnoDB Cluster not healthy: None" isn't helpful it should display the statusText: "Cluster is ONLINE and can tolerate up to ONE failure." The other is why the original reboot-cluster-from-complete-outage failed. Which may be harder to ascertain at this point. [22:28] ok I have the answer to why it failed [22:28] GTID conflict [22:29] for some reason the unit that had previously been the RW was no longer the RW [22:30] I don't think that I have those logs any more, but I do have the output of when I attempted to get it to rejoin the clsuter [22:30] Ok, I had thought (naively) that msyql was smart enough to use the node with the newest data. But that is more to do with mysql itself than our charms. Still it might be worth documenting what happened so we can keep an eye on it. [22:30] I don't think that I have those logs any more, but I do have the output of when I attempted to get it to rejoin the cluster [22:30] OK [22:31] https://pastebin.ubuntu.com/p/jn92h2xQgk/ [22:32] Thanks [22:33] I may be wrong here, but I don't think that the add-instance option support recoverymethod=clone [22:34] one thought that I did have was to dump the database from the new RW and restore to the instance that would not rejoin [22:34] but alas I didn't try that [22:36] I'll see if we can use the recovermethod=clone option in add-instance or even rejoin-instance. I'll have to dig into the mysql docs. [22:44] thanks again for your help