[00:22] hpidcock: a small PR for the security review, no rush https://github.com/juju/juju/pull/11981 [00:23] wallyworld: ok looking now [00:23] no rush == rush [00:23] fake news [00:36] wallyworld: added a comment, but overall looks ok [00:36] ty [06:18] kelvinliu: not sure if you'll have any time, here's an azure pr for initial spaces support https://github.com/juju/juju/pull/11983 [06:19] looking [06:21] kelvinliu: fyi, the core logic to use spaces info to get subnets is copied from the openstack provider. i'll get joe to look also next week when he's back. in the meantime we can test on site [06:22] ok [06:34] wallyworld: lgtm, ty [06:34] awesome ty [16:17] What is the best way to troubleshoot "Incomplete relation: vault" [16:17] The OSD log shows DEBUG juju-log secrets-storage:78: Deferring OSD preparation as vault not ready [16:18] I've trying removing and readding the relation [16:18] juju remove-relation ceph-osd:secrets-storage vault:secrets [16:18] everything else can communicate with vault correctly [16:20] vault status shows: Unit is ready (active: true, mlock: disabled) [16:27] tychicus: Hi. Can we first try the action refresh-secrets. `juju run-action --wait vault/0 refresh-secrets` [16:28] done, that triggered "Deferring OSD preparation as vault not ready" on the OSD's [16:30] Ok, let me check on what the code is looking for. That might give us a hint. One sec [16:35] I believe it is checking vault_kv here https://opendev.org/openstack/charm-ceph-osd/src/branch/master/hooks/ceph_hooks.py#L494 [16:36] Right, which is checking the relation data for secrets-storage here: https://github.com/juju/charm-helpers/blob/master/charmhelpers/contrib/openstack/vaultlocker.py#L39 [16:39] So our next step is to see what is set on the relation and see if anything is missing. [16:39] `juju run --unit ceph-osd/0 -- "relation-ids secrets-storage"` [16:39] `juju run --unit ceph-osd/0 -- "relation-get -r secrets-storage: - vault/0"` [16:39] Note: You may need to change the unit numbers from above and please sanitize the output to your satisfaction before pasting. [16:41] ceph-osd/0_role_id: '"e8e9709d-3686-cd80-a3b7-f3fec2517a8e"' [16:41] ceph-osd/0_token: '"s.6oyksjfe9pE55nQEODREBrIx"' [16:41] ceph-osd/1_role_id: '"9804cb6a-1cb6-99a5-f7e1-ac19f5fe0273"' [16:41] ceph-osd/1_token: '"s.COcndyxMYZ0b8o7XZKeTYQzG"' [16:41] ceph-osd/2_role_id: '"8ad4d5c7-2e92-2a2a-1aef-f298187b162b"' [16:41] ceph-osd/2_token: '"s.lyIyWDVnkh72WnmhUPA8gB8k"' [16:41] ceph-osd/3_role_id: '"9c641dc6-0da9-3d20-efc7-647ed05796d5"' [16:41] ceph-osd/3_token: '"s.sOyYRReEL6CgSKXDnWzwr0gn"' [16:41] ceph-osd/4_role_id: '"a79dacf5-d21c-882d-8362-ac280bd7e42d"' [16:41] ceph-osd/4_token: '"s.aZzNwScAezAeBsXMrcvE6gM3"' [16:41] ceph-osd/5_role_id: '"d1596430-0a95-d317-464d-c7708560ca84"' [16:41] ceph-osd/5_token: '"s.jCuF9G1niDiBQcxBBPauxOIo"' [16:41] egress-subnets: 10.100.113.0/32 [16:41] ingress-address: 10.100.113.0 [16:41] private-address: 10.100.113.0 [16:41] Ok, one sec [16:43] At first glance we seem to be missing the vault_url setting. Unless that was a paste error. [16:44] the vault_url is missing [16:44] Ok, that is our culprit. Let me look at the vault side and see how that could happen. [16:45] if I run the same command for nova-compute/0 it does return the vault_url [16:52] Hey, got a question re: juju 2.6.x upgrading (https://discourse.juju.is/t/pre-juju-2-6-5-upgrade-steps-for-large-log-collections/1633) [16:52] There's a sentence which says "Please stop the controllers when running this script." [16:53] What precisely does this mean? Does this mean all systemd juju-related services except juju-db? (I presume juju-db would need to be left running since that's where mongo is running from...?) [16:53] or am I wrong thee? [16:54] I don't want to assume and make a mistake; sounds like the consequences for such are less than fun. [16:54] tychicus: OK, I am guessing this is our problem: https://github.com/openstack/charm-vault/blob/master/src/reactive/vault_handlers.py#L536 and https://github.com/openstack-charmers/charm-interface-vault-kv/blob/master/provides.py#L41 If vault and ceph-osd do not have a common spaces binding for secrets-storage it never publishes the vault_url. [16:55] tychicus: My team desperately needs to make that clearer. Sorry to make you jump through hoops. But I am pretty confident that is the problem. [17:00] vultaire, it would be stop the juju-machine-X services for the controller machines. [17:00] so my current settings are vault: endpoint-bindings: secrets: os-internal-api [17:00] so my current settings are vault: endpoint-bindings: secrets: os-public-api [17:00] @jam: thanks for the clarification, perfect. [17:01] sorry first entry was in error [17:01] vultaire, note that stopping the controllers effectively stops everything else, since they can't talk to the controller, but you don't have to explicitly stop them. [17:02] and ceph-osd: endpoint-bindings: secrets-storage: os-internal-api [17:02] @vultaire, also note pjdc's comment that you can also just drop the collection entirely which is likely to be faster. [17:02] Juju will recreate it during controller startup [17:02] updating vault to vault: endpoint-bindings: secrets: os-internal-api [17:02] tychicus: That would do it. Those need to match. [17:02] should resolve the issue [17:02] ack - I'll simply stop the jujud-machine units. Also, the script was updated to include pjdc's suggestion [17:02] so no worries there [17:02] ah, good. [17:05] thedac: after updating the binding, what would need to happen to update the vault_url? [17:06] or can the vault url not be changed once it is set? [17:08] If the bindings are updated, a refresh-secrets action run should fix things. Juju just recently added the feature to update a binding. So depending on the version a re-deploy on the ceph side may be necessary. [17:09] thanks! [17:10] No problem