=== janitha78 is now known as janitha7 [12:39] Hello! I'm using cloud-init 22.3.4 and am encountering a problem where it remains in "status: running" forever (hours) with no indication in the logs about where it is stuck or why; "cloud-init analyze show" indicates that init-local and init-network have finished the states successfully; moreover, if I run "cloud-init clean" followed by [12:39] "cloud-init --debug init" and "cloud-init --debug modules", everything completes successfully. What else can I do to determine why/where it is hanging forever? [13:39] hi andrew49 [13:41] andrew49: do a full clean of logs and state. enable logging properly: basically, just make sure this file exists: https://github.com/canonical/cloud-init/blob/main/config/cloud.cfg.d/05_logging.cfg looks like this, and is named .cfg and so will be included [13:41] then do a reboot and see what the logs say [13:49] andrew49: actually, don't do that yet (unless you see no logs at all). You said 'cloud-init analyze show' shows init-local and init-network. In /var/log/cloud-init.log, is there a "running 'modules:config'" or "running 'modules:final'" message in the logs? [13:49] if not there's likely something else that was blocking cloud-init from running its final stages [13:50] does "systemctl --failed" or "systemd-analyze critical-chain" show anything unusual? [13:57] falcojr I do not see either of those entries in /var/log/cloud-init.log; "systemctl --failed" shows systemd-remount-fs.service as "failed" - maybe that is the problem? [13:58] it looks like the error in the unit is "mount: /: can't find LABEL=cloudimg-rootfs." [14:03] andrew49: I'd be surprised if that was the problem (unless literally mount '/' failed...which...would give you bigger problems I think :P ), but cloud-init actually runs 4 separate times on boot, so once the first service starts, status will say 'running' until the final service has completed. cloud-config.service has some dependencies during boot, so if those don't complete or are taking forever for some reason, it will look like cloud-init is... [14:03] ... never completing [14:03] does it seem likely that the fact that this systemd-remount-fs.service is "failed" is the thing holding it up then? [14:04] "systemd-analyze critical-chain" reports "Bootup is not yet finished" so not a lot of extra info there [14:06] andrew49: try 'systemctl list-jobs' ? [14:06] anything 'waiting' there? [14:08] yes a number of things - snapd.autoimport.service, cloud-init.target, cloud-config.service, snapd.seeded.service, cloud-final.service, multi-user.target, ubuntu-advantage.service, graphical.target, and systemd-update-utmp-runlevel.service [14:09] question. when using nocloud-net datasource. is it possible for cloud-init to supply the syserial/mbserial to the datasource http server? [14:10] eg something like `ds=nocloud-net;s=http://10.10.0.1:8000/$sysserial/` [14:11] andrew49: cloud-config.service relies on snapd completing. I would check the snap logs to see if anything is hanging there [14:14] itjamie-temp: I think that could work, but then cloud-init will try contacting http://10.10.0.1:8000/$sysserial/user-data for user data and similarly for vendor-data and meta-data [14:14] falcojr I see several errors, e.g. [14:14] "[change 142 "Setup snap \"snapd\" (17029) security profiles" task] failed: cannot reload udev rules: exit status 1" [14:14] and [14:14] "error trying to compare the snap system key: system-key missing on disk" [14:15] andrew49: Unfortunately, I don't know enough about snap to meaningfully help debug that [14:15] falcojr okay thanks, I have enough to go on for now so I'll work on figuring it out and report back what I find [14:16] @falco [14:16] falcojr is there a list of variables cloud-init has available? eg i just guessed $sysserial [14:22] falcojr I think the issue I'm hitting is https://bugs.launchpad.net/snapd/+bug/1712808; this container does have 'security.privileged: "true"' as described there [14:22] -ubottu:#cloud-init- Launchpad bug 1712808 in snapd "udev interface fails in privileged containers" [Medium, Confirmed] [14:28] itjamie-temp: oh sorry, I think I misunderstood your initial question. I don't think there's any variable substitution that can be applied there [14:28] damn. that would have been really useful... [14:29] falcojr: so, i just realized something, and I don't know why it took me almost a week: I can mock the ifconfig -a output as '' in cases where it has no bearing on the outcome. [14:29] meena: yeah, were you trying to feed it realistic results before? [14:30] falcojr: pretty much everywhere where it was failing [14:30] doh... [14:30] falcojr: i just did my thing of readResource('assets/netinfo/freebsd-ifconfig-output') [14:30] but that's a lot of work, for: we're just checking for … something completely different. [14:31] but, tbf, i was searching for generic solutions [14:32] where would i open a feature request for cloud-init ? [14:33] itjamie-temp: link is in the /topic [14:34] well, it says bugs, but, still [14:34] ok so its fine to open a bug for an FR? [14:34] itjamie-temp: yes [14:34] I open at least one per week :P [14:38] https://bugs.launchpad.net/cloud-init/+bug/1994980 well fingers crossed. [14:38] -ubottu:#cloud-init- Launchpad bug 1994980 in cloud-init "FR for variable substitution in nocloud-net urls (eg system serial number)" [Undecided, New] [14:58] sooooo close [15:49] falcojr: all bugs, no, all tests fixed [16:25] jrm: i think our package should patch whatever is causing cloud-init to print its version to… our version (in the net/cloud-init-devel package) [16:35] falcojr following up on my issue, it looks like as noted in https://discuss.linuxcontainers.org/t/snap-inside-privileged-lxd-container/13691 that adding "security.nesting: true" in addition to "security.privileged: true" (the latter being what I really want) is sufficient to avoid this problem and allow snapd to successfully install (and thus [16:35] cloud-init to finish); thanks again for the help! [16:48] ah, great. Glad you found it [20:53] falcojr: thanks for the review. Will adapt how flags are parsed, and actually add OpenBSD and NetBSD outputs to our assets for testing [20:54] tomorrow. [20:55] Sounds good [22:47] Re https://bugs.launchpad.net/bugs/1994980 if i take a stab at a pr is there any guidance you'd like to give on what vars should be available for substitution? [22:47] -ubottu:#cloud-init- Launchpad bug 1994980 in cloud-init "FR for variable substitution in nocloud-net urls (eg system serial number)" [Wishlist, Triaged]