[12:39] <andrew49> Hello! I'm using cloud-init 22.3.4 and am encountering a problem where it remains in "status: running" forever (hours) with no indication in the logs about where it is stuck or why; "cloud-init analyze show" indicates that init-local and init-network have finished the states successfully; moreover, if I run "cloud-init clean" followed by
[12:39] <andrew49> "cloud-init --debug init" and "cloud-init --debug modules", everything completes successfully. What else can I do to determine why/where it is hanging forever?
[13:39] <meena> hi andrew49 
[13:41] <meena> andrew49: do a full clean of logs and state. enable logging properly: basically, just make sure this file exists: https://github.com/canonical/cloud-init/blob/main/config/cloud.cfg.d/05_logging.cfg looks like this, and is named .cfg and so will be included
[13:41] <meena> then do a reboot and see what the logs say
[13:49] <falcojr> andrew49: actually, don't do that yet (unless you see no logs at all). You said 'cloud-init analyze show' shows init-local and init-network. In /var/log/cloud-init.log, is there a "running 'modules:config'" or "running 'modules:final'" message in the logs?
[13:49] <falcojr> if not there's likely something else that was blocking cloud-init from running its final stages
[13:50] <falcojr> does "systemctl --failed" or "systemd-analyze critical-chain" show anything unusual?
[13:57] <andrew49> falcojr I do not see either of those entries in /var/log/cloud-init.log; "systemctl --failed" shows systemd-remount-fs.service as "failed" - maybe that is the problem?
[13:58] <andrew49> it looks like the error in the unit is "mount: /: can't find LABEL=cloudimg-rootfs."
[14:03] <falcojr> andrew49: I'd be surprised if that was the problem (unless literally mount '/' failed...which...would give you bigger problems I think :P ), but cloud-init actually runs 4 separate times on boot, so once the first service starts, status will say 'running' until the final service has completed. cloud-config.service has some dependencies during boot, so if those don't complete or are taking forever for some reason, it will look like cloud-init is...
[14:03] <falcojr> ... never completing
[14:03] <andrew49> does it seem likely that the fact that this systemd-remount-fs.service is "failed" is the thing holding it up then?
[14:04] <andrew49> "systemd-analyze critical-chain" reports "Bootup is not yet finished" so not a lot of extra info there
[14:06] <falcojr> andrew49: try 'systemctl list-jobs' ?
[14:06] <falcojr> anything 'waiting' there?
[14:08] <andrew49> yes a number of things - snapd.autoimport.service, cloud-init.target, cloud-config.service, snapd.seeded.service, cloud-final.service, multi-user.target, ubuntu-advantage.service, graphical.target, and systemd-update-utmp-runlevel.service
[14:09] <itjamie-temp> question. when using nocloud-net datasource. is it possible for cloud-init to supply the syserial/mbserial to the datasource http server?
[14:10] <itjamie-temp> eg something like `ds=nocloud-net;s=http://10.10.0.1:8000/$sysserial/`
[14:11] <falcojr> andrew49: cloud-config.service relies on snapd completing. I would check the snap logs to see if anything is hanging there
[14:14] <falcojr> itjamie-temp: I think that could work, but then cloud-init will try contacting http://10.10.0.1:8000/$sysserial/user-data for user data and similarly for vendor-data and meta-data
[14:14] <andrew49> falcojr I see several errors, e.g.
[14:14] <andrew49> "[change 142 "Setup snap \"snapd\" (17029) security profiles" task] failed: cannot reload udev rules: exit status 1"
[14:14] <andrew49> and
[14:14] <andrew49> "error trying to compare the snap system key: system-key missing on disk"
[14:15] <falcojr> andrew49: Unfortunately, I don't know enough about snap to meaningfully help debug that
[14:15] <andrew49> falcojr okay thanks, I have enough to go on for now so I'll work on figuring it out and report back what I find
[14:16] <itjamie-temp> @falco
[14:16] <itjamie-temp> falcojr is there a list of variables cloud-init has available? eg i just guessed $sysserial
[14:22] <andrew49> falcojr I think the issue I'm hitting is https://bugs.launchpad.net/snapd/+bug/1712808; this container does have 'security.privileged: "true"' as described there
[14:22] -ubottu:#cloud-init- Launchpad bug 1712808 in snapd "udev interface fails in privileged containers" [Medium, Confirmed]
[14:28] <falcojr> itjamie-temp: oh sorry, I think I misunderstood your initial question. I don't think there's any variable substitution that can be applied there
[14:28] <itjamie-temp> damn. that would have been really useful...
[14:29] <meena> falcojr: so, i just realized something, and I don't know why it took me almost a week: I can mock the ifconfig -a output as '' in cases where it has no bearing on the outcome.
[14:29] <falcojr> meena: yeah, were you trying to feed it realistic results before?
[14:30] <meena> falcojr: pretty much everywhere where it was failing
[14:30] <falcojr> doh...
[14:30] <meena> falcojr: i just did my thing of readResource('assets/netinfo/freebsd-ifconfig-output')
[14:30] <meena> but that's a lot of work, for: we're just checking for … something completely different.
[14:31] <meena> but, tbf, i was searching for generic solutions
[14:32] <itjamie-temp> where would i open a feature request for cloud-init ?
[14:33] <meena> itjamie-temp: link is in the /topic
[14:34] <meena> well, it says bugs, but, still
[14:34] <itjamie-temp> ok so its fine to open a bug for an FR?
[14:34] <meena> itjamie-temp: yes
[14:34] <meena> I open at least one per week :P
[14:38] <itjamie-temp> https://bugs.launchpad.net/cloud-init/+bug/1994980 well fingers crossed.
[14:38] -ubottu:#cloud-init- Launchpad bug 1994980 in cloud-init "FR for variable substitution in nocloud-net urls (eg system serial number)" [Undecided, New]
[14:58] <meena> sooooo close
[15:49] <meena> falcojr: all bugs, no, all tests fixed
[16:25] <meena> jrm: i think our package should patch whatever is causing cloud-init to print its version to… our version (in the net/cloud-init-devel package)
[16:35] <andrew49> falcojr following up on my issue, it looks like as noted in https://discuss.linuxcontainers.org/t/snap-inside-privileged-lxd-container/13691 that adding "security.nesting: true" in addition to "security.privileged: true" (the latter being what I really want) is sufficient to avoid this problem and allow snapd to successfully install (and thus
[16:35] <andrew49> cloud-init to finish); thanks again for the help!
[16:48] <falcojr> ah, great. Glad you found it
[20:53] <meena> falcojr: thanks for the review. Will adapt how flags are parsed, and actually add OpenBSD and NetBSD outputs to our assets for testing
[20:54] <meena> tomorrow.
[20:55] <falcojr> Sounds good
[22:47] <itjamie> Re https://bugs.launchpad.net/bugs/1994980 if i take a stab at a pr is there any guidance you'd like to give on what vars should be available for substitution?
[22:47] -ubottu:#cloud-init- Launchpad bug 1994980 in cloud-init "FR for variable substitution in nocloud-net urls (eg system serial number)" [Wishlist, Triaged]