=== janitha78 is now known as janitha7 | ||
andrew49 | Hello! I'm using cloud-init 22.3.4 and am encountering a problem where it remains in "status: running" forever (hours) with no indication in the logs about where it is stuck or why; "cloud-init analyze show" indicates that init-local and init-network have finished the states successfully; moreover, if I run "cloud-init clean" followed by | 12:39 |
---|---|---|
andrew49 | "cloud-init --debug init" and "cloud-init --debug modules", everything completes successfully. What else can I do to determine why/where it is hanging forever? | 12:39 |
meena | hi andrew49 | 13:39 |
meena | andrew49: do a full clean of logs and state. enable logging properly: basically, just make sure this file exists: https://github.com/canonical/cloud-init/blob/main/config/cloud.cfg.d/05_logging.cfg looks like this, and is named .cfg and so will be included | 13:41 |
meena | then do a reboot and see what the logs say | 13:41 |
falcojr | andrew49: actually, don't do that yet (unless you see no logs at all). You said 'cloud-init analyze show' shows init-local and init-network. In /var/log/cloud-init.log, is there a "running 'modules:config'" or "running 'modules:final'" message in the logs? | 13:49 |
falcojr | if not there's likely something else that was blocking cloud-init from running its final stages | 13:49 |
falcojr | does "systemctl --failed" or "systemd-analyze critical-chain" show anything unusual? | 13:50 |
andrew49 | falcojr I do not see either of those entries in /var/log/cloud-init.log; "systemctl --failed" shows systemd-remount-fs.service as "failed" - maybe that is the problem? | 13:57 |
andrew49 | it looks like the error in the unit is "mount: /: can't find LABEL=cloudimg-rootfs." | 13:58 |
falcojr | andrew49: I'd be surprised if that was the problem (unless literally mount '/' failed...which...would give you bigger problems I think :P ), but cloud-init actually runs 4 separate times on boot, so once the first service starts, status will say 'running' until the final service has completed. cloud-config.service has some dependencies during boot, so if those don't complete or are taking forever for some reason, it will look like cloud-init is... | 14:03 |
falcojr | ... never completing | 14:03 |
andrew49 | does it seem likely that the fact that this systemd-remount-fs.service is "failed" is the thing holding it up then? | 14:03 |
andrew49 | "systemd-analyze critical-chain" reports "Bootup is not yet finished" so not a lot of extra info there | 14:04 |
falcojr | andrew49: try 'systemctl list-jobs' ? | 14:06 |
falcojr | anything 'waiting' there? | 14:06 |
andrew49 | yes a number of things - snapd.autoimport.service, cloud-init.target, cloud-config.service, snapd.seeded.service, cloud-final.service, multi-user.target, ubuntu-advantage.service, graphical.target, and systemd-update-utmp-runlevel.service | 14:08 |
itjamie-temp | question. when using nocloud-net datasource. is it possible for cloud-init to supply the syserial/mbserial to the datasource http server? | 14:09 |
itjamie-temp | eg something like `ds=nocloud-net;s=http://10.10.0.1:8000/$sysserial/` | 14:10 |
falcojr | andrew49: cloud-config.service relies on snapd completing. I would check the snap logs to see if anything is hanging there | 14:11 |
falcojr | itjamie-temp: I think that could work, but then cloud-init will try contacting http://10.10.0.1:8000/$sysserial/user-data for user data and similarly for vendor-data and meta-data | 14:14 |
andrew49 | falcojr I see several errors, e.g. | 14:14 |
andrew49 | "[change 142 "Setup snap \"snapd\" (17029) security profiles" task] failed: cannot reload udev rules: exit status 1" | 14:14 |
andrew49 | and | 14:14 |
andrew49 | "error trying to compare the snap system key: system-key missing on disk" | 14:14 |
falcojr | andrew49: Unfortunately, I don't know enough about snap to meaningfully help debug that | 14:15 |
andrew49 | falcojr okay thanks, I have enough to go on for now so I'll work on figuring it out and report back what I find | 14:15 |
itjamie-temp | @falco | 14:16 |
itjamie-temp | falcojr is there a list of variables cloud-init has available? eg i just guessed $sysserial | 14:16 |
andrew49 | falcojr I think the issue I'm hitting is https://bugs.launchpad.net/snapd/+bug/1712808; this container does have 'security.privileged: "true"' as described there | 14:22 |
-ubottu:#cloud-init- Launchpad bug 1712808 in snapd "udev interface fails in privileged containers" [Medium, Confirmed] | 14:22 | |
falcojr | itjamie-temp: oh sorry, I think I misunderstood your initial question. I don't think there's any variable substitution that can be applied there | 14:28 |
itjamie-temp | damn. that would have been really useful... | 14:28 |
meena | falcojr: so, i just realized something, and I don't know why it took me almost a week: I can mock the ifconfig -a output as '' in cases where it has no bearing on the outcome. | 14:29 |
falcojr | meena: yeah, were you trying to feed it realistic results before? | 14:29 |
meena | falcojr: pretty much everywhere where it was failing | 14:30 |
falcojr | doh... | 14:30 |
meena | falcojr: i just did my thing of readResource('assets/netinfo/freebsd-ifconfig-output') | 14:30 |
meena | but that's a lot of work, for: we're just checking for … something completely different. | 14:30 |
meena | but, tbf, i was searching for generic solutions | 14:31 |
itjamie-temp | where would i open a feature request for cloud-init ? | 14:32 |
meena | itjamie-temp: link is in the /topic | 14:33 |
meena | well, it says bugs, but, still | 14:34 |
itjamie-temp | ok so its fine to open a bug for an FR? | 14:34 |
meena | itjamie-temp: yes | 14:34 |
meena | I open at least one per week :P | 14:34 |
itjamie-temp | https://bugs.launchpad.net/cloud-init/+bug/1994980 well fingers crossed. | 14:38 |
-ubottu:#cloud-init- Launchpad bug 1994980 in cloud-init "FR for variable substitution in nocloud-net urls (eg system serial number)" [Undecided, New] | 14:38 | |
meena | sooooo close | 14:58 |
meena | falcojr: all bugs, no, all tests fixed | 15:49 |
meena | jrm: i think our package should patch whatever is causing cloud-init to print its version to… our version (in the net/cloud-init-devel package) | 16:25 |
andrew49 | falcojr following up on my issue, it looks like as noted in https://discuss.linuxcontainers.org/t/snap-inside-privileged-lxd-container/13691 that adding "security.nesting: true" in addition to "security.privileged: true" (the latter being what I really want) is sufficient to avoid this problem and allow snapd to successfully install (and thus | 16:35 |
andrew49 | cloud-init to finish); thanks again for the help! | 16:35 |
falcojr | ah, great. Glad you found it | 16:48 |
meena | falcojr: thanks for the review. Will adapt how flags are parsed, and actually add OpenBSD and NetBSD outputs to our assets for testing | 20:53 |
meena | tomorrow. | 20:54 |
falcojr | Sounds good | 20:55 |
itjamie | Re https://bugs.launchpad.net/bugs/1994980 if i take a stab at a pr is there any guidance you'd like to give on what vars should be available for substitution? | 22:47 |
-ubottu:#cloud-init- Launchpad bug 1994980 in cloud-init "FR for variable substitution in nocloud-net urls (eg system serial number)" [Wishlist, Triaged] | 22:47 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!