[12:24] <Raito_Bezarius> Hello, I'm trying to understand if cloud-init supports update_etc_hosts for NixOS
[12:40] <Raito_Bezarius> https://github.com/canonical/cloud-init/pull/561 I have made this
[14:31] <Odd_Bloke> Hello everyone, I'm back from my two weeks of vacation. o/
[14:33] <Odd_Bloke> I'm catching up on scrollback and as we're introducing more and more development tools into cloud-init, I wonder if we should consider splitting the Ubuntu package into two: one for the actual "init" functionality, and one that people can install on dev machines (to get schema validation, MIME archive creation, etc.) without cloud-init (potentially) running on boot.
[14:34] <Odd_Bloke> (Potentially this could be implemented by just moving the systemd units to a separate package; we probably don't want to try splitting the source tree in packaging.)
[16:05] <smoser> i have 2 thoughts.
[16:05] <smoser> a.) with the generator, if you don't want cloud-init to run, touch /etc/cloud/cloud-init.disabled
[16:06] <smoser> b.) right the only tool i think that is in cloud-init that isn't really to be run on a system that *does* use cloud-init  during boot is 'make-mime'
[16:07] <rharper> maybe net-convert
[16:07] <smoser> maybe... that is a dev tool though.
[16:08] <smoser> the obviousl thing that i'd think to add would be the `cloud-localds` functionality from cloud-utils
[16:08] <rharper> yeah, the schema stuff is in the middle ground as well
[16:08] <smoser> but doing *that* would mean dependencies
[16:08] <smoser> which i wouldn't want a package to pick up (mkiso)
[16:08] <rharper> smoser: right
[16:09] <smoser> i can see the value of schema on non-running system for sure.
[16:11] <Odd_Bloke> The `analyze` subcommands can take an input file, so they could also be used on a non-running system.
[16:27] <smoser> but the
[16:28] <smoser> s/but the//
[16:40] <Odd_Bloke> I think we can categorise our commands as: functional (init etc.), on-instance utilities (collect-logs, query, clean, etc.), power-user development (at least schema, make-mime, perhaps analyse and others; localds would fit here, I think), developer (net-convert, render? (I don't know what render does))
[16:42] <Odd_Bloke> As I said, I don't think splitting up our code tree in packaging is worthwhile, but we could feasibly have (names not intended as suggestions for actual package names): cloud-init-library, cloud-init-systemd-services, and then cloud-init-power-users-install-me which would depend on cloud-init-library and packages which are only required for {power-user,developer} subcommands.
[16:43] <Odd_Bloke> Which would give us a way of installing mkiso on systems where people are opting into having the full functionality available, without bloating the dependency set of the packages that get installed on every system (-library and -systemd-services).
[16:55] <rharper> Odd_Bloke: render lets users expand/test jinja templated files
[17:18] <smoser> Odd_Bloke:the issue is that if you dn't split up the code tree, then cloud-init-library has to have the mkiso dependency
[17:18] <smoser> or its just broken, waiting for someone to try to use the library and fail
[17:41] <Odd_Bloke> I think it would be reasonable to exit with "`cloud-init localds` requires `mkisofs` to be installed" or similar; it's an interactive command so people can respond to that.
[17:41] <Odd_Bloke> But I agree that it's not ideal.
[18:15] <ananke> hmm, this makes no sense, but so far all the symptoms agree: 'cloud-init status --wait' claims that 'status: done', but tail'ing cloud-init-output.log still shows final modules still running (package installs/updates specifically). was there ever a bug related to this?
[18:16] <blackboxsw> ananke: https://bugs.launchpad.net/cloud-init/+bug/1890528 ?
[18:16] <ananke> thank you, I'll check. this may be it.
[18:16] <blackboxsw> ananke: was cloud-init stats --long an error condition too?
[18:17] <ananke> This is on kali, absed on debian: cloud-init/now 20.1-2 all [installed,upgradable to: 20.2-2]
[18:17] <blackboxsw> that would exit early
[18:17] <ananke> blackboxsw: nope, no errors
[18:17] <ananke> I've been trying to figure out for the past hour why our packer configuration is failing on kali 2020.2, while it worked on kali 2020.1, and I was seeing some odd race conditions in the output logs
[18:19] <ananke> we use cloud-init status --wait before proceeding to next steps, and that seemed to first exit when there were problems with bootcmd, but even after removing everything and no errors, I still see it claiming it's done, while modules are still running
[18:20] <blackboxsw> ananke: after "removing everything" do you mean running "sudo cloud-init clean"?
[18:20] <blackboxsw> if a "clean" is not performed, some artifacts will exist on the system that would trick cloud-init status into thinking it is done
[18:21] <ananke> blackboxsw: no, I mean after removing anything in our packer config that pertains to early stages/etc
[18:23] <ananke> we pass a cloud-init config via user-data, and we then tell packer to wait until cloud-init status --wait exits
[18:23] <blackboxsw> ananke: so, cloud-init status --wait looks at /run/cloud-init/status.json
[18:24] <ananke> k, I'll redo the process and see what's in that file
[18:25] <blackboxsw> If each  key(stage) if it sees start and finished times for each stage, then it assumes that cloud-init is complete
[18:25] <blackboxsw> yeah check that and /run/cloud-init/result.json contents
[18:32] <ananke> here's a sample output, after I ssh to the system that's being provisioned: https://dpaste.com/3DT6BTAEW
[18:34] <ananke> so modules-final is not done, yet claims it is?
[18:47] <Odd_Bloke> ananke: Hmm, what is it that's performing those downloads?
[18:47] <Odd_Bloke> Is that just cloud-config, or have you passed in a script or similar?
[18:49] <ananke> Odd_Bloke: just cloud-config
[18:50] <Odd_Bloke> ananke: Could we see a full cloud-init.log?
[18:50] <ananke> sure, do hold
[18:51] <Odd_Bloke> https://www.youtube.com/watch?v=6g4dkBF5anU
[18:53] <ananke> Odd_Bloke: : http://sprunge.us/KExsDJ
[18:55] <ananke> so this part is still not finished: 2020-08-31 18:34:20,861 - util.py[DEBUG]: apt-install [apt-get --option=Dpkg::Options::=--force-confold --option=Dpkg::options::=--force-unsafe-io --assume-yes --quiet install fio time xrdp xorgxrdp aptitude elinks gedit htop leafpad nano nmap vim] took 131.844 seconds
[18:56] <ananke> which is populated via the 'packages:' directive
[18:57] <Odd_Bloke> ananke: It looks like "cloud-init mode 'modules' took" appears 4 times in the log; I believe we would only expect to see it twice: one for each of `--mode config` and `--mode final`.
[18:58] <ananke> ohh, damn, I think you just nailed the issue. in packer as a workaround we had to kick off 'cloud-init --mode=config' and 'cloud-init --mode=final' manually, and I never took that out
[18:59] <Odd_Bloke> That'd do it!
[18:59] <ananke> crap, I'm so sorry
[18:59] <blackboxsw> nice Odd_Bloke
[19:00] <ananke> Here's the offending code:
[19:00] <ananke>                 "cloud-init modules --mode=config; echo cloud-init module config error code is $?",
[19:00] <ananke>                 "cloud-init modules --mode=final; echo cloud-init module final error code is $?",
[19:00] <ananke>                 "/usr/bin/cloud-init status --wait; echo cloud-init status error code is $?",
[19:01] <ananke> It's interesting to see what the consequence is though
[19:03] <blackboxsw> +1 generally cloud-init doesn't prescribe machines invoking each stage directly as part of the boot process because of cloud-init uses systemd service/unit ordering to ensure the start of each stage after the appropriate stage in system boot.
[19:04] <blackboxsw> ...and if cloud-init every added boot stages images which only called into specific cloud-init boot stages might miss the introduction of a new cloud-init boot or configuration  stage  ( which *may* actually happen this year per https://bugs.launchpad.net/cloud-init/+bug/1892851)
[19:05] <blackboxsw> but I get that it is a good option while developing a new system to call into those stages directly
[19:05] <ananke> yeah, we had to put those workarounds due to broken Kali AMI
[19:06] <blackboxsw> roger. if you find there is something that makes sense to upstream, I'm sure we'd love to help get that support in
[19:06] <blackboxsw> into master
[19:07] <ananke> I can't imagine this would be a common issue, but I wonder if putting some locking/semaphore checking makes sense.
[19:10] <ananke> funny enough, I did remove these commands from packer earlier this morning, but then put them back because things were still breaking. so while I fixed the other things (I had to enable/start cloud-init-local.service and cloud-config.service), I never bothered to go back to this section
[19:13] <blackboxsw> ananke: each config module has it's own semaphore based on whether the module should be run per-boot, per-once, per-instance or per-always
[19:13] <blackboxsw> so as a whole, locking the entire boot stage doesn't quite work because some components within that stage want to be run always
[20:42] <ananke> on an unrelated (or semi-related) note, I need to figure out if there's a way to tail cloud-init's log _while_ waiting for cloud-init status --wait to finish
[20:43] <blackboxsw> tail -f /var/log/cloud-init.log?
[20:43] <blackboxsw> :) from a fok
[20:43] <blackboxsw> frok
[20:43] <blackboxsw> fork even
[20:43] <blackboxsw> :)
[20:44] <ananke> blackboxsw: that would work, but I'd have to kill it after cloud-init status --wait exits
[20:45] <ananke> otherwise that entire thing would hang and packer would never finish that stage
[20:48] <ananke> On error conditions I can dump the logs to stdout, which for us means packer sends them to gitlab CICD console and we can inspect them that way. However, it would be nice to view them real time, and see what's happening with the system as we wait ~20-30 mins for various cloud-init stages to finish
[23:32] <blackboxsw> ugh