[15:09] <spetrosi> Hi folks, I am looking at the cloud-init feature of creating a /etc/NetworkManager/conf.d/99-cloud-init.conf file to ensure that DNS is managed by cloud-init. The only workaround I see to this is to add a `/etc/NetworkManager/conf.d/9A-override-99-cloud-init.conf` file with `dns=default` to override cloud-init's configuration. Is there some more valid fix? Btw I do not have a `resolve` section in my /etc/cloud/cloud.cfg
[15:24] <minimal> spetrosi: which DataSource are you using?
[15:26] <spetrosi> minimal, it is recommended in https://access.redhat.com/solutions/4757761
[15:28] <minimal> I can't read that as I don't have a login. However I did ask which DataSource you are using in order to understand your problem...
[15:30] <minimal> as for some DataSources any DNS server configuration will come from network-config and in other cases via user-data
[15:38] <spetrosi> minimal, It's a VM provisioned from OpenStack
[15:39] <minimal> ok, so network settings including DNS are coming from either OpenStack DS or ConfigDrive DS
[15:40] <minimal> and you want NetworkManager to use the cloud-init DNS setting or to *not* use the cloud-init DNS settings?
[15:56] <spetrosi> minimal, to *not* use the cloud-init DNS settings
[16:15] <smoser> meena: be nice. https://bugs.launchpad.net/cloud-init/+bug/2006784 ;)
[16:15] -ubottu:#cloud-init- Launchpad bug 2006784 in cloud-init "dhcp lookups depend on end-of-life dhclient" [Undecided, New]
[16:16] <smoser> wrt to dhcp for ephemeral. i really think cloud-init should have a builtin (python) dhcp client that it can interact with inn this case rather than relying on some binary to execute.
[17:01] <minimal> smoser: I talk with holmanb in the past about adding both dhcpcd and udhcpc support
[17:01] <minimal> s/talk/talked/
[17:03] <minimal> smoser: interesting that the report is regarding NixOS, which is *not* a currently supported distro...
[17:04] <holmanb> also nixos's design isn't a good fit for cloud-init imo
[17:05] <minimal> well it is more "your software doesn't support a situation in a distro that you don't support (and therefore haven't tested)"
[17:06] <minimal> though they seem to have added it themselves: https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/services/system/cloud-init.nix
[17:09] <minimal> there selection of enabled modules is "interesting" as some of them may not work as the modules don't define NixOS (or any family for it) as supported...
[17:09] <holmanb> smoser: Current plans are to add fallback support for dhcpcd in the near term - other projects require an external dhcp client as well, and dhcpcd seems to meet the most needs.
[17:10] <holmanb> smoser: I explored some manually byte packed dhcp frames a while back to test out the idea - it seems doable, especially if limited to the functionality currently used rather than trying to implement a fully-featured client.
[17:11] <smoser> yeah, i kind of looked at that too. it was really surprising to me though. i found a number of dhcp clients in python, but they all required rthat the interfacxe you were using already have an ip address.
[17:11] <smoser> which, seems, surprising to me. and was then not useful for cloud-init. but i could have been missing something.
[17:13] <smoser> https://github.com/lvfrazao/dhcppython looks like it  might fit the bill. i dont think google turned that up so esaily when i last looked.
[17:18] <minimal> smoser: from a quick scan it doesn't appear to do DHCPv6 though
[17:18] <smoser> does anything use that in ephemeral ?
[17:18] <holmanb> not currently
[17:19] <smoser> if it magically worked, then its a huge improvement over what we have there. it has no dependencies.
[17:19] <minimal> perhaps not at present, but there is the future scenario of a IPv6-only environment ot consider
[17:19] <smoser> but probably not packaged.
[17:32] <smoser> well, it didn't "just work" for me in a container test. but that coudl be user error.
[17:33] <smoser> but i really do think that such an approach is preferable compared to trying to kick muiltiple crappy dhcp clients into doing exactly what you want.  where "what you want" is not really a normal thing.
[17:39] <meena> smoser: im always nice
[17:39] <holmanb> smoser: in that case what would be the story for distro support? "use dhclient until you can get this new dependency packaged"?
[17:40] <smoser> or vendor i guess. 
[17:50] <minimal> still need to support dhcpcd and/or another client outside of the ephemeral stuff - i.e. the client used by the distro - currently cloud-init writes a dhclient "hook" script (from memory)
[17:51] <minimal> cloudinit/dhclient_hook.py
[17:52] <minimal> which writes /etc/dhcp/dhclient-exit-hooks.d/hook-dhclient
[17:52] <meena> smoser: i think we discussed adding scapy as dependency a few years back
[17:53] <holmanb> scapy is pretty big iirc
[17:53] <holmanb> which I think was impetus why I was looking into crafting byte-packed frames
[17:54] <meena> and probably overpowered for most things we need
[17:54] <smoser> yeah, scapy is huge.
[17:57] <holmanb> minimal: What's the background on dhclient_hook? I see what it does, but I'm missing context on why we support this, and the docs for it are.... on par with many of our more obscure features
[17:59] <meena> FreeBSD has a package, https://www.freshports.org/net/scapy/ but disables a lot of features by default
[18:02] <minimal> holmanb: not sure myself - looking on a NoCloud based box the hook script appears to be designed to only active on Azure to run "cloud-init dhclient-hook up/down <interface>"
[18:02] <minimal> s/active/activate/.
[18:05] <holmanb> minimal: I see the initial commit message for that file explains it
[18:06] <minimal> I should have looked for that ;-)
[18:09] <minimal> my point being the need to create equivalents hooks for dhcpcd and udhcpc
[18:40] <holmanb> The hook is currently executed by dhclient, which is executed by cloud-init, right? 
[18:44]  * holmanb boots on azure to poke at it
[18:44] <minimal> the hook is run by dhclient, not sure if that is when dhclient is run by cloud-init as part of ephemeral DHCP or if it is later in boot when the OS brings up network interfaces
[19:31] <holmanb> minimal: dhclient-hook looks vestigal to me
[19:33] <holmanb> minimal: azure doesn't appear to be consuming that env var json file anymore
[19:34] <minimal> holmanb: cool. Less work to handle other dhclient alternatives then
[19:49] <meena> holmanb: 🪓❓
[19:55] <holmanb> meena: mayhaps
[19:56] <holmanb> meena: Something that's a top level cloud-init command should probably be deprecated first, but since this looked to be cloud-specific cmd, and is unlikely to have external consumers (all it does is filter some specific env vars and stuff them in a file as json), it might be something we could axe without deprecation.
[19:57] <holmanb> Either way, it would be best to check with msft to make sure this file isn't used in some other way (i.e. WALinuxAgent directly), and to make sure I didn't miss something.
[20:01] <minimal> holmanb: there's nothing in the hotplug network interface functionality (i.e. AWS and Openstack currently) that relies on dhclient and/or its hooks?
[20:03] <holmanb> minimal: I'm not sure on that one. I thought that was udev-based.
[20:09] <jchittum> definitely want to be careful with Azure and hotplugging networking. hotplug networking is done to setup the Advanced Networking, which is a core functional requirement of AKS
[20:09] <jchittum> there's a dragon in there somewhere, as we found out with the systemd / udev shenanigans in August
[20:34] <holmanb> jchittum: +1 more than one dragon I suspect
[20:39] <blackboxsw> minimal/holmanb sorry for delay on separate CA certs PR. I'm almost done on my end, couple minor tweaks to unittest/docs/integrationtests
[20:40] <blackboxsw> PR looks really good thx
[20:40] <holmanb> cjp256: github won't let me request you as a reviewer for some reason, but if you get a chance sometime https://github.com/canonical/cloud-init/pull/2015
[20:40] -ubottu:#cloud-init- Pull 2015 in canonical/cloud-init "[RFC] remove vestigal dhclient_hook command" [Open]
[20:40] <holmanb> cjp256: which is followup to https://github.com/canonical/cloud-init/commit/5ad0768a796bc07232476d0d29b5225f1e6e131c
[20:40] -ubottu:#cloud-init- Commit 5ad0768 in canonical/cloud-init "sources/azure: remove lease file parsing (#1302)"
[22:01] <blackboxsw> ok done on https://github.com/canonical/cloud-init/pull/1962 think this'll make 23.1 release.
[22:01] -ubottu:#cloud-init- Pull 1962 in canonical/cloud-init "overhaul cc_ca_certs functionality" [Open]
[22:03] <minimal> blackboxsw: Thanks, I'll have a look shortly
[22:05] <blackboxsw> there's a secondary bug with SSH host keys floating around.... that I also want to take a look at to see if we can tackle it for this release. https://bugs.launchpad.net/cloud-init/+bug/1999164
[22:05] -ubottu:#cloud-init- Launchpad bug 1999164 in cloud-init "when multiple SSH host key certificates are defined, only one HostCertificate is referenced in sshd_config" [High, Confirmed]
[22:05] <blackboxsw> It's now in our selected for development queue, so hopefully we can make some substantive progress there shortly
[22:08] <blackboxsw> note that I may be incorrect on the CA certs behavior suggestion from RHEL(esposem). holmanb/minimal do you understand that they wanted to still wait on us setting this behavior in general and landing it in tip of main until they change RHEL behavior. Or did minimal's backing out RHEL changes avoid this concern (because redhat will still continue to fully `remove_default_ca_certs`
[22:09] <blackboxsw> minimal/holmanb Let's restate that question clearly. Do you think were waiting on a fix in redhat before we could land PR 1962 still?
[22:10] <holmanb> blackboxsw: I don't sense a change in behavior comming from RHEL, just a change in docs (which already landed).
[22:11] <blackboxsw> Thank you. I wanted to confirm I thought minimal backed out any rhel-related changes.
[22:13] <holmanb> blackboxsw: minimal previously had rhel "not implemented" in hopes of using a RHEL dedicated utility, but from RHEL it seems that their utility isn't intended to manage system certs, so that part has been reverted to former behavior, iirc, which is to delete the certs
[22:13] <minimal> blackboxsw: yes I reverted RHEL and FreeBSD related changes as things were unclear
[22:16] <emper0r> hi.. i'm getting a little problem with cloud-init when boot an ec2 into aws..
[22:16] <emper0r> is taking 6 hours 
[22:17] <emper0r> stucking the boot until get login prompt
[22:17] <blackboxsw> emper0r: that's a problem :/ `systemd-analyze blame` or `cloud-init analyze show`?
[22:17] <blackboxsw> should tell you the blocking service that is taking so long in boot
[22:17] <emper0r> in fact i take a svg image
[22:18] <emper0r> and say
[22:18] <emper0r> cloud-init.service (6h 21min 55.264s)
[22:19] <blackboxsw> cloud-init analyze blame may tell you what part of cloud-init is blocking.
[22:19] <emper0r> want see the full blame output?
[22:20] <blackboxsw> sure pastebin.com works
[22:20] <emper0r> wait let me paste in some site 1 sec
[22:21] <emper0r> https://dpaste.com/CGA622DJS
[22:22] <blackboxsw> great and 'cloud-init analyze blame' and 'cloud-init analyze show' if possible please
[22:22] <emper0r> 1 sec
[22:22] <blackboxsw> something is likely blocking cloud-init.service from running... but not sure yet what that is .... also surpising that mysqld.service is taking an hour to come up BTW
[22:23] <blackboxsw> *2 hours rather
[22:23] <blackboxsw> makes me thing network throughput,retries and timeouts with misconfigured network  or proxy issues etc
[22:23] <emper0r> yes we have many database using engine InnoDB so when start mysql start check internal to rollback and start ok i need to login meanwhile is testing that...
[22:24] <emper0r> 1 sec to get output 
[22:26] <emper0r> blackboxsw: both output here https://dpaste.com/8EPJRFCBH
[22:33] <blackboxsw> emper0r: thanks your cloud-init analyze blame (and show) both show on latest boot, whatever bootcmd script is provided to cloud-init as user-data or vendor-data in these images took a  really long time to complete. line 264 of your paste -- Boot Record 07 --
[22:33] <blackboxsw>      22913.76400s (init-network/config-bootcmd)
[22:33] <blackboxsw> I'd check `sudo cloud-init query userdata` and see what script your image is running and debug there
[22:34] <blackboxsw> in a `bootcmd` section.
[22:35] <blackboxsw> as that's what took the 6 hrs
[22:39] <emper0r> blackboxsw: https://dpaste.com/AKJ95THPC
[22:39] <emper0r> is a simple script we create to auto-registrate any ec2 into our private hosted zone automaticly
[22:43] <emper0r> wait.. now checking that script log
[22:43] <emper0r> i see the timeout registering into our zone
[22:43] <emper0r> strange
[22:43] <emper0r> weird
[22:44] <emper0r> Create/Update EC2 record
[22:44] <emper0r> Error: RequestError: send request failed
[22:44] <emper0r> caused by: Get "https://route53.amazonaws.com/2013-04-01/hostedzonesbyname?dnsname=
[22:44] <emper0r> dial tcp 54.239.31.187:443: i/o timeout
[22:44] <emper0r> and the start and stop logs are
[22:45] <emper0r> 20230209-15:01 and ends 20230209-21:23
[22:45] <emper0r> with exactly that 6 hour in the middle
[22:45] <blackboxsw> +1. yeah I haven't played w/ route 53 offerings much, just read a couple docs on it.... but interesting little side script there and falling over on the setup/config I think
[22:45] <emper0r> i guess that is the cause
[22:46] <emper0r> hmm have to check if have some option about timeout to set not more than x min to avoid this 
[22:46] <emper0r> thanks for all blackboxsw i guess I can handle from this..
[22:46] <emper0r> :)
[22:47] <blackboxsw> take care
[22:47] <emper0r> u2