[07:33] <caribou> Hello,
[07:33] <caribou> Any hint on what's needed to get the 23.2 SRU of cloud-init in verified ?
[08:58] <sorinp> Hello! I'm trying to use "ds=nocloud-net s=http://boot-server.localdomain/cloud-init/" while iPXE booting an UEFI system, using this ipxe script: https://ipxe.archlinux.org/releng/netboot/archlinux.ipxe. The kernel parameter appears in /proc/cmdline, but I can see in webserver's logs that requests for the files are not made. I can only see the requests for system images and initrd while the system is 
[08:58] <sorinp> booting up properly. Can you help me figure out what I'm missing, please?
[08:59] <sorinp> I have a modified copy of the archlinux.ipxe script on my server, with the kernel parameter added.
[09:00] <meena> sorinp: you need a ; instead of a space
[09:00] <meena>  "ds=nocloud-net;s=http://boot-server.localdomain/cloud-init/" 
[09:02] <sorinp> meena: I tried that too; no luck." Although the in docs is without a ;.
[09:03] <meena> weird, i thought i saw a commit recently 
[09:04] <sorinp> meena: It looks like today it has a ;. Yesterday it hadn't.
[09:05] <sorinp> meena: Anyway, I tried both ways.
[09:05] <meena> that's how it goes sometimes
[09:06] <sorinp> This is my current line: imgargs vmlinuz-linux initrd=amd-ucode.img initrd=intel-ucode.img initrd=initramfs-linux.img archiso_http_srv=${mirrorurl}iso/${release}/ archisobasedir=arch cms_verify=y ${extrabootoptions} network-config=disable 'ds=nocloud-net;s=http://boot-server.localdomain/cloud-init/'
[09:06] <sorinp> ... and it does not work.
[09:07] <sorinp> The image is the standard archlinux installation media.
[09:12] <meena> I'm Blut confused as to what https://ipxe.archlinux.org/releng/netboot/archlinux.ipxe even does
[09:13] <sorinp> I's the bootloader configuration, like grub.cfg. Over the network, grub is replaced by ipxe as bootloader.
[09:14] <sorinp> https://ipxe.org
[09:16] <meena> sorinp: now is the time to confess I'm a FreeBSD user and developer, and the last time i worked on Linux at that level may be 8 years ago
[09:18] <sorinp> meena: I think ipxe can boot FreeBSD, too.
[09:18] <meena> 10:07 <sorinp> The image is the standard archlinux installation media. ⬅️ and that comes with cloud-init preinstalled? 
[09:19] <sorinp> meena: Yes, and if it is booted from a USB stick, works as intended.
[10:20] <sorinp> meena: Here is the archlinux cloud-init installantion documentation: https://wiki.archlinux.org/title/Install_Arch_Linux_via_SSH#Installation_on_a_headless_server
[11:12] <meena> sorinp: if cloud-init is installed and enabled, it should leave a trace on the machine about its failure
[11:14] <sorinp> meena: That's what I wanted to do yesterday and forgot to, before writing here: collect-logs...
[11:45] <sorinp> meena: Here is the log: https://dpaste.org/B6nnY/raw
[11:46] <sorinp> meena: I do not understand why it's saying notfound, b/c if I'm using wget, I can download user-data
[11:49] <meena> is_ds_enabled(IBMCloud) = true.
[11:49] <meena> No ds found [mode=search, notfound=disabled]. Disabled cloud-init [1]
[11:49] <meena> [up 78.95s] returning 1
[11:50] <sorinp> I don't see what's wrong.
[11:55] <sorinp> meena: what you pasted is the outcome, not the cause. Why can't it find the ds, as configured in the kernel command line?
[11:56] <sorinp> meena: I can see the NoCloud module is available and its configuration in the kernel command line is correct.
[11:57] <sorinp> meena: Or is there a way to tell ds-identify to not search, but use the kernel commandline configuration?
[12:35] <falcojr> I'm guessing that the quote before ds=nocloud-net is causing issues
[13:06] <sorinp> falcojr: That quote is an opening quote, and its pair, the closing quote is after the URL.
[13:07] <sorinp> falcojr: There is a note in the docs saying: "When supplementing kernel parameters in GRUB’s boot menu take care to single-quote this full value to avoid GRUB interpreting the semi-colon as a reserved word. See: GRUB quoting"
[13:08] <sorinp> falcojr: I know this is not grub, but I think quoting should not interfere with how the configuration is read, should it?
[13:09] <sorinp> falcojr: I will try without quoting, too.
[13:26] <sorinp> falcojr: It looks like it's initializing cloud-init wen the kernel parameter is not quoted, but:
[13:26] <sorinp> falcojr: DataSourceNoCloud.py[DEBUG]: Seed from http://boot-server.localdomain/cloud-init/ not supported by DataSourceNoCloud [seed=None][dsmode=net]
[13:27] <sorinp> falcojr: Also, there is this error: cmdline.py[ERROR]: Expected base64 encoded kernel commandline parameter network-config. Ignoring network-config=disable.
[13:35] <sorinp> Ah. It should be 'disabled', not 'disable'.
[13:59] <falcojr> also, I can confirm that the quote does matter. Quoting the value causes it to not be recognized
[13:59] <falcojr> holmanb: I think this is actually a regression introduced in https://github.com/canonical/cloud-init/pull/2093/files#diff-8e27dfc14b3ffceee1e29d3cf58dc25309c0ac070879febd47c578de04ea2ddd
[13:59] -ubottu:#cloud-init- Pull 2093 in canonical/cloud-init "Standardize NoCloud Command Line Parsing (SC-1457)" [Merged]
[14:00] <minimal> falcojr: in the past I had to quote the "ds=" value as otherwise everything after the semicolon didn't get passed by bootloader to kernel
[14:01] <falcojr> yeah, this is a recent change in behavior
[14:03] <sorinp> falcojr: I looked at the code and in DataSourceNoCloud.py at line 165 it should set seedfound = proto, but it appears that it does not do it and enters in the next 'if' at line 167 and returns False.
[14:04] <sorinp> falcojr: It does this even if the proto is http://, which is defined as a valid proto.
[14:06] <blackboxsw> caribou: sorry for delay on SRU verification and communication there, there were a couple of flaky integration tests I was wrapping up re-verification on before attaching logs. I was going to add those logs today. But, we also need to investigate this ds=nocloud kernel cmdline behavior as this may be a regression that blocks SRU
[14:14] <caribou> blackboxsw: ok, thanks for the info. I'll check on this one to see how bad it looks for us
[14:38] <blackboxsw> caribou: should only affect nocloud datasource we expect
[14:39] <caribou> ok, good to know, thanks
[14:39] <blackboxsw> we're working on a fix now and will see whether we can expedite SRU aging process once we have a fix up
[14:39] <blackboxsw> as i seems all other integration tests are clear
[14:41] <caribou> I might install the one in -proposed and will catch up the rest on -updates depending on how long the new one takes to hit -updates. We're ramping up our new network infra which relies on commits in 23.2
[14:49] <sorinp> falcojr: Do you have any suggestions for me regarding the seedfrom issue?
[14:50] <falcojr> sorinp for now you can remove the quote from the kernel command line. We'll get the issue fixed soon
[14:52] <sorinp> falcojr: I already removed it and passed that milestone. But now it's not recognizing the s=http://boot-server.localdomain/cloud-init/ parameter.
[14:56] <sorinp> falcojr: This is the error: DataSourceNoCloud.py[DEBUG]: Seed from http://boot-server.localdomain/cloud-init/ not supported by DataSourceNoCloud [seed=None][dsmode=net]
[15:24] <holmanb> I don't believe that the kernel commandline thing is a regression.
[15:24] <holmanb> @sorinp we've been trying to improve the docs and code to be more standardized and clearer. If you see any specific improvements / clarifications in the docs I'd love to have your feedback.
[15:26] <holmanb> It's a tricky thing to document, since most of the quote-specific behavior to document is grub-specific, but grub isn't the only way to set kernel commandline for a kernel (other ways: qemu, other bootloaders, etc)
[15:27] <minimal> holman: I don't think the quote issue was grub specific though
[15:27] <minimal> I think I also saw it with Syslinux
[15:28] <holmanb> depends which quote issue you mean
[15:28] <minimal> having to put quotes around a seed "ds=" cmdline parameter to avoid it being truncated
[15:28] <holmanb> single quoteing to work around semi-colon meaning the end of the line is definitely a bootloader behavior, not a cloud-init one
[15:29] <holmanb> but a very common mistake so worth documenting
[15:30] <holmanb> minimal: you're right -> not grub-specific, bootloader-specific
[15:36] <blackboxsw> yes we had recently added docs to point out that GRUB needs any semi-colon delimited kernel cmdline to be enclosed within quotes because ; is a reserved character that causes grub to break apart separate commands so it treats that semi-colon as end of value for the kernel param appended. 
[15:36] <blackboxsw> `Note
[15:36] <blackboxsw> When supplementing kernel parameters in GRUB’s boot menu take care to single-quote this full value to avoid GRUB interpreting the semi-colon as a reserved word. See: GRUB quoting
[15:36] <blackboxsw> `
[15:37] <blackboxsw> Maybe we need to make that doc more general than just GRUB? https://cloudinit.readthedocs.io/en/latest/reference/datasources/nocloud.html
[15:45] <minimal> well I can retest with Syslinux and Limine to confirm whether or not they behave the same
[15:56] <sorinp> holmanb: With ipxe, the kernel parameter is not passed correctly without quotes, because in cloud-init.log: https://dpaste.org/6oivx/raw
[16:00] <sorinp> holmanb: Actually here is more of that log: https://dpaste.org/e1qfZ/raw
[16:03] <sorinp> holmanb: Even though I do not know how it actually read that parameter and we have this line: 2023-06-26 13:38:06,382 - DataSourceNoCloud.py[DEBUG]: Seed from http://boot-server.localdomain/cloud-init/ not supported by DataSourceNoCloud [seed=None][dsmode=net]
[16:14] <minimal> sorinp: looking at cloud-init.log from an old seed-url VM I have handy I think your problem is the "network-config=disabled" you have - remove that
[16:14] <minimal> you can't fetch the YAML docs from the (base) url without a network connection...
[16:18] <falcojr> minimal: I don't think that's it. For some reason it's getting regonized as NoCloud rather than NoCloudNet
[16:19] <sorinp> minimal: In the case of network booting, the network is already set up. without network-config=disabled, cloud-init is adding a network file in /etc/systemd/network that is causing network to malfunction.
[16:21] <falcojr> Need to double check but I think the regex here needs to include a '-'. https://github.com/canonical/cloud-init/blob/main/cloudinit/sources/__init__.py#L1180
[16:21] <minimal> falcojr: I'm comparing his log with mine (admittedly for c-i 23.1.1). My log has basically the *same* contents as his except for the "network config is disabled by cmdline" and then continues onwards to "networking.py[DEBUG]
[16:22] <sorinp> minimal: Can you add network-config=disabled and try again?
[16:23] <minimal> oops, "networking.py[DEBUG]: applying net config names for {'ethernet': {'eth0': {'dhcp4': True, 'set-name': 'eth0', 'match': {'macaddress': '08:00:27:93:98:0d'}}}, 'version': 2}"
[16:24] <sorinp> minimal: network-config is disabling the network configuration functions of cloud-init. That should be an isolated event and everything else should not assume that network is not configured, just because network-config is disabled.
[16:24] <minimal> well I'm just commenting on differences I see between you're log and mine
[16:27] <minimal> so it renders eni config (in my case) and finishes cloud-init "init-local", the OS then ifups eth0 using that config, and then in cloud-init "init" stage the seed URLs are accessed
[16:28] <sorinp> falcojr: I agree.
[16:28] <falcojr> the error in the log means that it doesn't like that seed url starts with "http://" which should be valid if we're using the DataSourceNoCloudNet class
[16:29] <falcojr> but we're not, because we failed the ds_detect check here https://github.com/canonical/cloud-init/blob/main/cloudinit/sources/DataSourceNoCloud.py#L371
[16:29] <falcojr> because of the regex I linked earlier
[16:30] <sorinp> falcojr: Perhaps it should also include a colon: http://boot-server.localdomain:16384/cloud-init
[16:30] <falcojr> it''s only trying to match the datasource name
[16:30] <falcojr> but in this case it's "nocloud-net", so because of the hyphen it doesn't match
[16:30] <sorinp> ah.
[16:31] <sorinp> falcojr: Are there any regexes applied on the URL, besides looking at proto?
[16:32] <falcojr> sorinp: not that I know of, but I'd have to do more checking
[16:32] <minimal> yeah I noticed in your logfile the line "Looking for data source in: ['nocloud-net', 'None'], via packages ['', 'cloudinit.source] that matches dependencies ['FILESYSTEM']
[16:32] <minimal> and wonder what the DS was called "nocloud-init" and not "NoCloud"
[16:33] <minimal> s/wonder what/wondered why/
[16:33] <minimal> as that's the official DS name
[16:33] <sorinp> s/nocloud-init/nocloud-net/
[16:34] <minimal> in the c-i 23.1.1. VM I have handy that same line refers to the ds as "NoCloud" even though I'm using nocloud-net
[16:34] <minimal> so it's a change in behaviour
[16:35] <falcojr> yeah, we had some changes here in the past few months, so I'm still trying to tease out what changed and where things were happening before
[16:38] <sorinp> falcojr: perhaps a git diff with a working version could reveal where to look...
[16:38] <falcojr> yeah, already going down the git rabbit hole
[16:38] <minimal> sorinp: tried adding the "network-config=disabled" entry and rebooting - it does behave a little differently, do see the "main.py[DEBUG]: [local] Exiting without datasource" line but then it jumps to cloud-init mode 'init' and continues on
[16:39] <sorinp> minimal: I could upload the full log, but it is tedious, because it has many pages and dpaste does not support file uploads.
[16:43] <minimal> rebooted again without "network-config=disabled" and *again* it sets up the DHCP fallback config (which then c-i "init mode uses to retrieve URLs)
[16:43] <minimal> so "network-config=disabled" *IS* (at least in c-i 23.1.1) *directly* affecting whether fallback network config is created or not (which is needs to access URLs)
[16:44] <falcojr> holmanb: can you add any context here?
[16:45] <minimal> falcojr / holmanb: you probably remember I highlighted several months ago that nocloud-net seed is NOT using ephemeral DHCP but rather using fallback config
[16:46] <sorinp> Is this the dependency setting? https://github.com/canonical/cloud-init/blob/main/cloudinit/sources/__init__.py#L42
[16:46] <blackboxsw> sorinp: sorry I missed original context here, what distribution release is this failing on?
[16:46] <blackboxsw> archilinux using tip of cloud-init?
[16:46] <sorinp> blackboxsw: It's archlinux.
[16:46] <blackboxsw> and what cloud-init version or commit is installed there?
[16:47] <sorinp> blackboxsw: I think so. cloud-init is installed in the installation media, where I'm trying to use it on.
[16:47] <blackboxsw> `cloud-init --version` may help
[16:47] <sorinp> Let me boot up and have a look.
[16:47] <blackboxsw> thxthx
[16:48] <blackboxsw> sry also re-reading minimal's conversation on this too.
[16:48] <holmanb> falcojr: the hyphen thing you pointed out sounds right. still digging
[16:49] <minimal> blackboxsw: from hias logfile he's using 23.2
[16:49] <falcojr> I think there's two different problems we're talking about right now. I think minimal is right that it won't work if you have network disabled. I consider that a configuration issue. Before that he hyphen issue happens and appears to be a cloud-init bug.
[16:51] <sorinp> falcojr: I cannot use network-config because it messes the network configuration and there's no point in using cloud-config if I have to manually have access to the machine.
[16:52] <sorinp> blackboxsw: version = 23.2
[16:53] <falcojr> sorinp: but is it a problem letting cloud-init configure the network? You don't need to pass network configuration to let cloud-init bring up a fallback interface
[16:53] <minimal> sorinp: if ephemeral dhcp was used for seed then I don't think the "network-config: disabled" would be an issue
[16:54] <sorinp> falcojr: Why should cloud-init configure the network if something else is?
[16:54] <minimal> falcojr: I think his problem is that the fallback config is *rendered* to which network config mechanism his distro uses
[16:55] <minimal> it is not just brought up without touching network config files like ephemeral dhcp is
[16:56] <minimal> sorinp: the seed stuff happens early in boot before the OS brings up networking so cloud-init needs a network connection in order to be able to fetch seed URLs
[16:56] <blackboxsw> thx minimal sorinp on version
[16:56] <sorinp> falcojr: I think network configuration should be optional and other modules should not depend of cloud-init having the network configuration modules enabled.
[16:56] <minimal> in a similar fashion to cloud-init accessing Cloud Providers metadata servers in other to retrieve network & user-data config from them
[16:57] <minimal> sorinp: how do you expect c-i to fetch content from seed "http" urls without an active network connection?
[16:58] <sorinp> minimal: I disabled the network and I can ssh into the machine. How can I ssh into the machine if network is not working?!
[16:59] <sorinp> minimal: Correction: I disabled the network-config of cloud-init.
[16:59] <minimal> you ssh'ed into the machine during the early stage of boot where cloud-init's "init-local" ran at? nope, you ssh'ed in once the machine finished booting
[17:00] <minimal> there are 4 parts to cloud-init that run at different stages of booting, "init-local" runs very early, before networking and ssh
[17:00] <sorinp> minimal: Yes. But if I enable network-config, I will not be able to even connect the machine to the network.
[17:01] <sorinp> minimal: cloud-init is creating a file with 10- in front of it, while the distro has all files with 20-.
[17:02] <minimal> sorinp: I'm not disagreeing with you, I'm simply pointing out that in order for cloud-init to retrieve seed URLs it needs an active network connection to be in place and as there is no active network connection when it checks (as the OS has not yet brought up a connection) then cloud-init decides to configure one
[17:02] <sorinp> minimal: I only need one thing from cloud-init to do: setup a ssh key for the root user.
[17:03] <sorinp> minimal: The network was brought up by the bootloader (ipxe) and inherited by the OS before runlevel 3.
[17:03] <minimal> sorinp: nope, you need it to retrieve YAML docs from the seed urls as well in order for it to obtain things like the ssh key
[17:05] <sorinp> minimal: correct. But I only need cloud-init to set up the ssh key. Then I can connect with another tool and set up the machine.
[17:06] <sorinp> minimal: What I'm booting up is an installation medium downloaded from the vendor. This installation medium relies on cloud-init to configure ssh for remote access when a headless machine is installed over ssh.
[17:07] <sorinp> However, when cloud-init is initializing the network, it takes down the connection and the machine is disconnected from the network.
[17:10] <sorinp> The idea is to have a machine that has no OS installed. It is connected to the network and powered up. This machine does not have an ILO or DRAC or any other remote KVM option. It gets its network configuration from the DHCP server and boots up the installation medium from the network.
[17:11] <minimal> sorinp: I understand that PXE booting typically hands over its network connection to an OS. However I suspect that cloud-init is not checking for any existing connection in the "init-local" stage
[17:11] <minimal> I believe cloud-init's seed stuff is not used by many people and I've found some issues with it myself in the past
[17:12] <sorinp> Then it announces itself on mDNS as archiso.local and, with the cloud-init configuration it gets from the boot arguments, it sets up a ssh key for root and it's waiting for another tool to connect (ansible in this case) to set it up.
[17:13] <minimal> you're talking about how you *want/expect* cloud-init to behave, I'm reflecting on how cloud-init currently appears to behave
[17:15] <sorinp> minimal: Yes. :) I was expeting that if I disable the network-config, systemd is launching cloud-init and has the network as After= in its service file when the network is already configured.
[17:16] <sorinp> minimal: What you are saying is that the cloud-init service is an early service.
[17:17] <minimal> what I have said several times is that the cloud-init "init-local" is an early service
[17:17] <minimal> there are typically 4 different services for different parts of cloud-init
[17:18] <sorinp> I can see 3 cloud-init services on arch installation medium: cloud-init-local, cloud-init and cloud-init-hotplug.
[17:19] <sorinp> *cloud-init-hotplugd
[17:19] <minimal> I'm using non-systemd so I have cloud-config, cloud-final, cloud-init, cloud-init-hotplugd, and cloud-init-local
[17:21] <sorinp> I'm started to think it will be impossible for me to have the setup I described above...
[17:22] <minimal> well I'm intending to (eventually) have a similar setup myself in the future.
[17:22] <minimal> I don't think it is impossible, just not currently catered for
[17:25] <sorinp> I will write a wiki on wiki.archlinux.org that will detail how to set up the system like I did.
[17:26] <sorinp> But I wanted to have this piece working first, otherwise it is kinda pointless. One can use a USB stick that is working with cloud-init properly, but needs physical access.
[17:26] <sorinp> Or "remote hands".
[17:27] <sorinp> I already patched archiso for mDNS.
[17:28] <minimal> another option would be a network to run an "installer" to "dd" a prepared disk image onto the machine's HDD/SSD where NoCloud is passed config via either a ISO or a FAT partition in the disk image
[17:28] <minimal> I do that for physical machines
[17:30] <sorinp> minimal: I could also patch the installation medium to download cloud-init.img or cloud-init.iso from the boot-server and loop mount it in the boot-up process, but having this nocloud-net option looked like everything is in place already.
[17:32] <minimal> or pass "ip=" or "network-config=" on cmdline to give it network config info? not sur eif that works with nocloud-net
[17:33] <sorinp> minimal: For that network interface one needs to set up mDNS for the whole system to work.
[17:33] <sorinp> minimal: I don't want static IP configuration.
[17:34] <sorinp> minimal: It has to work on IPv6 networks where people won't remember the IPv6 address.
[17:34] <minimal> I didn't mention static
[17:38] <minimal> you can pass full (Base64-encoded) Network-config v1 or v2 settings via network-config=
[17:41] <sorinp> minimal: So basically ds=nocloud-net does not work at all, unless you pass network-config?
[17:41] <minimal> sorinp: no, I didn't say that. I have nocloud-net working myself
[17:44] <sorinp> minimal: Ok, but perhaps the default network-config is working for you, no?
[17:45] <sorinp> Because from what I understand network-config=disabled is not the default.
[17:52] <minimal> the default/fallback config uses DHCPv4, so obviously if your environment is a IPv6-only one then that won't work for you
[18:06] <sorinp> Should the yaml start with '---'?
[18:07] <sorinp> I'm trying to create a network configuration and encode it.
[18:07] <sorinp> I'll be missing for 30 - 60 minutes.
[19:08] <meena> sorinp: yes, the YAML needs to be valid YAML 
[20:22] <sorinp> I have one more question about NoCloudNet: does it actively check that network-config is not disabled and refuses to use the network if it is?
[20:57] <blackboxsw> sorinp: I don't think that's the case, it's just the a bug we had in detection of `nocloud-net` I believe. we're working on verifying the fix right now to make sure this works.   branch in progress is here. We should have it up for review today.  https://github.com/blackboxsw/cloud-init/tree/fix-nocloud-net-detect-from-cmdline
[20:58] <blackboxsw> that bug affect 23.2 only release. not 23.1.2... so sure enough , we will probably push a point release 23.2.1 for this regression and push it into cloud-init Ubuntu SRU 
[21:02] <sorinp> blackboxsw: That sounds awsome! If you already have a fix, it's brilliant!
[21:03] <sorinp> I can let the archiso guys that the new cloud-init version has a fix and they can release a fixed installation medium.
[21:08] <meena> sorinp: given the set_trace here https://github.com/blackboxsw/cloud-init/commit/78b42af8d74c3d8f8c1ba2134ad77a08edf4b506 this wip is very wip
[21:08] -ubottu:#cloud-init- Commit 78b42af in blackboxsw/cloud-init "wip"
[21:12] <sorinp> meena: This is quite a change...
[21:13] <sorinp> The regexes look better. :)
[21:46] <effendy[m]> Hello. Is bootcmd supposed to run just once like runcmd? Is that what the documentation implies? I find it a little bit ambiguous. In my experience, bootcmd runs after every boot (at least with the Ubuntu cloud images).
[21:48] <effendy[m]> Ah, never mind, it's not ambiguous. It says clearly that "bootcmd will run on every boot" :)
[23:04] <blackboxsw> ok https://github.com/canonical/cloud-init/pull/4204 pushed for review related to the bug discussed today. I created #4203 github issue for tracking
[23:04] -ubottu:#cloud-init- Pull 4204 in canonical/cloud-init "nocloud: parse_cmdline no longer detects nocloud-net datasource" [Open]