[14:02] <falcojr> blackboxsw: PRs are up for the rest of the ubuntu series
[14:07] <flaf> Hi @all. I'm using cloud-init on Redhat 8 with metadata where the only interface (ens192) is renamed to eth0. Cloud-init creates a classical udev rule in /etc/udev/rules.d/70-persistent-net.rules. OK. But even if I delete the renderer /etc/udev/rules.d/70-persistent-net.rules after a reboot the interface is still renamed to eth0, as if cloud-init renamed interface without using the udev rule.
[14:08] <flaf> If I delete the renderer /etc/udev/rules.d/70-persistent-net.rules and if I disable cloud-init, in this case the interface is not renamed and keeps its default name ens192.
[14:10] <flaf> So I imagine that cloud-init renames the interface at every reboot without using the udev rule file. Is it the expected behaviour and, if yes, why? I would like to understand. Thx for your help.
[14:13] <rbasak> Are you sure it's not ifnames or biosdevname that's doing this? I'm not familiar with modern RH specifically, but I thought it might be helpful to mention that there are also a bunch of other mechanisms for NIC renaming that exist generally. Or are you sure it's cloud-init doing this?
[14:19] <flaf> rbasak: maybe but if I rm the file /etc/udev/rules.d/70-persistent-net.rules and I reboot, the interface is still renamed to eth0 and, if I "touch /etc/cloud/cloud-init.disabled" and I reboot, then the interface keeps its default name ens192. Then if I rm /etc/cloud/cloud-init.disabled and I reboot, then the interface is renamed to eth0 etc. etc. So I have the feeling that cloud-init renames
[14:19] <flaf> the interface on-the-fly during each boot. I don't say it's a bug, it's maybe an expected behaviour... I would like to understand.
[14:22] <falcojr> flaf: what cloud? cloud-init will apply whatever network config is fed to it, but will also generate a fallback networking config if none is fed to it...though that shouldn't involve renaming anything
[14:23] <flaf> falcojr: the VM is a vmware/vSan VM and I use the datasource "VMware" with this metadata:
[14:24] <flaf> http://paste.alacon.org/47370 <= my metadata
[14:27] <falcojr> set-name: "eth0" ?
[14:27] <flaf> falcojr: Is it incorrect?
[14:28] <falcojr> flaf: that's why your interface name is being set to eth0
[14:28] <flaf> With this metadata I have these 2 renderers http://paste.alacon.org/47371
[14:28] <flaf> falcojr: ^^
[14:30] <flaf> falcojr: yes indeed. But this is not exactly my question. My question is why after “rm /etc/udev/rules.d/70-persistent-net.rules && reboot” the interface is still renamed to eth0 ?
[14:31] <flaf> falcojr: I admit that is strange to rename the interface to eth0 in metadata and rm after the udev rule but it's just to test and try to understand mechanism.
[14:31] <falcojr> flaf: the udev rule specifies what to do when the device is added to an already running system. This is separate from the network configuration that gets applied on boot
[14:34] <flaf> falcojr: ah... but in this case, where is the location of the conf file which allows cloud-init to know that the interface must be renamed at each reboot?
[14:37] <flaf> falcojr: it's strange, before I use cloud-init, I have already renamed interfaces via a file /etc/udev/rules.d/70-persistent-net.rules and a reboot. In fact, before I use cloud-init, it was the way I used to rename interface. Something is unclear for me in your answer.
[14:44] <falcojr> sorry, I meant that in the context of what cloud-init does. udev rules do get applied on boot, but if you're using cloud-init, we use other network configuration
[14:45] <falcojr> flaf: re: location of conf file...there's not a single configuration file for that. It depends on whether cloud-init detects this boot as having a new instance or whether the datasource specifies to re-fetch metadata on each boot
[14:46] <flaf> falcojr: in another words, for me the udev rule file is enough to ensure the interface renaming at each boot, and I don't understand why cloud-init uses another mechanism and I don't understand well which is this mechanism.
[14:47] <flaf> and why, in this case, put a udev rule file if is finally useless.
[14:48] <flaf> cloud-init creates a udev rule file but this file is not necessary in fact. It's curious.
[14:58] <falcojr> flaf: sorry, I am not explaining this well. cloud-init will take the network configuration from the metadata and then write out network configuration that is specific to the distro it is working on. For some distros, this includes a udev file. But this is all based on the higher level network config...the thing you posted as the metadata in one of
[14:58] <falcojr> your first paste links. It may be true that with sysconfig, a udev rule is no longer necessary. I don't know the details of how rhel sets up networking very well. Did you check that the udev rule isn't getting regenerated on boot?
[15:00] <flaf> falcojr: the udev rule file is generated by cloud-init after the first boot (or if the cloud-init cache has been cleaned of course). This is, according to my test, the only moment where the udev rule file is generated.
[15:05] <falcojr> flaf: then in that case, I also find it odd that disabling cloud-init changes the network name on a subsequent boot :)
[15:05] <flaf> falcojr: Ok, we can imagine that the udev rule file is unnecessary but which mechanism does cloud-init uses to renamed the interface? I tried to found without success. Redhat 8 uses systemd, does cloud-init use systemd generator or something like that?
[15:08] <falcojr> cloud-init does have a systemd generator, but only to determine if cloud-init should run
[15:20] <flaf> falcojr: thx for your help. At this stage of my understanding concerning my initial question, I would say roughly that 1) cloud-init manages to rename the interface according to my metadata (I imagine via python code which creates on-the-fly udev rules in /run/ directory during each boot) and 2) in addition cloud-init puts a udev rule file (in the filesystem) in order to that interface renaming
[15:20] <flaf> persists even when cloud-init is disabled (or even uninstalled from the OS), which might make sens.
[15:22] <flaf> *sense
[15:24] <minimal> flaf: on every boot cloud-init looks at the names of network interfaces and the network config from provided "metadata" to see if any interfaces need rename. This can be seen in /var/log/cloud-init.log if debug is enabled
[15:25] <minimal> e.g. you might see a line "__init__.py[DEBUG]: no interfaces to rename"
[15:30] <flaf> minimal: falcojr: the thing that bother me is I thought that the role of cloud-init was just limited to writing configuration at boot (network config or not network config) specific to the distribution and then just letting the OS apply these configurations itself at boot (via the usual boot mechanism of the distribution). I didn't think cloud-init changed the state of the OS other than like
[15:30] <flaf> this.
[15:32] <minimal> flaf: some cloud-init modules may only on 1st boot but some other c-i modules can run on every boot
[15:33] <flaf> I have difficulties to undestand why cloud-init needs to do something other than "write config and let OS boot with this conf".
[15:34] <minimal> flaf: it does - it write the network config (eni, netplan, etc depending on distro) and it writes the udev netdev persistent rules which the OS then uses
[15:35] <minimal> your c-i network tells c-i to rename the ethernet to eth0 and so c-i writes config files to ensure this happens but you are not happy that it writes the config files to achieve this?
[15:37] <flaf> minimal: but above (yes it's a little long, sorry, I have an simple example which seems to indicate that cloud-init renames the interface _without_ using the udev rule file it created itself.
[15:38] <minimal> flaf: c-i itself does not use the udev rule, it creates the udev rule
[15:39] <flaf> minimal: no, that not my question. I explain you.
[15:40] <flaf> minimal: here is my metadata http://paste.alacon.org/47370 and here is the 2 renderer conf files generated by c-i http://paste.alacon.org/47371 (a network conf file and a udev rule file).
[15:41] <flaf> minimal: my question: if I make this test: “rm /etc/udev/rules.d/70-persistent-net.rules && reboot”, the interface is still renamed to eth0, why?
[15:43] <flaf> minimal: however, the udev rule file is not present after reboot. So the udev rule file is gone but the interface is still renamed, why?
[15:44] <minimal> flaf: as I pointed out to you earlier if you enable debugging for c-i (maybe it is already) and look at /var/log/cloud-init.log you will see c-i deciding whether any network interfaces need to be renamed based on the metadata you provided
[15:45] <minimal> on *EVERY* boot c-i checks if the expected network interfaces ("eth0" in your case) exists or if another interface needs to be renamed to "eth0"
[15:46] <flaf> minimal: and c-i renames the interface if needed, even if the udev rule file is not present?
[15:46] <minimal> is it still renamed as the metadata you provided to c-i told c-i that it should be renamed
[15:47] <minimal> flaf: as I told you earlier c-i does not use udev rule files at all - udev used udev rule files
[15:49] <flaf> minimal: so, in this specific case, c-i changes the OS state without using the process "1) write conf files and 2) let the OS boots normally and uses these conf files", correct?
[15:50] <minimal> flaf: have you looked at /var/log/cloud-init.log with debug enabled to see what is happening?
[15:51] <flaf> minimal: and my second question: in this case, why c-i creates a udev rule file if it's finally unnecessary.
[15:53] <minimal> why don't you look at the log as explained to see what is actually happening? that should answer your questions for your specific situation and Linux distro
[15:53] <flaf> minimal: I'm seeing that now...
[15:58] <flaf> minimal: here is my log https://gist.github.com/flaf/5858afbdc0f6441890768ccc976f7620
[16:00] <minimal> flaf: so you see this line: 2021-11-03 14:59:08,129 - __init__.py[DEBUG]: no work necessary for renaming of [['00:50:56:a4:ee:a3', 'eth0', 'vmxnet3', '0x07b0']]
[16:02] <flaf> minimal: yes but I'm not sure to understand.
[16:02] <flaf> minimal: I can see too that cloud-init make a basic "ip link set" command to rename the interface. 
[16:04] <flaf> minimal: so why sometime “no work necessary for renaming of [['00:50:56:a4:ee:a3', 'eth0', 'vmxnet3', '0x07b0']]” and sometime “Running command ['ip', 'link', 'set', 'ens192', 'name', 'eth0'] with allowed return codes [0] (shell=False, capture=True)”
[16:10] <flaf> minimal: ah maybe this explanation: at the first boot, cloud-init will use “ip link set” to rename the interface to eth0 because it default name is ens192. But at next boot, the udev rule has already renamed the interface to eth0, so I have line “no work necessary for renaming ...”. Is it correct?
[16:10] <minimal> the ens192 lines I assume are from the 1st boot
[16:11] <minimal> do you have logs from when you deleted udev rules file and rebooted?
[16:12] <flaf> minimal: no but I can make the test now...
[16:15] <minimal> that would then answer your questions
[16:18] <flaf> minimal: https://gist.github.com/flaf/5858afbdc0f6441890768ccc976f7620#gistcomment-3949882 => boot starts at 2021-11-03 16:13:47,353
[16:18] <flaf> this is unclear for me.
[16:25] <flaf> minimal: the other point for me is: why cloud-init creates a udev rule file when cloud-init makes already a rename at each boot via "ip link set"?
[16:43] <minimal> flaf: my understanding is that during 1st boot c-i renames the interface AND creates the udev file so that during subsequent boots udev will rename the interface so that c-i does not need to do so
[16:45] <flaf> minimal: ok. it's my understanding too.
[16:49] <flaf> minimal: c-i interface renaming at each boot and creating of udev file seems to me redundant but why not. I notice that the interface naming is a special case where this is not only "a) writing conf files and b) normal boot" process, c-i can make some setting without using conf files (via "ip link set").
[16:49] <minimal> c-i renaming does not happen at each boot
[16:50] <minimal> as the creation of the udev file on 1st boot avoids this
[16:51] <flaf> minimal: yes, sorry, it was approximation: c-i renaming happens at each boot *if required*.
[16:51] <minimal> flaf: yes because for some DataSources it is possible that the network-config can change between boots
[16:54] <flaf> minimal: Ah ok, now it's clearer now. Thx for your help.
[16:55] <minimal> flaf: c-i is working as designed
[16:55] <flaf> minimal: yes indeed, and I have understood lot things now. Many thx minimal.
[17:06] <flaf> falcojr: minimal: big thx for your help (and sorry for my poor english). Now I'll take some notes of all this info. Bye. :)
[20:54] <akutz> Does anyone here have any guidance on how to build cloud-init 21.3 for CentOS 8 Stream?
[21:07] <blackboxsw> akutz: ./tools/run-container centos/8 --package   # from cloud-init repo uses LXD to launch centos 8 container and build a cloud-init RPM using the spec template in packages/redhat/cloud-init.spec.in I think
[21:08] <blackboxsw> we use it for our copr build jenkins publisher, which also builds centos/8 packages for testing/development from tip of main  @ https://copr.fedorainfracloud.org/coprs/g/cloud-init/cloud-init-dev/
[21:09] <akutz> Grazzie!
[21:13] <akutz> blackboxsw: There isn't a single successful 21.3 build. The last successful build was for 21.2, and now y'all are on 21.4. Should there not have been a successful 21.3 build?
[21:13] <akutz> I'm looking at https://copr.fedorainfracloud.org/coprs/g/cloud-init/cloud-init-dev/package/cloud-init/
[21:13] <akutz> (under packages)
[21:13] <akutz> Then I clicked on "cloud-init"
[21:15] <blackboxsw> akutz: I think the "failed" status is a global there due to failures on ppc64 build envs. if I look into each build https://copr.fedorainfracloud.org/coprs/g/cloud-init/cloud-init-dev/build/2918713/  I see success for epel-8 based RPMs. checking details on a centos/8 container to see if I can see the a  recently build cloud-init from the copr repo
[21:15] <akutz> Ahh. Oops, sorry to be an alarmist.
[21:16] <akutz> Do you know which one was the official build by chance? I guess it's not there since this is a dev channel after all.
[21:18] <akutz> Either way, thank you again sir!
[21:19] <blackboxsw> yeah: "(182/182): cloud-init-21.4-1.el8.noarch.rpm     349 kB/s | 1.1 MB     00:03    " on my `yum update` in a container when adding this text to /etc/yum.repos.d/CentOS-cloudinit.repo    https://copr.fedorainfracloud.org/coprs/g/cloud-init/cloud-init-dev/repo/epel-8/group_cloud-init-cloud-init-dev-epel-8.repo
[21:20] <blackboxsw> and no worries we probably should drop EL-7 ppc64le builders anyway to avoid triggering the red/green Failure alarms anyway
[21:22] <blackboxsw> akutz: we are going through an upstream release 21.4 of cloud-init at the moment and we should publish those "more stable" builds to https://copr.fedorainfracloud.org/coprs/g/cloud-init/el-testing/
[21:22] <blackboxsw> actually looks like falcojr already has thanks James!
[21:23] <blackboxsw> akutz: so 21.4 is latest release, and we should only push updates to the el-testing repo when we cut upstream releases and SRU that content back to earlier versions of Ubuntu
[21:23] <blackboxsw> that way consumers don't have as much churn/risk of daily breakag
[21:23] <akutz> Ack, thanks!
[21:23] <blackboxsw> that way consumers don't have as much churn/risk of daily breakage
[21:23] <akutz> 21.4 already, fantastic. Y'all rock!
[21:24]  * blackboxsw tips the hat to falcojr :)
[21:25] <blackboxsw> akutz: note we are starting to get other PRs that are adding dependencies on netifaces python module. we may need to chat with you/discuss on mailinglist over the coming weeks about whether we can garner enough support in cloudinit.net modules to meet the gaps you mentioned that the VMWare datasource needed support for 
[21:25] <blackboxsw> if we are to mainline that "missing" functionality in existing cloud-init network modules/functions.
[21:27] <blackboxsw> https://github.com/canonical/cloud-init/pull/1079 adds another datasource importing python's netifaces module which I think you mentioned VMware was using due to some failures in cloud-init on Macbooks or OSX I can't quite recall
[21:28] <blackboxsw> I'll go back the the original VMware PR to get more context. But, we'll probably have a couple of questions if we long-term try to drop the python netifaces dependency 
[21:43] <akutz> blackboxsw: Are you indicating this is a bad thing or just something you'd rather avoid? I'm happy to readdress this. Maybe we should set up a Zoom between a few of us to review the reasons I put it there in the first place and we can determine if there's a safe way off of it.
[21:45] <blackboxsw> akutz: I think from cloud-init's perspective the less additional python module dependencies the better, to avoiding image bloat pulling due to pulling in lots of python package dependencies. I don't think it's too "bad" in this case, but I think we might be able to solve the platform/environment use-cases that you mentioned as gaps when we introduced the VMware datasource which imported netifaces.
[21:45] <blackboxsw> If we *can* solve that in cloud-init proper without additional package dependency, that's a small win in my mind.
[21:46] <blackboxsw> yes I think it's worth us enumerating the known gaps to see if we can't tackle them before our 22.1 release early next year. Then we can avoid adding more datasource consumers of netifaces if unnecessary.
[21:47] <blackboxsw> I think it's worth us starting an email thread on this to see if there are interested parties and we can carve out investigatory work to see if this is possible/valuable.
[21:49] <akutz> Sure thing. I'm akutz at vmware dot com if you want to kick something off and CC me.
[21:49] <blackboxsw> will talk w/ falcor tomorrow and somone will reflect this to:cloud-init@lists.launchpad.net too
[21:49] <blackboxsw> +1 will do
[21:49] <akutz> I did not really consider how opening pandora's box would cause *other* people to start depending on this. I just thought it would be an AI for me to remove my own dep at some point. Thank you for letting me know!
[21:52] <blackboxsw> no worries/blame. it was a risk once it was included. But we can sort our trajectory and see if it's reasonable feedback on the new Xensource datasource too to avoid additional dependencies there
[22:15] <blackboxsw> falcojr: for SRU review of 21.4, I want to wait until we minimally have LXD images present in image streams for Jammy to make sure our LXD handling actually works as intended. currently I see that image builds are stuck at Nov 1st which was before our upload.
[22:15] <blackboxsw> so, I'll approve but let's not merge until we have an successful Jammy test run with released images containing LXDdatasource