/srv/irclogs.ubuntu.com/2023/08/16/#cloud-init.txt

meena	someone wanna do some editing to make this a bit more readable? https://github.com/canonical/cloud-init/issues/4043#issuecomment-1680103576	07:56
-ubottu:#cloud-init- Issue 4043 in canonical/cloud-init "Don't Break On Duplicate Mac Addresses" [Open]		07:56
=== randomvariable__ is now known as randomvariable_
=== mitchdz_ is now known as mitchdz
meena	would changing the tests that currently use Bash, even tho they don't need it, to use sh, speed up the test suite?	11:23
ShaneAH	Hi All, I'm still working on building a golden image that mounts var partitions and I'm getting very close. I am trying to mount the partitions very early on in the packer process so I mount everything to an alt_var run a tar command to copy existing var folders to alt_var and then I'm using sed to change fstab from mounting alt_var to mounting var	14:49
ShaneAH	and then reboot the packer VM to continue the configuraiton process. The problem I'm currently running into is that when I reboot the VM cloud-init is running and removing my changes to fstab and re-applying the /alt_var mountpoint. My understanding is that cc_mounts.py is per instance, how does cloud-init determine if the instance scripts have	14:49
ShaneAH	already been executed?	14:49
minimal	ShaneAH: by the instance-id indicated in the meta-data	14:51
minimal	if a previous instance-id has been recorded and then a new one is provided then obviously it is a change in instance-id and all per-instance modules need to be run	14:52
minimal	as I indicated to you previous, when building any "golden" image cloud-init should be cleaned at the end of the process so that upon boot of any VM creating using the golden image there is no instance data from any previous cloud-inits present and cloud-init then runs per-instance modules	14:54
ShaneAH	is that the guid recorded here? /var/lib/cloud/instances/8457ac95-cbe8-474b-a1ae-6185007a12c7	14:54
ShaneAH	understood but I am in the middle of building that image. I do have cleanup once the image is complete.	14:54
minimal	have you cleaned up cloud-init at the end of your golden image creation?	14:55
ShaneAH	"sudo cloud-init clean --logs --machine-id --seed",	14:55
ShaneAH	"sudo rm -rf /var/lib/cloud/",	14:55
ShaneAH	"sudo systemctl stop walinuxagent.service",	14:55
ShaneAH	"sudo rm -rf /var/lib/waagent/",	14:55
ShaneAH	"sudo rm -f /var/log/waagent.log",	14:55
ShaneAH	"sudo rm -f /var/lib/systemd/random-seed",	14:55
ShaneAH	"sudo rm -f /var/lib/systemd/credential.secret"	14:55
minimal	ok, this is happening during creation?	14:55
ShaneAH	correct.	14:55
ShaneAH	I'm trying to get /var on a mount point in the middle of my golden image build.	14:55
minimal	so then I assume the instance-id will not have changed and per-instance modules won't be run	14:55
ShaneAH	One would think not but I'm trying to confirm. I'm running a packer -debug session trying to capture things before and after the boot. I'm just uncertain what defines a new instance.	14:56
minimal	have you looked at the cloud-init logfile?	14:56
ShaneAH	Yep, and I see cc_mounts.py running twice. Once on inital boot of the packer VM and then again after rebooting.	14:57
minimal	how are you providing metadata/user-data/network-config during the Packer run?	14:57
minimal	the logfile will show whether cloud-init determines if the instance-id has changed or remains the same	14:58
ShaneAH	2023-08-16 14:14:48,138 - stages.py[DEBUG]: previous iid found to be NO_PREVIOUS_INSTANCE_ID	15:01
ShaneAH	2023-08-16 14:14:50,113 - stages.py[DEBUG]: previous iid found to be 8457ac95-cbe8-474b-a1ae-6185007a12c7	15:01
ShaneAH	I'm using the custom_data_file in the azure-arm source to provide a file to cloud-init.	15:01
minimal	a user_data file?	15:01
minimal	so Azure is providing the meta-data?	15:02
ShaneAH	Hmm...	15:02
ShaneAH	yes, I believe the answer is yes.	15:02
minimal	if which case they decide whether to provide the same instance-id or not across reboots	15:03
ShaneAH	cloud-init query --all right?	15:03
ShaneAH	drwxr-xr-x 2 root root 4096 Aug 16 14:19 data	15:04
ShaneAH	drwxr-xr-x 2 root root 4096 Aug 16 14:14 handlers	15:04
ShaneAH	lrwxrwxrwx 1 root root 61 Aug 16 14:19 instance -> /var/lib/cloud/instances/8457ac95-cbe8-474b-a1ae-6185007a12c7	15:04
ShaneAH	drwxr-xr-x 3 root root 4096 Aug 16 14:14 instances	15:04
ShaneAH	drwxr-xr-x 6 root root 4096 Aug 16 14:14 scripts	15:04
ShaneAH	drwxr-xr-x 2 root root 4096 Aug 16 14:14 seed	15:04
ShaneAH	drwxr-xr-x 2 root root 4096 Aug 16 14:15 sem	15:04
ShaneAH	packer@pkrvmclv34au9u3:/var/lib/cloud$ cloud-init query --all \| grep instance	15:04
ShaneAH	"instance-id": "8457ac95-cbe8-474b-a1ae-6185007a12c7",	15:04
ShaneAH	"instance-id": "8457ac95-cbe8-474b-a1ae-6185007a12c7",	15:04
ShaneAH	"instance_id": "8457ac95-cbe8-474b-a1ae-6185007a12c7",	15:04
ShaneAH	"userdata": "<redacted for non-root user> file:/var/lib/cloud/instance/user-data.txt",	15:04
ShaneAH	"instance-id": "8457ac95-cbe8-474b-a1ae-6185007a12c7",	15:04
ShaneAH	"instance_id": "8457ac95-cbe8-474b-a1ae-6185007a12c7",	15:04
ShaneAH	"vendordata": "<redacted for non-root user> file:/var/lib/cloud/instance/vendor-data.txt"	15:04
ShaneAH	so it looks like the same instance id right?	15:05
minimal	in cloud-init.log you may see things like "__init__.py[DEBUG] Update datasource metadata and network config due to events: boot-new-instance"	15:07
ShaneAH	meeting just started, I'll look for that in just a bit.	15:08
ShaneAH38	hmm, seems libera may not like that I connected to VPN. :(	16:15
ShaneAH38	The instance id doesn't seem to have changed, what else would cause cloud-init to think it needs to re-run the config?	16:42
ShaneAH38	for giggles I rebooted the VM again while waiting at a debug point in packer and cloud-init again re-wrote the fstab file.	17:08
minimal	if it doesn't believe the instance-id has changed then I wouldn't expect it to run per-instance modules, it may run other types of modules however	17:08
ShaneAH38	cc_mounts is per instance right?	17:09
minimal	that's what the docs say	17:09
ShaneAH38	I cloned the repo as well and "frequency" is per instance...	17:10
minimal	and cloud-init.log will also state that	17:10
ShaneAH38	so it seems there is something odd about the instance detection.	17:11
minimal	why? you haven't provided any cloud-init.log file to look at	17:12
minimal	in cloud-init.log do you see "helpers.py[DEBUG]: config-mounts already ran (freq=once=-per-instance)" ?	17:15
ShaneAH38	I can absolutely do that, is ther eanything that should be scrubbed?	17:16
minimal	you can run "cloud-init collect-logs" which I believe if "-u" is not specified should exclude sensitive info	17:17
ShaneAH38	I don't see helpers.py but I do see handlers.py	17:17
ShaneAH38	ok, sec.	17:18
ShaneAH38	do people usually post the entire tar.gz?	17:19
minimal	typically there would be a "handlers.py[DEBUG]: finish: init-network/config-mounts: SUCCESS: config-mounts previously ran" line immediately after that one	17:19
minimal	it is recommended to add that to any Github Issue raised	17:20
ShaneAH38	I see several "previously ran" but none for config-mounts.	17:22
ShaneAH38	grep "previously ran" cloud-init.log \| grep mounts shows nothing	17:22
minimal	I'm shooting in the dark without seeing the logfile	17:23
ShaneAH38	working on that now.	17:23
ShaneAH38	sigh, uploads failing.	17:27
blackboxsw	ShaneAH38: as mentioned by minimal the cloud-init clean will trigger the instance-id to be set to NO_PREVIOUS_INSTANCE_ID again so that'd also cause all modules to re-run. as well. And beyond that meta-data changing the `instance-id` value to a different UUID will also trigger such a PER_INSTANCE event.	17:33
ShaneAH38	Well, and paste.opendev won't let me put the whole log in there...	17:34
blackboxsw	ShaneAH38: If you are collecting install logs from your packer install and trying to track instance-id changes across the fully deplyoment for filing an issue in github, I'd suggest not running providing the `--logs` to cloud-init clean as you'd likely want to preserve those early install stage logs to see how many instance-id triggers were present through your install and reboot.	17:34
ShaneAH38	blackboxsw thanks for the info. And yes, I see the first instance of NO_PREVIOUS_INSTANCE_ID but the reboot does show an instance-id that exists and if I query metadata after the reboot the instance-id is the same as the instance folder.	17:35
ShaneAH38	Would it be appropriate for me to start a github issue so that I can upload the log files there?	17:37
blackboxsw	"do people usually post the entire tar.gz?" generally to a github issue if filed. One thing to peek at for credentials leaks would be the file /run/cloud-init/instance-data-sensitive.json and or /var/log/cloud-init.log to ensure passwords aren't represented in user-data in the tar.gz	17:37
blackboxsw	ShaneAH38: yes let's start with a github issue explaining your deployment and if it needs to go to support type discussions we can go from there to other channels where appropriate	17:43
blackboxsw	falcojr: holmanb the more I read https://github.com/canonical/cloud-init/pull/4325 and the original bug https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/2012044 , the more I don't want to try supporting images with a python source installed version of cloudinit from a deb-package postinst. There are so many places for that to fall over or not cover corner cases.	17:44
-ubottu:#cloud-init- Pull 4325 in canonical/cloud-init "check whether old version is empty in postinst" [Open]		17:44
-ubottu:#cloud-init- Launchpad bug 2012044 in cloud-init (Ubuntu) "/var/lib/cloud/data/upgraded-network file touched after apt install cloudinit" [Low, Triaged]		17:44
blackboxsw	I don't know if I'm just being grumpy, but it feels out of typical support scenarios to tell folks they can create an image with an unpackaged version of cloudinit from source and also expect to upgrade to a newer packaged version of cloudinit and expect that to work out of the box.	17:46
blackboxsw	talk me down if that feels like a reasonable solution/use-case for folks in various distributions `python3 setup.py install cloud-init; apt/yum/zypper install cloud-init`	17:48
ShaneAH38	Issue created. https://github.com/canonical/cloud-init/issues/4359	17:49
-ubottu:#cloud-init- Issue 4359 in canonical/cloud-init "Rebooting VM during packer build causes cloud-init to run instance modules again" [Open]		17:49
blackboxsw	my shallow guess in this space would be that someone interested in source installed software would typically continue to upgrade that package from source as needed and avoid the distribution packages altogether	17:51
blackboxsw	. A transition from source installl to distro deb/rpm-based packages is something that represents adoption of a different packaging and delivery vehicle for their software that likely will always need manual interaction at some level	17:51
minimal	blackboxsw: agreed. We've already seen that the cloud.cfg contents for Debian packaged c-i is quite different than for upstream. Likewise probably for at least some other distros	17:52
blackboxsw	thanks ShaneAH38 for the issue, it'll better help triage what's going on there	17:52
minimal	blackboxsw: and indeed it's possible a distro packaged c-i may alternatively use /etc/cloud/cloud.cfg.d/ files to override cloud.cfg and so that may also have an impact then on behaviour	17:54
minimal	ShaneAH38: the info in your issue shows Packer doing a "sed" on /etc/fstab to change /alt_var entries to /var - but there's no info on how those entries end up in fstab in the first place and what their contents are	18:03
ShaneAH38	minimal that's in the description of the issue. I put the cloud-init file I'm sending to packer.	18:11
ShaneAH38	mounts:	18:11
ShaneAH38	- [ "/dev/disk/azure/scsi1/lun0-part1", "/alt_home", "auto", "defaults,nofail", "0", "2" ]	18:11
ShaneAH38	- [ "/dev/disk/azure/scsi1/lun0-part2", "/alt_tmp", "auto", "defaults,nofail", "0", "2" ]	18:11
ShaneAH38	- [ "/dev/disk/azure/scsi1/lun0-part3", "/alt_var", "auto", "defaults,nofail", "0", "2" ]	18:11
ShaneAH38	- [ "/dev/disk/azure/scsi1/lun0-part4", "/alt_var/log", "auto", "defaults,nofail,x-mount.mkdir", "0", "2" ]	18:11
ShaneAH38	- [ "/dev/disk/azure/scsi1/lun0-part5", "/alt_var/log/audit", "auto", "defaults,nofail,x-mount.mkdir", "0", "2" ]	18:11
ShaneAH38	- [ "/dev/disk/azure/scsi1/lun0-part6", "/alt_var/tmp", "auto", "defaults,nofail,x-mount.mkdir", "0", "2" ]	18:11
ShaneAH38	and the log file at 2023-08-16 14:19:03,095 shows those being replaced.	18:13
ShaneAH38	2023-08-16 14:19:03,095 - cc_mounts.py[DEBUG]: Changes to fstab: ['- /dev/disk/azure/scsi1/lun0-part1 /home auto defaults,nofail,comment=cloudconfig 0 2', '- /dev/disk/azure/scsi1/lun0-part2 /tmp auto defaults,nofail,comment=cloudconfig 0 2', '- /dev/disk/azure/scsi1/lun0-part3 /var auto defaults,nofail,comment=cloudconfig 0 2', '-	18:13
ShaneAH38	/dev/disk/azure/scsi1/lun0-part4 /var/log auto defaults,nofail,x-mount.mkdir,comment=cloudconfig 0 2', '- /dev/disk/azure/scsi1/lun0-part5 /var/log/audit auto defaults,nofail,x-mount.mkdir,comment=cloudconfig 0 2', '- /dev/disk/azure/scsi1/lun0-part6 /var/tmp auto defaults,nofail,x-mount.mkdir,comment=cloudconfig 0 2', '+	18:13
ShaneAH38	/dev/disk/azure/scsi1/lun0-part1 /alt_home auto defaults,nofail,comment=cloudconfig 0 2', '+ /dev/disk/azure/scsi1/lun0-part2 /alt_tmp auto defaults,nofail,comment=cloudconfig 0 2', '+ /dev/disk/azure/scsi1/lun0-part3 /alt_var auto defaults,nofail,comment=cloudconfig 0 2', '+ /dev/disk/azure/scsi1/lun0-part4 /alt_var/log auto	18:13
ShaneAH38	defaults,nofail,x-mount.mkdir,comment=cloudconfig 0 2', '+ /dev/disk/azure/scsi1/lun0-part5 /alt_var/log/audit auto defaults,nofail,x-mount.mkdir,comment=cloudconfig 0 2', '+ /dev/disk/azure/scsi1/lun0-part6 /alt_var/tmp auto defaults,nofail,x-mount.mkdir,comment=cloudconfig 0 2']	18:13
minimal	that's the user-data you're using before you reboot the VM? or provided to both pre-reboot and post-reboot?	18:13
minimal	also please don't post large amounts of text in IRC	18:13
ShaneAH38	so the reboot does not change any userdata.	18:13
ShaneAH38	(sry)	18:13
ShaneAH38	there's not really an opportunity to change it in the packer process.	18:14
minimal	so during the reboot the fstab will contain entries for the disk's partitions with /var mountpoints and the user-data will contain entries for mounting the same disk partitions on /alt_var	18:16
ShaneAH38	correct.	18:16
minimal	so are you expecting as the end result the fstab to have both /var and /alt_var entries for the same partitions?	18:17
ShaneAH38	I'm expecting to only have /var not the /alt_ partitions after the reboot.	18:17
ShaneAH38	I was not expecting cloud-init to run again.	18:17
ShaneAH38	(well at least not the instance specific modules)	18:17
minimal	I assume it is to do with these being Azure ephemeral disks	18:19
minimal	I see disk_setup also ran	18:19
ShaneAH38	right, I saw that.	18:19
minimal	earlier in the logs there are lines relating to the Azure DS that indicate that NTFS cannot be mounted	18:22
ShaneAH38	so the disks I'm creating are not ephemeral...So are you saying that cloud-init is going to run like a Desired State Configuration system and continuously try and apply it's settings?	18:22
ShaneAH38	Yeah, I saw that also and just havne't chased that down yet.	18:22
minimal	and there's reference to "sem" files fro both disk_setup and mounts not existing	18:23
ShaneAH38	I'm not messing with the ephemeral stuff	18:23
ShaneAH38	let me put the fstab in the github issue	18:24
minimal	that would help	18:24
ShaneAH38	done.	18:24
ShaneAH38	Other than the partitions I'm not trying to be clever	18:24
ShaneAH38	FWIW I really do appreciate your assistance, I'm not sure how I would proceed so thanks.	18:25
ShaneAH1	minimal you mention sem files but I'm not following.	19:57
meena	blackboxsw: if these patches of mine ever get merged https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=273122 we could consider running tests on Cirrus CI on FreeBSD, as a stop gap until i got the lxd stuff done	20:02
-ubottu:#cloud-init- bugs.freebsd.org bug 273122 in Ports & Packages "lang/python311: backport netlink support" [Affects Only Me, In Progress]		20:02
minimal	meena: with #4348 TestGetProcPpid.test_get_proc_ppid_ps is failing for me on Alpine. Seems to be due to choice of options passed to "ps" - I suspect Busybox's ps doesn't support some of them	20:08
minimal	will investigate further	20:08
minimal	ShaneAM1: was referring to the lines "DataSourceAzure.py[DEBUG]: Marker "/var/lib/cloud/instances/sem/config_mounts" for module "mounts" did not exist. And the similar line regarding disk_setup	20:10
meena	minimal: i checked the man page before confidently declaring it should work everywhere	20:18
minimal	meena: if actually seems that is_Linux isn't giving the expect result for Alpine	20:18
minimal	and so get_proc_ppid_ps is called, not get_proc_ppid_linux	20:19
minimal	?	20:19
minimal	oops	20:19
meena	how?	20:19
minimal	don't know, that's the only explanation as to why get_proc_ppid_ps is being called when I'm building on Alpine	20:20
meena	weird.	20:21
minimal	the testcase is test_get_proc_ppid_ps	20:23
minimal	and is_Linux is mocked	20:23
minimal	and "m_is_Linux.return.value = False" in the testcase so it acts as though it is not linux	20:25
minimal	so the testcase then calling get_proc_ppid will result in that calling get_proc_ppid_ps	20:26
ShaneAH1	minimal The instance at 2023-08-16 14:14:50,405 is the initial boot of the packer vm. If I look in the /var/lib/cloud/sem folder I only see one file config_scripts_per_once.once	20:28
minimal	which then uses subp to execute "ps" with the "-p" option which is not recognised by Busybox ps	20:30
minimal	meena: you checked the GNU ps manpage? or also the Busybox manpage? ;-)	20:30
minimal	but I also don't understand why a testcase for a function, get_proc_ppid_ps, that is not intended to be used on Linux is then running that function of Linux...	20:31
meena	minimal: busybox	20:34
minimal	meena: https://busybox.net/Dwonloads/Busybox.html#ps only shows "-o" and "-T" as supported options	20:49
minimal	oops, https://busybox.net/downloads/BusyBox.html#ps	20:49
minimal	but I don't understand why the "ps" isn't mocked in the testcase	20:50
meena	because /proc isn't mocked either	20:56
meena	anyway, it looks like i smashed ps & pscan	20:57
ShaneAH1	FWIW I removed all of my partition manipulation except for home so now there is only a single partition that I am trying to mount. I was hoping that maybe something odd was happening given that I was working with var but the same behaviour exists.	21:19
dbungert	minimal: I'm poking around on the UEFI / grub seed question - what does seed mean there? I have a guess but wanted to hear your elaboration.	21:33
minimal	dbungert: there are several ways to seed the Linux kernels' entropy	21:49
minimal	one of those from the UEFI itself - a bootloader (if it supports this) can pass it also to the kernel	21:49
minimal	dbungert: I think this might be systemd-bootd's equivalent: https://github.com/systemd/systemd/blob/main/src/boot/efi/random-seed.c	21:54
minimal	meena: I added "procps-ng" package to the Alpine package's "checkdepends" to install the full version of "ps" during testing	21:57
meena	minimal: i think i would rather fix that test to exclude alpine. since it has a working /proc	22:01
minimal	don't all Linux distros have a working /proc?	22:03
minimal	dbungert: this is related: https://patchwork.kernel.org/project/linux-arm-kernel/patch/1475749646-10844-2-git-send-email-ard.biesheuvel@linaro.org/	22:17
minimal	"Note that the config table could be generated by the EFI stub or by any other UEFI driver or application (e.g., GRUB)"	22:18
meena	https://github.com/canonical/cloud-init/commit/8a70dbc49e609ac900b5f7b5b4358b0ccaf6c4aa#r124702899	23:23
-ubottu:#cloud-init- Commit 8a70dbc in canonical/cloud-init "util: Fix get_proc_ppid() on non-Linux systems (#4348)"		23:23
meena	minimal: re proc: i guess? I don't know. I spend way too much time in FreeVSD land	23:24

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!