[07:56] <meena> someone wanna do some editing to make this a bit more readable? https://github.com/canonical/cloud-init/issues/4043#issuecomment-1680103576
[07:56] -ubottu:#cloud-init- Issue 4043 in canonical/cloud-init "Don't Break On Duplicate Mac Addresses" [Open]
[11:23] <meena> would changing the tests that currently use Bash, even tho they don't need it, to use sh, speed up the test suite?
[14:49] <ShaneAH> Hi All, I'm still working on building a golden image that mounts var partitions and I'm getting very close. I am trying to mount the partitions very early on in the packer process so I mount everything to an alt_var run a tar command to copy existing var folders to alt_var and then I'm using sed to change fstab from mounting alt_var to mounting var
[14:49] <ShaneAH> and then reboot the packer VM to continue the configuraiton process.  The problem I'm currently running into is that when I reboot the VM cloud-init is running and removing my changes to fstab and re-applying the /alt_var mountpoint.  My understanding is that cc_mounts.py is per instance, how does cloud-init determine if the instance scripts have
[14:49] <ShaneAH> already been executed?
[14:51] <minimal> ShaneAH: by the instance-id indicated in the meta-data
[14:52] <minimal> if a previous instance-id has been recorded and then a new one is provided then obviously it is a change in instance-id and all per-instance modules need to be run
[14:54] <minimal> as I indicated to you previous, when building any "golden" image cloud-init should be cleaned at the end of the process so that upon boot of any VM creating using the golden image there is *no* instance data from any previous cloud-inits present and cloud-init then runs per-instance modules
[14:54] <ShaneAH> is that the guid recorded here? /var/lib/cloud/instances/8457ac95-cbe8-474b-a1ae-6185007a12c7
[14:54] <ShaneAH> understood but I am in the middle of building that image.  I do have cleanup once the image is complete.
[14:55] <minimal> have you cleaned up cloud-init at the end of your golden image creation?
[14:55] <ShaneAH>  "sudo cloud-init clean --logs --machine-id --seed",
[14:55] <ShaneAH>         "sudo rm -rf /var/lib/cloud/",
[14:55] <ShaneAH>         "sudo systemctl stop walinuxagent.service",
[14:55] <ShaneAH>         "sudo rm -rf /var/lib/waagent/",
[14:55] <ShaneAH>         "sudo rm -f /var/log/waagent.log",
[14:55] <ShaneAH>         "sudo rm -f /var/lib/systemd/random-seed",
[14:55] <ShaneAH>         "sudo rm -f /var/lib/systemd/credential.secret"
[14:55] <minimal> ok, this is happening during creation?
[14:55] <ShaneAH> correct.
[14:55] <ShaneAH> I'm trying to get /var on a mount point in the middle of my golden image build.
[14:55] <minimal> so then I assume the instance-id will not have changed and per-instance modules won't be run
[14:56] <ShaneAH> One would think not but I'm trying to confirm.  I'm running a packer -debug session trying to capture things before and after the boot.  I'm just uncertain what defines a new instance.
[14:56] <minimal> have you looked at the cloud-init logfile?
[14:57] <ShaneAH> Yep, and I see cc_mounts.py running twice.  Once on inital boot of the packer VM and then again after rebooting.
[14:57] <minimal> how are you providing metadata/user-data/network-config during the Packer run?
[14:58] <minimal> the logfile will show whether cloud-init determines if the instance-id has changed or remains the same
[15:01] <ShaneAH> 2023-08-16 14:14:48,138 - stages.py[DEBUG]: previous iid found to be NO_PREVIOUS_INSTANCE_ID
[15:01] <ShaneAH> 2023-08-16 14:14:50,113 - stages.py[DEBUG]: previous iid found to be 8457ac95-cbe8-474b-a1ae-6185007a12c7
[15:01] <ShaneAH> I'm using the custom_data_file in the azure-arm source to provide a file to cloud-init.
[15:01] <minimal> a user_data file?
[15:02] <minimal> so Azure is providing the meta-data?
[15:02] <ShaneAH> Hmm...
[15:02] <ShaneAH> yes, I believe the answer is yes.
[15:03] <minimal> if which case they decide whether to provide the same instance-id or not across reboots
[15:03] <ShaneAH> cloud-init query --all right?
[15:04] <ShaneAH> drwxr-xr-x 2 root root 4096 Aug 16 14:19 data
[15:04] <ShaneAH> drwxr-xr-x 2 root root 4096 Aug 16 14:14 handlers
[15:04] <ShaneAH> lrwxrwxrwx 1 root root   61 Aug 16 14:19 instance -> /var/lib/cloud/instances/8457ac95-cbe8-474b-a1ae-6185007a12c7
[15:04] <ShaneAH> drwxr-xr-x 3 root root 4096 Aug 16 14:14 instances
[15:04] <ShaneAH> drwxr-xr-x 6 root root 4096 Aug 16 14:14 scripts
[15:04] <ShaneAH> drwxr-xr-x 2 root root 4096 Aug 16 14:14 seed
[15:04] <ShaneAH> drwxr-xr-x 2 root root 4096 Aug 16 14:15 sem
[15:04] <ShaneAH> packer@pkrvmclv34au9u3:/var/lib/cloud$ cloud-init query --all | grep instance
[15:04] <ShaneAH>    "instance-id": "8457ac95-cbe8-474b-a1ae-6185007a12c7",
[15:04] <ShaneAH>  "instance-id": "8457ac95-cbe8-474b-a1ae-6185007a12c7",
[15:04] <ShaneAH>  "instance_id": "8457ac95-cbe8-474b-a1ae-6185007a12c7",
[15:04] <ShaneAH>  "userdata": "<redacted for non-root user> file:/var/lib/cloud/instance/user-data.txt",
[15:04] <ShaneAH>   "instance-id": "8457ac95-cbe8-474b-a1ae-6185007a12c7",
[15:04] <ShaneAH>   "instance_id": "8457ac95-cbe8-474b-a1ae-6185007a12c7",
[15:04] <ShaneAH>  "vendordata": "<redacted for non-root user> file:/var/lib/cloud/instance/vendor-data.txt"
[15:05] <ShaneAH> so it looks like the same instance id right?
[15:07] <minimal> in cloud-init.log you may see things like "__init__.py[DEBUG] Update datasource metadata and network config due to events: boot-new-instance"
[15:08] <ShaneAH> meeting just started, I'll look for that in just a bit.
[16:15] <ShaneAH38> hmm, seems libera may not like that I connected to VPN. :(
[16:42] <ShaneAH38> The instance id doesn't seem to have changed, what else would cause cloud-init to think it needs to re-run the config?
[17:08] <ShaneAH38> for giggles I rebooted the VM again while waiting at a debug point in packer and cloud-init again re-wrote the fstab file.
[17:08] <minimal> if it doesn't believe the instance-id has changed then I wouldn't expect it to run per-instance modules, it may run other types of modules however
[17:09] <ShaneAH38> cc_mounts is per instance right?
[17:09] <minimal> that's what the docs say
[17:10] <ShaneAH38> I cloned the repo as well and "frequency" is per instance...
[17:10] <minimal> and cloud-init.log will also state that
[17:11] <ShaneAH38> so it seems there is something odd about the instance detection.
[17:12] <minimal> why? you haven't provided any cloud-init.log file to look at
[17:15] <minimal> in cloud-init.log do you see "helpers.py[DEBUG]: config-mounts already ran (freq=once=-per-instance)" ?
[17:16] <ShaneAH38> I can absolutely do that, is ther eanything that should be scrubbed?
[17:17] <minimal> you can run "cloud-init collect-logs" which I believe if "-u" is *not* specified should exclude sensitive info
[17:17] <ShaneAH38> I don't see helpers.py but I do see handlers.py
[17:18] <ShaneAH38> ok, sec.
[17:19] <ShaneAH38> do people usually post the entire tar.gz?
[17:19] <minimal> typically there would be a "handlers.py[DEBUG]: finish: init-network/config-mounts: SUCCESS: config-mounts previously ran" line immediately after that one
[17:20] <minimal> it is recommended to add that to any Github Issue raised
[17:22] <ShaneAH38> I see several "previously ran" but none for config-mounts.
[17:22] <ShaneAH38> grep "previously ran" cloud-init.log  | grep mounts shows nothing
[17:23] <minimal> I'm shooting in the dark without seeing the logfile
[17:23] <ShaneAH38> working on that now.
[17:27] <ShaneAH38> sigh, uploads failing.
[17:33] <blackboxsw> ShaneAH38: as mentioned by minimal the cloud-init clean will trigger the instance-id to be set to NO_PREVIOUS_INSTANCE_ID again so that'd also cause all modules to re-run. as well.  And beyond that meta-data changing the `instance-id` value to a different UUID will also trigger such a PER_INSTANCE event.   
[17:34] <ShaneAH38> Well, and paste.opendev won't let me put the whole log in there...
[17:34] <blackboxsw> ShaneAH38: If you are collecting install logs from your packer install and trying to track instance-id changes across the fully deplyoment for filing an issue in github, I'd suggest not running providing the `--logs` to cloud-init clean as you'd likely want to preserve those early install stage logs to see how many instance-id triggers were present through your install and reboot.
[17:35] <ShaneAH38> blackboxsw thanks for the info.  And yes, I see the first instance of NO_PREVIOUS_INSTANCE_ID but the reboot does show an instance-id that exists and if I query metadata after the reboot the instance-id is the same as the instance folder.
[17:37] <ShaneAH38> Would it be appropriate for me to start a github issue so that I can upload the log files there?
[17:37] <blackboxsw> "do people usually post the entire tar.gz?"  generally to a github issue if filed.   One thing to peek at for credentials leaks would be the file /run/cloud-init/instance-data-sensitive.json and or /var/log/cloud-init.log to ensure passwords aren't represented in user-data in the tar.gz
[17:43] <blackboxsw> ShaneAH38: yes let's start with a github issue explaining your deployment and if it needs to go to support type discussions we can go from there to other channels where appropriate
[17:44] <blackboxsw> falcojr: holmanb the more I read https://github.com/canonical/cloud-init/pull/4325 and the original bug https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/2012044 , the more I don't want to try supporting images with a python source installed version of cloudinit from a deb-package postinst. There are so many places for that to fall over or not cover corner cases.
[17:44] -ubottu:#cloud-init- Pull 4325 in canonical/cloud-init "check whether old version is empty in postinst" [Open]
[17:44] -ubottu:#cloud-init- Launchpad bug 2012044 in cloud-init (Ubuntu) "/var/lib/cloud/data/upgraded-network file touched after apt install cloudinit" [Low, Triaged]
[17:46] <blackboxsw> I don't know if I'm just being grumpy, but it feels out of typical support scenarios to tell folks they can create an image with an unpackaged version of cloudinit from source and also expect to upgrade to a newer packaged version of cloudinit and expect that to work out of the box.
[17:48] <blackboxsw> talk me down if that feels like a reasonable solution/use-case for folks in various distributions `python3 setup.py install cloud-init; apt/yum/zypper install cloud-init`
[17:49] <ShaneAH38> Issue created. https://github.com/canonical/cloud-init/issues/4359
[17:49] -ubottu:#cloud-init- Issue 4359 in canonical/cloud-init "Rebooting VM during packer build causes cloud-init to run instance modules again" [Open]
[17:51] <blackboxsw> my shallow guess in this space would be that someone interested in source installed software would typically continue to upgrade that package from source as needed and avoid the distribution packages altogether
[17:51] <blackboxsw> . A transition from source installl to distro deb/rpm-based packages is something that represents adoption of a different packaging and delivery vehicle for their software that likely will always need manual interaction at some level
[17:52] <minimal> blackboxsw: agreed. We've already seen that the cloud.cfg contents for Debian packaged c-i is quite different than for upstream. Likewise probably for at least some other distros
[17:52] <blackboxsw> thanks ShaneAH38 for the issue, it'll better help triage what's going on there
[17:54] <minimal> blackboxsw: and indeed it's possible a distro packaged c-i may alternatively use /etc/cloud/cloud.cfg.d/ files to override cloud.cfg and so that may also have an impact then on behaviour
[18:03] <minimal> ShaneAH38: the info in your issue shows Packer doing a "sed" on /etc/fstab to change /alt_var entries to /var - but there's no info on how those entries end up in fstab in the first place and what their contents are
[18:11] <ShaneAH38> minimal that's in the description of the issue.  I put the cloud-init file I'm sending to packer.
[18:11] <ShaneAH38> mounts:
[18:11] <ShaneAH38>  - [ "/dev/disk/azure/scsi1/lun0-part1", "/alt_home", "auto", "defaults,nofail", "0", "2" ]
[18:11] <ShaneAH38>  - [ "/dev/disk/azure/scsi1/lun0-part2", "/alt_tmp", "auto", "defaults,nofail", "0", "2" ]
[18:11] <ShaneAH38>  - [ "/dev/disk/azure/scsi1/lun0-part3", "/alt_var", "auto", "defaults,nofail", "0", "2" ]
[18:11] <ShaneAH38>  - [ "/dev/disk/azure/scsi1/lun0-part4", "/alt_var/log", "auto", "defaults,nofail,x-mount.mkdir", "0", "2" ]
[18:11] <ShaneAH38>  - [ "/dev/disk/azure/scsi1/lun0-part5", "/alt_var/log/audit", "auto", "defaults,nofail,x-mount.mkdir", "0", "2" ]
[18:11] <ShaneAH38>  - [ "/dev/disk/azure/scsi1/lun0-part6", "/alt_var/tmp", "auto", "defaults,nofail,x-mount.mkdir", "0", "2" ]
[18:13] <ShaneAH38> and the log file at 2023-08-16 14:19:03,095 shows those being replaced.
[18:13] <ShaneAH38> 2023-08-16 14:19:03,095 - cc_mounts.py[DEBUG]: Changes to fstab: ['- /dev/disk/azure/scsi1/lun0-part1 /home auto defaults,nofail,comment=cloudconfig 0 2', '- /dev/disk/azure/scsi1/lun0-part2 /tmp auto defaults,nofail,comment=cloudconfig 0 2', '- /dev/disk/azure/scsi1/lun0-part3 /var auto defaults,nofail,comment=cloudconfig 0 2', '-
[18:13] <ShaneAH38> /dev/disk/azure/scsi1/lun0-part4 /var/log auto defaults,nofail,x-mount.mkdir,comment=cloudconfig 0 2', '- /dev/disk/azure/scsi1/lun0-part5 /var/log/audit auto defaults,nofail,x-mount.mkdir,comment=cloudconfig 0 2', '- /dev/disk/azure/scsi1/lun0-part6 /var/tmp auto defaults,nofail,x-mount.mkdir,comment=cloudconfig 0 2', '+
[18:13] <ShaneAH38> /dev/disk/azure/scsi1/lun0-part1 /alt_home auto defaults,nofail,comment=cloudconfig 0 2', '+ /dev/disk/azure/scsi1/lun0-part2 /alt_tmp auto defaults,nofail,comment=cloudconfig 0 2', '+ /dev/disk/azure/scsi1/lun0-part3 /alt_var auto defaults,nofail,comment=cloudconfig 0 2', '+ /dev/disk/azure/scsi1/lun0-part4 /alt_var/log auto
[18:13] <ShaneAH38> defaults,nofail,x-mount.mkdir,comment=cloudconfig 0 2', '+ /dev/disk/azure/scsi1/lun0-part5 /alt_var/log/audit auto defaults,nofail,x-mount.mkdir,comment=cloudconfig 0 2', '+ /dev/disk/azure/scsi1/lun0-part6 /alt_var/tmp auto defaults,nofail,x-mount.mkdir,comment=cloudconfig 0 2']
[18:13] <minimal> that's the user-data you're using *before* you reboot the VM? or provided to both pre-reboot and post-reboot?
[18:13] <minimal> also please don't post large amounts of text in IRC
[18:13] <ShaneAH38> so the reboot does not change any userdata.
[18:13] <ShaneAH38> (sry)
[18:14] <ShaneAH38> there's not really an opportunity to change it in the packer process.
[18:16] <minimal> so during the reboot the fstab will contain entries for the disk's partitions with /var mountpoints and the user-data will contain entries for mounting the *same* disk partitions on /alt_var
[18:16] <ShaneAH38> correct.
[18:17] <minimal> so are you expecting as the end result the fstab to have both /var and /alt_var entries for the *same* partitions?
[18:17] <ShaneAH38> I'm expecting to only have /var not the /alt_ partitions after the reboot.
[18:17] <ShaneAH38> I was not expecting cloud-init to run again.
[18:17] <ShaneAH38> (well at least not the instance specific modules)
[18:19] <minimal> I assume it is to do with these being Azure ephemeral disks
[18:19] <minimal> I see disk_setup also ran
[18:19] <ShaneAH38> right, I saw that.
[18:22] <minimal> earlier in the logs there are lines relating to the Azure DS that indicate that NTFS cannot be mounted
[18:22] <ShaneAH38> so the disks I'm creating are not ephemeral...So are you saying that cloud-init is going to run like a Desired State Configuration system and continuously try and apply it's settings?
[18:22] <ShaneAH38> Yeah, I saw that also and just havne't chased that down yet.
[18:23] <minimal> and there's reference to "sem" files fro both disk_setup and mounts not existing
[18:23] <ShaneAH38> I'm not messing with the ephemeral stuff
[18:24] <ShaneAH38> let me put the fstab in the github issue
[18:24] <minimal> that would help
[18:24] <ShaneAH38> done.
[18:24] <ShaneAH38> Other than the partitions I'm not trying to be clever
[18:25] <ShaneAH38> FWIW I really do appreciate your assistance, I'm not sure how I would proceed so thanks.
[19:57] <ShaneAH1> minimal you mention sem files but I'm not following.
[20:02] <meena> blackboxsw: if these patches of mine ever get merged https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=273122 we could consider running tests on Cirrus CI on FreeBSD, as a stop gap until i got the lxd stuff done
[20:02] -ubottu:#cloud-init- bugs.freebsd.org bug 273122 in Ports & Packages "lang/python311: backport netlink support" [Affects Only Me, In Progress]
[20:08] <minimal> meena: with #4348 TestGetProcPpid.test_get_proc_ppid_ps is failing for me on Alpine. Seems to be due to choice of options passed to "ps" - I suspect Busybox's ps doesn't support some of them
[20:08] <minimal> will investigate further
[20:10] <minimal> ShaneAM1: was referring to the lines "DataSourceAzure.py[DEBUG]: Marker "/var/lib/cloud/instances/sem/config_mounts" for module "mounts" did not exist. And the similar line regarding disk_setup
[20:18] <meena> minimal: i checked the man page before confidently declaring it should work everywhere
[20:18] <minimal> meena: if actually seems that is_Linux isn't giving the expect result for Alpine
[20:19] <minimal> and so get_proc_ppid_ps is called, not get_proc_ppid_linux
[20:19] <minimal> ?
[20:19] <minimal> oops
[20:19] <meena> how?
[20:20] <minimal> don't know, that's the only explanation as to why get_proc_ppid_ps is being called when I'm building on Alpine
[20:21] <meena> weird.
[20:23] <minimal> the testcase is test_get_proc_ppid_ps
[20:23] <minimal> and is_Linux is mocked
[20:25] <minimal> and "m_is_Linux.return.value = False" in the testcase so it acts as though it is not linux
[20:26] <minimal> so the testcase then calling get_proc_ppid will result in that calling get_proc_ppid_ps
[20:28] <ShaneAH1> minimal The instance at 2023-08-16 14:14:50,405 is the initial boot of the packer vm.  If I look in the /var/lib/cloud/sem folder I only see one file config_scripts_per_once.once
[20:30] <minimal> which then uses subp to execute "ps" with the "-p" option which is not recognised by Busybox ps
[20:30] <minimal> meena: you checked the *GNU* ps manpage? or also the Busybox manpage? ;-)
[20:31] <minimal> but I also don't understand why a testcase for a function, get_proc_ppid_ps, that is not intended to be used on Linux is then running that function of Linux...
[20:34] <meena> minimal: busybox
[20:49] <minimal> meena: https://busybox.net/Dwonloads/Busybox.html#ps only shows "-o" and "-T" as supported options
[20:49] <minimal> oops, https://busybox.net/downloads/BusyBox.html#ps
[20:50] <minimal> but I don't understand why the "ps" isn't mocked in the testcase
[20:56] <meena> because /proc isn't mocked either
[20:57] <meena> anyway, it looks like i smashed ps & pscan
[21:19] <ShaneAH1> FWIW I removed all of my partition manipulation except for home so now there is only a single partition that I am trying to mount.  I was hoping that maybe something odd was happening given that I was working with var but the same behaviour exists.
[21:33] <dbungert> minimal: I'm poking around on the UEFI / grub seed question - what does seed mean there?  I have a guess but wanted to hear your elaboration.
[21:49] <minimal> dbungert: there are several ways to seed the Linux kernels' entropy
[21:49] <minimal> one of those from the UEFI itself - a bootloader (if it supports this) can pass it also to the kernel
[21:54] <minimal> dbungert: I *think* this might be systemd-bootd's equivalent: https://github.com/systemd/systemd/blob/main/src/boot/efi/random-seed.c
[21:57] <minimal> meena: I added "procps-ng" package to the Alpine package's "checkdepends" to install the full version of "ps" during testing
[22:01] <meena> minimal: i think i would rather fix that test to exclude alpine. since it has a working /proc
[22:03] <minimal> don't all Linux distros have a working /proc?
[22:17] <minimal> dbungert: this is related: https://patchwork.kernel.org/project/linux-arm-kernel/patch/1475749646-10844-2-git-send-email-ard.biesheuvel@linaro.org/
[22:18] <minimal> "Note that the config table could be generated by the EFI stub or by any other UEFI driver or application (e.g., GRUB)"
[23:23] <meena> https://github.com/canonical/cloud-init/commit/8a70dbc49e609ac900b5f7b5b4358b0ccaf6c4aa#r124702899
[23:23] -ubottu:#cloud-init- Commit 8a70dbc in canonical/cloud-init "util: Fix get_proc_ppid() on non-Linux systems (#4348)"
[23:24] <meena> minimal: re proc: i guess? I don't know. I spend way too much time in FreeVSD land