/srv/irclogs.ubuntu.com/2023/08/16/#cloud-init.txt

meenasomeone wanna do some editing to make this a bit more readable? https://github.com/canonical/cloud-init/issues/4043#issuecomment-168010357607:56
-ubottu:#cloud-init- Issue 4043 in canonical/cloud-init "Don't Break On Duplicate Mac Addresses" [Open]07:56
=== randomvariable__ is now known as randomvariable_
=== mitchdz_ is now known as mitchdz
meenawould changing the tests that currently use Bash, even tho they don't need it, to use sh, speed up the test suite?11:23
ShaneAHHi All, I'm still working on building a golden image that mounts var partitions and I'm getting very close. I am trying to mount the partitions very early on in the packer process so I mount everything to an alt_var run a tar command to copy existing var folders to alt_var and then I'm using sed to change fstab from mounting alt_var to mounting var14:49
ShaneAHand then reboot the packer VM to continue the configuraiton process.  The problem I'm currently running into is that when I reboot the VM cloud-init is running and removing my changes to fstab and re-applying the /alt_var mountpoint.  My understanding is that cc_mounts.py is per instance, how does cloud-init determine if the instance scripts have14:49
ShaneAHalready been executed?14:49
minimalShaneAH: by the instance-id indicated in the meta-data14:51
minimalif a previous instance-id has been recorded and then a new one is provided then obviously it is a change in instance-id and all per-instance modules need to be run14:52
minimalas I indicated to you previous, when building any "golden" image cloud-init should be cleaned at the end of the process so that upon boot of any VM creating using the golden image there is *no* instance data from any previous cloud-inits present and cloud-init then runs per-instance modules14:54
ShaneAHis that the guid recorded here? /var/lib/cloud/instances/8457ac95-cbe8-474b-a1ae-6185007a12c714:54
ShaneAHunderstood but I am in the middle of building that image.  I do have cleanup once the image is complete.14:54
minimalhave you cleaned up cloud-init at the end of your golden image creation?14:55
ShaneAH "sudo cloud-init clean --logs --machine-id --seed",14:55
ShaneAH        "sudo rm -rf /var/lib/cloud/",14:55
ShaneAH        "sudo systemctl stop walinuxagent.service",14:55
ShaneAH        "sudo rm -rf /var/lib/waagent/",14:55
ShaneAH        "sudo rm -f /var/log/waagent.log",14:55
ShaneAH        "sudo rm -f /var/lib/systemd/random-seed",14:55
ShaneAH        "sudo rm -f /var/lib/systemd/credential.secret"14:55
minimalok, this is happening during creation?14:55
ShaneAHcorrect.14:55
ShaneAHI'm trying to get /var on a mount point in the middle of my golden image build.14:55
minimalso then I assume the instance-id will not have changed and per-instance modules won't be run14:55
ShaneAHOne would think not but I'm trying to confirm.  I'm running a packer -debug session trying to capture things before and after the boot.  I'm just uncertain what defines a new instance.14:56
minimalhave you looked at the cloud-init logfile?14:56
ShaneAHYep, and I see cc_mounts.py running twice.  Once on inital boot of the packer VM and then again after rebooting.14:57
minimalhow are you providing metadata/user-data/network-config during the Packer run?14:57
minimalthe logfile will show whether cloud-init determines if the instance-id has changed or remains the same14:58
ShaneAH2023-08-16 14:14:48,138 - stages.py[DEBUG]: previous iid found to be NO_PREVIOUS_INSTANCE_ID15:01
ShaneAH2023-08-16 14:14:50,113 - stages.py[DEBUG]: previous iid found to be 8457ac95-cbe8-474b-a1ae-6185007a12c715:01
ShaneAHI'm using the custom_data_file in the azure-arm source to provide a file to cloud-init.15:01
minimala user_data file?15:01
minimalso Azure is providing the meta-data?15:02
ShaneAHHmm...15:02
ShaneAHyes, I believe the answer is yes.15:02
minimalif which case they decide whether to provide the same instance-id or not across reboots15:03
ShaneAHcloud-init query --all right?15:03
ShaneAHdrwxr-xr-x 2 root root 4096 Aug 16 14:19 data15:04
ShaneAHdrwxr-xr-x 2 root root 4096 Aug 16 14:14 handlers15:04
ShaneAHlrwxrwxrwx 1 root root   61 Aug 16 14:19 instance -> /var/lib/cloud/instances/8457ac95-cbe8-474b-a1ae-6185007a12c715:04
ShaneAHdrwxr-xr-x 3 root root 4096 Aug 16 14:14 instances15:04
ShaneAHdrwxr-xr-x 6 root root 4096 Aug 16 14:14 scripts15:04
ShaneAHdrwxr-xr-x 2 root root 4096 Aug 16 14:14 seed15:04
ShaneAHdrwxr-xr-x 2 root root 4096 Aug 16 14:15 sem15:04
ShaneAHpacker@pkrvmclv34au9u3:/var/lib/cloud$ cloud-init query --all | grep instance15:04
ShaneAH   "instance-id": "8457ac95-cbe8-474b-a1ae-6185007a12c7",15:04
ShaneAH "instance-id": "8457ac95-cbe8-474b-a1ae-6185007a12c7",15:04
ShaneAH "instance_id": "8457ac95-cbe8-474b-a1ae-6185007a12c7",15:04
ShaneAH "userdata": "<redacted for non-root user> file:/var/lib/cloud/instance/user-data.txt",15:04
ShaneAH  "instance-id": "8457ac95-cbe8-474b-a1ae-6185007a12c7",15:04
ShaneAH  "instance_id": "8457ac95-cbe8-474b-a1ae-6185007a12c7",15:04
ShaneAH "vendordata": "<redacted for non-root user> file:/var/lib/cloud/instance/vendor-data.txt"15:04
ShaneAHso it looks like the same instance id right?15:05
minimalin cloud-init.log you may see things like "__init__.py[DEBUG] Update datasource metadata and network config due to events: boot-new-instance"15:07
ShaneAHmeeting just started, I'll look for that in just a bit.15:08
ShaneAH38hmm, seems libera may not like that I connected to VPN. :(16:15
ShaneAH38The instance id doesn't seem to have changed, what else would cause cloud-init to think it needs to re-run the config?16:42
ShaneAH38for giggles I rebooted the VM again while waiting at a debug point in packer and cloud-init again re-wrote the fstab file.17:08
minimalif it doesn't believe the instance-id has changed then I wouldn't expect it to run per-instance modules, it may run other types of modules however17:08
ShaneAH38cc_mounts is per instance right?17:09
minimalthat's what the docs say17:09
ShaneAH38I cloned the repo as well and "frequency" is per instance...17:10
minimaland cloud-init.log will also state that17:10
ShaneAH38so it seems there is something odd about the instance detection.17:11
minimalwhy? you haven't provided any cloud-init.log file to look at17:12
minimalin cloud-init.log do you see "helpers.py[DEBUG]: config-mounts already ran (freq=once=-per-instance)" ?17:15
ShaneAH38I can absolutely do that, is ther eanything that should be scrubbed?17:16
minimalyou can run "cloud-init collect-logs" which I believe if "-u" is *not* specified should exclude sensitive info17:17
ShaneAH38I don't see helpers.py but I do see handlers.py17:17
ShaneAH38ok, sec.17:18
ShaneAH38do people usually post the entire tar.gz?17:19
minimaltypically there would be a "handlers.py[DEBUG]: finish: init-network/config-mounts: SUCCESS: config-mounts previously ran" line immediately after that one17:19
minimalit is recommended to add that to any Github Issue raised17:20
ShaneAH38I see several "previously ran" but none for config-mounts.17:22
ShaneAH38grep "previously ran" cloud-init.log  | grep mounts shows nothing17:22
minimalI'm shooting in the dark without seeing the logfile17:23
ShaneAH38working on that now.17:23
ShaneAH38sigh, uploads failing.17:27
blackboxswShaneAH38: as mentioned by minimal the cloud-init clean will trigger the instance-id to be set to NO_PREVIOUS_INSTANCE_ID again so that'd also cause all modules to re-run. as well.  And beyond that meta-data changing the `instance-id` value to a different UUID will also trigger such a PER_INSTANCE event.   17:33
ShaneAH38Well, and paste.opendev won't let me put the whole log in there...17:34
blackboxswShaneAH38: If you are collecting install logs from your packer install and trying to track instance-id changes across the fully deplyoment for filing an issue in github, I'd suggest not running providing the `--logs` to cloud-init clean as you'd likely want to preserve those early install stage logs to see how many instance-id triggers were present through your install and reboot.17:34
ShaneAH38blackboxsw thanks for the info.  And yes, I see the first instance of NO_PREVIOUS_INSTANCE_ID but the reboot does show an instance-id that exists and if I query metadata after the reboot the instance-id is the same as the instance folder.17:35
ShaneAH38Would it be appropriate for me to start a github issue so that I can upload the log files there?17:37
blackboxsw"do people usually post the entire tar.gz?"  generally to a github issue if filed.   One thing to peek at for credentials leaks would be the file /run/cloud-init/instance-data-sensitive.json and or /var/log/cloud-init.log to ensure passwords aren't represented in user-data in the tar.gz17:37
blackboxswShaneAH38: yes let's start with a github issue explaining your deployment and if it needs to go to support type discussions we can go from there to other channels where appropriate17:43
blackboxswfalcojr: holmanb the more I read https://github.com/canonical/cloud-init/pull/4325 and the original bug https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/2012044 , the more I don't want to try supporting images with a python source installed version of cloudinit from a deb-package postinst. There are so many places for that to fall over or not cover corner cases.17:44
-ubottu:#cloud-init- Pull 4325 in canonical/cloud-init "check whether old version is empty in postinst" [Open]17:44
-ubottu:#cloud-init- Launchpad bug 2012044 in cloud-init (Ubuntu) "/var/lib/cloud/data/upgraded-network file touched after apt install cloudinit" [Low, Triaged]17:44
blackboxswI don't know if I'm just being grumpy, but it feels out of typical support scenarios to tell folks they can create an image with an unpackaged version of cloudinit from source and also expect to upgrade to a newer packaged version of cloudinit and expect that to work out of the box.17:46
blackboxswtalk me down if that feels like a reasonable solution/use-case for folks in various distributions `python3 setup.py install cloud-init; apt/yum/zypper install cloud-init`17:48
ShaneAH38Issue created. https://github.com/canonical/cloud-init/issues/435917:49
-ubottu:#cloud-init- Issue 4359 in canonical/cloud-init "Rebooting VM during packer build causes cloud-init to run instance modules again" [Open]17:49
blackboxswmy shallow guess in this space would be that someone interested in source installed software would typically continue to upgrade that package from source as needed and avoid the distribution packages altogether17:51
blackboxsw. A transition from source installl to distro deb/rpm-based packages is something that represents adoption of a different packaging and delivery vehicle for their software that likely will always need manual interaction at some level17:51
minimalblackboxsw: agreed. We've already seen that the cloud.cfg contents for Debian packaged c-i is quite different than for upstream. Likewise probably for at least some other distros17:52
blackboxswthanks ShaneAH38 for the issue, it'll better help triage what's going on there17:52
minimalblackboxsw: and indeed it's possible a distro packaged c-i may alternatively use /etc/cloud/cloud.cfg.d/ files to override cloud.cfg and so that may also have an impact then on behaviour17:54
minimalShaneAH38: the info in your issue shows Packer doing a "sed" on /etc/fstab to change /alt_var entries to /var - but there's no info on how those entries end up in fstab in the first place and what their contents are18:03
ShaneAH38minimal that's in the description of the issue.  I put the cloud-init file I'm sending to packer.18:11
ShaneAH38mounts:18:11
ShaneAH38 - [ "/dev/disk/azure/scsi1/lun0-part1", "/alt_home", "auto", "defaults,nofail", "0", "2" ]18:11
ShaneAH38 - [ "/dev/disk/azure/scsi1/lun0-part2", "/alt_tmp", "auto", "defaults,nofail", "0", "2" ]18:11
ShaneAH38 - [ "/dev/disk/azure/scsi1/lun0-part3", "/alt_var", "auto", "defaults,nofail", "0", "2" ]18:11
ShaneAH38 - [ "/dev/disk/azure/scsi1/lun0-part4", "/alt_var/log", "auto", "defaults,nofail,x-mount.mkdir", "0", "2" ]18:11
ShaneAH38 - [ "/dev/disk/azure/scsi1/lun0-part5", "/alt_var/log/audit", "auto", "defaults,nofail,x-mount.mkdir", "0", "2" ]18:11
ShaneAH38 - [ "/dev/disk/azure/scsi1/lun0-part6", "/alt_var/tmp", "auto", "defaults,nofail,x-mount.mkdir", "0", "2" ]18:11
ShaneAH38and the log file at 2023-08-16 14:19:03,095 shows those being replaced.18:13
ShaneAH382023-08-16 14:19:03,095 - cc_mounts.py[DEBUG]: Changes to fstab: ['- /dev/disk/azure/scsi1/lun0-part1 /home auto defaults,nofail,comment=cloudconfig 0 2', '- /dev/disk/azure/scsi1/lun0-part2 /tmp auto defaults,nofail,comment=cloudconfig 0 2', '- /dev/disk/azure/scsi1/lun0-part3 /var auto defaults,nofail,comment=cloudconfig 0 2', '-18:13
ShaneAH38/dev/disk/azure/scsi1/lun0-part4 /var/log auto defaults,nofail,x-mount.mkdir,comment=cloudconfig 0 2', '- /dev/disk/azure/scsi1/lun0-part5 /var/log/audit auto defaults,nofail,x-mount.mkdir,comment=cloudconfig 0 2', '- /dev/disk/azure/scsi1/lun0-part6 /var/tmp auto defaults,nofail,x-mount.mkdir,comment=cloudconfig 0 2', '+18:13
ShaneAH38/dev/disk/azure/scsi1/lun0-part1 /alt_home auto defaults,nofail,comment=cloudconfig 0 2', '+ /dev/disk/azure/scsi1/lun0-part2 /alt_tmp auto defaults,nofail,comment=cloudconfig 0 2', '+ /dev/disk/azure/scsi1/lun0-part3 /alt_var auto defaults,nofail,comment=cloudconfig 0 2', '+ /dev/disk/azure/scsi1/lun0-part4 /alt_var/log auto18:13
ShaneAH38defaults,nofail,x-mount.mkdir,comment=cloudconfig 0 2', '+ /dev/disk/azure/scsi1/lun0-part5 /alt_var/log/audit auto defaults,nofail,x-mount.mkdir,comment=cloudconfig 0 2', '+ /dev/disk/azure/scsi1/lun0-part6 /alt_var/tmp auto defaults,nofail,x-mount.mkdir,comment=cloudconfig 0 2']18:13
minimalthat's the user-data you're using *before* you reboot the VM? or provided to both pre-reboot and post-reboot?18:13
minimalalso please don't post large amounts of text in IRC18:13
ShaneAH38so the reboot does not change any userdata.18:13
ShaneAH38(sry)18:13
ShaneAH38there's not really an opportunity to change it in the packer process.18:14
minimalso during the reboot the fstab will contain entries for the disk's partitions with /var mountpoints and the user-data will contain entries for mounting the *same* disk partitions on /alt_var18:16
ShaneAH38correct.18:16
minimalso are you expecting as the end result the fstab to have both /var and /alt_var entries for the *same* partitions?18:17
ShaneAH38I'm expecting to only have /var not the /alt_ partitions after the reboot.18:17
ShaneAH38I was not expecting cloud-init to run again.18:17
ShaneAH38(well at least not the instance specific modules)18:17
minimalI assume it is to do with these being Azure ephemeral disks18:19
minimalI see disk_setup also ran18:19
ShaneAH38right, I saw that.18:19
minimalearlier in the logs there are lines relating to the Azure DS that indicate that NTFS cannot be mounted18:22
ShaneAH38so the disks I'm creating are not ephemeral...So are you saying that cloud-init is going to run like a Desired State Configuration system and continuously try and apply it's settings?18:22
ShaneAH38Yeah, I saw that also and just havne't chased that down yet.18:22
minimaland there's reference to "sem" files fro both disk_setup and mounts not existing18:23
ShaneAH38I'm not messing with the ephemeral stuff18:23
ShaneAH38let me put the fstab in the github issue18:24
minimalthat would help18:24
ShaneAH38done.18:24
ShaneAH38Other than the partitions I'm not trying to be clever18:24
ShaneAH38FWIW I really do appreciate your assistance, I'm not sure how I would proceed so thanks.18:25
ShaneAH1minimal you mention sem files but I'm not following.19:57
meenablackboxsw: if these patches of mine ever get merged https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=273122 we could consider running tests on Cirrus CI on FreeBSD, as a stop gap until i got the lxd stuff done20:02
-ubottu:#cloud-init- bugs.freebsd.org bug 273122 in Ports & Packages "lang/python311: backport netlink support" [Affects Only Me, In Progress]20:02
minimalmeena: with #4348 TestGetProcPpid.test_get_proc_ppid_ps is failing for me on Alpine. Seems to be due to choice of options passed to "ps" - I suspect Busybox's ps doesn't support some of them20:08
minimalwill investigate further20:08
minimalShaneAM1: was referring to the lines "DataSourceAzure.py[DEBUG]: Marker "/var/lib/cloud/instances/sem/config_mounts" for module "mounts" did not exist. And the similar line regarding disk_setup20:10
meenaminimal: i checked the man page before confidently declaring it should work everywhere20:18
minimalmeena: if actually seems that is_Linux isn't giving the expect result for Alpine20:18
minimaland so get_proc_ppid_ps is called, not get_proc_ppid_linux20:19
minimal?20:19
minimaloops20:19
meenahow?20:19
minimaldon't know, that's the only explanation as to why get_proc_ppid_ps is being called when I'm building on Alpine20:20
meenaweird.20:21
minimalthe testcase is test_get_proc_ppid_ps20:23
minimaland is_Linux is mocked20:23
minimaland "m_is_Linux.return.value = False" in the testcase so it acts as though it is not linux20:25
minimalso the testcase then calling get_proc_ppid will result in that calling get_proc_ppid_ps20:26
ShaneAH1minimal The instance at 2023-08-16 14:14:50,405 is the initial boot of the packer vm.  If I look in the /var/lib/cloud/sem folder I only see one file config_scripts_per_once.once20:28
minimalwhich then uses subp to execute "ps" with the "-p" option which is not recognised by Busybox ps20:30
minimalmeena: you checked the *GNU* ps manpage? or also the Busybox manpage? ;-)20:30
minimalbut I also don't understand why a testcase for a function, get_proc_ppid_ps, that is not intended to be used on Linux is then running that function of Linux...20:31
meenaminimal: busybox20:34
minimalmeena: https://busybox.net/Dwonloads/Busybox.html#ps only shows "-o" and "-T" as supported options20:49
minimaloops, https://busybox.net/downloads/BusyBox.html#ps20:49
minimalbut I don't understand why the "ps" isn't mocked in the testcase20:50
meenabecause /proc isn't mocked either20:56
meenaanyway, it looks like i smashed ps & pscan20:57
ShaneAH1FWIW I removed all of my partition manipulation except for home so now there is only a single partition that I am trying to mount.  I was hoping that maybe something odd was happening given that I was working with var but the same behaviour exists.21:19
dbungertminimal: I'm poking around on the UEFI / grub seed question - what does seed mean there?  I have a guess but wanted to hear your elaboration.21:33
minimaldbungert: there are several ways to seed the Linux kernels' entropy21:49
minimalone of those from the UEFI itself - a bootloader (if it supports this) can pass it also to the kernel21:49
minimaldbungert: I *think* this might be systemd-bootd's equivalent: https://github.com/systemd/systemd/blob/main/src/boot/efi/random-seed.c21:54
minimalmeena: I added "procps-ng" package to the Alpine package's "checkdepends" to install the full version of "ps" during testing21:57
meenaminimal: i think i would rather fix that test to exclude alpine. since it has a working /proc22:01
minimaldon't all Linux distros have a working /proc?22:03
minimaldbungert: this is related: https://patchwork.kernel.org/project/linux-arm-kernel/patch/1475749646-10844-2-git-send-email-ard.biesheuvel@linaro.org/22:17
minimal"Note that the config table could be generated by the EFI stub or by any other UEFI driver or application (e.g., GRUB)"22:18
meenahttps://github.com/canonical/cloud-init/commit/8a70dbc49e609ac900b5f7b5b4358b0ccaf6c4aa#r12470289923:23
-ubottu:#cloud-init- Commit 8a70dbc in canonical/cloud-init "util: Fix get_proc_ppid() on non-Linux systems (#4348)"23:23
meenaminimal: re proc: i guess? I don't know. I spend way too much time in FreeVSD land23:24

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!