/srv/irclogs.ubuntu.com/2023/03/20/#cloud-init.txt

caribouHello,10:50
caribouSince the drop of support for Python 3.6 has been announced for 23.2, will there be any intermediate releases (i.e. 23.1.x) between now and release of 23.2 ?10:50
caribouI am just curious to know if there is any chance that our PR (https://github.com/canonical/cloud-init/pull/2033) may have a chance to be integrated in a python 3.6 supported version10:50
-ubottu:#cloud-init- Pull 2033 in canonical/cloud-init "Adapt DataSourceScaleway to IPv6" [Open]10:50
caribouotherwise we will look at how we handle the distros with only python 3.6 support10:51
meenacaribou: which distros are those?11:00
caribouFrom our side, AlmaLinux-8, RockyLinux-8, CentOS-stream 8 & Ubuntu Bionic (this last one will be EOS by mid april)11:01
cariboubtw, when I say "only python 3.6 support" I mean natively on their cloud images. Of course newer versions may be installable but their cloud images are delivered only with 3.611:02
meenawhat's the support timeline for those rgel-ish 8 systems?11:04
cariboudunno, I'll check that out11:21
cariboumeena: Almalinux 8 : 1 may 202411:23
caribouRockyLinux 8 : 31 may 202411:23
caribouCentOS Stream 8 : 31 may 202411:23
caribouFWIW : the rhel-ish distro all run some flavor of 22.111:33
meenacaribou: given the time line for 23.2, that seems fitting then11:44
meenaso, yeah, i would push reviewers to get that in, if you're going to be supporting those OSes for longer than that11:45
falcojrwe actually got enough response to that 3.6 email to the mailing list that we'll likely be extending support for a bit longer. We still need to agree on a time line though, so if you have any specific concerns, let us know soon 11:56
cariboufalcojr: other the the PR cited above and for which holmanb is already on it nothing, thanks12:47
blackboxswhey falcojr holmanb aciba I'm seeing intermittent resize2fs errors on lunar. Have we seen those elsewhere? I saw one fly by on azure too but it's not affecting all platforms or all builds on the same platform https://jenkins.canonical.com/server-team/view/cloud-init/job/cloud-init-integration-lunar-gce/lastSuccessfulBuild/testReport/junit/tests.integration_tests/test_upgrade/test_clean_boot_of_upgraded_package/15:00
blackboxswI ask because I'd like to try to queue an upload for lunar to help out this MAAS bug with ourt postinst15:00
blackboxswhttps://bugs.launchpad.net/bugs/200974615:01
-ubottu:#cloud-init- Launchpad bug 2009746 in maas (Ubuntu) "dpkg-reconfigure cloud-init: yaml.load errors during MAAS deloyment of Ubuntu 23.04(Lunar)" [Undecided, Confirmed]15:01
blackboxswthe rest of the jenkins jobs seem to look pretty healthy with the exception of an intermittent errors due to flaky tests that don't persist acrosss test runs15:01
falcojrthat failure doesn't look familiar to me15:02
falcojrSuperblock checksum does not match superblock while trying to open /dev/root15:02
holmanbcaribou: pr 2033 should be able to get into a 3.6 release. I expect we will likely make 23.4 the final release with 3.6 support (will follow up in the thread publicly at some point)15:02
holmanbblackboxsw: I also haven't seen that in our tests before15:06
holmanbsome sort of fs corruption issue15:06
caribouholmanb: thanks for the info !15:34
parideblackboxsw, it doesn't looks familiar to me15:42
acibablackboxsw: I did not see that error, previously.15:48
blackboxswaciba:parid holmanb thx. ok I also just confirmed we have nothing in cloud-init that's touched this specific area of code. So nothing that we've introduced there. I'll try kicking off another jenkins run to assert that this isn't a persistent problem on Azure/GCP15:51
holmanbblackboxsw: I reproduced it on GCE15:51
blackboxswI'm not seeing it on LXC VMs 15:51
holmanbblackboxsw: I suspect an image build issue15:51
blackboxswand not seeing on ec215:52
blackboxsw@holmanb do you have /etc/cloud/build.info on your GCE image?15:52
blackboxswI'm launching Azure now to see15:52
holmanbon first boot, the kernel log shows issues with the partition table in that test15:53
holmanbblackboxsw: yes15:53
holmanbbuild_name: server15:53
holmanbserial: 2023031915:53
holmanbhttps://dpaste.org/wbZwR15:53
blackboxsw+1 thx we may want to take that by CPC team to see if they are aware of image build issues at the moment15:54
holmanbwhat's odd is that in that the error only happens on the initial cloud-init run, not the subsequent - it makes me curious why this isn't seen in other tests15:59
minimalholmanb: does growpart run during the initial cloud-init run? If so then as part of growing the partition won't it move the GPT header?16:03
holmanbminimal: I think that's right16:07
holmanbminimal: however the GPT alt header was incorrect before systemd even started, so something was wrong in the image prior to cloud-init doing anything16:07
minimalso is this not the result of a disk image being created of a given size, but being put onto a disk of a larger size (and so the GPT header at the end of the disk image is no long at the end of the disk?16:08
holmanbminimal: that would be one way to cause this16:09
minimali.e. you create a 1GB disk image and so the alt GPT header is around the 1GB mark but you create a VM with a 2GB disk using that disk image and so the alt GPT header is still around the 1GB mark, at least until growpart runs16:10
minimalholmanb: was that kernel output taken from Azure? I'm guessing so16:12
holmanbyes16:12
minimalI'm assuming your disk image is 8GB and 20Gb is AFAIK the small disk size for Azure16:13
minimalFYI as I'm building disk images myself I seem to remember that Azure has rules for disk images to be included into their Marketplace, from memory one of those is that disk images MUST be 20Gb in size lol16:14
minimalwell, as a minimum16:14
holmanbminimal: oops, no this is from GCE, not azure16:14
holmanband a 10G root fs16:15
minimalok, I was guessing those figures in the kernel output were bytes16:15
minimalso then is the disk image 4Gb in size?16:16
holmanbI think so16:19
holmanbsince I see:16:20
holmanbkernel: EXT4-fs (sda1): resizing filesystem from 1020155 to 2593019 blocks16:20
minimalso the kernel message is specific to GPT, you wouldn't see such a message for MBR partitioned/labelled disks.16:22
jchittumi've been following a bit, but i'm not caught up on the case -- it's using `cloud-init` to do image-resize on launch for lunar?16:24
holmanbjchittum: yes, which is default behavior on cloud-init boot16:28
holmanb*for the root partition16:29
jchittummake sure my brain has it : first boot, cloud-init expands root to equal instance disk size. and you're seeing intermittent issues with resize on GCE and Azure16:31
jchittumminimal: azure default disk is 30gb16:31
jchittumHow will it show up on an instance? cloud-init will fail to finish succesfully?16:32
holmanbjchittum: I wouldn't say intermittent - I reproduced it on my first GCE boot. I can retry a couple times to check if it is intermittent16:33
blackboxswIt's possible that in only just started showing up. on GCE/Azure. prior runs from 3 days ago looked good https://jenkins.canonical.com/server-team/view/cloud-init/job/cloud-init-integration-lunar-gce/46/16:34
jchittumi'm looking at tests from yesterday16:35
blackboxswyep, Azure 3 days ago clean, 2 days ago failure to resize16:35
jchittumWhat will it show up as? cloud-init having a failed status? 16:35
holmanbjchittum: symptoms would be resize2fs fails during boot, cloud-init will otherwise complete but with errors in logs16:36
holmanbjchittum: a quick test would be: `grep resize2fs /var/log/cloud-init.log`16:36
jchittumwell, i can say we don't have a test for that now, so, no, it's not failing our CI16:36
holmanb*requires the colon to filter out other logs for the quick test: `grep 'resize2fs\:' /var/log/cloud-init.log`16:38
blackboxswyes as holmanb said, cloud-init passes setup, but fails to resize the disk so the disk will be smaller than availbale size. Grepping for `Traceback` in /var/log/cloud-init.log shows `16:39
blackboxsw 'cloudinit.subp.ProcessExecutionError: Unexpected error while running '16:39
blackboxsw 'command.\n'16:39
blackboxsw "Command: ('resize2fs', '/dev/root')\n"`16:39
blackboxswor easier `resize2fs: Superblock checksum does not match superblock`16:39
blackboxswfull log here for those inside canonical https://jenkins.canonical.com/server-team/view/cloud-init/job/cloud-init-integration-lunar-azure/50/testReport/tests.integration_tests/test_upgrade/test_clean_boot_of_upgraded_package/16:40
waldiblackboxsw: how did you manage to break superblock checksums?16:41
blackboxswwaldi: an excellent question. not certain yet. I wish I had the power to do that. I break everything else at home. Just noticed it from our daily jenkins test runners and still digging into the 'how' that happened16:51
blackboxswit definitely feels like a strange corner case failure during image publishing. But we can report here when we find out what gives.16:51
blackboxswit doesn't look like anything cloud-init related on first blush.16:52
blackboxswit could be patches to resize2fs or something during image publish that triggered this case in Ubuntu Lunar. (doesn't affect other stable releases such as bionic, focal, jammy, kinetic etc).16:54
waldiblackboxsw: what comes to mind is https://bugs.debian.org/1023450, where the kernel used the wrong checksum on resize in linux 6.0. but that sounds different16:55
-ubottu:#cloud-init- Debian bug 1023450 in linux "e2fsprogs - Does not agree with kernel on clean state" [Important, Open]16:55
blackboxswand specifically looking in ubuntu's git archive for what's landed in ubuntu/devel distro branches .... I see no changes in e2fsprogs since Feb 17th that published to Lunar, so it feels like that's not the culprit either (no deltas between success runs on ubuntu vs failures in the last few days/week)16:59
blackboxswchecking waldi to see if your patch is already included, just in case17:00
waldidid you check what kernel changes got in, both in the build environment and the final image?17:06
waldibecause i assume you use the kernel ext4 implementation during image build and not the mke2fs one17:07
jchittumfrom an image standpoint,i see no package diffs between 20230316 and 2023031917:07
blackboxswahh. sry bug is kernel-owned 6.0 not e2fsprogs. And Ubuntu 23.04 (Lunar) is 6.1 based.17:08
blackboxswchecking kernel logs for the 'fix' and seeing if it's already in the kernel 17:08
blackboxswchecking kernel logs for the 'related fix and discussion' to see if it's already in the Ubuntu kernel.17:08
holmanbblackboxsw: lunar has 6.1 in -proposed still, which I don't see in use on gcp (this was repro'd on 5.19)17:09
blackboxswholmanb: orig bug report claimed it showed up in Found in version 6.0-1~exp1.... I'm not sure if that issue would have been present in 5.19 .... but we can try to track any backports applied to see17:10
blackboxswI think it's this commit but digging https://github.com/torvalds/linux/commit/9a8c5b0d061517:11
-ubottu:#cloud-init- Commit 9a8c5b0 in torvalds/linux "ext4: update the backup superblock's at the end of the online resize"17:11
holmanbsure, I'm still curious why this would have only started failing in the last couple of days, however17:14
minimalblackboxsw: one other possibility, the latest release of e2fsprogs, 1.47.0, changes mkfs.ext4 to enable 2 additional options by default: metadata_csum_seed and orphan_file17:16
minimalcould you be creating the ext4 fs for the disk image using this latest version of mkfs.ext4?17:17
holmanbmetadata_csum_seed is actually disabled in lunar17:17
minimalholmanb: does lunar contain e2fsprogs 1.47.0?17:17
holmanbminimal 1.47.0-1ubuntu117:17
holmanband I saw a message about the orphan file thing in the klog but haven't dug into how it would / could be related17:18
holmanbminimal: https://changelogs.ubuntu.com/changelogs/pool/main/e/e2fsprogs/e2fsprogs_1.47.0-1ubuntu1/changelog17:18
minimalok, because in general the orphan_file option presents a problem - older versions of fsck do not know about it and cannot fsck such a fs....which causes data recovery issues (i.e. if you're using a ISO/recovery OS with an older version)17:19
minimalok, so metadata_csum_seed is disable, but not orphan_file - that'll cause the fsck "compatibility" issues I mentioned17:21
jchittumdigging further back, i see e2fsprogs migrate from 1.46.6~rc1-1ubuntu1 to 1.47.0-1ubuntu1 somewhere around 2023031117:35
jchittumthere has not been a kernel SRU since, and azure is still sitting on 5.19.0.1010.9 17:36
jchittumso we are time aligned as well (sorry, was in meetings had this up and typed but didn't send)17:37

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!