caribou | Hello, | 10:50 |
---|---|---|
caribou | Since the drop of support for Python 3.6 has been announced for 23.2, will there be any intermediate releases (i.e. 23.1.x) between now and release of 23.2 ? | 10:50 |
caribou | I am just curious to know if there is any chance that our PR (https://github.com/canonical/cloud-init/pull/2033) may have a chance to be integrated in a python 3.6 supported version | 10:50 |
-ubottu:#cloud-init- Pull 2033 in canonical/cloud-init "Adapt DataSourceScaleway to IPv6" [Open] | 10:50 | |
caribou | otherwise we will look at how we handle the distros with only python 3.6 support | 10:51 |
meena | caribou: which distros are those? | 11:00 |
caribou | From our side, AlmaLinux-8, RockyLinux-8, CentOS-stream 8 & Ubuntu Bionic (this last one will be EOS by mid april) | 11:01 |
caribou | btw, when I say "only python 3.6 support" I mean natively on their cloud images. Of course newer versions may be installable but their cloud images are delivered only with 3.6 | 11:02 |
meena | what's the support timeline for those rgel-ish 8 systems? | 11:04 |
caribou | dunno, I'll check that out | 11:21 |
caribou | meena: Almalinux 8 : 1 may 2024 | 11:23 |
caribou | RockyLinux 8 : 31 may 2024 | 11:23 |
caribou | CentOS Stream 8 : 31 may 2024 | 11:23 |
caribou | FWIW : the rhel-ish distro all run some flavor of 22.1 | 11:33 |
meena | caribou: given the time line for 23.2, that seems fitting then | 11:44 |
meena | so, yeah, i would push reviewers to get that in, if you're going to be supporting those OSes for longer than that | 11:45 |
falcojr | we actually got enough response to that 3.6 email to the mailing list that we'll likely be extending support for a bit longer. We still need to agree on a time line though, so if you have any specific concerns, let us know soon | 11:56 |
caribou | falcojr: other the the PR cited above and for which holmanb is already on it nothing, thanks | 12:47 |
blackboxsw | hey falcojr holmanb aciba I'm seeing intermittent resize2fs errors on lunar. Have we seen those elsewhere? I saw one fly by on azure too but it's not affecting all platforms or all builds on the same platform https://jenkins.canonical.com/server-team/view/cloud-init/job/cloud-init-integration-lunar-gce/lastSuccessfulBuild/testReport/junit/tests.integration_tests/test_upgrade/test_clean_boot_of_upgraded_package/ | 15:00 |
blackboxsw | I ask because I'd like to try to queue an upload for lunar to help out this MAAS bug with ourt postinst | 15:00 |
blackboxsw | https://bugs.launchpad.net/bugs/2009746 | 15:01 |
-ubottu:#cloud-init- Launchpad bug 2009746 in maas (Ubuntu) "dpkg-reconfigure cloud-init: yaml.load errors during MAAS deloyment of Ubuntu 23.04(Lunar)" [Undecided, Confirmed] | 15:01 | |
blackboxsw | the rest of the jenkins jobs seem to look pretty healthy with the exception of an intermittent errors due to flaky tests that don't persist acrosss test runs | 15:01 |
falcojr | that failure doesn't look familiar to me | 15:02 |
falcojr | Superblock checksum does not match superblock while trying to open /dev/root | 15:02 |
holmanb | caribou: pr 2033 should be able to get into a 3.6 release. I expect we will likely make 23.4 the final release with 3.6 support (will follow up in the thread publicly at some point) | 15:02 |
holmanb | blackboxsw: I also haven't seen that in our tests before | 15:06 |
holmanb | some sort of fs corruption issue | 15:06 |
caribou | holmanb: thanks for the info ! | 15:34 |
paride | blackboxsw, it doesn't looks familiar to me | 15:42 |
aciba | blackboxsw: I did not see that error, previously. | 15:48 |
blackboxsw | aciba:parid holmanb thx. ok I also just confirmed we have nothing in cloud-init that's touched this specific area of code. So nothing that we've introduced there. I'll try kicking off another jenkins run to assert that this isn't a persistent problem on Azure/GCP | 15:51 |
holmanb | blackboxsw: I reproduced it on GCE | 15:51 |
blackboxsw | I'm not seeing it on LXC VMs | 15:51 |
holmanb | blackboxsw: I suspect an image build issue | 15:51 |
blackboxsw | and not seeing on ec2 | 15:52 |
blackboxsw | @holmanb do you have /etc/cloud/build.info on your GCE image? | 15:52 |
blackboxsw | I'm launching Azure now to see | 15:52 |
holmanb | on first boot, the kernel log shows issues with the partition table in that test | 15:53 |
holmanb | blackboxsw: yes | 15:53 |
holmanb | build_name: server | 15:53 |
holmanb | serial: 20230319 | 15:53 |
holmanb | https://dpaste.org/wbZwR | 15:53 |
blackboxsw | +1 thx we may want to take that by CPC team to see if they are aware of image build issues at the moment | 15:54 |
holmanb | what's odd is that in that the error only happens on the initial cloud-init run, not the subsequent - it makes me curious why this isn't seen in other tests | 15:59 |
minimal | holmanb: does growpart run during the initial cloud-init run? If so then as part of growing the partition won't it move the GPT header? | 16:03 |
holmanb | minimal: I think that's right | 16:07 |
holmanb | minimal: however the GPT alt header was incorrect before systemd even started, so something was wrong in the image prior to cloud-init doing anything | 16:07 |
minimal | so is this not the result of a disk image being created of a given size, but being put onto a disk of a larger size (and so the GPT header at the end of the disk image is no long at the end of the disk? | 16:08 |
holmanb | minimal: that would be one way to cause this | 16:09 |
minimal | i.e. you create a 1GB disk image and so the alt GPT header is around the 1GB mark but you create a VM with a 2GB disk using that disk image and so the alt GPT header is still around the 1GB mark, at least until growpart runs | 16:10 |
minimal | holmanb: was that kernel output taken from Azure? I'm guessing so | 16:12 |
holmanb | yes | 16:12 |
minimal | I'm assuming your disk image is 8GB and 20Gb is AFAIK the small disk size for Azure | 16:13 |
minimal | FYI as I'm building disk images myself I seem to remember that Azure has rules for disk images to be included into their Marketplace, from memory one of those is that disk images MUST be 20Gb in size lol | 16:14 |
minimal | well, as a minimum | 16:14 |
holmanb | minimal: oops, no this is from GCE, not azure | 16:14 |
holmanb | and a 10G root fs | 16:15 |
minimal | ok, I was guessing those figures in the kernel output were bytes | 16:15 |
minimal | so then is the disk image 4Gb in size? | 16:16 |
holmanb | I think so | 16:19 |
holmanb | since I see: | 16:20 |
holmanb | kernel: EXT4-fs (sda1): resizing filesystem from 1020155 to 2593019 blocks | 16:20 |
minimal | so the kernel message is specific to GPT, you wouldn't see such a message for MBR partitioned/labelled disks. | 16:22 |
jchittum | i've been following a bit, but i'm not caught up on the case -- it's using `cloud-init` to do image-resize on launch for lunar? | 16:24 |
holmanb | jchittum: yes, which is default behavior on cloud-init boot | 16:28 |
holmanb | *for the root partition | 16:29 |
jchittum | make sure my brain has it : first boot, cloud-init expands root to equal instance disk size. and you're seeing intermittent issues with resize on GCE and Azure | 16:31 |
jchittum | minimal: azure default disk is 30gb | 16:31 |
jchittum | How will it show up on an instance? cloud-init will fail to finish succesfully? | 16:32 |
holmanb | jchittum: I wouldn't say intermittent - I reproduced it on my first GCE boot. I can retry a couple times to check if it is intermittent | 16:33 |
blackboxsw | It's possible that in only just started showing up. on GCE/Azure. prior runs from 3 days ago looked good https://jenkins.canonical.com/server-team/view/cloud-init/job/cloud-init-integration-lunar-gce/46/ | 16:34 |
jchittum | i'm looking at tests from yesterday | 16:35 |
blackboxsw | yep, Azure 3 days ago clean, 2 days ago failure to resize | 16:35 |
jchittum | What will it show up as? cloud-init having a failed status? | 16:35 |
holmanb | jchittum: symptoms would be resize2fs fails during boot, cloud-init will otherwise complete but with errors in logs | 16:36 |
holmanb | jchittum: a quick test would be: `grep resize2fs /var/log/cloud-init.log` | 16:36 |
jchittum | well, i can say we don't have a test for that now, so, no, it's not failing our CI | 16:36 |
holmanb | *requires the colon to filter out other logs for the quick test: `grep 'resize2fs\:' /var/log/cloud-init.log` | 16:38 |
blackboxsw | yes as holmanb said, cloud-init passes setup, but fails to resize the disk so the disk will be smaller than availbale size. Grepping for `Traceback` in /var/log/cloud-init.log shows ` | 16:39 |
blackboxsw | 'cloudinit.subp.ProcessExecutionError: Unexpected error while running ' | 16:39 |
blackboxsw | 'command.\n' | 16:39 |
blackboxsw | "Command: ('resize2fs', '/dev/root')\n"` | 16:39 |
blackboxsw | or easier `resize2fs: Superblock checksum does not match superblock` | 16:39 |
blackboxsw | full log here for those inside canonical https://jenkins.canonical.com/server-team/view/cloud-init/job/cloud-init-integration-lunar-azure/50/testReport/tests.integration_tests/test_upgrade/test_clean_boot_of_upgraded_package/ | 16:40 |
waldi | blackboxsw: how did you manage to break superblock checksums? | 16:41 |
blackboxsw | waldi: an excellent question. not certain yet. I wish I had the power to do that. I break everything else at home. Just noticed it from our daily jenkins test runners and still digging into the 'how' that happened | 16:51 |
blackboxsw | it definitely feels like a strange corner case failure during image publishing. But we can report here when we find out what gives. | 16:51 |
blackboxsw | it doesn't look like anything cloud-init related on first blush. | 16:52 |
blackboxsw | it could be patches to resize2fs or something during image publish that triggered this case in Ubuntu Lunar. (doesn't affect other stable releases such as bionic, focal, jammy, kinetic etc). | 16:54 |
waldi | blackboxsw: what comes to mind is https://bugs.debian.org/1023450, where the kernel used the wrong checksum on resize in linux 6.0. but that sounds different | 16:55 |
-ubottu:#cloud-init- Debian bug 1023450 in linux "e2fsprogs - Does not agree with kernel on clean state" [Important, Open] | 16:55 | |
blackboxsw | and specifically looking in ubuntu's git archive for what's landed in ubuntu/devel distro branches .... I see no changes in e2fsprogs since Feb 17th that published to Lunar, so it feels like that's not the culprit either (no deltas between success runs on ubuntu vs failures in the last few days/week) | 16:59 |
blackboxsw | checking waldi to see if your patch is already included, just in case | 17:00 |
waldi | did you check what kernel changes got in, both in the build environment and the final image? | 17:06 |
waldi | because i assume you use the kernel ext4 implementation during image build and not the mke2fs one | 17:07 |
jchittum | from an image standpoint,i see no package diffs between 20230316 and 20230319 | 17:07 |
blackboxsw | ahh. sry bug is kernel-owned 6.0 not e2fsprogs. And Ubuntu 23.04 (Lunar) is 6.1 based. | 17:08 |
blackboxsw | checking kernel logs for the 'fix' and seeing if it's already in the kernel | 17:08 |
blackboxsw | checking kernel logs for the 'related fix and discussion' to see if it's already in the Ubuntu kernel. | 17:08 |
holmanb | blackboxsw: lunar has 6.1 in -proposed still, which I don't see in use on gcp (this was repro'd on 5.19) | 17:09 |
blackboxsw | holmanb: orig bug report claimed it showed up in Found in version 6.0-1~exp1.... I'm not sure if that issue would have been present in 5.19 .... but we can try to track any backports applied to see | 17:10 |
blackboxsw | I think it's this commit but digging https://github.com/torvalds/linux/commit/9a8c5b0d0615 | 17:11 |
-ubottu:#cloud-init- Commit 9a8c5b0 in torvalds/linux "ext4: update the backup superblock's at the end of the online resize" | 17:11 | |
holmanb | sure, I'm still curious why this would have only started failing in the last couple of days, however | 17:14 |
minimal | blackboxsw: one other possibility, the latest release of e2fsprogs, 1.47.0, changes mkfs.ext4 to enable 2 additional options by default: metadata_csum_seed and orphan_file | 17:16 |
minimal | could you be creating the ext4 fs for the disk image using this latest version of mkfs.ext4? | 17:17 |
holmanb | metadata_csum_seed is actually disabled in lunar | 17:17 |
minimal | holmanb: does lunar contain e2fsprogs 1.47.0? | 17:17 |
holmanb | minimal 1.47.0-1ubuntu1 | 17:17 |
holmanb | and I saw a message about the orphan file thing in the klog but haven't dug into how it would / could be related | 17:18 |
holmanb | minimal: https://changelogs.ubuntu.com/changelogs/pool/main/e/e2fsprogs/e2fsprogs_1.47.0-1ubuntu1/changelog | 17:18 |
minimal | ok, because in general the orphan_file option presents a problem - older versions of fsck do not know about it and cannot fsck such a fs....which causes data recovery issues (i.e. if you're using a ISO/recovery OS with an older version) | 17:19 |
minimal | ok, so metadata_csum_seed is disable, but not orphan_file - that'll cause the fsck "compatibility" issues I mentioned | 17:21 |
jchittum | digging further back, i see e2fsprogs migrate from 1.46.6~rc1-1ubuntu1 to 1.47.0-1ubuntu1 somewhere around 20230311 | 17:35 |
jchittum | there has not been a kernel SRU since, and azure is still sitting on 5.19.0.1010.9 | 17:36 |
jchittum | so we are time aligned as well (sorry, was in meetings had this up and typed but didn't send) | 17:37 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!