/srv/irclogs.ubuntu.com/2023/03/20/#cloud-init.txt

caribou	Hello,	10:50
caribou	Since the drop of support for Python 3.6 has been announced for 23.2, will there be any intermediate releases (i.e. 23.1.x) between now and release of 23.2 ?	10:50
caribou	I am just curious to know if there is any chance that our PR (https://github.com/canonical/cloud-init/pull/2033) may have a chance to be integrated in a python 3.6 supported version	10:50
-ubottu:#cloud-init- Pull 2033 in canonical/cloud-init "Adapt DataSourceScaleway to IPv6" [Open]		10:50
caribou	otherwise we will look at how we handle the distros with only python 3.6 support	10:51
meena	caribou: which distros are those?	11:00
caribou	From our side, AlmaLinux-8, RockyLinux-8, CentOS-stream 8 & Ubuntu Bionic (this last one will be EOS by mid april)	11:01
caribou	btw, when I say "only python 3.6 support" I mean natively on their cloud images. Of course newer versions may be installable but their cloud images are delivered only with 3.6	11:02
meena	what's the support timeline for those rgel-ish 8 systems?	11:04
caribou	dunno, I'll check that out	11:21
caribou	meena: Almalinux 8 : 1 may 2024	11:23
caribou	RockyLinux 8 : 31 may 2024	11:23
caribou	CentOS Stream 8 : 31 may 2024	11:23
caribou	FWIW : the rhel-ish distro all run some flavor of 22.1	11:33
meena	caribou: given the time line for 23.2, that seems fitting then	11:44
meena	so, yeah, i would push reviewers to get that in, if you're going to be supporting those OSes for longer than that	11:45
falcojr	we actually got enough response to that 3.6 email to the mailing list that we'll likely be extending support for a bit longer. We still need to agree on a time line though, so if you have any specific concerns, let us know soon	11:56
caribou	falcojr: other the the PR cited above and for which holmanb is already on it nothing, thanks	12:47
blackboxsw	hey falcojr holmanb aciba I'm seeing intermittent resize2fs errors on lunar. Have we seen those elsewhere? I saw one fly by on azure too but it's not affecting all platforms or all builds on the same platform https://jenkins.canonical.com/server-team/view/cloud-init/job/cloud-init-integration-lunar-gce/lastSuccessfulBuild/testReport/junit/tests.integration_tests/test_upgrade/test_clean_boot_of_upgraded_package/	15:00
blackboxsw	I ask because I'd like to try to queue an upload for lunar to help out this MAAS bug with ourt postinst	15:00
blackboxsw	https://bugs.launchpad.net/bugs/2009746	15:01
-ubottu:#cloud-init- Launchpad bug 2009746 in maas (Ubuntu) "dpkg-reconfigure cloud-init: yaml.load errors during MAAS deloyment of Ubuntu 23.04(Lunar)" [Undecided, Confirmed]		15:01
blackboxsw	the rest of the jenkins jobs seem to look pretty healthy with the exception of an intermittent errors due to flaky tests that don't persist acrosss test runs	15:01
falcojr	that failure doesn't look familiar to me	15:02
falcojr	Superblock checksum does not match superblock while trying to open /dev/root	15:02
holmanb	caribou: pr 2033 should be able to get into a 3.6 release. I expect we will likely make 23.4 the final release with 3.6 support (will follow up in the thread publicly at some point)	15:02
holmanb	blackboxsw: I also haven't seen that in our tests before	15:06
holmanb	some sort of fs corruption issue	15:06
caribou	holmanb: thanks for the info !	15:34
paride	blackboxsw, it doesn't looks familiar to me	15:42
aciba	blackboxsw: I did not see that error, previously.	15:48
blackboxsw	aciba:parid holmanb thx. ok I also just confirmed we have nothing in cloud-init that's touched this specific area of code. So nothing that we've introduced there. I'll try kicking off another jenkins run to assert that this isn't a persistent problem on Azure/GCP	15:51
holmanb	blackboxsw: I reproduced it on GCE	15:51
blackboxsw	I'm not seeing it on LXC VMs	15:51
holmanb	blackboxsw: I suspect an image build issue	15:51
blackboxsw	and not seeing on ec2	15:52
blackboxsw	@holmanb do you have /etc/cloud/build.info on your GCE image?	15:52
blackboxsw	I'm launching Azure now to see	15:52
holmanb	on first boot, the kernel log shows issues with the partition table in that test	15:53
holmanb	blackboxsw: yes	15:53
holmanb	build_name: server	15:53
holmanb	serial: 20230319	15:53
holmanb	https://dpaste.org/wbZwR	15:53
blackboxsw	+1 thx we may want to take that by CPC team to see if they are aware of image build issues at the moment	15:54
holmanb	what's odd is that in that the error only happens on the initial cloud-init run, not the subsequent - it makes me curious why this isn't seen in other tests	15:59
minimal	holmanb: does growpart run during the initial cloud-init run? If so then as part of growing the partition won't it move the GPT header?	16:03
holmanb	minimal: I think that's right	16:07
holmanb	minimal: however the GPT alt header was incorrect before systemd even started, so something was wrong in the image prior to cloud-init doing anything	16:07
minimal	so is this not the result of a disk image being created of a given size, but being put onto a disk of a larger size (and so the GPT header at the end of the disk image is no long at the end of the disk?	16:08
holmanb	minimal: that would be one way to cause this	16:09
minimal	i.e. you create a 1GB disk image and so the alt GPT header is around the 1GB mark but you create a VM with a 2GB disk using that disk image and so the alt GPT header is still around the 1GB mark, at least until growpart runs	16:10
minimal	holmanb: was that kernel output taken from Azure? I'm guessing so	16:12
holmanb	yes	16:12
minimal	I'm assuming your disk image is 8GB and 20Gb is AFAIK the small disk size for Azure	16:13
minimal	FYI as I'm building disk images myself I seem to remember that Azure has rules for disk images to be included into their Marketplace, from memory one of those is that disk images MUST be 20Gb in size lol	16:14
minimal	well, as a minimum	16:14
holmanb	minimal: oops, no this is from GCE, not azure	16:14
holmanb	and a 10G root fs	16:15
minimal	ok, I was guessing those figures in the kernel output were bytes	16:15
minimal	so then is the disk image 4Gb in size?	16:16
holmanb	I think so	16:19
holmanb	since I see:	16:20
holmanb	kernel: EXT4-fs (sda1): resizing filesystem from 1020155 to 2593019 blocks	16:20
minimal	so the kernel message is specific to GPT, you wouldn't see such a message for MBR partitioned/labelled disks.	16:22
jchittum	i've been following a bit, but i'm not caught up on the case -- it's using `cloud-init` to do image-resize on launch for lunar?	16:24
holmanb	jchittum: yes, which is default behavior on cloud-init boot	16:28
holmanb	*for the root partition	16:29
jchittum	make sure my brain has it : first boot, cloud-init expands root to equal instance disk size. and you're seeing intermittent issues with resize on GCE and Azure	16:31
jchittum	minimal: azure default disk is 30gb	16:31
jchittum	How will it show up on an instance? cloud-init will fail to finish succesfully?	16:32
holmanb	jchittum: I wouldn't say intermittent - I reproduced it on my first GCE boot. I can retry a couple times to check if it is intermittent	16:33
blackboxsw	It's possible that in only just started showing up. on GCE/Azure. prior runs from 3 days ago looked good https://jenkins.canonical.com/server-team/view/cloud-init/job/cloud-init-integration-lunar-gce/46/	16:34
jchittum	i'm looking at tests from yesterday	16:35
blackboxsw	yep, Azure 3 days ago clean, 2 days ago failure to resize	16:35
jchittum	What will it show up as? cloud-init having a failed status?	16:35
holmanb	jchittum: symptoms would be resize2fs fails during boot, cloud-init will otherwise complete but with errors in logs	16:36
holmanb	jchittum: a quick test would be: `grep resize2fs /var/log/cloud-init.log`	16:36
jchittum	well, i can say we don't have a test for that now, so, no, it's not failing our CI	16:36
holmanb	*requires the colon to filter out other logs for the quick test: `grep 'resize2fs\:' /var/log/cloud-init.log`	16:38
blackboxsw	yes as holmanb said, cloud-init passes setup, but fails to resize the disk so the disk will be smaller than availbale size. Grepping for `Traceback` in /var/log/cloud-init.log shows `	16:39
blackboxsw	'cloudinit.subp.ProcessExecutionError: Unexpected error while running '	16:39
blackboxsw	'command.\n'	16:39
blackboxsw	"Command: ('resize2fs', '/dev/root')\n"`	16:39
blackboxsw	or easier `resize2fs: Superblock checksum does not match superblock`	16:39
blackboxsw	full log here for those inside canonical https://jenkins.canonical.com/server-team/view/cloud-init/job/cloud-init-integration-lunar-azure/50/testReport/tests.integration_tests/test_upgrade/test_clean_boot_of_upgraded_package/	16:40
waldi	blackboxsw: how did you manage to break superblock checksums?	16:41
blackboxsw	waldi: an excellent question. not certain yet. I wish I had the power to do that. I break everything else at home. Just noticed it from our daily jenkins test runners and still digging into the 'how' that happened	16:51
blackboxsw	it definitely feels like a strange corner case failure during image publishing. But we can report here when we find out what gives.	16:51
blackboxsw	it doesn't look like anything cloud-init related on first blush.	16:52
blackboxsw	it could be patches to resize2fs or something during image publish that triggered this case in Ubuntu Lunar. (doesn't affect other stable releases such as bionic, focal, jammy, kinetic etc).	16:54
waldi	blackboxsw: what comes to mind is https://bugs.debian.org/1023450, where the kernel used the wrong checksum on resize in linux 6.0. but that sounds different	16:55
-ubottu:#cloud-init- Debian bug 1023450 in linux "e2fsprogs - Does not agree with kernel on clean state" [Important, Open]		16:55
blackboxsw	and specifically looking in ubuntu's git archive for what's landed in ubuntu/devel distro branches .... I see no changes in e2fsprogs since Feb 17th that published to Lunar, so it feels like that's not the culprit either (no deltas between success runs on ubuntu vs failures in the last few days/week)	16:59
blackboxsw	checking waldi to see if your patch is already included, just in case	17:00
waldi	did you check what kernel changes got in, both in the build environment and the final image?	17:06
waldi	because i assume you use the kernel ext4 implementation during image build and not the mke2fs one	17:07
jchittum	from an image standpoint,i see no package diffs between 20230316 and 20230319	17:07
blackboxsw	ahh. sry bug is kernel-owned 6.0 not e2fsprogs. And Ubuntu 23.04 (Lunar) is 6.1 based.	17:08
blackboxsw	checking kernel logs for the 'fix' and seeing if it's already in the kernel	17:08
blackboxsw	checking kernel logs for the 'related fix and discussion' to see if it's already in the Ubuntu kernel.	17:08
holmanb	blackboxsw: lunar has 6.1 in -proposed still, which I don't see in use on gcp (this was repro'd on 5.19)	17:09
blackboxsw	holmanb: orig bug report claimed it showed up in Found in version 6.0-1~exp1.... I'm not sure if that issue would have been present in 5.19 .... but we can try to track any backports applied to see	17:10
blackboxsw	I think it's this commit but digging https://github.com/torvalds/linux/commit/9a8c5b0d0615	17:11
-ubottu:#cloud-init- Commit 9a8c5b0 in torvalds/linux "ext4: update the backup superblock's at the end of the online resize"		17:11
holmanb	sure, I'm still curious why this would have only started failing in the last couple of days, however	17:14
minimal	blackboxsw: one other possibility, the latest release of e2fsprogs, 1.47.0, changes mkfs.ext4 to enable 2 additional options by default: metadata_csum_seed and orphan_file	17:16
minimal	could you be creating the ext4 fs for the disk image using this latest version of mkfs.ext4?	17:17
holmanb	metadata_csum_seed is actually disabled in lunar	17:17
minimal	holmanb: does lunar contain e2fsprogs 1.47.0?	17:17
holmanb	minimal 1.47.0-1ubuntu1	17:17
holmanb	and I saw a message about the orphan file thing in the klog but haven't dug into how it would / could be related	17:18
holmanb	minimal: https://changelogs.ubuntu.com/changelogs/pool/main/e/e2fsprogs/e2fsprogs_1.47.0-1ubuntu1/changelog	17:18
minimal	ok, because in general the orphan_file option presents a problem - older versions of fsck do not know about it and cannot fsck such a fs....which causes data recovery issues (i.e. if you're using a ISO/recovery OS with an older version)	17:19
minimal	ok, so metadata_csum_seed is disable, but not orphan_file - that'll cause the fsck "compatibility" issues I mentioned	17:21
jchittum	digging further back, i see e2fsprogs migrate from 1.46.6~rc1-1ubuntu1 to 1.47.0-1ubuntu1 somewhere around 20230311	17:35
jchittum	there has not been a kernel SRU since, and azure is still sitting on 5.19.0.1010.9	17:36
jchittum	so we are time aligned as well (sorry, was in meetings had this up and typed but didn't send)	17:37

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!