=== genii is now known as genii-core [03:36] I have multiple u20.04 VMs that refuse to boot kernel versions after 5.4.0-80, what do? [03:36] could try !HWE williamo [03:36] !HWE [03:36] The Ubuntu LTS enablement stacks provide newer kernel and X support for existing LTS releases, see https://wiki.ubuntu.com/Kernel/LTSEnablementStack [03:37] williamo: or, investigate why the -80 kernels doesnt wanna boot exactly [03:37] every kernel up to and including 5.4.0-80 boots and works. kernels after that seem to fail to read the disks. [03:38] !info linux-image-generic focal [03:38] linux-image-generic (5.4.0.90.94, focal): Generic Linux kernel image. In component main, is optional. Built by linux-meta. Size 3 kB / 18 kB. (Only available for amd64, armhf, arm64, powerpc, ppc64el, s390x.) [03:38] running amd64 [03:39] even the latest -90 one doesnt boot williamo ? [03:39] the furthest I've gotten is a busybox prompt after tossing a 'break' into the kernel boot line. [03:39] correct. 5.4.0-90 does not boot [03:39] thats indeed a weird one [03:40] williamo: can you still grab a dmesg from a failed boot one? [03:41] not sure how I would. can't read/write/touch the disks without it dying, don't have networking as far as I can tell [03:41] best I can do is screenshot from ESX console [03:42] williamo: or if you can textboot F1 at boot and see how far you can go [03:42] maybe we lucky at wich point it gets stuck [03:42] I do have VMs that do boot up after -80, the only difference on the esx side is if there are a mixture of encrypted/nonencrypted disks [03:43] if all the disks are either encrypted or unencrypted in esx, it boots. [03:44] it looks like it gets stuck at the point where it is trying to talk to the disks and get them sorted [03:45] if you can log or screenshot something, that could be helpful for the volunteers to help tracing whats the bottleneck [03:46] https://imgur.com/dUz2966 [03:48] udev database hmm [03:50] https://imgur.com/crPuuVl It gets about this far, and it just sits there till about 240s for those timeouts to occur [03:51] and I think I remember hearing from someone else that 18.04 is having the same issue [03:52] not sure myself williamo dont think i saw that udev error before, dont find any related bugs right away neither [03:53] you think you could try a !hwe kernel for a test? [03:53] see if the 5.11 and higher series influence this [03:53] sure [03:57] I'll wait for the full timeout, but I'm getting the same issue [03:59] ouch [04:04] https://imgur.com/mOJaGQI https://imgur.com/A9hLpMh [04:05] williamo: maybe we should file a new !bug on this, and attach all your logs you shared to it [04:06] williamo: can you reproduce this on 1 machine only or serveral? [04:07] I can reproduce this on other VMs that have the similar configuration with 1 of 3 disks esx encrypted. as far as the vm guest is concerned, this should be 100% transparent and shouldn't matter, but for some reason it does. [04:07] or 2/3 disks encrypted [04:09] try ubuntu-bug linux to start your bug file, then add your story & logs to it and ill ping some volunteers about it later, see if they find something [04:09] why can we just encrypt/unencrypt all? we have stupid amounts of data has has to be encrypted, and even more dumb amounts of data that is too large to beencrypted. [04:11] and now to type this uuid url in by hand. [04:23] hooray, first bug report. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1950712 [04:23] Launchpad bug 1950712 in linux (Ubuntu) "ESX u20.04 VM refuses to boot after upgrading to kernel 5.4.0-81 and up" [Undecided, New] [04:23] now to go bug that other person for info about 18.04 that might be doing the same stuff [04:24] in 8 hours === ^wuseman is now known as Guest6898 [04:32] great work williamo ! [04:34] hang around a bit and we can see if more volunteers are awake, if we can trace some more [04:56] tomreyn: can you take a look for williamo see if you got any ideas to try? ^ [05:03] lotuspsychje: i can try ;) [05:04] williamo: since this seems to be esx related, have you verified that you're running the latest / a supported, fully patched esx server version? [05:04] williamo: have you tried booting without those extra kernel parameters? why are you using them, are they actually needed? [05:08] there's a kernel oops on your screenshot, but (a) it may not be the first, and thus a side effect of a former one, (b) we can't see the actually failing module there, since it already scrolled off the screen. you may want to look into some kernel debugging options to get a better idea of what's failing there. [05:08] https://wiki.ubuntu.com/Kernel/KernelDebuggingTricks [05:10] the easiest will likely be to set up a serial console [05:20] searching the web for the err message printed on your screen, "device [..] not initialized in udev database even after" points to a bug in lvm2, but those are from around 2019, and this bug has since been fixed in debian, and probably ubuntu, too: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=925247 ; additionally, those are first triggered by grub-mkconfig during OS install, not during boot. [05:20] Debian bug 925247 in fai-client "setting up lvm2 hangs for a long time" [Normal, Open] [05:38] in the 5.4.0-80 boot dmesg, you have multiple "BAR 13: no space for [io size 0x1000]" errors. that's memory allocation failing for the VMware PCI Express Root Ports. === PC_ is now known as deksar === calcmandan_ is now known as calcmandan [09:33] icey, coreycb: doko fixed up greenlet/eventlet sufficiently for packages to build and unit test OK so I'm dealing with the openstack packages that need to drop depends on python3-crypto [09:35] fantastic === cpaelzer_ is now known as cpaelzer [13:29] jamespage: ah thanks for doing that! it was on my list and hadn't gotten to it. [13:53] jamespage: see that ceph risc failure? best fail reason ever: "/usr/bin/ar: unable to copy file '../../lib/librgw_a.a'; reason: Success" [13:54] coreycb: np [13:54] icey: gotta live a risc failure after 23 hours of building... [13:54] live/love rather [13:55] jamespage: and only 85% done :-P [14:59] /usr/bin/ar: unable to copy file '../../lib/librgw_a.a'; reason: Success => i feel like retrying ceph riscv64 build [15:01] nothing succeeds like success === genii-core is now known as genii [15:42] tomreyn, only 1/3 disks has LVM, the other 2 are also showing issues with accessing data [16:31] frustrating, esx will not output serial console to text files for encrypted VMs [22:32] williamo: did you verify that esx is up to date, though? === genii is now known as genii-core