/srv/irclogs.ubuntu.com/2021/11/12/#ubuntu-server.txt

=== genii is now known as genii-core
williamoI have multiple u20.04 VMs that refuse to boot kernel versions after 5.4.0-80, what do?03:36
lotuspsychjecould try !HWE williamo 03:36
williamo!HWE03:36
ubottuThe Ubuntu LTS enablement stacks provide newer kernel and X support for existing LTS releases, see https://wiki.ubuntu.com/Kernel/LTSEnablementStack03:36
lotuspsychjewilliamo: or, investigate why the -80 kernels doesnt wanna boot exactly03:37
williamoevery kernel up to and including 5.4.0-80 boots and works. kernels after that seem to fail to read the disks.03:37
lotuspsychje!info linux-image-generic focal03:38
ubottulinux-image-generic (5.4.0.90.94, focal): Generic Linux kernel image. In component main, is optional. Built by linux-meta. Size 3 kB / 18 kB. (Only available for amd64, armhf, arm64, powerpc, ppc64el, s390x.)03:38
williamorunning amd6403:38
lotuspsychjeeven the latest -90 one doesnt boot williamo ?03:39
williamothe furthest I've gotten is a busybox prompt after tossing a 'break' into the kernel boot line.03:39
williamocorrect. 5.4.0-90 does not boot03:39
lotuspsychjethats indeed a weird one03:39
lotuspsychjewilliamo: can you still grab a dmesg from a failed boot one?03:40
williamonot sure how I would. can't read/write/touch the disks without it dying, don't have networking as far as I can tell03:41
williamobest I can do is screenshot from ESX console03:41
lotuspsychjewilliamo: or if you can textboot F1 at boot and see how far you can go03:42
lotuspsychjemaybe we lucky at wich point it gets stuck03:42
williamoI do have VMs that do boot up after -80, the only difference on the esx side is if there are a mixture of encrypted/nonencrypted disks03:42
williamoif all the disks are either encrypted or unencrypted in esx, it boots. 03:43
williamoit looks like it gets stuck at the point where it is trying to talk to the disks and get them sorted03:44
lotuspsychjeif you can log or screenshot something, that could be helpful for the volunteers to help tracing whats the bottleneck03:45
williamohttps://imgur.com/dUz296603:46
lotuspsychjeudev database hmm03:48
williamohttps://imgur.com/crPuuVl It gets about this far, and it just sits there till about 240s for those timeouts to occur03:50
williamoand I think I remember hearing from someone else that 18.04 is having the same issue03:51
lotuspsychjenot sure myself williamo dont think i saw that udev error before, dont find any related bugs right away neither03:52
lotuspsychjeyou think you could try a !hwe kernel for a test?03:53
lotuspsychjesee if the 5.11 and higher series influence this03:53
williamosure03:53
williamoI'll wait for the full timeout, but I'm getting the same issue03:57
lotuspsychjeouch03:59
williamo https://imgur.com/mOJaGQI https://imgur.com/A9hLpMh04:04
lotuspsychjewilliamo: maybe we should file a new !bug on this, and attach all your logs you shared to it04:05
lotuspsychjewilliamo: can you reproduce this on 1 machine only or serveral?04:06
williamoI can reproduce this on other VMs that have the similar configuration with 1 of 3 disks esx encrypted. as far as the vm guest is concerned, this should be 100% transparent and shouldn't matter, but for some reason it does.04:07
williamoor 2/3 disks encrypted04:07
lotuspsychjetry ubuntu-bug linux to start your bug file, then add your story & logs to it and ill ping some volunteers about it later, see if they find something04:09
williamowhy can we just encrypt/unencrypt all? we have stupid amounts of data has has to be encrypted, and even more dumb amounts of data that is too large to beencrypted.04:09
williamoand now to type this uuid url in by hand.04:11
williamohooray, first bug report. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/195071204:23
ubottuLaunchpad bug 1950712 in linux (Ubuntu) "ESX u20.04 VM refuses to boot after upgrading to kernel 5.4.0-81 and up" [Undecided, New]04:23
williamonow to go bug that other person for info about 18.04 that might be doing the same stuff04:23
williamoin 8 hours04:24
=== ^wuseman is now known as Guest6898
lotuspsychjegreat work williamo !04:32
lotuspsychjehang around a bit and we can see if more volunteers are awake, if we can trace some more04:34
lotuspsychjetomreyn: can you take a look for williamo see if you got any ideas to try? ^04:56
tomreynlotuspsychje: i can try ;)05:03
tomreynwilliamo: since this seems to be esx related, have you verified that you're running the latest / a supported, fully patched esx server version?05:04
tomreynwilliamo: have you tried booting without those extra kernel parameters? why are you using them, are they actually needed?05:04
tomreynthere's a kernel oops on your screenshot, but (a) it may not be the first, and thus a side effect of a former one, (b) we can't see the actually failing module there, since it already scrolled off the screen. you may want to look into some kernel debugging options to get a better idea of what's failing there.05:08
tomreynhttps://wiki.ubuntu.com/Kernel/KernelDebuggingTricks05:08
tomreynthe easiest will likely be to set up a serial console05:10
tomreynsearching the web for the err message printed on your screen, "device [..] not initialized in udev database even after" points to a bug in lvm2, but those are from around 2019, and this bug has since been fixed in debian, and probably ubuntu, too: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=925247 ; additionally, those are first triggered by grub-mkconfig during OS install, not during boot.05:20
ubottuDebian bug 925247 in fai-client "setting up lvm2 hangs for a long time" [Normal, Open]05:20
tomreynin the 5.4.0-80 boot dmesg, you have multiple "BAR 13: no space for [io  size 0x1000]" errors. that's memory allocation failing for the VMware PCI Express Root Ports.05:38
=== PC_ is now known as deksar
=== calcmandan_ is now known as calcmandan
jamespageicey, coreycb: doko fixed up greenlet/eventlet sufficiently for packages to build and unit test OK so I'm dealing with the openstack packages that need to drop depends on python3-crypto09:33
iceyfantastic09:35
=== cpaelzer_ is now known as cpaelzer
coreycbjamespage: ah thanks for doing that! it was on my list and hadn't gotten to it.13:29
iceyjamespage: see that ceph risc failure? best fail reason ever:  "/usr/bin/ar: unable to copy file '../../lib/librgw_a.a'; reason: Success"13:53
jamespagecoreycb: np13:54
jamespageicey: gotta live a risc failure after 23 hours of building...13:54
jamespagelive/love rather13:54
iceyjamespage: and only 85% done :-P13:55
xnox/usr/bin/ar: unable to copy file '../../lib/librgw_a.a'; reason: Success => i feel like retrying ceph riscv64 build14:59
ginggsnothing succeeds like success15:01
=== genii-core is now known as genii
williamotomreyn, only 1/3 disks has LVM, the other 2 are also showing issues with accessing data15:42
williamofrustrating, esx will not output serial console to text files for encrypted VMs16:31
tomreynwilliamo: did you verify that esx is up to date, though?22:32
=== genii is now known as genii-core

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!