=== genii is now known as genii-core | ||
williamo | I have multiple u20.04 VMs that refuse to boot kernel versions after 5.4.0-80, what do? | 03:36 |
---|---|---|
lotuspsychje | could try !HWE williamo | 03:36 |
williamo | !HWE | 03:36 |
ubottu | The Ubuntu LTS enablement stacks provide newer kernel and X support for existing LTS releases, see https://wiki.ubuntu.com/Kernel/LTSEnablementStack | 03:36 |
lotuspsychje | williamo: or, investigate why the -80 kernels doesnt wanna boot exactly | 03:37 |
williamo | every kernel up to and including 5.4.0-80 boots and works. kernels after that seem to fail to read the disks. | 03:37 |
lotuspsychje | !info linux-image-generic focal | 03:38 |
ubottu | linux-image-generic (5.4.0.90.94, focal): Generic Linux kernel image. In component main, is optional. Built by linux-meta. Size 3 kB / 18 kB. (Only available for amd64, armhf, arm64, powerpc, ppc64el, s390x.) | 03:38 |
williamo | running amd64 | 03:38 |
lotuspsychje | even the latest -90 one doesnt boot williamo ? | 03:39 |
williamo | the furthest I've gotten is a busybox prompt after tossing a 'break' into the kernel boot line. | 03:39 |
williamo | correct. 5.4.0-90 does not boot | 03:39 |
lotuspsychje | thats indeed a weird one | 03:39 |
lotuspsychje | williamo: can you still grab a dmesg from a failed boot one? | 03:40 |
williamo | not sure how I would. can't read/write/touch the disks without it dying, don't have networking as far as I can tell | 03:41 |
williamo | best I can do is screenshot from ESX console | 03:41 |
lotuspsychje | williamo: or if you can textboot F1 at boot and see how far you can go | 03:42 |
lotuspsychje | maybe we lucky at wich point it gets stuck | 03:42 |
williamo | I do have VMs that do boot up after -80, the only difference on the esx side is if there are a mixture of encrypted/nonencrypted disks | 03:42 |
williamo | if all the disks are either encrypted or unencrypted in esx, it boots. | 03:43 |
williamo | it looks like it gets stuck at the point where it is trying to talk to the disks and get them sorted | 03:44 |
lotuspsychje | if you can log or screenshot something, that could be helpful for the volunteers to help tracing whats the bottleneck | 03:45 |
williamo | https://imgur.com/dUz2966 | 03:46 |
lotuspsychje | udev database hmm | 03:48 |
williamo | https://imgur.com/crPuuVl It gets about this far, and it just sits there till about 240s for those timeouts to occur | 03:50 |
williamo | and I think I remember hearing from someone else that 18.04 is having the same issue | 03:51 |
lotuspsychje | not sure myself williamo dont think i saw that udev error before, dont find any related bugs right away neither | 03:52 |
lotuspsychje | you think you could try a !hwe kernel for a test? | 03:53 |
lotuspsychje | see if the 5.11 and higher series influence this | 03:53 |
williamo | sure | 03:53 |
williamo | I'll wait for the full timeout, but I'm getting the same issue | 03:57 |
lotuspsychje | ouch | 03:59 |
williamo | https://imgur.com/mOJaGQI https://imgur.com/A9hLpMh | 04:04 |
lotuspsychje | williamo: maybe we should file a new !bug on this, and attach all your logs you shared to it | 04:05 |
lotuspsychje | williamo: can you reproduce this on 1 machine only or serveral? | 04:06 |
williamo | I can reproduce this on other VMs that have the similar configuration with 1 of 3 disks esx encrypted. as far as the vm guest is concerned, this should be 100% transparent and shouldn't matter, but for some reason it does. | 04:07 |
williamo | or 2/3 disks encrypted | 04:07 |
lotuspsychje | try ubuntu-bug linux to start your bug file, then add your story & logs to it and ill ping some volunteers about it later, see if they find something | 04:09 |
williamo | why can we just encrypt/unencrypt all? we have stupid amounts of data has has to be encrypted, and even more dumb amounts of data that is too large to beencrypted. | 04:09 |
williamo | and now to type this uuid url in by hand. | 04:11 |
williamo | hooray, first bug report. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1950712 | 04:23 |
ubottu | Launchpad bug 1950712 in linux (Ubuntu) "ESX u20.04 VM refuses to boot after upgrading to kernel 5.4.0-81 and up" [Undecided, New] | 04:23 |
williamo | now to go bug that other person for info about 18.04 that might be doing the same stuff | 04:23 |
williamo | in 8 hours | 04:24 |
=== ^wuseman is now known as Guest6898 | ||
lotuspsychje | great work williamo ! | 04:32 |
lotuspsychje | hang around a bit and we can see if more volunteers are awake, if we can trace some more | 04:34 |
lotuspsychje | tomreyn: can you take a look for williamo see if you got any ideas to try? ^ | 04:56 |
tomreyn | lotuspsychje: i can try ;) | 05:03 |
tomreyn | williamo: since this seems to be esx related, have you verified that you're running the latest / a supported, fully patched esx server version? | 05:04 |
tomreyn | williamo: have you tried booting without those extra kernel parameters? why are you using them, are they actually needed? | 05:04 |
tomreyn | there's a kernel oops on your screenshot, but (a) it may not be the first, and thus a side effect of a former one, (b) we can't see the actually failing module there, since it already scrolled off the screen. you may want to look into some kernel debugging options to get a better idea of what's failing there. | 05:08 |
tomreyn | https://wiki.ubuntu.com/Kernel/KernelDebuggingTricks | 05:08 |
tomreyn | the easiest will likely be to set up a serial console | 05:10 |
tomreyn | searching the web for the err message printed on your screen, "device [..] not initialized in udev database even after" points to a bug in lvm2, but those are from around 2019, and this bug has since been fixed in debian, and probably ubuntu, too: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=925247 ; additionally, those are first triggered by grub-mkconfig during OS install, not during boot. | 05:20 |
ubottu | Debian bug 925247 in fai-client "setting up lvm2 hangs for a long time" [Normal, Open] | 05:20 |
tomreyn | in the 5.4.0-80 boot dmesg, you have multiple "BAR 13: no space for [io size 0x1000]" errors. that's memory allocation failing for the VMware PCI Express Root Ports. | 05:38 |
=== PC_ is now known as deksar | ||
=== calcmandan_ is now known as calcmandan | ||
jamespage | icey, coreycb: doko fixed up greenlet/eventlet sufficiently for packages to build and unit test OK so I'm dealing with the openstack packages that need to drop depends on python3-crypto | 09:33 |
icey | fantastic | 09:35 |
=== cpaelzer_ is now known as cpaelzer | ||
coreycb | jamespage: ah thanks for doing that! it was on my list and hadn't gotten to it. | 13:29 |
icey | jamespage: see that ceph risc failure? best fail reason ever: "/usr/bin/ar: unable to copy file '../../lib/librgw_a.a'; reason: Success" | 13:53 |
jamespage | coreycb: np | 13:54 |
jamespage | icey: gotta live a risc failure after 23 hours of building... | 13:54 |
jamespage | live/love rather | 13:54 |
icey | jamespage: and only 85% done :-P | 13:55 |
xnox | /usr/bin/ar: unable to copy file '../../lib/librgw_a.a'; reason: Success => i feel like retrying ceph riscv64 build | 14:59 |
ginggs | nothing succeeds like success | 15:01 |
=== genii-core is now known as genii | ||
williamo | tomreyn, only 1/3 disks has LVM, the other 2 are also showing issues with accessing data | 15:42 |
williamo | frustrating, esx will not output serial console to text files for encrypted VMs | 16:31 |
tomreyn | williamo: did you verify that esx is up to date, though? | 22:32 |
=== genii is now known as genii-core |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!