/srv/irclogs.ubuntu.com/2024/02/09/#ubuntu-kernel.txt

=== chris14_ is now known as chris14
lilo_booterhi - mentioned this a couple of months back, but with work requirements, i have been unable to look into it - basically, since the release of kernel 6.5 on ubuntu 22.04, i have been unable to boot my main computer on the current kernel - i have been using kernel-6.2.0-39-generic since the problem started (and this has been fine so far)10:45
arighililo_booter, can you elaborate more on "unable to boot"? do you see some errors in the console or just a black screen or...?10:50
lilo_booteri have time to diagnose the issue now - the first thing is that i currently have 6.5.0-17 and it doesn't boot in either normal or recovery mode - some normal diagnostics are shown before it reboots to the bios - there is no record of the attempt to boot in journalctl, and if there is any error reported in the diagnostics it's impossible to say as it just clears the screen immediately - i10:50
lilo_booterhave tried filming it, but that is not showing any errors 10:50
arighican you provide a brief description of your hardwre / system? is that a standard Ubuntu installation?10:51
lilo_booterstandard 22.04 - ryzen 9, 64GB, nvidia graphics, 2 SSD drives10:52
lilo_bootercan reboot into the old kernel and get a more detailed report10:53
lilo_bootermachine is back up - are there specific commands you want me to run?10:55
lilo_booterCPU: AMD Ryzen 9 5950X (32) @ 3.400GHz, GPU: NVIDIA GeForce RTX 3070 Ti, GPU: NVIDIA GeForce RTX 3070 Ti10:57
ogra_you should probably not use lilo then 😉10:58
ogra_(SCNR)10:58
lilo_booter:)10:59
lilo_booterthe nick just shows long ago it was when i first connected to irc :)11:00
lilo_booteri guess you guys have a jira like system for reporting issues like this? can you provide any pointers as to the kind of stuff i would need to provide in creating a ticket?11:03
arighililo_booter, are you using nouveau by chance? asking because we had some recent boot issues reported that were caused by nouveau11:03
lilo_booternope - nvidia drivers11:04
arighiI tried... :D11:04
lilo_booterappreciate any help at all :) - i'm ok for now - i'm just worried that the working kernel might get removed at some point...11:06
arighiwe have a quite similar config, my desktop is an AMD Ryzen 7 5800X with an NVIDIA GeForce RTX 307011:06
arighibut apparently I can boot any 6.5, 6.7 and 6.8 kernel11:07
lilo_booteri guess i can try creating a usb boot disk for the latest 22.04 to see if that boots - but i guess that puts me in the nouveau domain11:08
arighihave you tried to boot a fresh ubuntu image, let's say from USB, just to see if the problem is the kernel itself or maybe a specific setting in your system?11:08
arighililo_booter, great minds think alike :)11:08
lilo_booterseems that way :) - thanks11:08
lilo_booterok - will break for a bit - few errands to run - but will continue in that line of attack this afternoon - thanks for the help so far arighi :)11:09
arighiI'd try that anyway, it should be a fairly easy and quick test and if it boots we know that we need to dig in your system settings11:09
arighililo_booter, np, thanks for reporting this11:09
lilo_booterno worries11:10
lilo_booterhmm - i see this - [Firmware Bug]: TPM Final Events table missing or invalid12:40
lilo_booterdevice-mapper - duplicate ima measurements will not be recorded in the ima log12:42
lilo_booterplatform eisa.0: EISA: cannot allocate resource for mainboard plus same for EISA slots 1 through 8 - 0 eisa cards detected12:44
lilo_booteracpi PNPOC14 01 through 04 duplicate WMI GUID12:45
lilo_booterusb: port power powermanagement maybe unreliable12:46
lilo_booterusb 3-4: config 1 has an invalid interface number: 2 but max is 112:46
lilo_booterGPT:primary header thinks Alt. header is not at end of disk - use gnu parted to correct - against sda which is main hd i guess12:50
lilo_booterthis is running from a second internal drive - 5.15.0 came up - but it dumped me in a recovery shell at the end of the boot 12:55
lilo_bootertrying from fresh usb now, booting by way of bios boot - selected manual override for the device - got grub menu, selected "try or install" - "republic of gamers" logo shows up, usb drive is continuing to be accessed - keyboard is unresponsive - cannot remove logo to see diagnostics13:17
lilo_booterwill leave it for a while, but does appear to be hung13:20
lilo_booteryeah - usb boots up in safe graphics, hangs with normal boot13:33
lilo_booteri'll simplify the system - unplug all usb except keyboard, mouse and drive itself - and will try to disable this annoying "republic of gamers" crap - defnitely not helping 13:34
lilo_booterdisabled logo - black screen after accessing "try or install" and appears to have hung again - no diagnostics shown13:42
lilo_booteri can try disabling the onboard network card - i also have a firewire card which i could take out i guess13:42
lilo_booterarighi: i guess this usb behaviour might overlap with suggestion about noveau?13:44
arighililo_booter, could be... we're planning to disable nouveau in the upcoming 6.8 kernel, hopefully it'll help to fix/debug issues like this14:18
lilo_booterwell, i started by disabling TPM in the bios - rebooted by way of second internal disk - unsuprisingly that removed the TPM firmware diagnostic, but otherwise behaved the same14:20
lilo_booterbut i do have a new one - mce: [hardware error]: cpu 0: machine check: 0 bank 27: faa00000000080b (may have miscounted 0s there)14:21
lilo_booterthe usb 3-4 config error still shows14:22
lilo_booterhmm - how do i map a /dev/disk/by-uuid/xxxx-xxxx to a device/partition14:24
mdiewalilo_booter: use ls -l14:27
lilo_booterhmm - thanks - weird though - on the boot from the second drive (which is running a very old 22.04 - can't recall why i installed it in the first place tbh), it's running a check on A840-C7E5 - it doesn't exist though14:33
=== fling_ is now known as fling
lilo_booterdisabling network devices in the bios didn't help either14:52
=== NotEickmeyer is now known as Eickmeyer
lilo_booterah - back to usb - try or install again - this time i edited the bootup to remove "quiet splash" and finally get a log - hangs immediately after detecting usb device with vendor 1d6b and product 000315:02
lilo_booterwell, i'm confused - that's a usb hub and i guess it must be internal to the machine as i definitely don't have anything plugged in15:18
lilo_booteri've even swapped out my keyboard/mouse - same (though the bios now detects 1 keyboard, 1 mouse, 1 hub rather than 2 keyboards, 2 mouse, 1 hub)15:20
lilo_booterthere's a scary option in the bios to disable "usb device" - singular - no explanation of what it does - if ti disables all, then surely i would lose the keyboard and mouse necessary to re-enable them?15:25
lilo_booteri'm at a loss - don't get it at all - the fact that the usb drive fails to boot in non-safe mode suggests it's not down to settings in my install at least15:54
lilo_booterah - but ... it's probably worth checking 24.04 to see if it has the same issue15:55
lilo_booter24.04 running under qemu at least - good start :)16:15
lilo_booterbut almost identical behaviour on boot (except it doesn't hang after detecting the usb hub - just reboots the enitre machine)16:20
lilo_booterand safe mode no longer works - same reboot16:22
lilo_booterand actually, qemu doesn't seem to get passed the initial boot - hangs when starting the gui (i got impatient and killed it, so may have eventually recovered)16:27
lilo_booterunsure how to continue - any suggestions would be very welcome :)16:32
lilo_booter(and no worries if you don't feel like parsing the text above - quite happy to repeat, reprhase and retest)16:34
lilo_bootertried 24.04 under virt-manager - not altogether surprised that its working there :)16:51
arighililo_booter, hm... it's pretty hard to debug this from here without seeing any specific kernel errors, it seems to be related to some issues with your specific hardware, if the system reboots it's quite difficult to get some errors, maybe in the case when it's hanging you can try to boot with initcall_debug to have a better understanding of where it gets stuck16:53
arighior try the usual "safe" settings, like boot with "noapic nolapic", even if for what I read it seems to be related to some kind of usb issues...?16:53
lilo_booteri suspect usb, yeah, but could be a red herring - certainly turning up logging seems like a good idea - from the hang and the lack of journalctl, i'm not sure what we'll get - but mostly, i don't know where i specify any of these settings :) - guessing i can edit the grub command to use them though? 16:57
lilo_bootertried your noapci nolapic suggestion on the 24.04 version - ends in a kernel panic17:09
lilo_booteri did see that it mentioned firewire before the panic (usb stuff was above), so i will definitely take that card out next 17:11
arighililo_booter, oh yes, put them in /etc/default/grub (after quiet splash) then `sudo update-grub` and reboot, if you have both a good and a bad kernel installed, otherwise if you boot with an image you need to add the extra boot options directly from grub (editing the entry in the grub menu when it's booting)18:04
lilo_booterfirewire card removed - system reboots into 22.04 current kernel :D18:23
lilo_booterjeez - i am happy that it's as simple as that (though i am somewhat sad to be forced to remove the card :()18:24
arighililo_booter, yay! nice debugging session :) at least we know where the problem is, what kind of fireware card is that?18:29
lilo_booternot a lot on the card - "made in china" (no surprise there) - asmedia (possibly followed by the letters lp2 but written in a weird way such that it could be lp squared iyswim) and asm1083 on one chip (old bugger here - need magnifying glasses to read the rest) - there's also a via chip vt6307 made in taiwan - manufactured/passed qa 2021 (which is definitely around the time i bought it)18:35
lilo_booterwill happily replace if someone can point me at a compatible card (but do want such a beast - have number of old dv tapes ans devices here...)18:38
lilo_booterplugging all the usb devices back in now :)18:44
lilo_booterhot plugging worked at least - nice to have sound again18:47
JanClilo_booter: if that is a PCIE device there should be some information about it in old boot logs (with a working kernel)20:40
JanCstuff like what driver was loaded for it etc.20:40
lilo_booterJanC: cool - will reinstall and reboot on the working kernel tomorrow - will let you know what i find21:22
JanClilo_booter: I'm no kernel dev, just pointing out where you might find more info than on the card itself sometimes21:23
JanCand there should probably be previous boot logs from the working kernel still21:24

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!