=== chris14_ is now known as chris14 | ||
lilo_booter | hi - mentioned this a couple of months back, but with work requirements, i have been unable to look into it - basically, since the release of kernel 6.5 on ubuntu 22.04, i have been unable to boot my main computer on the current kernel - i have been using kernel-6.2.0-39-generic since the problem started (and this has been fine so far) | 10:45 |
---|---|---|
arighi | lilo_booter, can you elaborate more on "unable to boot"? do you see some errors in the console or just a black screen or...? | 10:50 |
lilo_booter | i have time to diagnose the issue now - the first thing is that i currently have 6.5.0-17 and it doesn't boot in either normal or recovery mode - some normal diagnostics are shown before it reboots to the bios - there is no record of the attempt to boot in journalctl, and if there is any error reported in the diagnostics it's impossible to say as it just clears the screen immediately - i | 10:50 |
lilo_booter | have tried filming it, but that is not showing any errors | 10:50 |
arighi | can you provide a brief description of your hardwre / system? is that a standard Ubuntu installation? | 10:51 |
lilo_booter | standard 22.04 - ryzen 9, 64GB, nvidia graphics, 2 SSD drives | 10:52 |
lilo_booter | can reboot into the old kernel and get a more detailed report | 10:53 |
lilo_booter | machine is back up - are there specific commands you want me to run? | 10:55 |
lilo_booter | CPU: AMD Ryzen 9 5950X (32) @ 3.400GHz, GPU: NVIDIA GeForce RTX 3070 Ti, GPU: NVIDIA GeForce RTX 3070 Ti | 10:57 |
ogra_ | you should probably not use lilo then 😉 | 10:58 |
ogra_ | (SCNR) | 10:58 |
lilo_booter | :) | 10:59 |
lilo_booter | the nick just shows long ago it was when i first connected to irc :) | 11:00 |
lilo_booter | i guess you guys have a jira like system for reporting issues like this? can you provide any pointers as to the kind of stuff i would need to provide in creating a ticket? | 11:03 |
arighi | lilo_booter, are you using nouveau by chance? asking because we had some recent boot issues reported that were caused by nouveau | 11:03 |
lilo_booter | nope - nvidia drivers | 11:04 |
arighi | I tried... :D | 11:04 |
lilo_booter | appreciate any help at all :) - i'm ok for now - i'm just worried that the working kernel might get removed at some point... | 11:06 |
arighi | we have a quite similar config, my desktop is an AMD Ryzen 7 5800X with an NVIDIA GeForce RTX 3070 | 11:06 |
arighi | but apparently I can boot any 6.5, 6.7 and 6.8 kernel | 11:07 |
lilo_booter | i guess i can try creating a usb boot disk for the latest 22.04 to see if that boots - but i guess that puts me in the nouveau domain | 11:08 |
arighi | have you tried to boot a fresh ubuntu image, let's say from USB, just to see if the problem is the kernel itself or maybe a specific setting in your system? | 11:08 |
arighi | lilo_booter, great minds think alike :) | 11:08 |
lilo_booter | seems that way :) - thanks | 11:08 |
lilo_booter | ok - will break for a bit - few errands to run - but will continue in that line of attack this afternoon - thanks for the help so far arighi :) | 11:09 |
arighi | I'd try that anyway, it should be a fairly easy and quick test and if it boots we know that we need to dig in your system settings | 11:09 |
arighi | lilo_booter, np, thanks for reporting this | 11:09 |
lilo_booter | no worries | 11:10 |
lilo_booter | hmm - i see this - [Firmware Bug]: TPM Final Events table missing or invalid | 12:40 |
lilo_booter | device-mapper - duplicate ima measurements will not be recorded in the ima log | 12:42 |
lilo_booter | platform eisa.0: EISA: cannot allocate resource for mainboard plus same for EISA slots 1 through 8 - 0 eisa cards detected | 12:44 |
lilo_booter | acpi PNPOC14 01 through 04 duplicate WMI GUID | 12:45 |
lilo_booter | usb: port power powermanagement maybe unreliable | 12:46 |
lilo_booter | usb 3-4: config 1 has an invalid interface number: 2 but max is 1 | 12:46 |
lilo_booter | GPT:primary header thinks Alt. header is not at end of disk - use gnu parted to correct - against sda which is main hd i guess | 12:50 |
lilo_booter | this is running from a second internal drive - 5.15.0 came up - but it dumped me in a recovery shell at the end of the boot | 12:55 |
lilo_booter | trying from fresh usb now, booting by way of bios boot - selected manual override for the device - got grub menu, selected "try or install" - "republic of gamers" logo shows up, usb drive is continuing to be accessed - keyboard is unresponsive - cannot remove logo to see diagnostics | 13:17 |
lilo_booter | will leave it for a while, but does appear to be hung | 13:20 |
lilo_booter | yeah - usb boots up in safe graphics, hangs with normal boot | 13:33 |
lilo_booter | i'll simplify the system - unplug all usb except keyboard, mouse and drive itself - and will try to disable this annoying "republic of gamers" crap - defnitely not helping | 13:34 |
lilo_booter | disabled logo - black screen after accessing "try or install" and appears to have hung again - no diagnostics shown | 13:42 |
lilo_booter | i can try disabling the onboard network card - i also have a firewire card which i could take out i guess | 13:42 |
lilo_booter | arighi: i guess this usb behaviour might overlap with suggestion about noveau? | 13:44 |
arighi | lilo_booter, could be... we're planning to disable nouveau in the upcoming 6.8 kernel, hopefully it'll help to fix/debug issues like this | 14:18 |
lilo_booter | well, i started by disabling TPM in the bios - rebooted by way of second internal disk - unsuprisingly that removed the TPM firmware diagnostic, but otherwise behaved the same | 14:20 |
lilo_booter | but i do have a new one - mce: [hardware error]: cpu 0: machine check: 0 bank 27: faa00000000080b (may have miscounted 0s there) | 14:21 |
lilo_booter | the usb 3-4 config error still shows | 14:22 |
lilo_booter | hmm - how do i map a /dev/disk/by-uuid/xxxx-xxxx to a device/partition | 14:24 |
mdiewa | lilo_booter: use ls -l | 14:27 |
lilo_booter | hmm - thanks - weird though - on the boot from the second drive (which is running a very old 22.04 - can't recall why i installed it in the first place tbh), it's running a check on A840-C7E5 - it doesn't exist though | 14:33 |
=== fling_ is now known as fling | ||
lilo_booter | disabling network devices in the bios didn't help either | 14:52 |
=== NotEickmeyer is now known as Eickmeyer | ||
lilo_booter | ah - back to usb - try or install again - this time i edited the bootup to remove "quiet splash" and finally get a log - hangs immediately after detecting usb device with vendor 1d6b and product 0003 | 15:02 |
lilo_booter | well, i'm confused - that's a usb hub and i guess it must be internal to the machine as i definitely don't have anything plugged in | 15:18 |
lilo_booter | i've even swapped out my keyboard/mouse - same (though the bios now detects 1 keyboard, 1 mouse, 1 hub rather than 2 keyboards, 2 mouse, 1 hub) | 15:20 |
lilo_booter | there's a scary option in the bios to disable "usb device" - singular - no explanation of what it does - if ti disables all, then surely i would lose the keyboard and mouse necessary to re-enable them? | 15:25 |
lilo_booter | i'm at a loss - don't get it at all - the fact that the usb drive fails to boot in non-safe mode suggests it's not down to settings in my install at least | 15:54 |
lilo_booter | ah - but ... it's probably worth checking 24.04 to see if it has the same issue | 15:55 |
lilo_booter | 24.04 running under qemu at least - good start :) | 16:15 |
lilo_booter | but almost identical behaviour on boot (except it doesn't hang after detecting the usb hub - just reboots the enitre machine) | 16:20 |
lilo_booter | and safe mode no longer works - same reboot | 16:22 |
lilo_booter | and actually, qemu doesn't seem to get passed the initial boot - hangs when starting the gui (i got impatient and killed it, so may have eventually recovered) | 16:27 |
lilo_booter | unsure how to continue - any suggestions would be very welcome :) | 16:32 |
lilo_booter | (and no worries if you don't feel like parsing the text above - quite happy to repeat, reprhase and retest) | 16:34 |
lilo_booter | tried 24.04 under virt-manager - not altogether surprised that its working there :) | 16:51 |
arighi | lilo_booter, hm... it's pretty hard to debug this from here without seeing any specific kernel errors, it seems to be related to some issues with your specific hardware, if the system reboots it's quite difficult to get some errors, maybe in the case when it's hanging you can try to boot with initcall_debug to have a better understanding of where it gets stuck | 16:53 |
arighi | or try the usual "safe" settings, like boot with "noapic nolapic", even if for what I read it seems to be related to some kind of usb issues...? | 16:53 |
lilo_booter | i suspect usb, yeah, but could be a red herring - certainly turning up logging seems like a good idea - from the hang and the lack of journalctl, i'm not sure what we'll get - but mostly, i don't know where i specify any of these settings :) - guessing i can edit the grub command to use them though? | 16:57 |
lilo_booter | tried your noapci nolapic suggestion on the 24.04 version - ends in a kernel panic | 17:09 |
lilo_booter | i did see that it mentioned firewire before the panic (usb stuff was above), so i will definitely take that card out next | 17:11 |
arighi | lilo_booter, oh yes, put them in /etc/default/grub (after quiet splash) then `sudo update-grub` and reboot, if you have both a good and a bad kernel installed, otherwise if you boot with an image you need to add the extra boot options directly from grub (editing the entry in the grub menu when it's booting) | 18:04 |
lilo_booter | firewire card removed - system reboots into 22.04 current kernel :D | 18:23 |
lilo_booter | jeez - i am happy that it's as simple as that (though i am somewhat sad to be forced to remove the card :() | 18:24 |
arighi | lilo_booter, yay! nice debugging session :) at least we know where the problem is, what kind of fireware card is that? | 18:29 |
lilo_booter | not a lot on the card - "made in china" (no surprise there) - asmedia (possibly followed by the letters lp2 but written in a weird way such that it could be lp squared iyswim) and asm1083 on one chip (old bugger here - need magnifying glasses to read the rest) - there's also a via chip vt6307 made in taiwan - manufactured/passed qa 2021 (which is definitely around the time i bought it) | 18:35 |
lilo_booter | will happily replace if someone can point me at a compatible card (but do want such a beast - have number of old dv tapes ans devices here...) | 18:38 |
lilo_booter | plugging all the usb devices back in now :) | 18:44 |
lilo_booter | hot plugging worked at least - nice to have sound again | 18:47 |
JanC | lilo_booter: if that is a PCIE device there should be some information about it in old boot logs (with a working kernel) | 20:40 |
JanC | stuff like what driver was loaded for it etc. | 20:40 |
lilo_booter | JanC: cool - will reinstall and reboot on the working kernel tomorrow - will let you know what i find | 21:22 |
JanC | lilo_booter: I'm no kernel dev, just pointing out where you might find more info than on the card itself sometimes | 21:23 |
JanC | and there should probably be previous boot logs from the working kernel still | 21:24 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!