=== queue is now known as queue- | ||
adac | I try to install the grub again to my disks in a chrooted environment, following basically these commands here: https://blog.michael.franzl.name/2014/01/29/remote-server-hetzner-rebooting/ however for one disk I get the error grub-install: error: unable to identify a filesystem in hostdisk//dev/nvme0n1; safety check can't be performed | 20:54 |
---|---|---|
adac | the problem then here is that my machine wont boot anymore since that fails | 20:55 |
adac | any ideas what I can do about this? | 20:55 |
tomreyn | "hostdisk//dev/nvme0n1" is obviously wrong, and i've never seen this. how did you install? | 20:58 |
tomreyn | and which version of ubuntu server is this? | 20:58 |
adac | tomreyn, actually at the moment I'm on the rescue system of the provider and I have chrooted into my machine wichh is ubuntu 18.04 | 21:00 |
adac | This was a raid and I had to let change one disk | 21:00 |
tomreyn | and this recue system is compatible to ubuntu 18.04 ? | 21:01 |
tomreyn | oh you chrooted so you're using 18.04 tools | 21:01 |
adac | tomreyn, good question. usually when this happened that the grub was missing, which happend more often before, this used to work | 21:01 |
adac | tomreyn, how can I see if it is compatible? | 21:02 |
tomreyn | there's no a simple check to tell | 21:02 |
tomreyn | but if you chrooted to the existing ubuntu 18.04 installation with the same boot mode it usually boots in, and with all virtual file systems bind mounted into it, you should be able to recover it if the kernel of the recovery system is not extremely old | 21:03 |
tomreyn | try this https://askubuntu.com/a/831241 | 21:04 |
tomreyn | you'll need to get your SW RAID ready as the first step, though | 21:05 |
tomreyn | disabling secure boot in the systems' bios and upgrading the bios can make things easier, especially if its old. | 21:06 |
adac | tomreyn, actually back when this happened before on other machines after a disk swap, I did not had to get the SW raid ready before. I just went into the rescue system and re-installed grub | 21:07 |
adac | not sure why this time it fails | 21:07 |
tomreyn | journalctl -b | grep DMI: tells you about board and bios version. hetzner provides bios upgrade tools on their network share, and you can access the system bios fro their KVM (which is free for 2 hours= | 21:07 |
adac | journalctl -b | 21:08 |
adac | -- Logs begin at Thu 2022-02-03 04:27:42 CET, end at Sun 2022-02-20 20:02:29 CET. -- | 21:08 |
adac | -- No entries -- | 21:08 |
adac | tomreyn, ok thanks | 21:08 |
tomreyn | you must be working on the rescue system still | 21:09 |
tomreyn | dmesg | grep DMI: then | 21:09 |
tomreyn | please use a pastebin to share multi-line outputs | 21:09 |
adac | https://pastebin.com/HrBJXnYh | 21:10 |
tomreyn | first line of this output is your mainboard + bios version | 21:11 |
adac | tomreyn, ok I see. It might be worth upgrading then | 21:12 |
tomreyn | you may want to check this with support, too. | 21:13 |
adac | tomreyn, yes I will ask them. It is strange behaviour | 21:13 |
tomreyn | that's a customized (for hetzner) gigabyte desktop board you have there | 21:14 |
tomreyn | https://www.gigabyte.com/Motherboard/B360-HD3P-rev-10/support#support-dl-bios would be the closest, but the release date given for version F4 there does not match up with the build date of yours | 21:24 |
adac | tomreyn, I think I could ask them that they would upgrade it for me | 21:25 |
tomreyn | possibly. i don't know whether it's actually needed, just suggesting it generally. | 21:28 |
tomreyn | it's old,though, they should probably do it without asking | 21:28 |
adac | tomreyn, yes I see. Yeah they usually just do when there is a problem | 21:28 |
adac | Im doing a backup in the rescue system of my virtual machine images first now | 21:29 |
adac | then I need to see how to proceed | 21:29 |
tomreyn | you said this problem occurred after / due to a disk swap. i'm assuming the disk that was swapped contained the (single copy of the) ESP, and you had a mirror md raid spun across other partitions on this and the other nvme storage (which you still have one leg of, just without the esp) | 21:32 |
tomreyn | if you re-create an esp there and re-populate it, you could possibly boot from there also without the raid restored, but in failed state. | 21:33 |
tomreyn | there are ways to keep a backup copy of the active ESP on another storage (such as the one which contains the other leg of a RAID-1), if you want to consider this in the future | 21:35 |
adac | Yes that could be the reason | 21:37 |
adac | what would that mean booting in failed state? | 21:38 |
adac | usually the raid is off already since before the disk swap one has to remove the disk from the raid | 21:38 |
adac | which I also did this time | 21:39 |
adac | so back then I was always able to enter the rescue system, install the grub again, then reboot and it would boot | 21:40 |
adac | afterwards i put the disk back into the raid | 21:40 |
tomreyn | if this is a raid-1 array then it has to have a minimum nuumber of 2 devices. if you only have a total of two then removing one would bring the array into failed state, from what i rmember. | 21:42 |
tomreyn | the difference between earlier recoveries of this type and now is probably that earlier, the disk that was affected was the one without the ESP on it, but this time it was the one with the ESP on it, so you lost that | 21:43 |
tomreyn | i'm still guessing, though, don't actually know how you boot or manage the ESP | 21:43 |
adac | tomreyn, https://pastebin.com/R7dNyAjA see that is that array you probably mean. That was right after i removed nvme0n1 from the raid | 21:44 |
adac | tomreyn, ok I see. How can I check if the ESP is actually there? | 21:45 |
tomreyn | '(F)' means failed device | 21:45 |
tomreyn | the _ in '[_U]' means you're missing the first leg of this two device array | 21:46 |
adac | Yes exactly the one that I moved out of the raid | 21:47 |
tomreyn | fdisk -l should list all partitions and their flags, if there's an ESP there it should show | 21:47 |
adac | tomreyn, actually thisis the situation: https://pastebin.com/CS5ECmfM | 21:51 |
adac | searched for esp string. Not found | 21:51 |
tomreyn | hmm, do you actually boot in efi mode, though? | 21:53 |
tomreyn | those are mbr/dos partition tables | 21:53 |
adac | tomreyn, I'm afraid I do not really know | 21:55 |
tomreyn | /dev/nvme1n1 contains a 512 MB partition, nvme1n1p2, this could be /boot, could also be ESP, but not on an mbr partitioned disk, so you probably boot in bios mode | 21:55 |
tomreyn | is sda the recovery system? | 21:56 |
adac | https://pastebin.com/n1GKYWkZ thisis the current situation in the rescue system after the steps in https://blog.michael.franzl.name/2014/01/29/remote-server-hetzner-rebooting/ | 21:57 |
adac | sda is an additional disk actually that was added | 21:57 |
adac | since I run out of space for my logs | 21:58 |
tomreyn | i see. i guess in your situation all you may need to do is to re-create the missing partitions on nvme0n1 to match those of nvme1n1 and then to install grub to both /dev/nvme0n1 and /dev/nvme1n1 | 22:02 |
tomreyn | and then, either before or after reboot, to resynch the raid by re-adding the nvme0n1 raid members | 22:03 |
adac | tomreyn, Yes that sounds about right | 22:03 |
adac | tomreyn, thank you so much man | 22:04 |
tomreyn | you're welcome | 22:04 |
tomreyn | this guide you're following is not bind mounting /dev/pts into the chroot. you may need to do that also | 22:06 |
tomreyn | hmm, no, you probably wont need it | 22:07 |
adac | I just finish the backup, just to be save. I have backups, but not up to date one of the whole virtual machines. then I will try to create these partitions | 22:10 |
adac | can I somehow see the exact partitions of rhe working disk and maybe just use exactly thes? | 22:11 |
adac | sfdisk -d /dev/nvme1n1 | sfdisk /dev/nvme0n1 | 22:13 |
adac | Copy over partition table from /dev/nvme1n1 to /dev/nvme0n1 | 22:13 |
tomreyn | i guess this can work. might fail for the last partition if that's larger on the existing disk than there is space available on the new one | 22:17 |
adac | tomreyn, kk I will try afterwards | 22:18 |
adac | I#m just wondering. this usually happened after the disk was changed and when i was already booted into the system | 22:19 |
adac | then, before putting the disk into the raid, we copied over the partition tables with a script | 22:19 |
adac | maybe on the other disks, they were already there | 22:20 |
tomreyn | that seems unlikely, unless you had been using those disks before | 22:20 |
adac | not sure tbh. I will finish the backup then lets try it out :D | 22:21 |
adac | aaaaa these disk issues a are a nightmare! | 22:22 |
tomreyn | good luck on that, i'll make that a fortunate outcome in my dreams. | 22:26 |
adac | tomreyn, hehe thanks!! Have a good night!! :-) | 22:28 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!