/srv/irclogs.ubuntu.com/2020/09/13/#ubuntu-server.txt

=== JanC_ is now known as JanC
=== Wryhder is now known as Lucas_Gray
=== StathisA_ is now known as StathisA
=== vlm_ is now known as vlm
fretegimorning17:58
fretegianyone around to help with a raid issue?17:59
fretegihave a raid 1 array, that i just cannot seem to properly repair.  so thought is to make a new array, degraded.  mount it, copy data over, then add a second disk18:00
RoyKfretegi: one drive down?18:29
fretegiRoyK, yea so funny story, parted was ran against 1 drive in the array18:30
fretegiRoyK, and no matter what when i re-add the drive, raid then shows active with both devices, but wont survive a reboot18:31
RoyKfretegi: did you try to remove the bad-ish drive and zero its superblock and then add it again?18:32
RoyKas in no --re-add18:32
RoyKiirc --remove will remove the superblock, so probably not needed to --zero-superblock18:33
fretegiRoyK, so what ive done this time is, on the device that was removed.. i zero'd the superblocks, ran DD for the first 1024, ran parted to make a new label and new partition scheme, built a new degraded raid device, mounted and am rsync'ing data over now from the working device in the old raid.  then zero that drive and add to the new md118:33
RoyKfretegi: can you check and pastebin 'mdadm --examine /dev/sdX' where x is the name of each member of the raid?18:33
fretegiRoyK, yea i did.. zero'd superblocks, tried to remove device, updated the mdadm.conf.  all this 5x or so.  same thing.. it would create add the device, rebuild raid.  show active..  then die on reboot18:34
RoyKfretegi: also, have you checked smart data?18:34
RoyKsmartctl -a /dev/sdX18:34
fretegiRoyK, yup checked smart data, both involved drives are good18:34
RoyKand /etc/mdadm/mdadm.conf is fine?18:34
RoyKand initramfs is updated with its contents?18:34
fretegiRoyK, yup, mdadm was even updated after the last 2 tries to just make sure the proper UUID was referenced..18:35
fretegiRoyK, ah... i did NOT update initramfs...18:36
fretegiis that necessary if the md device in question is only a data logical volume, no OS related data on it?18:36
RoyKupdate-initramfs -u18:37
RoyKupdate-initramfs -u -k all18:37
RoyKperhaps if you want to update it for all installed kernels18:37
RoyKbut the former should do as well18:37
fretegiright but is that needed if the raid does not contact OS partitions?18:37
fretegierr... logical vols18:37
RoyKprobably not, but it won't hurt ;)18:38
fretegigood thinking18:38
RoyKfretegi: you can generate the mdadm.conf stuff with mdadm --detail --scan18:38
fretegiRoyK, that is exactl what i did18:38
fretegiRoyK, so when remove drives from a mdadm raid, is there anything else you have to do besides zero out the superblocks to make the drive available for use again?18:39
RoyKfretegi: can you please pastebin output of 'cat /proc/mdstat' and 'mdadm --examine' for both drives first?18:40
fretegiRoyK, i was fiddling wth that disk and no matter what mdadm would still referense evidence of a prior xfs filesystem, but yet could not build the damn array18:40
fretegiRoyK, sure, but hte second drive is now part of a second degraded array18:40
RoyKhm - that's weird18:41
fretegiRoyK, https://dpaste.org/wERH18:42
fretegiRoyK, no i did that intentionally...  seemed everyone was gone for Sunday so i just decided to take another approach.. made a new degraged array using the disk that has been a pita, copy the data over, remove old array and add that disk to the new array.18:43
RoyKhm18:43
RoyKon which one is the rootfs now?18:43
fretegiRoyK, md0 old array, md1 new array, rootfs not on either18:43
RoyKany data on them at all?18:44
fretegihttps://dpaste.org/yysq18:45
fretegiRoyK, oh yea, data perfectly intact in md0 (and backed up).  currently rsync'ing to md118:45
RoyKfretegi: good, but you can probably stop that rsync18:46
RoyKyou won't be able to add sdc1 to md0 as member disk, it's too small18:47
RoyKbetter remove that partition and try with the full disk instead18:47
RoyKthey should be the same size18:47
fretegiRoyK, so that leads me to where i am now.... several people in here mentioned that have the md device set up on the full disk and not a partition was a problem18:48
fretegithat was the primary reason for my saying heck with it and just building a new array...18:48
RoyKbut md1 is your new array?18:49
fretegiright18:49
RoyKwhich is on a partition18:49
fretegiusing a partition, because the old array was just on the full disks, and folks were saying that was a bad approach18:49
fretegialthough this raid array been working fine for like 5 years18:49
RoyKAFAIK the only reason to use a partition is to have grub work with it, since that can be rather troublesome without it18:50
RoyKfretegi: itæs *not* a bad approach18:50
RoyKpartitions aren't needed18:50
RoyKs/itæs/it's/18:50
fretegiRoyK, see that is exactly my understanding, unless of course there was some reason u did need a partition, but mdadm was not that reason18:51
fretegiso to back up a tick...18:51
RoyKI generally use partitions if I 1: need to boot off the device and thus need grub or 2: for some reason can't use lvm18:51
RoyKso my typical raid is a bunch of disks with raid put on top of the disks directly and then lvm on there and lastly, xfs (or perhaps ext4 if it's smaller stuff or somewhere I might want to shrink the fs) on top18:52
RoyKI've used this approach for work and home machines for at least a a decade - works well18:53
fretegiRoyK, originally i had md0 raid 1 built on 2 entire disks (sdb & sdc).  The array is just for a data volume, all os componenets on another disk.  the wrong disk had parted ran against it (sdc) which of course took it out of array.  md0 has lvl with xfs.  i re-added that sdc to md0 5 or 6x, wont survive boot.  so i said hell with it and made a new degraded array from sdc, am copying data over and intend to break down md0, adding disk18:55
fretegisdb into md118:55
RoyKwith "smaller", I mean less than say 5TiB or something I know won't need to grow and/or is sufficiently fast for fsck to finish within a short while (which isn't necessarily what happens with large ext4 filesystems)18:55
RoyKdid you re-add or add? also, did you add sdc1 or sdc?18:56
fretegiok so very simmiliar setups to what i am doing here18:56
RoyKbetter remove those partitions before you add the whole drive, though - I've seen restovers hanging around at some cases and that's not pretty18:56
fretegiRoyK, so at that time, sdc had no partitions.  since building a new array, and many people in here criticized that lack of a partition, i created one for this array18:56
fretegiRoyK, well the partitoin is the entire disk18:57
RoyKwhoever critisized you for that should have his or her head examined by moths18:57
fretegiso was gonna partition sdb and add to md118:57
RoyKjust don't use partitions18:58
fretegiRoyK, yea made no sense to me, but was like 4 people in here lol18:58
RoyKI really can't understand why - partitions have no function there18:58
RoyKall md needs is a chunk of storage18:58
fretegiRoyK, so am i better off trying to re add sdc to md0? just takes hours to rebuild and hate to waste that all over again to not have it survive a reboot19:00
RoyKso, ok - my advice: the working 1-drive mirror of md0 is the one with data, right? if so, stop md1 and zero the drive's superblock and better dd some zeros on it as well. Add it to md0 and wait for it to resync. While waiting, run mdadm --detail --scan to get the config line for mdadm.conf and copy it or redirect it into the file. run editor /etc/mdadm/mdadm.conf and remove whatever leftovers from old19:01
RoyKstuff there. update initramfs as mentioned above and wait for resync to finish. this should do it.19:02
fretegiRoyK, do i have to do anything else to rmeove md1?19:03
RoyKmdadm --stop /dev/md119:04
fretegiRoyK, no i get that, i mean between stopping it and pulling any reference from mdadm.conf, thats all i need to thoroughly delete md1 right? just dont want this thing to try to spin up md1 later is all19:05
RoyKdoublecheck that all data is in place and mdadm --zero-superblock /dev/sdX (was it c?) and remove the partition table, preferably with dd if=/dev/zero of=/dev/sdX bs=1M count=1k or something - that one writes a gig of zeros, so overkill, but what the hell19:05
RoyKfretegi: let's fix mdadm.conf when you have added the last drive first19:05
fretegiRoyK, ok, md1 stopped, lvm volumes etc. that were mounted are gone, dev/sdc superblocks zero'd dd'd 2k count on sdc.. mdadm.conf has md1 references #'d and heres some output19:13
fretegihttps://dpaste.org/YpcS19:13
fretegiso just grow md0 and add sdc?19:14
fretegiRoyK, data confirmed to still be active on md019:15
RoyKfretegi: add sdc first19:15
fretegiRoyK, since it shows no unused devices dont i need to grow it?19:17
fretegiRoyK, md0 that is19:17
RoyKjust add the new one first and it'll become a spare19:18
fretegiRoyK, done https://dpaste.org/gJxU19:19
fretegiRoyK, mdad --detail https://dpaste.org/U05w19:20
RoyKgood - mdadm --grow /dev/md0 --raid-devices 2 # iirc ;)19:21
fretegiRoyK, building https://dpaste.org/qT5Y19:22
fretegiRoyK, mdadm --detail --scan outputs the same UUID as what is already in the mdadm.conf19:24
RoyKthat is correct19:24
RoyKjust update initramfs, then19:24
fretegiRoyK, actually isnt that mdadm.conf line supposed to reference the # of devices?19:26
fretegioutput shows the spare drive still, guess because its rebuilding19:26
RoyKfretegi: pastebin /proc/mdstat again19:27
fretegihttps://dpaste.org/UYZ419:28
RoyKok, finished in two and a half hours19:28
fretegiright19:29
fretegithen update initramfs19:29
fretegiand should be good19:29
RoyKnah - it should be fine now19:29
RoyKthe uuid won't change19:29
fretegithen why was htis thing not started on boot before?19:29
RoyKjust remove old references to old arrays and then add the current one19:29
fretegihave not done anything different on this go then the last 5x19:29
RoyKI have no idea :)19:29
fretegiremove from mdadm.conf?19:30
RoyKI just followed my own playbook on how to debug these sort of things19:30
RoyKyes, just remove those arrays defined there and take the output from mdadm --detail --scan and add it to the end (perhaps with >>)19:30
fretegiRoyK, oh i get it, and your process exactly lines up with my udnerstanding... i just could not get the damned md0 to start on boot, couldnt figure out why19:30
fretegiRoyK, even when the output still shows a spare?19:31
fretegihttps://dpaste.org/x1NT19:31
RoyKoh - remove that part19:31
RoyKjust remove "spares=1 "19:32
RoyKbut you could run it again later if you like, but it'll probably show the same19:32
RoyKwithout that spare19:32
fretegionce fully recovered19:32
RoyKgood luck :)19:33
freteginow we just wait and see i suppose ha19:33
fretegiletcha know in 2.5 hours19:33
fretegii mean there is nothing i need to do to have this thing start on boot right?19:33
RoyKI'll probably be awake :)19:33
RoyKnah - not really19:34
fretegiyea thats what i thought...  so weird19:34
fretegiand wost of it was... md0 wouldnt start as a device was missing, which means that LVM couldnt load the fs on md0, but oddly that prevented a boot...  guess it caused LVM to freak out and since i have /boot and / on lvm volumes, just in a dif. group, machine would not boot19:36
fretegiout comes the live cd, mounting all the file systems blah blah19:36
RoyKpartitions were invented in the ightees when filesystems didn't support big drives (such as MSDOS 3.3's max 32MB). AFAIK they are still needed for grub to work, but that might change over time or perhaps has already. We have stuff like LVM today that does this *way* more flexible19:36
fretegilvm is awesome, just never had it choke like this lol19:37
fretegiand didnt think it choking on my data volumes would take the system down19:37
fretegiwell... prevent a boot anyway19:37
RoyKthere's a kernel setting to allow boot degraded19:40
RoyKimho it should be on by default19:41
RoyKwhich distro is this?19:41
fretegiubuntu 16.0419:41
RoyKhm19:43
RoyKthat seems to be fixed https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/163504919:44
ubottuLaunchpad bug 1635049 in mdadm (Ubuntu Xenial) "Boot fails with degraded mdadm raid" [High,Fix released]19:44
fretegiin 16.04?19:44
RoyKfretegi: the bug is for 160419:44
fretegiRoyK, ah gotcha where there ya have it ;)19:44
RoyKfretegi: pastebin output of lsb_release -a19:44
fretegiNo LSB modules are available.19:45
fretegiDistributor ID:Ubuntu19:45
fretegiDescription:Ubuntu 16.04.7 LTS19:45
fretegiRelease:16.0419:45
fretegiCodename:xenial19:45
fretegisorry, thought would be one line19:45
RoyKnp19:45
RoyKbut that should be updated nicely19:45
fretegiyea shes old19:45
fretegihttps://dpaste.org/VGEq19:48
fretegiall looks good19:48
RoyKand nothing nasty in dmesg?19:52
RoyKpreferably dmesg -T if 16.04 supports that flag19:52
fretegi[Wed Sep  9 16:32:06 2020] cgroup: new mount options do not match the existing superblock, will be ignored20:05
fretegiactually u know what... i shrank that raid to just 1 device before this most recent reboot..  DMESG doesnt go back far enough now to see the issue20:07
RoyKfretegi: then /var/log/kern.log.something should show it20:21
RoyKor whatever that was called in 16.04 ;)20:21
RoyKfretegi: still running?20:33
fretegiyup another 100 min20:37
RoyKit usually slows down towards the end - it's about the double amout of sectors per track on the outside compared to the inside of the disk, so half the speed, since the spin rate is the same20:39
fretegigotcha20:42
fretegigonna run an errand while this is building, bbiab, appreciate all the help buddy!  see ya soon20:43
fridtjof[m]Alright, finally found time to go after the qemu-img issue again!21:46
fridtjof[m]sarnold: so far, i can also reproduce with upstream qemu-img. Time to bisect!21:47
RoyKfretegi: how's it going?22:25
fridtjof[m]found the bad commit! 34fa110e424e9a6a9b7e0274c3d4bfee766eb7ed23:15
fretegiRoyK, rebooting now23:38
RoyKfretegi: did you update initramfs first?23:38
fretegiRoyK, yup and same thing23:39
fretegidamn md0 is inactive23:39
RoyKdamn23:39
fretegihttps://dpaste.org/bQkv23:40
RoyKare you in the rescue thing?23:40
fretegiseriously have no idea what the heck is the deal here23:40
RoyKfretegi: does lsblk show the other drive?23:40
RoyKit really shouldn't be (S) anyway23:41
RoyKif a disk fell out, well, no big deal23:41
RoyKfretegi: have you considered updating the kernel?23:41
RoyKit might be that there's a stone old bug around that noone bothers to fix23:41
fretegihttps://dpaste.org/9rLd23:42
fretegishows the 2 disks23:42
RoyKok, and mdadm --examine for those?23:42
RoyKand pastebin output of uname -a as well23:43
fretegihttps://dpaste.org/B3iP23:43
fretegisee sdc all eff'd up, you saw the post when it was building.. i sent --examine output and all was well23:44
fretegihttps://dpaste.org/N5P823:44
RoyKhm - never seen that23:45
fretegiso im thinking either 1 i just nuke md0 start over from backup (but seems kinda like cheating, we are linux guys afterall) or degrade md0, make md1 with sdc only, copy data over, then nuke md0 and add sdb into md1..23:46
RoyKanything in BIOS flagging sdc as a raid drive or something?23:46
fretegitheres a thought, have not looked, but have not made any bios changes either tho23:46
fretegipita to confirm, not easy to get a screen hooked up to this thing23:47
RoyKtake a look or just reset to defaults and turn off anything that looks like raid in there23:47
fretegibut never made a bios changes so...23:47
fretegishould all be ahci23:47
RoyKI don't know - I can just guess23:47
RoyKbut it seems like the superblock has been overwritten by something and that something is either the BIOS or some nasty virus of sorts - no idea23:48
fretegiwell the way i broke it was by simply running parted23:49
fretegimade it a gpt label23:50
fretegiis that wrong23:50
RoyKnot sure it's relevant, but http://mbrwizard.com/thembr.php23:50
RoyKas I said - there's no need for partitions23:50
RoyKbe it MBR or GPT23:51
fretegiso i ran parted after dd'ing the drive, was that wrong?23:51
RoyKbut try to hook up a monitor to that thing and check BIOS settings23:52
RoyKfretegi: I can't understand why you'd want to do that23:52
RoyKanyway - I guess parted maybe asked you if you wanted to save your changes or similar?23:52
fretegiRoyK, well, superblock is gone.. so ave to reassemble anyway right?23:53
fretegias a best case23:53
fretegihave to reassemble*23:53
RoyKif you want a partition table there, add one, but then you'll need to rsync the stuff over to the new array23:54
RoyKI'll suggest not using a partition table at all, since you don't need it23:54
fretegii mean to fix this array..  to get this disk back in... gonna have to reassemble anyway23:54
RoyKthen first try to reassemble the raid with the working drive23:55
RoyKmdadm --assemble /dev/md0 after mdadm --stop /dev/md023:55
fretegiyea got that already23:56
RoyKit should tell you 'assembled with 1 out of 2 drives' or something like that23:56
fretegimdadm --assemble --scan didnt work.. but i could force it with mdadm --assemble /dev/md0 /dev/sdb -f23:56
fretegithen started with just sdb23:56
RoyKthe zero the first megs on the target drive (not the one in the raid, it could be rather messy) and try to add it. better check lsblk /dev/sdc to check23:57
RoyKor just mdadm --examine23:57
fretegiso i dont have to shrink md0?23:58
fretegiwont let me fail or remove sdc, doesnt even see it as a valid raid device23:58
fretegiRoyK, kinda curious...  what if i nuke the superblock and dd the first few k bytes of sdc.  make md1 degraded and reboot.. see if it starts md1 on boot23:59

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!