=== markthomas is now known as markthomas|away === Lcawte is now known as Lcawte|Away [01:47] Anyone have any experience rebuilding broken arrays in Mdadm? I had a drive completely die and now when I boot it opens in an "initramfs" prompt. I was told booting from a disc/usb would at least get me to root, but not sure how to safely rebuild array [02:09] How broken is broken? [02:09] Degraded pr dead? [02:09] *or [02:12] maybe it didn't start degraded for some reason? [02:12] not automatically [02:19] it is dead [02:19] trying to load degraded array is ineffective [02:19] just brings me to initramfs prompt [02:19] whether I select y/N [02:20] so, you remove the faulty disk from the array and it still doesn't start if you tell it to start degraded? [02:20] exactly [02:21] I hope you have backups... [02:21] I have the OS on a SSD drive so not sure why that is [02:21] I would think it would at least load to root even if it cant mount array [02:22] oh no, that doesnt sound promising [02:22] no backups LOL [02:23] what if you mount the SSD manually? [02:23] from the initramfs prompt? [02:24] I was thinking about trying to boot from a USB to get to root [02:24] BTW: did you try to exit from the initramfs prompt? [02:25] didnt know that was an option [02:26] saying I'm inexperienced in Ubuntu would be to put it lightly... my first run at Ubuntu and running an array [02:27] would I just do a change directory command to exit out of initramfs? [02:27] IME it will (try to) continue booting [02:27] no, you press Ctrl+D or you run exit [02:28] sometimes that works if you ended up at an initramfs prompt because of a time-out [02:28] will give it a shot real quick [02:29] obviously, it's not going to solve any RAID problems [02:30] Assuming that this is ubuntu [02:30] Try typing "exit" into the initramfs [02:31] It's probably dropping into initramfs because it's asking if you want to boot degraded and (being a server, assuming headless) you don't tell it to boot degraded fast enough and the operation times out [02:32] I hit Ctrl+D [02:32] its checking drives now [02:32] Uh [02:32] Did you do the disk check [02:32] Or is the disk check automatic [02:32] automatic [02:32] If you invoked the disk check CANCEL IT RIGHT THE FUCK NOW [02:32] how do you cancel? [02:33] If it's automatic, nevermind, let it run [02:33] LOL wasnt an optino [02:33] just started checking [02:34] Considering that it's running I'm thinking that it's just degraded and not broken [02:34] okay that brought me to root [02:34] was able to login with no issues [02:34] Okay, so you're now at console I assume [02:34] yes [02:34] can you do this command? [02:35] Wait, before executing commands [02:35] What drives are on the system that are managed by mdadm? [02:35] What is /dev/sd[X] where X are the drives in the RAID? [02:36] there are 3 2TB drives, sda1, sdb1, sdc1 I believe [02:36] Ok. [02:36] first step, backup existing superblock [02:36] Please run the following commands: mdadm --examine /dev/sda1 >> raid.status [02:37] mdadm --examine /dev/sdb1 >> raid.status [02:37] and mdadm --examine /dev/sdc1 >> raid.status [02:37] unfortunately that is the issue, so I opened up the box to add more memory and low and behold when I rebooted it had a dead drive [02:37] Even so [02:37] so no memory available for backup [02:37] iDealz: so the SSD is sdd then? [02:37] sdd isnt on array [02:37] The SSD is SDD I assume? [02:37] *ssd [02:37] Then save the superblock to wherever the SSD is [02:38] It's a very small file [02:38] A 4 GB flash drive should be more than enough to save it [02:38] ah ok [02:38] Hell a 1 GB flash drive is more than enough to save it [02:38] so run the examine commands first? [02:38] No [02:38] a 256 kB flash drive is ... (etc.) :p [02:38] Mount the backup location [02:39] Then run the command so that it saves the raid.status file there [02:39] following the logic, but dont know the commands [02:39] <-- feels like an idiot [02:40] mdadm --examine prints the contents of the metadata stored on the device [02:41] >> raid.status saves the printed metadata to a file named raid.status which will be saved in the current working directory [02:41] So what you'd do is mount the backup location (SSD, flash drive, whatever), CD to that backup folder, then run the commands [02:41] okay so the command you gave above is backing up the superblock? [02:41] Backing up to to rebuild the superblock, yes [02:42] Backing up how to rebuild the superblock, yes [02:42] the SSD should be mounted automatically I would imagine [02:42] Yeah, you can backup to that [02:42] The advantage of backing up how to rebuild the superblock is that you see what commands went into building the superblock in the first place, which gives you an idea of how the system is configured [02:42] okay so good to run mdadm --examine /dev/sdb1 >> raid.status ? [02:43] As long as your current directory is the SSD, sure, [02:43] Hmm. [02:43] I think it would be better if you ran this command instead [02:43] does it matter what directory I'm in? [02:43] Yes it matters [02:44] Because the current directory you are in would be the location of where the raid.status fill will be saved when you run that command [02:44] Thus if you're in the RAID array you're essentially saving the backup back to the array [02:44] you probably want ">" instead of ">>" ? [02:44] Anyhow, run this command instead: mdadm --examine /dev/sd[abc]1 >> raid.status [02:45] So that you run the command only once instead of three times [02:45] okay [02:45] and I'm not in the array so should be good there [02:46] Once it's done please open raid.status and pastebin the contents here [02:50] Is it done? [02:51] "pastebin" meaning you put it on a sit like paste.ubuntu.com :) [02:51] yep, had to restart router... server wasnt connected [02:52] Ah [02:56] okay so sda1 isnt connect atm due to it being dead [02:56] *connected [02:56] I get the following response: [02:57] mdadm: no md superblock detected on /dev/sda1 [02:57] did it still run the examine? [02:57] looks like it file is there [02:57] okay will pastebin [02:59] blah, is there a quick way to pastebin from terminal? [03:00] @iDealz: So /dev/sda1 is the broken drive I assume? [03:00] Do you have a spare drive that you can swap into sda1's slot? [03:01] As for pastebin from terminal, there's a package for that, sudo apt-get install pastebinit [03:01] Then cat raid.stats | pastebinit [03:02] yes @ the spare drive [03:02] and sda1 is the broken drive [03:02] Or you can also do cat raid.stats | curl -F 'sprunge=<' http://sprunge.us [03:04] hmm it didnt like sudo apt-get install pastebinit [03:04] Ok, do the cat to sprunge instead [03:04] To pastebin from terminal (assuming you have curl installed), you can either do pastebinit or do | curl -F 'sprunge=<-' http://sprunge.us [03:05] cat: raid.stats: No such file or directory curl: (26) couldn't open file "" [03:06] raid.status [03:06] Not stats [03:06] Don't worry, we make typos too [03:07] okay now it just says curl: (26) couldn't open file "" [03:08] did it create the folder but not the file when it got hung up on the "no md superblock on /dev/sda1"? [03:09] perhaps I should've left the drive connected [03:12] hmm so the raid must be running in a degraded capacity as well... I can see this server and the raid contents on my network [03:12] hopefully that is a good sign [03:23] It should create the file even if it hung up on /dev/sda1 [03:23] But to be sure, run it like this: mdadm --examine /dev/sd[bc]1 >> raid.status [03:23] Sorry about the gap, had to handle an incoming ticket [03:27] Anyhow, once the superblock is backed up, please do the following: [03:27] a) Insert the new, working disk into the array [03:29] b) Run mdadm --manage /dev/mdN -r /dev/sda1 <-- Replace /dev/mdN with the name of the RAID array [03:29] b) Run mdadm --manage /dev/mdN -a /dev/sda1 <-- Replace /dev/mdN with the name of the RAID array [03:29] Sorry, that should be C) [03:29] d) mdadm --stop /dev/mdN <-- Replace /dev/mdN with the name of the RAID array [03:30] e) mdadm --assemble --run --force --update=resync /dev/mdN /dev/sda1 /dev/sdb1 /dev/sdc1 <-- Replace /dev/mdN with the name of the RAID array [03:42] Sachiru: sorry had stepped away [03:42] No problem [03:45] so I did nano raid.status and there is information in the file [03:46] but will run it with just bc as well [03:48] still the curl msg when I try to send it to sprunge [03:50] will hook up the new drive though and move ahead with your directions [03:51] is there a way to double check the name of the array? [03:52] want to make sure I get this right LOL [03:55] Uh [03:55] Could you do "ls -l /dev/ | grep [md]" and paste the output? [03:56] Sorry, grep md [03:56] Without the brackets. [03:57] yep 1 sec [03:58] brw-rw---- 1 root disk 9, 0 Oct 1 22:54 md0 === thumper is now known as thumper-afk [04:01] Ok [04:01] So your array is md0 [04:01] No other array on the system I assume? [04:02] nope [04:02] Ok, confirmed, your only array is md0 [04:02] Just run in sequence, then after step E [04:02] Run "cat /proc/mdstat" [04:02] so replace mdN with md0 [04:02] That should show you the progress of the resync [04:02] Yep [04:02] mdN with md0 [04:03] alright, need to shut the server down real quick to plug in new drive [04:03] Just so you understand, step detaches sda1 from the raid array, step b attaches the new drive [04:03] does it need to be formatted in an fashion before step a? [04:03] I mean step b and step c [04:03] No [04:04] No formatting at all [04:04] the resync (step e) will do the format. [04:04] okay [04:04] If you format it beforehand you run the risk of breaking the array even further [04:08] hmm perhaps sda1 is the SSD [04:08] Wait [04:09] Can you paste in the contents of raid.status [04:09] Here? [04:09] So that I can check? [04:09] yes [04:09] Please do [04:09] Did you already do step b? If so, stop [04:09] And paste the contents of raid.status first [04:09] Back in 15 minutes, lunch break [04:10] no, havent started on the steps, just hooked up the new drive [04:10] I'll be here [04:11] Ok [04:12] Paste the contents of raid.status here via pastebin, while I eat lunch [04:12] okay [04:21] http://pastebin.com/QSHvXKA6 [04:22] Back. Currently reading [04:23] Okay [04:23] Yep, sda is the SSd [04:23] ok, to be sure I did "sudo mdadm --examine /dev/sd[abcd]1 >> raid.status" [04:23] So array members are sdb sdc and sdd [04:23] So which disk is the broken one? [04:25] Could you please paste the output of "cat /proc/mdstat"? [04:25] And "mdadm -D /dev/md0"? [04:26] this looks like the line that might be important [04:26] output was long [04:26] md0 : active raid5 sdc1[1] sdd1[3] [04:26] Paste complete output into pastebin please [04:26] Then paste the pastebin link here [04:27] In the future, when I say "paste the output of X here please", I mean "upload the output to pastebin then copy the pastebin link here please" [04:27] here is the first one in its entirety [04:27] Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] md0 : active raid5 sdc1[1] sdd1[3] 3906763776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU] [04:27] wasnt that long afterall [04:27] Unless the output is just one or two lines [04:27] Ok [04:27] So it appears that sdb is the broken drive [04:27] Am I correct in assuming that the new unbroken drive is already in? [04:28] yes [04:29] Ok, please run these commands in sequence: mdadm --manage /dev/md0 -a /dev/sdb1 [04:29] Just that command [04:29] What's the output? [04:29] here is from mdadm -D /dev/md0 [04:29] http://pastebin.com/3aSnQ4sV [04:29] will do the last one now [04:30] mdadm: cannot find /dev/sdb1: No such file or directory [04:30] Could you please paste the output of "ls -l /dev/ | grep sdb"? [04:31] brw-rw---- 1 root disk 8, 16 Oct 2 00:09 sdb [04:35] Ok, please run this command: mdadm --manage /dev/md0 -a /dev/sdb1 [04:35] Sorry [04:36] Remove the 1, should be /dev/sdb [04:38] Is the command done? [04:38] mdadm: added /dev/sdb [04:38] was the output [04:42] Ok [04:43] Please run the following commands in sequence [04:43] ok [04:43] sudo mdadm --stop /dev/md0 [04:43] sudo mdadm --assemble --run --force --update=resync /dev/md0 /dev/sdb /dev/sdc1 /dev/sdd1 [04:43] Then paste the output here [04:43] problem with first command [04:44] mdadm: Cannot get exclusive access to /dev/md0:Perhaps a running process, mounted filesystem or active volume group? [04:44] Ah. [04:44] Could you unmount it first please? [04:45] how do I unmount? [04:45] Is it actively in use? [04:45] Stop everything that uses it first [04:46] ahh it is in use [04:47] so programs like sabnzb I need to stop prior to unmount? [04:47] Before stopping? Yes [04:47] You're not supposed to resync/reassemble a linux-raid drive while it is in use. [04:47] That's why I like ZFS so much [04:48] hmm [04:48] so I have a number of programs that start on boot [04:49] I believe they all reside on the SSD and some write to the array [04:50] Could you please paste the output of "cat /proc/mdstat"? [04:50] And "mdadm --misc --detail /dev/md0" [04:51] Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] md0 : active raid5 sdb[4] sdc1[1] sdd1[3] 3906763776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU] [>....................] recovery = 3.6% (70843128/1953381888) finish=325.1min speed=96500K/sec unused devices: [04:52] http://pastebin.com/vLC5pMy0 [04:53] I'm somewhat at a loss on how to stop the programs though... sabnzbd was running and I paused it through its web portal [04:53] Ah good [04:53] No need to stop the programs [04:53] It's starting to do the rebuild [04:53] okay [04:53] It should be done in about 5 to 6 hours [04:53] thats it? [04:54] Congratulations, you just fixed your first degraded RAID array! [04:54] LOL you just fixed my degraded RAID array. Thank you so very much Sachiru [04:54] No problem [04:54] One last recommendation [04:54] yes? [04:54] After 24 hours and during times of low to no usage [04:55] run a scrub on the array [04:55] To fully check the data on all drives [04:55] Just to be sure. [04:55] is running a scrub fairly simple? [04:55] Yes [04:55] I'm sure I can google and find the commands I dont want to take up any more of your time [04:55] It's simply a matter of running "echo check > /sys/block/md0/md/sync_action" [04:56] It will be quite I/O intensive however and you should expect some slowdown while running the scrub, so schedule it for off-peak hours [04:56] It should also take around 5-6 hours to complete. [04:56] okay, will start it before bed [04:56] If you're in the middle of a scrub and need to abort it, run "echo idle > /sys/block/md0/md/sync_action" [04:57] After running the scrub you can do "cat /sys/block/md0/md/mismatch_cnt". That should show how many errors were detected and fixed by the scrub [04:58] Have fun with your server! I accept paypal. [05:00] Thanks again Sachiru! and while you were likely kidding would be more than happy to paypal you a little token for your time [05:00] Nah it's cool [05:01] LINUX guru and a standup guy [05:01] thanks! [05:02] now for sleep [05:03] send him a postcard ;) [05:05] * JanC remembers back in the late 1980s / early 1990s some software was distributed as "postcardware"; you had to send the author a postcard of your town/area to be licensed to use it :p [05:07] (I bet most users didn't, but the author still got a huge postcard collection) [05:34] That sounds pretty cool === kickinz1|afk is now known as kickinz1 [06:23] Good morning. [09:07] hey all [09:07] lordievader: hey === thumper-afk is now known as thumper [10:19] lo I'm looking for help with Landscape alerts. I looked at a landscape server today and it had a misconfigured MTA. The mailq was 0 but fixing the MTA caused >1000 emails to be sent to a ticketing system. It just never seemed to run out of steam. Everytime the MTA is started Landscape just keeps sending piles of emails all alerting the same thing. [10:19] Is this expected or is there an alerts queue I could acknowledge or clear in Landscape? [10:31] When trying to install jenkins with apt-get I get the response that there is no candidate for that package. Is it not packaged for 14.04? === Lcawte|Away is now known as Lcawte [11:07] morning [11:13] Rovanion: yes, it looks like jenkins was packaged in 12.04 only. === Lcawte is now known as Lcawte|Away [12:30] zul, jamespage: heat's out and I'm starting on it [12:30] coreycb: cinder is out as well [12:30] zul, k [12:30] zul, cinder is done, jamespage did that [12:31] coreycb: k [12:33] coreycb/jamespage: the tools/config/generate_config works ok? [12:33] zul, yep, seems to === Adri2000 is now known as Guest10811 === Guest10811 is now known as Adri2000 [14:45] hows it going guys [14:45] shellshock a problem ? [14:46] Valduare: all the updates were promptly released by the security team. I'm not aware of any issue. [14:47] ok [14:49] jamespage, were you doing anything with getting a newer version of python-eventlet? [14:49] coreycb, I was trying to avoid doing that - the diff is 25k for the 0.13 -> 0.15.2 [14:49] jamespage, ah right I recall you saying that [14:51] Valduare: just apply your security updates like normal [14:53] ok [14:54] I also use some smoothwall virt routers [14:54] but no patch on them yet…. [14:54] what does that mean for me [14:58] jamespage, heat wants hacking>=0.8.0,<0.9 and 0.9.2 is in utopic [14:58] zul ^ [14:59] coreycb: patch the test-requirements.txt then [15:00] zul, ok [15:17] zul, jamespage: heat is ready for review juno https://code.launchpad.net/~corey.bryant/heat/2014.2-rc1/+merge/236912 [15:17] coreycb: you have merge conflicts [15:19] zul, doh, fixing [15:19] coreycb, retarget that to /juno methings [15:25] zul, this should be better- https://code.launchpad.net/~corey.bryant/heat/2014.2-rc1/+merge/236916 [15:28] coreycb: + libpython2.7-stdlib ?? [15:29] zul, that's for argparse [15:29] oh no you dont want argparse [15:29] doko will shoot you [15:30] zul, heh -- ok === markthomas|away is now known as markthomas [16:09] zul, heat's ready for re-review [16:12] coreycb: k [16:53] Am I wrong in thinking that if bridge_stp is off in an /etc/networking/interfaces bridge stanza, there's no point in having _fd, _hello, _maxage etc. lines? === donspaulding_ is now known as donspaulding [17:48] Please help with a RAID issue (this is not an unubtu question but I am running Server 14.04) [17:48] I have 4 ssd's in my server connected to 2 x 2port raid cards. I am 90% sure that I have set BOTH raid cards to see their 2 drives as a striped array, however my server see's 1 array and 2 separate drives. [17:53] <<<--- PICNIC === kriskrop1 is now known as kriskropd === bilde2910 is now known as bilde2910|away === markthomas is now known as markthomas|away === darkness is now known as Guest91880 === mattgrif_ is now known as mattgriffin === mattgrif_ is now known as mattgriffin === kickinz1 is now known as kickinz1|afk [19:35] anyone runningt trusty using the official postgres apt repo? === a1berto_ is now known as a1berto === Lcawte|Away is now known as Lcawte === kickinz1|afk is now known as kickinz1 === kickinz1 is now known as kickinz1|afk === markthomas|away is now known as markthomas === krtaylor is now known as krtaylor_away [22:49] When I boot my local server i am greeted with this message. Incrementally starting RAID arrays mdadm: CREATE user root not found mdadm: CREATE group disk not found Incrementally started RAID arrays. I have booted on live Debian mounted the raid insured it is still working and it is. Tried installing grub over again but it is missing the partition table so it never installs. What should I do. There are 5 hdd in raid0 on [22:52] CodeVent: if it doesnt have a partition table, you need to give it one [22:52] assuming there isn't one already there that is broken [22:53] CodeVent: man fdisk [22:54] what happens to the old data or old table? [22:55] 100% data loss [22:56] for the data and table [22:57] forgive my ignorance, if i can mount the raid, why do i need a new table? [22:57] with the live image correct? [22:58] yes [22:58] and I assume you attempted the grub install with that right? [22:59] yes, it could not find the superblock? i can boot up shortly and say exactly whats wrong in a min [23:00] that information would be useful [23:00] Simplest way to mitigate denial-of-service attacks? Like AB? [23:02] ApplesInArrays1: ask your ISP to block worst offenders at ingress and talk with packet sources to get them squelched [23:03] Sure [23:03] Here's another one: [23:03] Sometimes my MySQL goes offline and I have to login to restart it. Best way to deal with it? [23:04] anything in the logs say why it failed? [23:04] SegFault [23:05] interesting; check dmesg for other segfaults, disk errors, etc [23:05] it has also failed due to AB testing before [23:05] does it die on specific tasks? [23:05] for the segfault, I couldn't figure out why and I investigated [23:05] I tried to reproduce it, but couldn't [23:05] I had someone else look at the logs, they couldn't tell either. [23:06] Anyways, i'm interested in a way of having mysql 'revive' by itself. [23:06] dang :/ [23:06] maybe install mcelog, you might find mahcine check exceptions getting logged.. [23:06] It might be a band-aid, but it'd help out immensely with what I"m doing right now [23:06] maybe run memtest86+ or other memory stressors... [23:06] i'm interested in a way of having mysql 'revive' by itself. [23:07] ApplesInArrays1: yeah, makes sense, that might even make it easier to track down what specifically killed it [23:07] Alright, I can shut it down by AB testing with another machine. [23:07] and being offline until I wake up and figuring out what's wrong isn't really the best. [23:07] Is there a way to revive MySQL if it's not running? [23:08] ApplesInArrays1: this is where service monitoring systems like runit can help, but you could script together something like a "while true ; do sleep 10 ; service mysql status | something && /etc/init.d/mysql restart ; done [23:10] Sounds like a script is the way to go. thanks [23:10] good luck :) [23:12] while true; do sleep 60; service mysql status | service mysql restart [23:12] ; . Would that work? [23:13] service mysql status returns "mysql stop/waiting' [23:13] ApplesInArrays1: no, you need to inspect the output of service mysql status to see if it is still running or dead; or, if the service mysql status output doesn't know when it dies, find something better for determining when mysql is alive or dead. (nagios is likely to have a mysql monitoring script available that you can steal) [23:14] mysql start/running, process 1104 = on. mysql stop/waiting = off [23:14] Ahh, I see [23:15] .. just so long as it always -knows- when mysql has unexpectedly died. it probably does, but that might not always have been true.. [23:16] voidstar: [23:16] root@debian:/# grub-install /dev/md128 [23:16] Installing for i386-pc platform. [23:16] grub-install: warning: File system `ext2' doesn't support embedding. [23:16] grub-install: warning: Embedding is not possible. GRUB can only be installed in this setup by using blocklists. However, blocklists are UNRELIABLE and their use is discouraged.. [23:16] grub-install: error: will not proceed with blocklists. [23:19] How would I go about sending 'service mysql start' every 5 minutes through bash? [23:19] ApplesInArrays1: if it came to that, /etc/crontab [23:20] im having issues with my samba setup on my headless file server. i had to change routers and now samba isnt working. nothing changed other then the hardware of the router. the configs are the same, heck even the internal ip's are the same. i have a pastebin of the testparm command could anyone help me figure out this issue? http://pastebin.com/v8D7wsx3 [23:21] I just typed that, now I'm stuck [23:21] user:root@scrapy2:/etc# crontab [23:21] CodeVent: http://askubuntu.com/questions/420778/i-need-step-by-step-guidence-to-recover-grub [23:21] first result on google [23:22] my search query was "ext2 ubuntu error: will not proceed with blocklists" [23:22] ApplesInArrays1: try ^C [23:23] ApplesInArrays1: if that doesn't work, try ^D [23:23] yes, I have followed this too [23:23] ^C works for some reason. Strange. [23:24] what is the output of fsck of that device? [23:24] ApplesInArrays1: good good :) first, run "man 5 crontab", and then edit your /etc/crontab file :) hehe [23:24] */15 * * * * /bin/bash /etc/cron.d/clear-mixtape-dir.sh  [23:24] I can follow this template. [23:25] I'd save "service mysql start" in file (/etc/cron.d/clear-mixtape-dir.sh ) [23:25] oh, /etc/cron.d/ looks handy [23:25] WARNING: Re-reading the partition table failed with error 22: Invalid argument. The kernel still uses the old table. The new table will be used at the next reboot or after you run partprobe(8) or kpartx(8) Syncing disks. [23:25] do that instead, yes :) ignore /etc/crontab [23:25] when I reboot same error. [23:25] I also have a directory i should clear out every day [23:26] /var/web/html/img [23:26] i could use bash to clean out that folder once a day, yeah? [23:26] ApplesInArrays1: yeah === trifort_ is now known as trifort === Lcawte is now known as Lcawte|Away