Fedora: re-mounting a SATA raid array

zefflyn

Registered. User.
Hello sweet open-source OS gurus,

I had this sweet little Dell that I was using for a webserver. I had an add-in SATA card with two 250 Gig SATA drives set up as mirrors, and an IDE boot drive.

In fstab, the raid pair was set as:
/dev/sda1 /web ext3

Then, some Bad Stuff happened: PS blew and took the mobo with it. I got a new mobo & memory (from newegg), then went through a saga of finding a functional matching PS from Fry's, then a new case because the Dell case & wires wouldn't fit the board... and then after putting it all together and turning it on, the SATA card bleeped that one of the SATA drives was dead.

So I popped in a new drive, entered the RAID bios, hit Rebuild, came back a few hours later, and all was good.

But booting the machine now, it hangs on Initializing Kernel Device. I can boot single-user mode, but there's no sign of /dev/sda1.

dmesg has the line: "md: Autodetecting RAID arrays." not followed by any errors.

Right before entering the root password, the system says:
fsck,ext3 /dev/sda1 no valid superblock. blah blah try setting a new superblock using e2fsck -b 8193 /dev/sda1

But trying that produces: no such file or dir /dev/sda1

The system has Fedora 3 installed. I downloaded Fedora 8 and boot to do an install/upgrade, and after passing the selections for Graphical install, English, US keyboard, etc, get the error:
Error mounting device sda1 as /web - no such device or address
Devices in /etc/fstab should be specified by label, not device name.

Says it'll have to reinitialize, which will wipe out all existing data.


So, /web is what I didn't want to lose. How would I find out of the system sees the RAID array, and how to re-mount it without wiping out /web?
 

Aegon

Bongo Maniac
Honestly, I've never played with hardware raid. So this is just an idea and I hope someone out there knows more.

"mdadm" might be a good command to check out, but I doubt it because it sounds like your array appears as a single device (the hardware does the mirroring, and the software thinks there is only one block device). But if this isn't the case, check out the mdadm command.

But I'd probably start with "MAKEDEV /dev/sda1"
Read the man page (or whatever script docs) before you try it, I've never done it before.

If that works, then you can try the suggested e2fsck command.

I hope it helps some. At least until someone who knows what they are doing arrives. Backup your data before you trust me.
 

Aegon

Bongo Maniac
wait a minute... It hangs on a kernel extension? That is probably your raid driver. See if there is a new driver out there.

Are you good with the kernel debugger? Getting into the LVM (or whatever layer this is) shouldn't be a big deal if you are familiar with the territory. That way you can know for sure what the problem is.
 

zefflyn

Registered. User.
Thanks for the tips!

I only had a bit of time to try it before leaving, but MAKEDEV doesn't seem to be on my system, or I typo'd it the two times I tried.

mdadm is there, so I'll read the man page and see how to use it. And I'll look for a driver for the SATA card and try installing that.
 
C

crow

Guest
how many sata slots do you have? is it mounted as a master or slave partition? it could very well be on /dev/sdb or something. what does dmesg show in terms of hardware probing? try disabling raid first and commenting out the mount in your fstab, then check your /proc info files and find out what's going on.. my guess is that it's actually on a different device now.
 

Aegon

Bongo Maniac
The whole thing sounds fishy to me. Crow's advice is good; see if the disk moved.

But I imagine that a power failure that is capable of taking out the mobo and a disk may have taken out the raid card too. Or it might have killed the raid card and the disk you replaced may still be good.

I would consider trying a live CD like knoppix or ubuntu live or something to see if the other OS could get the hardware to work. If it doesn't figure it out, you might have a problem with the raid card.

If you have another raid card, switch it out as a test. <-unlikely.

Have you done any tests to see if the "dead" harddrive is actually dead? If you can show that the drive works, that is a great indication that the controller is broken.
 
Last edited:

zefflyn

Registered. User.
Actually, I don't think the PS killed the MoBo... but while working on it listening to the piezo-buzzer beep error, I got mad and pulled the buzzer off the mobo... and later discovered it was the SATA RAID card making the buzzing. :blush

The Dell mobo didn't have on-board SATA, but I think the new one does, I'm trying to remember. I'll look for what crow described.
 

zefflyn

Registered. User.
Finally got to tweak on this again.

No matter what I did on the rebuilt machine, it wouldn't see the SATA Raid controller (LSI MegaRAID SATA 150-6D). Looking in /proc/partitions, it wouldn't show any change whether I had the card plugged-in, unplugged, or a normal SATA drive plugged in to the MoBo.

So I ran out and bought a little xPC Shuttle micro ATX machine. nVidia 7025 chipset, AMD A64 X2 4800, 2 Gig Ram, etc.

Popped the boot IDE drive in, and it powered on, went past the "Configuring kernel parameters" where it was hanging before, went through the "discovering new hardware" wizard" and then hung.

Booting single-user with the RAID card in and the old /dev/sda1 commented out, it shows /dev/sda in /proc/partitions!

So I added /dev/sda to fstab where the old entry was, reboot to singleuser mode, and it complains that /dev/sda has a bad superblock.

How do I find out what an alternate superblock is, so that I can try using 'e2fsck -b ##' to set it?
 

zefflyn

Registered. User.
OK, discovered 'mke2fs -n' to see where the alternate superblocks are. Running it without the -b option, it calculated /dev/sda to be using 4096 sized blocks.

But running 'e2fsck -b ## /dev/sda' with any of the 4096 blocks worked; it produces:
e2fsck: Bad magic number in super-block while trying to open /dev/sda

The superblock could not be read bla bla bla. Try again with an alternate superblock:
e2fsck -b 8193 <device>

So I re-ran 'mke2fs -n -b 2048' and 'mke2fs -n -b 1024' for more alternates, but none of those worked either. :(

Oh the irony. Everything on the RAID mirror is gone, while everything on the boot drive is safe.
 

Aegon

Bongo Maniac
When e2fsck told you to try "e2fsck -b 8193 /dev/sda", you tried "mke2fs".

But mke2fs won't tell you anything in particular about your old superblock, instead it will only tell you about what would happen if you made a new superblock with the given parameters...

The error from e2fsck... Is the device unmounted? (I ask because it looks like you've added it to the fstab). I would hope that e2fsck would let you know that it isn't working because it is still mounted, but it might instead say something dumb like "bad magic number in superblock..."

What are you passing the -b flag? 4097? 8193? 12289?

Have you used "od" to look at the superblocks yet? The magic number for ext3 should be easily found on the web, so you should see that at the offsets you've listed (4096 * N (BTW, 4096 is a usual size for a page of memory, so a block of data can be read and dumped directly into a page)).

Also, why is your superblock ruined? Does dmesg have disk errors in it? I'm still concerned about the original failure of the disks.

I'm just throwing ideas out...
 

zefflyn

Registered. User.
Don't know what 'od' does, I'll give that a try.

I didn't know what the block size was or where the alternate superblocks would be, but e2fsck's man page suggests using mke2fs to find out, which lead to the above.

When I leave the /dev/sda entry out of /etc/fstab, I get no errors. When I put it in, I get superblock errors and it won't mount.

But I'm going on vacation for a week, and hopefully it'll magically work when I get back.

Thanks Thomas!
 

Wrong Way

Well-known member
If you are totally sure it was a mirror and you had "hardware fault tolerance" with only two discs, I would grab the HDD that was jumpered as disc”0” or the first in the series and try to locate it. Once you have the single disc operating you could back up the data and then worry about your array later.

I guess it would be possible to write a stripe set to only two discs but it would not provide you with any benefits, except maybe getting some fantasy “read” benchmark. So if there is not any stripe set, you should be able to look at the data on either of the two discs independently. The only problem you should encounter looking at the second disc in the series would be if it had an OS - the boot ini file would point to the wrong channel on the second disc, since the data on both discs is identical in a mirror the boot file would be the same too
 

zefflyn

Registered. User.
It was a mirrored pair. The SATA Raid controller has six ports, but I was only using two. Disk 0 is the one that reportedly went bad, so I pulled it, popped on another, went into the RAID webBIOS, and hit Rebuild.

When I get back next week, I'll grab the original disk 0 and try your suggestion.
 
C

crow

Guest
you can't call it just as /dev/sda.. it will be /dev/sda0, or /dev/sda1, etc..

here is what you do:
boot up with that line commented out in your /etc/fstab
run "fdisk -l" or "df", you should see the partition as well as the block size.
if you still can't see the block size, use dumpe2fs /dev/sdaX (where x is the partition #) to list it out.

alternatively, you can try running this tool: http://www.cgsecurity.org/wiki/TestDisk
 

zefflyn

Registered. User.
OK, back from vacation, giving it another try.

fdisk -l : Gives the data on hda1 through 5, all normal, then says:
Disk /dev/sda: 134.7 GB, 137434759168 bytes
255 heads, 63 sectors/track, 16708 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sda doesn't contain a valid partition table.

df produces one line for /dev/hda.

TestDisk looks sweet! I'm playing with it now.
 

zefflyn

Registered. User.
:laughing:laughing

I had given up on this a couple days ago.

I used TestDisk, and it seemed super promising! I went and got the original Disk0 from the RAID mirror set, and plugged it directly to the second SATA port, and ran TestDisk. It saw both drives, and running 'fdisk -l' at the command line showed me both disks (the new, boot disk, and this failed disk). I ran TestDisk's utility to scan and repair the file system. It spun through most of the disk pretty quick, then spent an hour or two scanning through the last 5%, which it said had filesystem errors.

It finally finished, showed me the filesystem it had found, and asked if I wanted to save it! I cheered, saved it, then exited and rebooted, as it had instructed me to do.

But on rebooting, I couldn't find the second disk. fdisk -l showed a /dev/sda2, but I couldn't get it to do anything with it. /mnt/fstab didn't show the disk. I searched futilely on the web for clues on how to mount the disk, gradually losing hope, thinking TestDisk had overwritten the filesystem and wiped out my data... and eventually gave up.

Today, while installing PHP and MySQL, I rebooted, then looked for something on the desktop... and saw an icon called "120 Meg Volume." Puzzled, I clicked on it, and stared in disbelief as I realized that was the failed disk, and there was all my data! :banana

I was able to copy everything off, without disk read errors! Now I just need to figure out how to get a new MySQL install to read my old data files.

Wohoo! Thanks for the tips!
 

zefflyn

Registered. User.
Yep! And for added sweetness, I tar'd up the entire mysql/data directory, put it on a USB stick, opened it on my Windows machine, and copied the data files into the MySQL data directory, and they all opened right up! I'm totally back in business!
 
Top