Re: Softraid Multi-dirve Failure

Top Page
Attachments:
Message as email
+ (text/html)
+ (text/plain)
Delete this message
Reply to this message
Author: Joe Fleming
Date:  
To: Main PLUG discussion list
Subject: Re: Softraid Multi-dirve Failure






Yeah, I've done that in a non-RAID setup before. It
worked really well, except for a bunch of stray files and directories
that were just INODE addresses instead of names. Everything inside the
directories was fine though, and that's all I cared about. I had pretty
much a 100% recovery of that drive and I was quite happy I figured it
all out. I was also lucky because the data wasn't on any other drive.
It's not backed up daily of course.

I don't think there's a problem with the card. One of the drives keeps
making a chirping sound (not a click) for a while and I've been
expecting it to fail. I'm guessing since I still hear it that's not the
drive that actually DID fail on me. I'm assuming what happened was that
1 drive failed for real (check the super-block seems to verify
corruption) and another one just dropped out because mdadm messed up a
little. The super-block on the other drive was still fine and I was
able to force it all back online like I mentioned. Now, back to copying
all this data over...... this is going to take a long, long time.

-Joe

Eric Shubert wrote:

I did this last summer (rebuilt w/ --assemble --force) and it worked ok.

For a drive that's really failed, you can dd as much of it as you can to
a new drive, then run fsck on the new drive, then add it back in w/
--assemble --force. That worked for me as well IIRC. It did lose
whatever wasn't able to be read w/ dd from the old drive though.

Given that 2 drives went at once, consider that the i/o card or MB (or
PS) might be having issues. That was the case with the incident last
summer (replaced i/o card, then MB was failing).

Joe Fleming wrote:


That's exactly what I want to do here; just pull up one of the drives
long enough that I can get the data off it. I suspect one of the drives
really did fail, I've been waiting for it to happen in fact. But since
the other drive claims to have failed at the EXACT same time, I really
don't think that it did.

I saw the --force option but there's no indication that it wasn't going
to rebuild the array. The assemble option might simply imply that
though.... it does say "This usage assembles one of more raid arrays
from pre-existing components" which sounds promising enough.

I think you've described exactly what I was trying to do; assemble (NOT
rebuild) and copy. Thanks!

-Joe


I've had luck in the past recovering from a multi-drive failure, where
the other failed drive was not truly dead but rather was dropped
because of an IO error caused by a thermal calibration or something
similar. The trick is to re-add the drive to the array and using the
option to force it NOT to try to rebuild the array. This used to be
an require several options like --really-force and --really-dangerous
but now I think its just something like --assemble --force /dev/md0.
This forces the array to come back up to its degraded (still down 1
disk) state. If possible replace the degraded disk or copy your data
off before the other flakey drive fails.
------------------------------------------------------------------------

---------------------------------------------------
PLUG-discuss mailing list -
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss









---------------------------------------------------
PLUG-discuss mailing list -
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss