Joe Fleming wrote:
Hey all, I have a Debian box that was acting as a 4
drive RAID-5 mdadm softraid server. I heard one of the drives making
strange noises but mdstat reported no problems with any of the drives.
I decided to copy the data off the array so I had a backup before I
tried to figure out which drive it was. Unfortunately, in the middle of
copying said data, 2 of the drives dropped out at the same time. Since
RAID-5 is only tolerant to one failure at a time, basically the whole
array is hosed now. I've had drives drop out on me before, but never 2
at once. Sigh.
I tried to Google a little about dealing with multi-drive failures with
mdadm, but I couldn't find much in my initial looking. I'm going to
keep digging, but I thought I'd post a question to the group and see
what happens. So, is there a way to tell mdadm to "unmark" one of the 2
drives as failed and try to bring up the array again WITHOUT rebuilding
it? I really don't think both of the drives failed on me simultaneously
and I'd like to try to return 1 of the 2 to the array and test my
theory. If I can get the array back up, I can either keep trying to
copy data off it or add a new replacement and try to rebuild. I'm
pretty novice with mdadm thought I don't see an option that will let me
do what I want. Can anyone offer me some advice or point me in the
right direction..... or am I just SOL?
As a side note, why can't hard drive manufacturers make drives that
last anymore? I've had like 5 drives fail on me in the last year... WD,
Seagate, Hitachi, they all suck equally! I can't find any that last for
any reasonable amount of time, and all the warranties leave you with
reman'd drives which fail even more rapidly, some even show up DOA.
Plus, I'm not sending my unencrypted data off to some random place!
Sorry for venting, just a little ticked off at all of this. Thanks in
advance for any help.
-Joe
I've had luck in the past recovering from a multi-drive failure, where
the other failed drive was not truly dead but rather was dropped
because of an IO error caused by a thermal calibration or something
similar. The trick is to re-add the drive to the array and using the
option to force it NOT to try to rebuild the array. This used to be an
require several options like --really-force and --really-dangerous but
now I think its just something like --assemble --force /dev/md0. This
forces the array to come back up to its degraded (still down 1 disk)
state. If possible replace the degraded disk or copy your data off
before the other flakey drive fails.
---------------------------------------------------
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss