Softraid Multi-dirve Failure
Charles Jones
charles.jones at ciscolearning.org
Fri Jan 9 14:53:30 MST 2009
Joe Fleming wrote:
> Hey all, I have a Debian box that was acting as a 4 drive RAID-5 mdadm
> softraid server. I heard one of the drives making strange noises but
> mdstat reported no problems with any of the drives. I decided to copy
> the data off the array so I had a backup before I tried to figure out
> which drive it was. Unfortunately, in the middle of copying said data,
> 2 of the drives dropped out at the same time. Since RAID-5 is only
> tolerant to one failure at a time, basically the whole array is hosed
> now. I've had drives drop out on me before, but never 2 at once. Sigh.
>
> I tried to Google a little about dealing with multi-drive failures
> with mdadm, but I couldn't find much in my initial looking. I'm going
> to keep digging, but I thought I'd post a question to the group and
> see what happens. So, is there a way to tell mdadm to "unmark" one of
> the 2 drives as failed and try to bring up the array again WITHOUT
> rebuilding it? I really don't think both of the drives failed on me
> simultaneously and I'd like to try to return 1 of the 2 to the array
> and test my theory. If I can get the array back up, I can either keep
> trying to copy data off it or add a new replacement and try to
> rebuild. I'm pretty novice with mdadm thought I don't see an option
> that will let me do what I want. Can anyone offer me some advice or
> point me in the right direction..... or am I just SOL?
>
> As a side note, why can't hard drive manufacturers make drives that
> last anymore? I've had like 5 drives fail on me in the last year...
> WD, Seagate, Hitachi, they all suck equally! I can't find any that
> last for any reasonable amount of time, and all the warranties leave
> you with reman'd drives which fail even more rapidly, some even show
> up DOA. Plus, I'm not sending my unencrypted data off to some random
> place! Sorry for venting, just a little ticked off at all of this.
> Thanks in advance for any help.
>
> -Joe
I've had luck in the past recovering from a multi-drive failure, where
the other failed drive was not truly dead but rather was dropped because
of an IO error caused by a thermal calibration or something similar.
The trick is to re-add the drive to the array and using the option to
force it NOT to try to rebuild the array. This used to be an require
several options like --really-force and --really-dangerous but now I
think its just something like --assemble --force /dev/md0. This forces
the array to come back up to its degraded (still down 1 disk) state. If
possible replace the degraded disk or copy your data off before the
other flakey drive fails.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.PLUG.phoenix.az.us/pipermail/plug-discuss/attachments/20090109/eb93b00e/attachment.htm
More information about the PLUG-discuss
mailing list