Softraid Multi-dirve Failure

Fri Jan 9 14:53:30 MST 2009

Joe Fleming wrote:
> Hey all, I have a Debian box that was acting as a 4 drive RAID-5 mdadm 
> softraid server. I heard one of the drives making strange noises but 
> mdstat reported no problems with any of the drives. I decided to copy 
> the data off the array so I had a backup before I tried to figure out 
> which drive it was. Unfortunately, in the middle of copying said data, 
> 2 of the drives dropped out at the same time. Since RAID-5 is only 
> tolerant to one failure at a time, basically the whole array is hosed 
> now. I've had drives drop out on me before, but never 2 at once. Sigh.
>
> I tried to Google a little about dealing with multi-drive failures 
> with mdadm, but I couldn't find much in my initial looking. I'm going 
> to keep digging, but I thought I'd post a question to the group and 
> see what happens. So, is there a way to tell mdadm to "unmark" one of 
> the 2 drives as failed and try to bring up the array again WITHOUT 
> rebuilding it? I really don't think both of the drives failed on me 
> simultaneously and I'd like to try to return 1 of the 2 to the array 
> and test my theory. If I can get the array back up, I can either keep 
> trying to copy data off it or add a new replacement and try to 
> rebuild. I'm pretty novice with mdadm thought I don't see an option 
> that will let me do what I want. Can anyone offer me some advice or 
> point me in the right direction..... or am I just SOL?
>
> As a side note, why can't hard drive manufacturers make drives that 
> last anymore? I've had like 5 drives fail on me in the last year... 
> WD, Seagate, Hitachi, they all suck equally! I can't find any that 
> last for any reasonable amount of time, and all the warranties leave 
> you with reman'd drives which fail even more rapidly, some even show 
> up DOA. Plus, I'm not sending my unencrypted data off to some random 
> place! Sorry for venting, just a little ticked off at all of this. 
> Thanks in advance for any help.
>
> -Joe

I've had luck in the past recovering from a multi-drive failure, where 
the other failed drive was not truly dead but rather was dropped because 
of an IO error caused by a thermal calibration or something similar.  
The trick is to re-add the drive to the array and using the option to 
force it NOT to try to rebuild the array.  This used to be an require 
several options like --really-force and --really-dangerous but now I 
think its just something like --assemble --force /dev/md0. This forces 
the array to come back up to its degraded (still down 1 disk) state.  If 
possible replace the degraded disk or copy your data off before the 
other flakey drive fails.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.PLUG.phoenix.az.us/pipermail/plug-discuss/attachments/20090109/eb93b00e/attachment.htm