Joe Fleming wrote: > Hey all, I have a Debian box that was acting as a 4 drive RAID-5 mdadm > softraid server. I heard one of the drives making strange noises but > mdstat reported no problems with any of the drives. I decided to copy > the data off the array so I had a backup before I tried to figure out > which drive it was. Unfortunately, in the middle of copying said data, > 2 of the drives dropped out at the same time. Since RAID-5 is only > tolerant to one failure at a time, basically the whole array is hosed > now. I've had drives drop out on me before, but never 2 at once. Sigh. > > I tried to Google a little about dealing with multi-drive failures > with mdadm, but I couldn't find much in my initial looking. I'm going > to keep digging, but I thought I'd post a question to the group and > see what happens. So, is there a way to tell mdadm to "unmark" one of > the 2 drives as failed and try to bring up the array again WITHOUT > rebuilding it? I really don't think both of the drives failed on me > simultaneously and I'd like to try to return 1 of the 2 to the array > and test my theory. If I can get the array back up, I can either keep > trying to copy data off it or add a new replacement and try to > rebuild. I'm pretty novice with mdadm thought I don't see an option > that will let me do what I want. Can anyone offer me some advice or > point me in the right direction..... or am I just SOL? > > As a side note, why can't hard drive manufacturers make drives that > last anymore? I've had like 5 drives fail on me in the last year... > WD, Seagate, Hitachi, they all suck equally! I can't find any that > last for any reasonable amount of time, and all the warranties leave > you with reman'd drives which fail even more rapidly, some even show > up DOA. Plus, I'm not sending my unencrypted data off to some random > place! Sorry for venting, just a little ticked off at all of this. > Thanks in advance for any help. > > -Joe I've had luck in the past recovering from a multi-drive failure, where the other failed drive was not truly dead but rather was dropped because of an IO error caused by a thermal calibration or something similar. The trick is to re-add the drive to the array and using the option to force it NOT to try to rebuild the array. This used to be an require several options like --really-force and --really-dangerous but now I think its just something like --assemble --force /dev/md0. This forces the array to come back up to its degraded (still down 1 disk) state. If possible replace the degraded disk or copy your data off before the other flakey drive fails.