RAID and Failed Disk

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Kevin Buettner
Date:  
New-Topics: mail server solution
Subject: RAID and Failed Disk
On Oct 7, 11:06am, Narayanasamy, Sundar wrote:

> I have a Linux server at a remote site. Is there anyway, I could
> check whether it has any failed hard disks?


If it's software RAID, inspecting /proc/mdstat should tell you what
you need to know.

E.g, here's what I see when I do "cat /proc/mdstat" on one of my machines:

    Personalities : [raid1] 
    read_ahead 1024 sectors
    md0 : active raid1 hdg5[1] hde5[0]
      4194688 blocks [2/2] [UU]

    
    md4 : active raid1 hdg6[1] hde6[0]
      4194688 blocks [2/2] [UU]

    
    md1 : active raid1 hdg7[1] hde7[0]
      17685760 blocks [2/2] [UU]

    
    md2 : active raid1 hdg8[1] hde8[0]
      17684736 blocks [2/2] [UU]

    
    md3 : active raid1 hdg10[1] hde10[0]
      14307904 blocks [2/2] [UU]


If one of the disks had failed for (e.g.) md0, it might instead say:

    md0 : active raid1 hdg5[1] hde5[0]
      4194688 blocks [1/2] [U_]


This shows that only one of the mirrors is operational and that partition
hde5 is inoperational. Of course, if there were a problem with hde, then
it's likely that all of the entries above would indicate a problem with
their hde partition.

You should also see a log file message. According to drivers/md/raid1.c,
it'll look like this:

    #define DISK_FAILED KERN_ALERT \
    "raid1: Disk failure on %s, disabling device. \n" \
    "    Operation continuing on %d devices\n"


There are some other messages which may be printed too. You may want to
inspect the kernel source yourself to see what they are. (There's
probably some real documentation about this somewhere too...)

Kevin