How to detect hard disk failure?

Devator asked:

So, one of my servers has a hard disk failure. It’s running software RAID, the system locked up and according to /proc/mdstat (and /var/log/messages), it’s really down:

Personalities : [raid1]
md2 : active raid1 sdb2[1]
      104320 blocks [2/1] [_U]

md5 : active raid1 sdb5[1]
      2104448 blocks [2/1] [_U]

md6 : active raid1 sdb6[1]
      830134656 blocks [2/1] [_U]

md1 : active raid1 sdb1[1]
      143363968 blocks [2/1] [_U]

and

Nov  5 22:04:37 m38501 smartd[4467]: Device: /dev/sda, not capable of SMART self-check

However

when I do smartctl -H /dev/sda, it passes the test. It also passes the test with smartctl --test=short /dev/sda.

So, is smartctl a broken testing tool, or am I doing something completely off?

My answer:


Maybe an intermittent error with the drive electronics? That’s the first thing that comes to mind. Be safe and replace the drive.


View the full question and answer on Server Fault.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.