SMART and exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

grigio asked:

looking at the logs I see several errors. The smart status is PASSED, but even if It is verbose It isn’t clear what is going on. Should the disk be replaced?

Attached dmesg and smart logs

# dmesg
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: BMDMA stat 0x64
ata1.00: failed command: READ DMA EXT
ata1.00: cmd 25/00:08:61:5d:64/00:00:5f:00:00/e0 tag 0 dma 4096 in
         res 51/40:00:63:5d:64/40:00:5f:00:00/00 Emask 0x9 (media error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/133
sd 0:0:0:0: [sda] Unhandled sense code
sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 0:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
        5f 64 5d 63 
sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
sd 0:0:0:0: [sda] CDB: Read(10): 28 00 5f 64 5d 61 00 00 08 00
ata1: EH complete 

# smartctl -x /dev/sda
smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen,

Model Family:     Seagate Barracuda 7200.12 family
Device Model:     ST31000528AS
Serial Number:    9VP8WA3X
Firmware Version: CC38
User Capacity:    1,000,204,886,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Sun Feb 10 16:13:22 2013 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled


Error 36 [15] occurred at disk power-on lifetime: 16731 hours (697 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 5f 5d 00 64 63 00 00  Error: UNC at LBA = 0x5f5d006463 = 409582199907

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x000a  2            8  Device-to-host register FISes sent due to a COMRESET
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS

My answer:

Congratulations, you’ve just had your first unrecoverable read error.

sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed

This means the drive was unable to read from a sector of the disk, and in this case, was also unable to reallocate a good sector to take the place of the bad sector.

A URE can cause the drive to appear failed to a RAID controller and putting your RAID array in a degraded state. After the next read error on any remaining disk, you lose all your data. So you should replace the disk immediately. (And woe to you if this isn’t in a RAID…)

Despite the “pass” from SMART, you should be able to get a warranty replacement when your drive is showing any number of unrecoverable read errors.

View the full question and answer on Server Fault.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.