EDAC error messages at boot each time, is it a hardware issue?

Edward asked:

I noticed I have EDAC error messages in the logs while I was doing a dmesg. Then I found out I am getting the same EDAC error message each time at boot. This is on CentOS 6.4 x86_64. I suspect a memory problem so I ran memtest86 from a Live DVD of CentOS 6.0 and it didn’t show any problems. I tried removing a stick of RAM at time, powering up and still have the EDAC error messages. Wondering if it was a recent Kernel problem I booted from the CentOS 6.0 Live DVD and looked in the log and there was a EDAC message there too just like with CentOS 6.4.

This is the error message:

Jul  5 00:44:19 mybox kernel: dracut: Switching root
Jul  5 00:44:19 mybox kernel: readahead: starting
Jul  5 00:44:19 mybox kernel: udev: starting version 147
Jul  5 00:44:19 mybox kernel: EDAC MC: Ver: 2.1.0 Jun 12 2013
Jul  5 00:44:19 mybox kernel: EDAC MC0: Giving out device to 'i3000_edac' 'i3000': DEV 0000:00:00.0
Jul  5 00:44:19 mybox kernel: EDAC PCI0: Giving out device to module 'i3000_edac' controller 'EDAC PCI controller': DEV '0000:00:00.0' (POLLED)
Jul  5 00:44:19 mybox kernel: tg3.c:v3.124 (March 21, 2012)

I’m not experiencing any other problems with the system. It’s running on a Dell PowerEdge SC430 with 4 GB of RAM. It has two internal 80 GB drives running a software RAID and the external eSATA drives are also running a software RAID.

If it is a hardware problem, would it only be related to memory? Could it be something else? I’m willing to try more things out to get to the bottom of this, but I’m not sure what the next step is at this point. Thanks!

EDAC MC0: CE page 0x1521e, offset 0xb00, grain 128, syndrome 0x49, row 0, channel 1, label "": i3000 CE
EDAC MC0: CE page 0x1521e, offset 0xb00, grain 128, syndrome 0x49, row 0, channel 1, label "": i3000 CE
EDAC MC0: CE page 0x1521e, offset 0xb00, grain 128, syndrome 0x49, row 0, channel 1, label "": i3000 CE
EDAC MC0: CE page 0x1521e, offset 0xb00, grain 128, syndrome 0x49, row 0, channel 1, label "": i3000 CE
EDAC MC0: CE page 0x1521e, offset 0xb00, grain 128, syndrome 0x49, row 0, channel 1, label "": i3000 CE
EDAC MC0: CE page 0x1521e, offset 0xb00, grain 128, syndrome 0x49, row 0, channel 1, label "": i3000 CE
EDAC MC0: CE page 0x1521e, offset 0xb00, grain 128, syndrome 0x49, row 0, channel 1, label "": i3000 CE
EDAC MC0: CE page 0x1521e, offset 0xb00, grain 128, syndrome 0x49, row 0, channel 1, label "": i3000 CE
EDAC MC0: CE page 0x1521e, offset 0xb00, grain 128, syndrome 0x49, row 0, channel 1, label "": i3000 CE
EDAC MC0: CE page 0x1521e, offset 0xb00, grain 128, syndrome 0x49, row 0, channel 1, label "": i3000 CE
EDAC MC0: CE page 0x1521e, offset 0xb00, grain 128, syndrome 0x49, row 0, channel 1, label "": i3000 CE

My answer:


The lines shown which refer to “Giving out device” mean that the driver has initialized and is talking to the hardware. One refers to the memory controller (MC0) and the other refers to the PCI controller (PCI0).

The lines beginning with CE refer to correctable errors, i.e. the ECC hardware successfully corrected an error. If you only see one every few months, no big deal; cosmic rays or whatever. If you are seeing a lot of these, then it’s time to replace the affected RAM because it’s probably going to die on you soon.


View the full question and answer on Server Fault.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.