Skip to content

Questions: Memory ECC Errors Increasing

  • Which specific DIMM slot is generating the errors?
  • Are the errors on a single DIMM or spread across multiple DIMMs?
  • Is the error rate accelerating (trend toward uncorrectable errors)?
  • What is the DIMM part number, serial number, and manufacturer?
  • Is the server under warranty and eligible for DIMM replacement?
  • Can the workload be failed over to another replica before replacement?
  • Is the DIMM in a slot that requires a server shutdown for replacement (not hot-swap)?
  • Are there any uncorrectable errors (UEs) in addition to correctable errors (CEs)?
  • Could this be caused by a memory controller issue rather than the DIMM itself?
  • What is the EDAC (Error Detection and Correction) subsystem showing in sysfs?