Skip to content

Questions to Determine

  • Is the BMC event log showing the reboots as OS-initiated or hardware-initiated power cycles?
  • Are there any thermal events (CPU, inlet, exhaust temperature warnings) preceding the reboots?
  • Is the PSU event log clean, or are there power fault/loss events?
  • Does mcelog show uncorrectable memory errors (UCEs), not just correctable ones?
  • Is the BIOS configured to reboot on UCE, or does it halt?
  • Could the CMOS battery be failing, causing intermittent BIOS instability?
  • Are there any firmware bugs known for this iDRAC/BIOS version that cause spurious reboots?
  • Is there a hardware watchdog timer that could be triggering the reboot?
  • Have the power cables and PDU outlets been checked for intermittent contact?
  • Is kdump configured and functional? If so, why are no crash dumps being generated?