Skip to content

Grading Checklist

  • Reviewed BMC/iDRAC system event log for the exact event type (power cycle vs. reset vs. OS shutdown)
  • Distinguished between OS-level crash and hardware-level power cycle
  • Checked mcelog for uncorrectable memory errors and correlated with reboot timestamps
  • Investigated PSU health and power event logs
  • Verified kdump is configured and explained why it may not capture this type of failure
  • Considered hardware causes: failing PSU, loose power cable, bad memory DIMM
  • Checked for thermal throttling or shutdown events
  • Reviewed BIOS settings for behavior on uncorrectable errors
  • Proposed diagnostic steps: run memory diagnostics, swap suspect DIMM, check PSU redundancy
  • Addressed the database impact and recommended failover while diagnosing
  • Considered firmware/BIOS update as part of the resolution
  • Mentioned physical inspection: power cable seating, PSU seating, DIMM seating