Skip to content

Grading Checklist

  • Checked lspci output to confirm whether the device is visible on the PCIe bus
  • Reviewed dmesg for PCIe enumeration errors, AER messages, or NVMe driver failures
  • Verified NVMe kernel modules are loaded (lsmod | grep nvme)
  • Checked BMC/iDRAC system event log for hardware fault entries
  • Considered physical layer issues: drive seating, riser card, PCIe slot failure
  • Investigated whether BIOS/firmware changes during the patch window affected PCIe bifurcation or slot enablement settings
  • Proposed a physical reseat of the drive as a diagnostic step
  • Identified the need to test the slot with another device or the drive in another slot to isolate the fault
  • Addressed the application impact (degraded database cluster) and any immediate mitigation
  • Mentioned checking NVMe drive health via nvme smart-log if the drive becomes visible again
  • Considered thermal or power delivery issues as potential causes
  • Documented the resolution path and whether a drive RMA is needed