Grading Checklist¶
- Checked
lspcioutput to confirm whether the device is visible on the PCIe bus - Reviewed
dmesgfor PCIe enumeration errors, AER messages, or NVMe driver failures - Verified NVMe kernel modules are loaded (
lsmod | grep nvme) - Checked BMC/iDRAC system event log for hardware fault entries
- Considered physical layer issues: drive seating, riser card, PCIe slot failure
- Investigated whether BIOS/firmware changes during the patch window affected PCIe bifurcation or slot enablement settings
- Proposed a physical reseat of the drive as a diagnostic step
- Identified the need to test the slot with another device or the drive in another slot to isolate the fault
- Addressed the application impact (degraded database cluster) and any immediate mitigation
- Mentioned checking NVMe drive health via
nvme smart-logif the drive becomes visible again - Considered thermal or power delivery issues as potential causes
- Documented the resolution path and whether a drive RMA is needed