Symptoms: Power Supply Redundancy Lost¶
- Server
k8s-worker-17(HPE ProLiant DL380 Gen10) triggered a BMC alert: "Power Supply Redundancy Lost." - The server is still running on a single PSU (PSU 1). PSU 2 shows "Failed" status.
- The server is part of a Kubernetes cluster and runs production workloads.
- No service impact yet, but the server has lost its power redundancy safety margin.
- Each PSU is rated at 800W; the server's current draw is approximately 520W.
- PSU 1 is on Power Distribution Unit (PDU) A; PSU 2 was on PDU B (dual-feed for redundancy).
- The datacenter facility team confirmed PDU B is healthy -- other servers on PDU B are fine.
- The server has been in production for 18 months.
- The amber PSU LED on the rear of the chassis is lit for PSU 2.