Questions to Determine¶
- What is the per-outlet power draw for each device on the overloaded phase (L2)?
- Which servers were recently added, and were they properly accounted for in capacity planning?
- Is the load imbalance a phase distribution problem (too many servers on L2 outlets)?
- Can any non-critical workloads be gracefully powered down or migrated to reduce immediate risk?
- Are the new GPU servers drawing more power than their nameplate rating predicted?
- Can servers be redistributed across phases by moving power cables to different PDU outlets?
- Is the redundant PDU (PDU-B) also at risk of overload if PDU-A trips?
- What is the total rack power budget and how does current draw compare to the planned allocation?
- Are there any servers with redundant PSUs that could shift one PSU to a different phase?
- Should any servers be relocated to a different rack with available capacity?