- datacenter
- l1
- topic-pack
- power --- Portal | Level: L1: Foundations | Topics: Power & UPS | Domain: Datacenter & Hardware
Power & UPS - Primer¶
Why This Matters¶
Power is the most fundamental dependency in a datacenter. Every server, switch, and storage array stops instantly without it. Power failures cascade — a tripped breaker takes out a rack, a UPS failure takes out a row, a generator failure takes out a facility. Understanding power infrastructure, redundancy models, and monitoring prevents outages that no amount of software resilience can fix.
Power Distribution Chain¶
Remember: The power chain mnemonic: "U-A-G-U-P-S" — Utility feed, Automatic Transfer Switch, Generator (backup), UPS, PDU, Server PSU. Power flows left to right through this chain. Each component adds a layer of protection. If any single link fails without redundancy, everything downstream goes dark.
Utility Feed → ATS (Automatic Transfer Switch) → UPS → PDU → Server PSU → Components
↑
Generator (backup)
Automatic Transfer Switch (ATS)¶
Switches between utility power and generator within seconds. Dual-feed facilities have two independent utility feeds with ATS for each.
Generator¶
Diesel generators provide backup during utility outages: - Startup time: 10-30 seconds - UPS bridges the gap during transfer - Fuel supply determines runtime (typically 24-72 hours) - Requires regular testing (monthly load tests)
UPS (Uninterruptible Power Supply)¶
UPS Types¶
| Type | How It Works | Efficiency | Transfer Time | Use Case |
|---|---|---|---|---|
| Offline/Standby | Switches to battery on failure | 95-98% | 5-12ms | Desktop, small network |
| Line-Interactive | Regulates voltage, battery on failure | 95-98% | 2-4ms | Small server rooms |
| Online/Double-Conversion | Always on battery, utility charges | 90-95% | 0ms (no transfer) | Datacenter standard |
Fun fact: The concept of the UPS dates back to 1934 when John Hanley patented an "apparatus for maintaining an uninterrupted supply of electric current." Modern online/double-conversion UPS technology was developed in the 1970s. APC (now Schneider Electric) and Eaton/Powerware dominate the datacenter UPS market. A large datacenter UPS can weigh several tons and contain enough battery capacity to power the facility for 10-15 minutes — just enough to bridge the gap until generators spin up.
Double-conversion is the datacenter standard because it provides: - Zero transfer time (always on inverter) - Clean power (no voltage sags, surges, or frequency variations) - Complete isolation from utility power quality issues
UPS Sizing¶
UPS capacity is measured in kVA (kilovolt-amperes) and kW (kilowatts):
Design for 15-20 minutes of runtime — enough for generator startup or graceful shutdown.
UPS Monitoring¶
# NUT (Network UPS Tools) — standard Linux UPS monitoring
upsc myups # show UPS status
upsc myups ups.status # OL=online, OB=on battery, LB=low battery
upsc myups battery.charge # battery percentage
upsc myups battery.runtime # estimated runtime in seconds
# APC UPS via apcupsd
apcaccess status
# IPMI power readings from server
ipmitool dcmi power reading
PDU (Power Distribution Unit)¶
PDUs distribute power from UPS to server racks.
PDU Types¶
| Type | Features |
|---|---|
| Basic | Power strip, no monitoring |
| Metered | Shows power draw per circuit |
| Monitored | Network-connected, per-outlet monitoring |
| Switched | Remote power cycling per outlet |
| Intelligent | Monitoring + switching + environmental sensors |
Rack Power Layout¶
Standard datacenter practice — dual PDUs per rack: - PDU A: fed from Power Feed A (UPS A) - PDU B: fed from Power Feed B (UPS B) - Each server connects one PSU to PDU A, one to PDU B - Either PDU can handle full rack load if the other fails
Server PSU Redundancy¶
Redundancy Models¶
| Config | Description | Risk |
|---|---|---|
| 1+0 | Single PSU | Any PSU failure = server down |
| 1+1 | Two PSUs, one active, one standby | Survives one PSU failure |
| 2+1 | Three PSUs, two active, one spare | Survives one failure under full load |
| 2+2 | Four PSUs, N+N redundancy | Survives one failure per feed |
# Check PSU status via IPMI
ipmitool sdr type "Power Supply"
# Check PSU redundancy
ipmitool sensor list | grep -i psu
# Dell iDRAC
racadm getconfig -g cfgPowerSupply
# Read current power consumption
ipmitool dcmi power reading
Power Budgeting¶
Per-Rack Calculations¶
Typical rack capacity: 5-10 kW (standard), up to 30+ kW (high density)
Example:
20 servers × 500W average = 10,000W = 10 kW per rack
2 ToR switches × 150W = 300W
Total: ~10.3 kW per rack
With 1+1 PSU redundancy, each PDU must handle full 10.3 kW
PUE (Power Usage Effectiveness)¶
Fun fact: PUE was created by The Green Grid in 2007 and became an ISO standard (ISO/IEC 30134-2) in 2016. Google's datacenters average a PUE of 1.10 — meaning only 10% of total power goes to non-IT overhead (cooling, lighting, power conversion losses). The industry average is around 1.58. Every 0.1 drop in PUE across a large datacenter saves millions of dollars per year in electricity costs.
PUE = Total Facility Power / IT Equipment Power
PUE 1.0 = perfect (impossible)
PUE 1.2 = excellent
PUE 1.5 = average
PUE 2.0 = poor (half the power goes to cooling)
Graceful Shutdown¶
When battery is low and no generator is available:
# NUT automatic shutdown configuration
# /etc/nut/upsmon.conf
MONITOR myups@localhost 1 admin secret master
SHUTDOWNCMD "/sbin/shutdown -h +0"
FINALDELAY 5
# Manual graceful shutdown
shutdown -h +2 "Power failure — shutting down in 2 minutes"
# Shutdown VMs first, then hypervisors
for vm in $(virsh list --name); do
virsh shutdown "$vm"
done
sleep 60
shutdown -h now
Shutdown Order¶
Gotcha: The most dangerous mistake during emergency power shutdown is turning off storage before servers have finished writing. If a database server is mid-transaction when its storage array loses power, you get corrupted data that survives the power restoration. Shutdown order is not optional — it is a data integrity contract. Script it, test it, and make sure on-call knows where the runbook is.
- Applications (drain connections, flush buffers)
- Virtual machines
- Hypervisors / bare-metal OS
- Storage arrays (after all servers are down)
- Network switches (last — needed for management until the end)
Monitoring and Alerting¶
Key power metrics to monitor: - UPS battery charge and estimated runtime - UPS input/output voltage and frequency - PDU per-circuit amperage (prevent overloads) - Server power consumption (watts) - Inlet temperature per rack
# SNMP polling for PDU/UPS (common in monitoring systems)
snmpwalk -v2c -c public pdu-a.dc.local .1.3.6.1.4.1
# Prometheus exporters exist for NUT, APC, and most PDU brands
Alert thresholds: - UPS on battery → immediate page - Battery < 50% → warning - Battery < 20% → critical, initiate shutdown - PDU circuit > 80% capacity → warning - Inlet temp > 35C → warning
Quick Reference¶
| Task | Command |
|---|---|
| UPS status (NUT) | upsc myups |
| UPS status (APC) | apcaccess status |
| Server power draw | ipmitool dcmi power reading |
| PSU status | ipmitool sdr type "Power Supply" |
| Graceful shutdown | shutdown -h +2 "message" |
| Check battery | upsc myups battery.charge |
| Runtime estimate | upsc myups battery.runtime |
Wiki Navigation¶
Related Content¶
- Case Study: Power Supply Redundancy Lost (Case Study, L1) — Power & UPS
- Case Study: Rack PDU Overload Alert (Case Study, L1) — Power & UPS
- Case Study: Server Intermittent Reboot (Case Study, L2) — Power & UPS