Skip to content

Power & UPS - Primer

Why This Matters

Power is the most fundamental dependency in a datacenter. Every server, switch, and storage array stops instantly without it. Power failures cascade — a tripped breaker takes out a rack, a UPS failure takes out a row, a generator failure takes out a facility. Understanding power infrastructure, redundancy models, and monitoring prevents outages that no amount of software resilience can fix.

Power Distribution Chain

Remember: The power chain mnemonic: "U-A-G-U-P-S" — Utility feed, Automatic Transfer Switch, Generator (backup), UPS, PDU, Server PSU. Power flows left to right through this chain. Each component adds a layer of protection. If any single link fails without redundancy, everything downstream goes dark.

Utility Feed → ATS (Automatic Transfer Switch) → UPS → PDU → Server PSU → Components
            Generator (backup)

Automatic Transfer Switch (ATS)

Switches between utility power and generator within seconds. Dual-feed facilities have two independent utility feeds with ATS for each.

Generator

Diesel generators provide backup during utility outages: - Startup time: 10-30 seconds - UPS bridges the gap during transfer - Fuel supply determines runtime (typically 24-72 hours) - Requires regular testing (monthly load tests)

UPS (Uninterruptible Power Supply)

UPS Types

Type How It Works Efficiency Transfer Time Use Case
Offline/Standby Switches to battery on failure 95-98% 5-12ms Desktop, small network
Line-Interactive Regulates voltage, battery on failure 95-98% 2-4ms Small server rooms
Online/Double-Conversion Always on battery, utility charges 90-95% 0ms (no transfer) Datacenter standard

Fun fact: The concept of the UPS dates back to 1934 when John Hanley patented an "apparatus for maintaining an uninterrupted supply of electric current." Modern online/double-conversion UPS technology was developed in the 1970s. APC (now Schneider Electric) and Eaton/Powerware dominate the datacenter UPS market. A large datacenter UPS can weigh several tons and contain enough battery capacity to power the facility for 10-15 minutes — just enough to bridge the gap until generators spin up.

Double-conversion is the datacenter standard because it provides: - Zero transfer time (always on inverter) - Clean power (no voltage sags, surges, or frequency variations) - Complete isolation from utility power quality issues

UPS Sizing

UPS capacity is measured in kVA (kilovolt-amperes) and kW (kilowatts):

Power Factor = kW / kVA (typically 0.9 for modern UPS)
Runtime = Battery capacity / Load

Design for 15-20 minutes of runtime — enough for generator startup or graceful shutdown.

UPS Monitoring

# NUT (Network UPS Tools) — standard Linux UPS monitoring
upsc myups                    # show UPS status
upsc myups ups.status         # OL=online, OB=on battery, LB=low battery
upsc myups battery.charge     # battery percentage
upsc myups battery.runtime    # estimated runtime in seconds

# APC UPS via apcupsd
apcaccess status

# IPMI power readings from server
ipmitool dcmi power reading

PDU (Power Distribution Unit)

PDUs distribute power from UPS to server racks.

PDU Types

Type Features
Basic Power strip, no monitoring
Metered Shows power draw per circuit
Monitored Network-connected, per-outlet monitoring
Switched Remote power cycling per outlet
Intelligent Monitoring + switching + environmental sensors

Rack Power Layout

Standard datacenter practice — dual PDUs per rack: - PDU A: fed from Power Feed A (UPS A) - PDU B: fed from Power Feed B (UPS B) - Each server connects one PSU to PDU A, one to PDU B - Either PDU can handle full rack load if the other fails

Server PSU Redundancy

Redundancy Models

Config Description Risk
1+0 Single PSU Any PSU failure = server down
1+1 Two PSUs, one active, one standby Survives one PSU failure
2+1 Three PSUs, two active, one spare Survives one failure under full load
2+2 Four PSUs, N+N redundancy Survives one failure per feed
# Check PSU status via IPMI
ipmitool sdr type "Power Supply"

# Check PSU redundancy
ipmitool sensor list | grep -i psu

# Dell iDRAC
racadm getconfig -g cfgPowerSupply

# Read current power consumption
ipmitool dcmi power reading

Power Budgeting

Per-Rack Calculations

Typical rack capacity: 5-10 kW (standard), up to 30+ kW (high density)

Example:
  20 servers × 500W average = 10,000W = 10 kW per rack
  2 ToR switches × 150W = 300W
  Total: ~10.3 kW per rack

  With 1+1 PSU redundancy, each PDU must handle full 10.3 kW

PUE (Power Usage Effectiveness)

Fun fact: PUE was created by The Green Grid in 2007 and became an ISO standard (ISO/IEC 30134-2) in 2016. Google's datacenters average a PUE of 1.10 — meaning only 10% of total power goes to non-IT overhead (cooling, lighting, power conversion losses). The industry average is around 1.58. Every 0.1 drop in PUE across a large datacenter saves millions of dollars per year in electricity costs.

PUE = Total Facility Power / IT Equipment Power

PUE 1.0 = perfect (impossible)
PUE 1.2 = excellent
PUE 1.5 = average
PUE 2.0 = poor (half the power goes to cooling)

Graceful Shutdown

When battery is low and no generator is available:

# NUT automatic shutdown configuration
# /etc/nut/upsmon.conf
MONITOR myups@localhost 1 admin secret master
SHUTDOWNCMD "/sbin/shutdown -h +0"
FINALDELAY 5

# Manual graceful shutdown
shutdown -h +2 "Power failure — shutting down in 2 minutes"

# Shutdown VMs first, then hypervisors
for vm in $(virsh list --name); do
    virsh shutdown "$vm"
done
sleep 60
shutdown -h now

Shutdown Order

Gotcha: The most dangerous mistake during emergency power shutdown is turning off storage before servers have finished writing. If a database server is mid-transaction when its storage array loses power, you get corrupted data that survives the power restoration. Shutdown order is not optional — it is a data integrity contract. Script it, test it, and make sure on-call knows where the runbook is.

  1. Applications (drain connections, flush buffers)
  2. Virtual machines
  3. Hypervisors / bare-metal OS
  4. Storage arrays (after all servers are down)
  5. Network switches (last — needed for management until the end)

Monitoring and Alerting

Key power metrics to monitor: - UPS battery charge and estimated runtime - UPS input/output voltage and frequency - PDU per-circuit amperage (prevent overloads) - Server power consumption (watts) - Inlet temperature per rack

# SNMP polling for PDU/UPS (common in monitoring systems)
snmpwalk -v2c -c public pdu-a.dc.local .1.3.6.1.4.1

# Prometheus exporters exist for NUT, APC, and most PDU brands

Alert thresholds: - UPS on battery → immediate page - Battery < 50% → warning - Battery < 20% → critical, initiate shutdown - PDU circuit > 80% capacity → warning - Inlet temp > 35C → warning

Quick Reference

Task Command
UPS status (NUT) upsc myups
UPS status (APC) apcaccess status
Server power draw ipmitool dcmi power reading
PSU status ipmitool sdr type "Power Supply"
Graceful shutdown shutdown -h +2 "message"
Check battery upsc myups battery.charge
Runtime estimate upsc myups battery.runtime

Wiki Navigation