Rack & Data Center Operations¶
Reference guide for physical rack layout, cabling, power distribution, thermal management, and asset inventory. The physical layer that everything else depends on.
Mental Model¶
[Application Layer ] Pods, services, user traffic
|
[Compute Layer ] Servers (CPU, memory, disk, NIC)
|
[Network Layer ] Switches, patch panels, cables (ToR / spine-leaf)
|
[Power Layer ] PDUs, UPS, breakers, utility feed (A + B)
|
[Cooling Layer ] CRAC/CRAH, in-row, hot/cold aisle containment
|
[Physical Layer ] Rack, floor, room, building
Every outage is ultimately physical. A software engineer who understands the physical layer can troubleshoot faster and design more resilient systems.
Physical Rack Layout¶
Standard 42U Rack Anatomy¶
Front Rear
+------------------+ +------------------+
42U| [blank panel] | | [cable mgmt] |
41U| [blank panel] | | [cable mgmt] |
40U| [ToR switch A] | | [patch panel A] |
39U| [ToR switch B] | | [patch panel B] |
38U| [blank panel] | | [cable mgmt] |
37U| [server 8 2U] | | |
36U| [server 8 ] | | |
35U| [server 7 2U] | | |
34U| [server 7 ] | | |
33U| [server 6 2U] | | |
32U| [server 6 ] | | |
31U| [server 5 2U] | | |
30U| [server 5 ] | | |
29U| [server 4 2U] | | |
28U| [server 4 ] | | |
27U| [server 3 2U] | | |
26U| [server 3 ] | | |
25U| [server 2 2U] | | |
24U| [server 2 ] | | |
23U| [server 1 2U] | | |
22U| [server 1 ] | | |
21U| [blank panel] | | |
... (expansion) | |
5U| [blank panel] | | [cable mgmt] |
4U| [blank panel] | | [cable mgmt] |
3U| [UPS / battery] | | |
2U| [PDU A (vert)] | | [PDU B (vert)] |
1U| [shelf/drawer] | | [shelf/drawer] |
+------------------+ +------------------+
Note: Vertical (0U) PDUs are mounted on the rear posts and don't consume U-space. The above shows a simplified layout.
U-Space Planning¶
| Component | Typical Size | Quantity | Total U | Notes |
|---|---|---|---|---|
| ToR switch | 1U | 2 | 2U | Redundant pair (A/B) |
| Patch panel | 1U | 2 | 2U | One per switch |
| Cable management | 1U | 3-4 | 3-4U | Between sections |
| 2U server (R760) | 2U | 8 | 16U | Main compute |
| 1U server (R660) | 1U | 4 | 4U | Lighter workloads |
| Blanking panels | 1U | ~10 | 10U | Fill gaps for airflow |
| Shelf/drawer | 1U | 1 | 1U | Crash cart tools |
| Total | ~38U | Leaves 4U for expansion |
Weight Distribution¶
- Standard rack: 1000-1500 kg (2200-3300 lb) weight capacity
- Dell R760 (2U, loaded): ~30-35 kg (66-77 lb)
- 8 servers: ~280 kg (617 lb) — well within limits
- Rule of thumb: heaviest equipment at the bottom, switches at top
- Check raised floor tile load rating: standard tiles handle ~450 kg per tile
- Distribute weight evenly across the rack footprint
Rail Kit Types¶
| Type | Dell Part | Use Case |
|---|---|---|
| Sliding rails | ReadyRails II | Standard rackmount, tool-less |
| Static rails | Static ReadyRails | Fixed mount, no slide-out |
| 2-post adapter | 2-Post kit | Telco/open racks |
| Cable management | CMA | Hinged arm for rear cable routing |
Always install the Cable Management Arm (CMA) — it prevents cable damage when sliding servers out for maintenance.
Network Cabling¶
Cable Types Reference¶
| Cable Type | Speed | Max Distance | Use Case |
|---|---|---|---|
| Cat6 | 10 Gbps | 55m (10G), 100m (1G) | Short runs, management network |
| Cat6a | 10 Gbps | 100m | Standard server-to-switch |
| DAC (Twinax) | 10-100 Gbps | 1-7m | ToR to server (cheapest 10G+) |
| AOC | 10-400 Gbps | 7-100m | Inter-rack, longer than DAC |
| OM3 MMF | 10 Gbps | 300m | Intra-building backbone |
| OM4 MMF | 100 Gbps | 150m | High-speed intra-building |
| OS2 SMF | 100+ Gbps | 10+ km | Inter-building, WAN |
DAC vs AOC: For server-to-ToR connections under 5 meters, DAC (Direct Attach Copper) is cheapest and lowest latency. Beyond 5m, use AOC (Active Optical Cable) or fiber with transceivers.
Structured Cabling Standards (TIA-568)¶
Key requirements: - Maximum horizontal cable run: 90m (permanent link) + 10m patch cords = 100m total - Minimum bend radius: 4x cable diameter (Cat6a), 10x for fiber - Pull tension: max 25 lbs for Cat6a, never yank fiber - Cable pathways: separate power and data by at least 200mm (or use shielded cable)
Labeling Conventions¶
Every cable should have labels at both ends. Use a consistent scheme:
Format: <rack>-<U>-<port>
Examples:
A01-40-1 Rack A01, U40, Port 1 (switch port)
A01-22-iDRAC Rack A01, U22, iDRAC port
A01-22-NIC1 Rack A01, U22, NIC port 1
A01-22-NIC2 Rack A01, U22, NIC port 2
Patch panel labels mirror the switch port:
PP-A01-01 through PP-A01-48
Cross-connect (inter-rack):
A01-40-25--B01-40-25 Rack A01 switch port 25 to Rack B01 switch port 25
Switch Topology: ToR vs EoR¶
Top-of-Rack (ToR) — recommended for most deployments:
Rack A Rack B
+--[ToR-A1]--+ +--[ToR-B1]--+
| [server] | | [server] |
| [server] | | [server] |
| [server] |--uplink----| [server] |
| [server] | | [server] |
+------------+ +------------+
| |
+-------[Spine/Agg]------+
- Each rack has its own switch pair
- Short cable runs (1-3m DAC) from server to switch
- Easy to manage, scale by adding racks
- More switches to buy
End-of-Row (EoR) — sometimes used for smaller deployments:
Rack A Rack B Rack C Rack D (EoR)
[server] [server] [server] +--[EoR switch]--+
[server] [server] [server] | |
[server] [server] [server] +----------------+
| | | |
+----long cables----+-----------+
- Fewer switches (centralized)
- Longer cable runs (Cat6a needed)
- Harder to manage, cable congestion at EoR rack
Port Mapping Documentation¶
Maintain a port map for every switch:
# switch-a01-tor1.portmap.csv
switch,port,speed,vlan,connected_to,description
A01-ToR1,Eth1/1,25G,100,A01-U22-NIC1,server-01 data
A01-ToR1,Eth1/2,25G,100,A01-U24-NIC1,server-02 data
A01-ToR1,Eth1/3,25G,100,A01-U26-NIC1,server-03 data
...
A01-ToR1,Eth1/47,100G,trunk,B01-ToR1-Eth1/47,inter-rack uplink
A01-ToR1,Eth1/48,100G,trunk,Spine1-Eth1/1,spine uplink
A01-ToR1,Mgmt0,1G,10,A01-U22-iDRAC,management network
Power Distribution¶
PDU Types¶
| Type | Features | Use Case | Cost |
|---|---|---|---|
| Basic | Power distribution only | Dev/test | Low |
| Metered | Per-outlet power monitoring | Track per-server power draw | Medium |
| Switched | Remote per-outlet on/off + metered | Remote power control | Med-High |
| ATS | Automatic Transfer Switch | Dual-feed failover | High |
Recommendation: Metered PDUs minimum for production. Switched if you need remote power cycling without iDRAC.
Redundant Power (A+B Feeds)¶
Every production rack should have dual power feeds from separate circuits:
Utility Power
|
+--[UPS A]--[Panel A]--[Breaker A]--[PDU A]--+-- Server PSU 1
| |
+--[UPS B]--[Panel B]--[Breaker B]--[PDU B]--+-- Server PSU 2
- Each PSU in a dual-PSU server connects to a different PDU
- If PDU A fails (or its entire upstream path), PSU 2 keeps the server running
- Servers should be configured for redundant PSU mode (not load-balancing)
Power Budget Worksheet¶
Rack: A01
Location: Room 1, Row A, Position 1
Circuit A: 30A @ 208V single-phase = 6,240W capacity
Circuit B: 30A @ 208V single-phase = 6,240W capacity
Rule: Never exceed 80% of circuit capacity (NEC code)
Usable per circuit: 4,992W
Equipment | Qty | Watts Each | Total W | Feed
--------------------------|-----|------------|---------|------
Dell R760 (2U, loaded) | 8 | 750W | 6,000W | A+B split
Dell S5248 ToR switch | 2 | 350W | 700W | A+B split
Patch panel (passive) | 2 | 0W | 0W | -
Cable management | 4 | 0W | 0W | -
PDU overhead | 2 | 50W | 100W | A+B
| | |---------|
Total rack power: | | | 6,800W |
Per-feed (A or B): | | | 3,400W |
Per-feed utilization: | | | 68% | Under 80% limit
Headroom per feed: | | | 1,592W | Room for 2 more servers
UPS Sizing¶
UPS sizing formula:
Required VA = Total Watts / Power Factor
Example:
Total rack power: 6,800W
Power factor: 0.9 (typical for modern PSUs)
Required VA: 6,800 / 0.9 = 7,556 VA
Target runtime: 10 minutes (enough for generator start)
UPS size: 10 kVA unit per rack (with margin)
For the full room:
20 racks * 7,556 VA = ~151 kVA
UPS: 200 kVA system with N+1 redundancy
Thermal Management¶
Hot Aisle / Cold Aisle Containment¶
Cold Aisle (intake)
============================
| Rack A | | Rack B | Server fronts face the cold aisle
| front | | front | Cool air enters server intakes
============================
||
Perforated floor tiles
||
============================
| Rack A | | Rack B |
| rear | | rear | Hot exhaust exits rear of servers
============================
Hot Aisle (exhaust) Hot air rises to return plenum
============================ or in-row cooling units
Cold aisle containment: Enclose the cold aisle with doors/roof. Prevents hot air from mixing with cold supply air. Most common approach.
Hot aisle containment: Enclose the hot aisle and duct exhaust directly to CRAC/CRAH return. More efficient but harder to work in (it's hot).
Blanking Panels¶
Always fill empty U-spaces with blanking panels. Without them, hot exhaust recirculates through gaps to the cold aisle, creating hot spots.
Impact of missing blanking panels: - 1U gap: ~2-3C inlet temp increase for servers above it - Multiple gaps: can cause thermal shutdowns in extreme cases - More cooling energy wasted compensating for recirculation
Cooling Types¶
| Type | How It Works | Capacity | Best For |
|---|---|---|---|
| Raised floor CRAC | Chilled air under floor, up through tiles | 30-100 kW | Traditional DC |
| In-row cooling | Cooling unit between racks in the row | 30-50 kW | Modern, high-density |
| Rear-door heat ex | Liquid-cooled door on rack rear | 30-40 kW/rack | Retrofit, very dense |
| Direct liquid | Cold plates on CPU/GPU, liquid loop | Unlimited | HPC, AI/GPU clusters |
Monitoring Inlet Temperatures¶
Server inlet temp is the critical metric — it's what the server actually breathes.
# Read inlet temp via iDRAC Redfish
curl -sk -u root:password \
https://idrac-ip/redfish/v1/Chassis/System.Embedded.1/Thermal \
| jq '.Temperatures[] | select(.Name | test("Inlet")) | {Name, ReadingCelsius, Status: .Status.Health}'
# Via IPMI
ipmitool -I lanplus -H 10.0.10.101 -U root -P password sdr type temperature
# Via racadm
racadm getsensorinfo | grep -i inlet
ASHRAE Temperature Guidelines¶
| Class | Recommended Range | Allowable Range | Humidity | Use Case |
|---|---|---|---|---|
| A1 | 18-27C (64-81F) | 15-32C (59-90F) | 20-80% RH | Enterprise servers |
| A2 | 18-27C (64-81F) | 10-35C (50-95F) | 20-80% RH | IT equipment |
| A3 | 18-27C (64-81F) | 5-40C (41-104F) | 8-85% RH | Ruggedized |
| A4 | 18-27C (64-81F) | 5-45C (41-113F) | 8-90% RH | Extreme edge |
Dell PowerEdge servers are typically rated for ASHRAE A2. Target 20-25C inlet temperature for optimal performance and component longevity.
Thermal Troubleshooting¶
Thermal alert / server throttling
|
+-- Check inlet temp (should be 18-27C)
| |
| +-- Inlet > 30C?
| | +-- Check blanking panels (any gaps?)
| | +-- Check cold aisle containment (doors open?)
| | +-- Check CRAC/CRAH status (running? setpoint?)
| | +-- Check for blocked floor tiles (cables under floor?)
| | +-- Check neighboring rack exhaust (hot aisle leaking?)
| |
| +-- Inlet normal but server hot?
| +-- Check for failed fans (iDRAC sensor / SEL)
| +-- Check for dust buildup (front intake filters)
| +-- Check CPU/workload (abnormal 100% CPU?)
| +-- Airflow obstruction inside server (cables, loose components)
Asset Inventory & Labeling¶
CMDB Basics¶
A Configuration Management Database (CMDB) tracks every asset and its relationships. At minimum, track:
Server record:
- Service Tag (Dell unique ID)
- Model (PowerEdge R760, etc.)
- Serial Number
- Location: Room / Row / Rack / U-position
- IP addresses: iDRAC, OS management, data interfaces
- MAC addresses: iDRAC, NIC1-4
- CPU: model, count, cores
- Memory: total GB, DIMM layout
- Storage: controller, disk count/type/size, RAID config
- Firmware versions: BIOS, iDRAC, PERC, NIC
- OS: distribution, version, kernel
- Cluster membership: k8s node name, cluster name
- Purchase date
- Warranty expiration
- Status: Provisioning / Active / Maintenance / Decommissioned
Labeling Standards¶
Physical labels on every server (front and rear):
Front label (on bezel or chassis front):
+-----------------------------------+
| SVR-A01-22 | SvcTag: ABC1234 |
| 10.0.10.101 | iDRAC: 10.0.20.101 |
+-----------------------------------+
Rear label (on chassis rear, visible from hot aisle):
+-----------------------------------+
| SVR-A01-22 | NIC1: Eth1/3 |
| iDRAC port | NIC2: Eth1/4 |
+-----------------------------------+
Naming Convention¶
Format: <role>-<rack>-<U>
Examples:
svr-a01-22 Server in rack A01 at U22
sw-a01-40 Switch in rack A01 at U40
pdu-a01-a PDU A in rack A01
pdu-a01-b PDU B in rack A01
iDRAC naming:
idrac-a01-22 iDRAC for server at rack A01, U22
DNS records:
svr-a01-22.dc1.example.com A 10.0.10.101
idrac-a01-22.dc1.example.com A 10.0.20.101
sw-a01-40.dc1.example.com A 10.0.30.40
QR/Barcode Asset Tags¶
Use QR codes on asset tags for fast scanning during audits:
QR content (URL to CMDB record):
https://cmdb.example.com/asset/ABC1234
Physical label with QR:
+---------------------------+
| [QR CODE] | SVR-A01-22 |
| | ABC1234 |
| | R760 |
+---------------------------+
iDRAC Auto-Discovery to Inventory¶
Automate inventory collection from iDRAC across the fleet:
#!/usr/bin/env bash
# collect-inventory.sh — Scan iDRAC IPs and dump inventory to CSV
set -euo pipefail
IDRAC_USER="${IDRAC_USER:-root}"
IDRAC_PASS="${IDRAC_PASS:?Set IDRAC_PASS}"
SUBNET="10.0.20"
OUTPUT="inventory-$(date +%Y%m%d).csv"
echo "ServiceTag,Model,BiosVersion,iDRACVersion,PowerState,MemoryGiB,CPUs,iDRACIP" > "$OUTPUT"
for octet in $(seq 101 150); do
ip="${SUBNET}.${octet}"
# Quick check if iDRAC is reachable
if ! timeout 2 bash -c "echo >/dev/tcp/${ip}/443" 2>/dev/null; then
continue
fi
data=$(curl -sk --connect-timeout 5 -u "${IDRAC_USER}:${IDRAC_PASS}" \
"https://${ip}/redfish/v1/Systems/System.Embedded.1" 2>/dev/null) || continue
idrac_ver=$(curl -sk --connect-timeout 5 -u "${IDRAC_USER}:${IDRAC_PASS}" \
"https://${ip}/redfish/v1/Managers/iDRAC.Embedded.1" 2>/dev/null \
| jq -r '.FirmwareVersion // "unknown"') || idrac_ver="unknown"
echo "$data" | jq -r --arg ip "$ip" --arg idrac "$idrac_ver" '[
.SKU // "unknown",
.Model // "unknown",
.BiosVersion // "unknown",
$idrac,
.PowerState // "unknown",
(.MemorySummary.TotalSystemMemoryGiB // 0 | tostring),
(.ProcessorSummary.Count // 0 | tostring),
$ip
] | @csv' >> "$OUTPUT"
echo "Collected: ${ip}"
done
echo "Inventory saved to ${OUTPUT}"
echo "Total servers: $(( $(wc -l < "$OUTPUT") - 1 ))"
Rack Elevation Diagram Template¶
Track what's in each U of each rack:
# rack-elevation-a01.txt
# Rack: A01 | Room: DC1 | Row: A | Position: 01
# Updated: 2026-03-05
#
# U | Equipment | Service Tag | Power A | Power B | Notes
# ----|---------------------|-------------|---------|---------|------------------
# 42 | Cable management | - | - | - |
# 41 | Blanking panel | - | - | - |
# 40 | Dell S5248F-ON (A) | XYZ001 | A-1 | B-1 | ToR switch A
# 39 | Dell S5248F-ON (B) | XYZ002 | A-2 | B-2 | ToR switch B
# 38 | Patch panel 48-port | - | - | - | PP-A01-A
# 37 | Patch panel 48-port | - | - | - | PP-A01-B
# 36 | Cable management | - | - | - |
# 35 | Dell R760 (2U top) | ABC008 | A-3 | B-3 | svr-a01-34
# 34 | Dell R760 (2U bot) | (cont.) | | |
# 33 | Dell R760 (2U top) | ABC007 | A-4 | B-4 | svr-a01-32
# 32 | Dell R760 (2U bot) | (cont.) | | |
# 31 | Dell R760 (2U top) | ABC006 | A-5 | B-5 | svr-a01-30
# 30 | Dell R760 (2U bot) | (cont.) | | |
# 29 | Dell R760 (2U top) | ABC005 | A-6 | B-6 | svr-a01-28
# 28 | Dell R760 (2U bot) | (cont.) | | |
# 27 | Dell R760 (2U top) | ABC004 | A-7 | B-7 | svr-a01-26
# 26 | Dell R760 (2U bot) | (cont.) | | |
# 25 | Dell R760 (2U top) | ABC003 | A-8 | B-8 | svr-a01-24
# 24 | Dell R760 (2U bot) | (cont.) | | |
# 23 | Dell R760 (2U top) | ABC002 | A-9 | B-9 | svr-a01-22
# 22 | Dell R760 (2U bot) | (cont.) | | |
# 21 | Dell R760 (2U top) | ABC001 | A-10 | B-10 | svr-a01-20
# 20 | Dell R760 (2U bot) | (cont.) | | |
# 19 | Blanking panel | - | - | - |
# .. | Blanking panels | - | - | - | Expansion space
# 4 | Blanking panel | - | - | - |
# 3 | Cable management | - | - | - |
# 2 | Shelf (crash cart) | - | - | - |
# 1 | Blanking panel | - | - | - |
#
# PDUs (0U vertical mount, rear):
# PDU-A: APC AP8886 (metered), Feed A, Circuit A-01
# PDU-B: APC AP8886 (metered), Feed B, Circuit B-01