Skip to content

Rack & Data Center Operations

Reference guide for physical rack layout, cabling, power distribution, thermal management, and asset inventory. The physical layer that everything else depends on.

Mental Model

[Application Layer ]  Pods, services, user traffic
|
[Compute Layer     ]  Servers (CPU, memory, disk, NIC)
|
[Network Layer     ]  Switches, patch panels, cables (ToR / spine-leaf)
|
[Power Layer       ]  PDUs, UPS, breakers, utility feed (A + B)
|
[Cooling Layer     ]  CRAC/CRAH, in-row, hot/cold aisle containment
|
[Physical Layer    ]  Rack, floor, room, building

Every outage is ultimately physical. A software engineer who understands the physical layer can troubleshoot faster and design more resilient systems.


Physical Rack Layout

Standard 42U Rack Anatomy

         Front                    Rear
    +------------------+    +------------------+
 42U|  [blank panel]   |    |  [cable mgmt]    |
 41U|  [blank panel]   |    |  [cable mgmt]    |
 40U|  [ToR switch A]  |    |  [patch panel A] |
 39U|  [ToR switch B]  |    |  [patch panel B] |
 38U|  [blank panel]   |    |  [cable mgmt]    |
 37U|  [server 8  2U]  |    |                  |
 36U|  [server 8     ]  |    |                  |
 35U|  [server 7  2U]  |    |                  |
 34U|  [server 7     ]  |    |                  |
 33U|  [server 6  2U]  |    |                  |
 32U|  [server 6     ]  |    |                  |
 31U|  [server 5  2U]  |    |                  |
 30U|  [server 5     ]  |    |                  |
 29U|  [server 4  2U]  |    |                  |
 28U|  [server 4     ]  |    |                  |
 27U|  [server 3  2U]  |    |                  |
 26U|  [server 3     ]  |    |                  |
 25U|  [server 2  2U]  |    |                  |
 24U|  [server 2     ]  |    |                  |
 23U|  [server 1  2U]  |    |                  |
 22U|  [server 1     ]  |    |                  |
 21U|  [blank panel]   |    |                  |
 ...   (expansion)          |                  |
  5U|  [blank panel]   |    |  [cable mgmt]    |
  4U|  [blank panel]   |    |  [cable mgmt]    |
  3U|  [UPS / battery] |    |                  |
  2U|  [PDU A (vert)]  |    |  [PDU B (vert)]  |
  1U|  [shelf/drawer]  |    |  [shelf/drawer]  |
    +------------------+    +------------------+

Note: Vertical (0U) PDUs are mounted on the rear posts and don't consume U-space. The above shows a simplified layout.

U-Space Planning

Component Typical Size Quantity Total U Notes
ToR switch 1U 2 2U Redundant pair (A/B)
Patch panel 1U 2 2U One per switch
Cable management 1U 3-4 3-4U Between sections
2U server (R760) 2U 8 16U Main compute
1U server (R660) 1U 4 4U Lighter workloads
Blanking panels 1U ~10 10U Fill gaps for airflow
Shelf/drawer 1U 1 1U Crash cart tools
Total ~38U Leaves 4U for expansion

Weight Distribution

  • Standard rack: 1000-1500 kg (2200-3300 lb) weight capacity
  • Dell R760 (2U, loaded): ~30-35 kg (66-77 lb)
  • 8 servers: ~280 kg (617 lb) — well within limits
  • Rule of thumb: heaviest equipment at the bottom, switches at top
  • Check raised floor tile load rating: standard tiles handle ~450 kg per tile
  • Distribute weight evenly across the rack footprint

Rail Kit Types

Type Dell Part Use Case
Sliding rails ReadyRails II Standard rackmount, tool-less
Static rails Static ReadyRails Fixed mount, no slide-out
2-post adapter 2-Post kit Telco/open racks
Cable management CMA Hinged arm for rear cable routing

Always install the Cable Management Arm (CMA) — it prevents cable damage when sliding servers out for maintenance.


Network Cabling

Cable Types Reference

Cable Type Speed Max Distance Use Case
Cat6 10 Gbps 55m (10G), 100m (1G) Short runs, management network
Cat6a 10 Gbps 100m Standard server-to-switch
DAC (Twinax) 10-100 Gbps 1-7m ToR to server (cheapest 10G+)
AOC 10-400 Gbps 7-100m Inter-rack, longer than DAC
OM3 MMF 10 Gbps 300m Intra-building backbone
OM4 MMF 100 Gbps 150m High-speed intra-building
OS2 SMF 100+ Gbps 10+ km Inter-building, WAN

DAC vs AOC: For server-to-ToR connections under 5 meters, DAC (Direct Attach Copper) is cheapest and lowest latency. Beyond 5m, use AOC (Active Optical Cable) or fiber with transceivers.

Structured Cabling Standards (TIA-568)

Key requirements: - Maximum horizontal cable run: 90m (permanent link) + 10m patch cords = 100m total - Minimum bend radius: 4x cable diameter (Cat6a), 10x for fiber - Pull tension: max 25 lbs for Cat6a, never yank fiber - Cable pathways: separate power and data by at least 200mm (or use shielded cable)

Labeling Conventions

Every cable should have labels at both ends. Use a consistent scheme:

Format: <rack>-<U>-<port>

Examples:
  A01-40-1    Rack A01, U40, Port 1 (switch port)
  A01-22-iDRAC  Rack A01, U22, iDRAC port
  A01-22-NIC1   Rack A01, U22, NIC port 1
  A01-22-NIC2   Rack A01, U22, NIC port 2

Patch panel labels mirror the switch port:
  PP-A01-01 through PP-A01-48

Cross-connect (inter-rack):
  A01-40-25--B01-40-25   Rack A01 switch port 25 to Rack B01 switch port 25

Switch Topology: ToR vs EoR

Top-of-Rack (ToR) — recommended for most deployments:

Rack A                     Rack B
+--[ToR-A1]--+            +--[ToR-B1]--+
|  [server]  |            |  [server]  |
|  [server]  |            |  [server]  |
|  [server]  |--uplink----|  [server]  |
|  [server]  |            |  [server]  |
+------------+            +------------+
      |                         |
      +-------[Spine/Agg]------+

  • Each rack has its own switch pair
  • Short cable runs (1-3m DAC) from server to switch
  • Easy to manage, scale by adding racks
  • More switches to buy

End-of-Row (EoR) — sometimes used for smaller deployments:

Rack A    Rack B    Rack C    Rack D (EoR)
[server]  [server]  [server]  +--[EoR switch]--+
[server]  [server]  [server]  |                |
[server]  [server]  [server]  +----------------+
   |         |         |           |
   +----long cables----+-----------+

  • Fewer switches (centralized)
  • Longer cable runs (Cat6a needed)
  • Harder to manage, cable congestion at EoR rack

Port Mapping Documentation

Maintain a port map for every switch:

# switch-a01-tor1.portmap.csv
switch,port,speed,vlan,connected_to,description
A01-ToR1,Eth1/1,25G,100,A01-U22-NIC1,server-01 data
A01-ToR1,Eth1/2,25G,100,A01-U24-NIC1,server-02 data
A01-ToR1,Eth1/3,25G,100,A01-U26-NIC1,server-03 data
...
A01-ToR1,Eth1/47,100G,trunk,B01-ToR1-Eth1/47,inter-rack uplink
A01-ToR1,Eth1/48,100G,trunk,Spine1-Eth1/1,spine uplink
A01-ToR1,Mgmt0,1G,10,A01-U22-iDRAC,management network

Power Distribution

PDU Types

Type Features Use Case Cost
Basic Power distribution only Dev/test Low
Metered Per-outlet power monitoring Track per-server power draw Medium
Switched Remote per-outlet on/off + metered Remote power control Med-High
ATS Automatic Transfer Switch Dual-feed failover High

Recommendation: Metered PDUs minimum for production. Switched if you need remote power cycling without iDRAC.

Redundant Power (A+B Feeds)

Every production rack should have dual power feeds from separate circuits:

Utility Power
  |
  +--[UPS A]--[Panel A]--[Breaker A]--[PDU A]--+-- Server PSU 1
  |                                             |
  +--[UPS B]--[Panel B]--[Breaker B]--[PDU B]--+-- Server PSU 2
  • Each PSU in a dual-PSU server connects to a different PDU
  • If PDU A fails (or its entire upstream path), PSU 2 keeps the server running
  • Servers should be configured for redundant PSU mode (not load-balancing)

Power Budget Worksheet

Rack: A01
Location: Room 1, Row A, Position 1

Circuit A: 30A @ 208V single-phase = 6,240W capacity
Circuit B: 30A @ 208V single-phase = 6,240W capacity

Rule: Never exceed 80% of circuit capacity (NEC code)
Usable per circuit: 4,992W

Equipment                 | Qty | Watts Each | Total W | Feed
--------------------------|-----|------------|---------|------
Dell R760 (2U, loaded)    |  8  |    750W    | 6,000W  | A+B split
Dell S5248 ToR switch     |  2  |    350W    |   700W  | A+B split
Patch panel (passive)     |  2  |      0W    |     0W  | -
Cable management          |  4  |      0W    |     0W  | -
PDU overhead              |  2  |     50W    |   100W  | A+B
                          |     |            |---------|
Total rack power:         |     |            | 6,800W  |
Per-feed (A or B):        |     |            | 3,400W  |
Per-feed utilization:     |     |            |   68%   | Under 80% limit

Headroom per feed:        |     |            | 1,592W  | Room for 2 more servers

UPS Sizing

UPS sizing formula:
  Required VA = Total Watts / Power Factor

  Example:
    Total rack power: 6,800W
    Power factor: 0.9 (typical for modern PSUs)
    Required VA: 6,800 / 0.9 = 7,556 VA

  Target runtime: 10 minutes (enough for generator start)
  UPS size: 10 kVA unit per rack (with margin)

For the full room:
  20 racks * 7,556 VA = ~151 kVA
  UPS: 200 kVA system with N+1 redundancy

Thermal Management

Hot Aisle / Cold Aisle Containment

         Cold Aisle (intake)
    ============================
    |  Rack A  |  |  Rack B  |      Server fronts face the cold aisle
    |  front   |  |  front   |      Cool air enters server intakes
    ============================
                ||
         Perforated floor tiles
                ||
    ============================
    |  Rack A  |  |  Rack B  |
    |  rear    |  |  rear    |      Hot exhaust exits rear of servers
    ============================
          Hot Aisle (exhaust)           Hot air rises to return plenum
    ============================        or in-row cooling units

Cold aisle containment: Enclose the cold aisle with doors/roof. Prevents hot air from mixing with cold supply air. Most common approach.

Hot aisle containment: Enclose the hot aisle and duct exhaust directly to CRAC/CRAH return. More efficient but harder to work in (it's hot).

Blanking Panels

Always fill empty U-spaces with blanking panels. Without them, hot exhaust recirculates through gaps to the cold aisle, creating hot spots.

Impact of missing blanking panels: - 1U gap: ~2-3C inlet temp increase for servers above it - Multiple gaps: can cause thermal shutdowns in extreme cases - More cooling energy wasted compensating for recirculation

Cooling Types

Type How It Works Capacity Best For
Raised floor CRAC Chilled air under floor, up through tiles 30-100 kW Traditional DC
In-row cooling Cooling unit between racks in the row 30-50 kW Modern, high-density
Rear-door heat ex Liquid-cooled door on rack rear 30-40 kW/rack Retrofit, very dense
Direct liquid Cold plates on CPU/GPU, liquid loop Unlimited HPC, AI/GPU clusters

Monitoring Inlet Temperatures

Server inlet temp is the critical metric — it's what the server actually breathes.

# Read inlet temp via iDRAC Redfish
curl -sk -u root:password \
  https://idrac-ip/redfish/v1/Chassis/System.Embedded.1/Thermal \
  | jq '.Temperatures[] | select(.Name | test("Inlet")) | {Name, ReadingCelsius, Status: .Status.Health}'

# Via IPMI
ipmitool -I lanplus -H 10.0.10.101 -U root -P password sdr type temperature

# Via racadm
racadm getsensorinfo | grep -i inlet

ASHRAE Temperature Guidelines

Class Recommended Range Allowable Range Humidity Use Case
A1 18-27C (64-81F) 15-32C (59-90F) 20-80% RH Enterprise servers
A2 18-27C (64-81F) 10-35C (50-95F) 20-80% RH IT equipment
A3 18-27C (64-81F) 5-40C (41-104F) 8-85% RH Ruggedized
A4 18-27C (64-81F) 5-45C (41-113F) 8-90% RH Extreme edge

Dell PowerEdge servers are typically rated for ASHRAE A2. Target 20-25C inlet temperature for optimal performance and component longevity.

Thermal Troubleshooting

Thermal alert / server throttling
|
+-- Check inlet temp (should be 18-27C)
|   |
|   +-- Inlet > 30C?
|   |   +-- Check blanking panels (any gaps?)
|   |   +-- Check cold aisle containment (doors open?)
|   |   +-- Check CRAC/CRAH status (running? setpoint?)
|   |   +-- Check for blocked floor tiles (cables under floor?)
|   |   +-- Check neighboring rack exhaust (hot aisle leaking?)
|   |
|   +-- Inlet normal but server hot?
|       +-- Check for failed fans (iDRAC sensor / SEL)
|       +-- Check for dust buildup (front intake filters)
|       +-- Check CPU/workload (abnormal 100% CPU?)
|       +-- Airflow obstruction inside server (cables, loose components)

Asset Inventory & Labeling

CMDB Basics

A Configuration Management Database (CMDB) tracks every asset and its relationships. At minimum, track:

Server record:
  - Service Tag (Dell unique ID)
  - Model (PowerEdge R760, etc.)
  - Serial Number
  - Location: Room / Row / Rack / U-position
  - IP addresses: iDRAC, OS management, data interfaces
  - MAC addresses: iDRAC, NIC1-4
  - CPU: model, count, cores
  - Memory: total GB, DIMM layout
  - Storage: controller, disk count/type/size, RAID config
  - Firmware versions: BIOS, iDRAC, PERC, NIC
  - OS: distribution, version, kernel
  - Cluster membership: k8s node name, cluster name
  - Purchase date
  - Warranty expiration
  - Status: Provisioning / Active / Maintenance / Decommissioned

Labeling Standards

Physical labels on every server (front and rear):

Front label (on bezel or chassis front):
  +-----------------------------------+
  |  SVR-A01-22  |  SvcTag: ABC1234  |
  |  10.0.10.101 |  iDRAC: 10.0.20.101 |
  +-----------------------------------+

Rear label (on chassis rear, visible from hot aisle):
  +-----------------------------------+
  |  SVR-A01-22  |  NIC1: Eth1/3     |
  |  iDRAC port  |  NIC2: Eth1/4     |
  +-----------------------------------+

Naming Convention

Format: <role>-<rack>-<U>

Examples:
  svr-a01-22     Server in rack A01 at U22
  sw-a01-40      Switch in rack A01 at U40
  pdu-a01-a      PDU A in rack A01
  pdu-a01-b      PDU B in rack A01

iDRAC naming:
  idrac-a01-22   iDRAC for server at rack A01, U22

DNS records:
  svr-a01-22.dc1.example.com       A    10.0.10.101
  idrac-a01-22.dc1.example.com     A    10.0.20.101
  sw-a01-40.dc1.example.com        A    10.0.30.40

QR/Barcode Asset Tags

Use QR codes on asset tags for fast scanning during audits:

QR content (URL to CMDB record):
  https://cmdb.example.com/asset/ABC1234

Physical label with QR:
  +---------------------------+
  |  [QR CODE]  | SVR-A01-22 |
  |             | ABC1234    |
  |             | R760       |
  +---------------------------+

iDRAC Auto-Discovery to Inventory

Automate inventory collection from iDRAC across the fleet:

#!/usr/bin/env bash
# collect-inventory.sh — Scan iDRAC IPs and dump inventory to CSV
set -euo pipefail

IDRAC_USER="${IDRAC_USER:-root}"
IDRAC_PASS="${IDRAC_PASS:?Set IDRAC_PASS}"
SUBNET="10.0.20"
OUTPUT="inventory-$(date +%Y%m%d).csv"

echo "ServiceTag,Model,BiosVersion,iDRACVersion,PowerState,MemoryGiB,CPUs,iDRACIP" > "$OUTPUT"

for octet in $(seq 101 150); do
    ip="${SUBNET}.${octet}"
    # Quick check if iDRAC is reachable
    if ! timeout 2 bash -c "echo >/dev/tcp/${ip}/443" 2>/dev/null; then
        continue
    fi

    data=$(curl -sk --connect-timeout 5 -u "${IDRAC_USER}:${IDRAC_PASS}" \
        "https://${ip}/redfish/v1/Systems/System.Embedded.1" 2>/dev/null) || continue

    idrac_ver=$(curl -sk --connect-timeout 5 -u "${IDRAC_USER}:${IDRAC_PASS}" \
        "https://${ip}/redfish/v1/Managers/iDRAC.Embedded.1" 2>/dev/null \
        | jq -r '.FirmwareVersion // "unknown"') || idrac_ver="unknown"

    echo "$data" | jq -r --arg ip "$ip" --arg idrac "$idrac_ver" '[
        .SKU // "unknown",
        .Model // "unknown",
        .BiosVersion // "unknown",
        $idrac,
        .PowerState // "unknown",
        (.MemorySummary.TotalSystemMemoryGiB // 0 | tostring),
        (.ProcessorSummary.Count // 0 | tostring),
        $ip
    ] | @csv' >> "$OUTPUT"

    echo "Collected: ${ip}"
done

echo "Inventory saved to ${OUTPUT}"
echo "Total servers: $(( $(wc -l < "$OUTPUT") - 1 ))"

Rack Elevation Diagram Template

Track what's in each U of each rack:

# rack-elevation-a01.txt
# Rack: A01 | Room: DC1 | Row: A | Position: 01
# Updated: 2026-03-05
#
# U   | Equipment           | Service Tag | Power A | Power B | Notes
# ----|---------------------|-------------|---------|---------|------------------
# 42  | Cable management    | -           | -       | -       |
# 41  | Blanking panel      | -           | -       | -       |
# 40  | Dell S5248F-ON (A)  | XYZ001      | A-1     | B-1     | ToR switch A
# 39  | Dell S5248F-ON (B)  | XYZ002      | A-2     | B-2     | ToR switch B
# 38  | Patch panel 48-port | -           | -       | -       | PP-A01-A
# 37  | Patch panel 48-port | -           | -       | -       | PP-A01-B
# 36  | Cable management    | -           | -       | -       |
# 35  | Dell R760 (2U top)  | ABC008      | A-3     | B-3     | svr-a01-34
# 34  | Dell R760 (2U bot)  | (cont.)     |         |         |
# 33  | Dell R760 (2U top)  | ABC007      | A-4     | B-4     | svr-a01-32
# 32  | Dell R760 (2U bot)  | (cont.)     |         |         |
# 31  | Dell R760 (2U top)  | ABC006      | A-5     | B-5     | svr-a01-30
# 30  | Dell R760 (2U bot)  | (cont.)     |         |         |
# 29  | Dell R760 (2U top)  | ABC005      | A-6     | B-6     | svr-a01-28
# 28  | Dell R760 (2U bot)  | (cont.)     |         |         |
# 27  | Dell R760 (2U top)  | ABC004      | A-7     | B-7     | svr-a01-26
# 26  | Dell R760 (2U bot)  | (cont.)     |         |         |
# 25  | Dell R760 (2U top)  | ABC003      | A-8     | B-8     | svr-a01-24
# 24  | Dell R760 (2U bot)  | (cont.)     |         |         |
# 23  | Dell R760 (2U top)  | ABC002      | A-9     | B-9     | svr-a01-22
# 22  | Dell R760 (2U bot)  | (cont.)     |         |         |
# 21  | Dell R760 (2U top)  | ABC001      | A-10    | B-10    | svr-a01-20
# 20  | Dell R760 (2U bot)  | (cont.)     |         |         |
# 19  | Blanking panel      | -           | -       | -       |
# ..  | Blanking panels     | -           | -       | -       | Expansion space
#  4  | Blanking panel      | -           | -       | -       |
#  3  | Cable management    | -           | -       | -       |
#  2  | Shelf (crash cart)  | -           | -       | -       |
#  1  | Blanking panel      | -           | -       | -       |
#
# PDUs (0U vertical mount, rear):
#   PDU-A: APC AP8886 (metered), Feed A, Circuit A-01
#   PDU-B: APC AP8886 (metered), Feed B, Circuit B-01