Dell Server Management¶
Reference guide for managing Dell PowerEdge servers in a DevOps context. Covers iDRAC, BIOS configuration, OpenManage, hardware diagnostics, and RAID/storage.
See Also¶
- Dell PowerEdge Deep Dive — conceptual guide and interview preparation covering RAID internals, iDRAC architecture, firmware management theory, and common failure modes
Mental Model¶
[Your Automation ] Ansible, Redfish scripts, OMECLI
|
[Management Plane ] iDRAC9 / Lifecycle Controller / OpenManage Enterprise
|
[Firmware Layer ] BIOS/UEFI, PERC, NIC firmware, iDRAC firmware
|
[Hardware ] CPU, DIMMs, disks, PSUs, fans, backplane
|
[Physical ] Rack, cabling, power, cooling
Everything above the hardware layer is firmware you can update, configure, and automate remotely. The management plane (iDRAC) is your primary interface — it's always available even when the OS is down.
iDRAC & Lifecycle Controller¶
What iDRAC Is¶
iDRAC (integrated Dell Remote Access Controller) is a dedicated BMC (Baseboard Management Controller) with its own network interface, CPU, and memory. It runs independently of the host OS. Think of it as a tiny always-on management computer embedded in every Dell server.
Key capabilities: - Remote console (virtual KVM) — full BIOS-to-OS access from a browser - Virtual media — mount ISOs remotely for OS installs - Power control — power on/off/cycle/graceful shutdown - Sensor monitoring — temps, voltages, fan speeds, power draw - Alerting — SNMP traps, email, syslog forwarding, Redfish events - Firmware updates — flash BIOS, PERC, NIC, iDRAC itself - Lifecycle Controller — embedded deployment and update tool
iDRAC Versions¶
| Generation | iDRAC Version | Key Additions |
|---|---|---|
| 12G (R620) | iDRAC7 | Basic HTML5 console, RACADM |
| 13G (R630) | iDRAC8 | Improved HTML5, Redfish v1 (partial) |
| 14G (R640) | iDRAC9 | Full Redfish, Telemetry streaming, group management |
| 15G (R660) | iDRAC9 | Enhanced Redfish, zero-touch provisioning |
| 16G (R760) | iDRAC9 | Telemetry enhancements, security hardening |
RACADM CLI¶
racadm is the iDRAC command-line interface. It can run locally (on the managed
server), remotely via SSH, or from a management station.
# Remote connection
racadm -r 192.168.1.100 -u root -p <password> getversion
# Get system inventory
racadm -r 192.168.1.100 -u root -p <password> get BIOS.SysInformation
# Get iDRAC network config
racadm get iDRAC.NIC
# Set static IP on iDRAC
racadm set iDRAC.IPv4.Address 10.0.10.50
racadm set iDRAC.IPv4.Netmask 255.255.255.0
racadm set iDRAC.IPv4.Gateway 10.0.10.1
racadm set iDRAC.IPv4.DHCPEnable Disabled
# Power operations
racadm serveraction powerup
racadm serveraction powerdown
racadm serveraction powercycle
racadm serveraction graceshutdown
# Get system event log
racadm getsel
# Clear system event log
racadm clrsel
# Get sensor readings
racadm getsensorinfo
# Get storage controller info
racadm storage get controllers
# Get virtual disk info
racadm storage get vdisks
# Schedule a BIOS configuration job
racadm jobqueue create BIOS.Setup.1-1
# Export server config profile (SCP) to file
racadm get -t xml -f server_config.xml
# Import server config profile
racadm set -t xml -f server_config.xml
Redfish API¶
Redfish is the modern REST API standard for server management (DMTF standard). iDRAC9 has comprehensive Redfish support.
# Base discovery
curl -sk -u root:password https://idrac-ip/redfish/v1/ | jq .
# System inventory
curl -sk -u root:password https://idrac-ip/redfish/v1/Systems/System.Embedded.1 | jq '{
Model: .Model,
SerialNumber: .SerialNumber,
BiosVersion: .BiosVersion,
PowerState: .PowerState,
TotalMemoryGiB: .MemorySummary.TotalSystemMemoryGiB,
ProcessorCount: .ProcessorSummary.Count
}'
# Get all storage controllers
curl -sk -u root:password \
https://idrac-ip/redfish/v1/Systems/System.Embedded.1/Storage | jq '.Members[]."@odata.id"'
# Get system event log entries
curl -sk -u root:password \
https://idrac-ip/redfish/v1/Managers/iDRAC.Embedded.1/LogServices/Sel/Entries \
| jq '.Members[] | {Created, Message, Severity}'
# Power cycle via Redfish
curl -sk -u root:password \
-X POST https://idrac-ip/redfish/v1/Systems/System.Embedded.1/Actions/ComputerSystem.Reset \
-H 'Content-Type: application/json' \
-d '{"ResetType": "ForceRestart"}'
# Set next one-time boot to PXE
curl -sk -u root:password \
-X PATCH https://idrac-ip/redfish/v1/Systems/System.Embedded.1 \
-H 'Content-Type: application/json' \
-d '{"Boot": {"BootSourceOverrideTarget": "Pxe", "BootSourceOverrideEnabled": "Once"}}'
# Get iDRAC firmware version
curl -sk -u root:password \
https://idrac-ip/redfish/v1/Managers/iDRAC.Embedded.1 | jq .FirmwareVersion
# List firmware inventory (all components)
curl -sk -u root:password \
https://idrac-ip/redfish/v1/UpdateService/FirmwareInventory | jq '.Members[]."@odata.id"'
Alerting Setup¶
Configure iDRAC to send alerts for critical events:
# Enable email alerts
racadm set iDRAC.EmailAlert.1.Enable Enabled
racadm set iDRAC.EmailAlert.1.Address ops-team@example.com
racadm set iDRAC.EmailAlert.1.CustomMsg "Dell Server Alert"
# Configure SMTP
racadm set iDRAC.RemoteHosts.SMTPServerIPAddress smtp.example.com
racadm set iDRAC.RemoteHosts.SMTPPort 587
# Enable SNMP traps
racadm set iDRAC.SNMP.TrapFormat SNMPv2
racadm set iDRAC.SNMP.Alert.1.Enable Enabled
racadm set iDRAC.SNMP.Alert.1.DestAddr 10.0.10.200
# Forward to syslog
racadm set iDRAC.SysLog.SysLogEnable Enabled
racadm set iDRAC.SysLog.Server1 10.0.10.201
racadm set iDRAC.SysLog.Port1 514
Lifecycle Controller¶
The Lifecycle Controller (LC) is a persistent embedded tool in iDRAC that handles:
- OS Deployment: Built-in OS install wizard (mount ISO, configure RAID, install)
- Firmware Updates: Pull from Dell.com or local repository, apply without OS
- Hardware Diagnostics: Run ePSA/eSA diagnostics remotely
- Server Configuration Profiles (SCP): Export/import full server config as XML/JSON
To access: reboot server, press F10 during POST, or use iDRAC virtual console.
SCPs are powerful for fleet standardization:
# Export full SCP via Redfish
curl -sk -u root:password \
-X POST https://idrac-ip/redfish/v1/Managers/iDRAC.Embedded.1/Actions/Oem/EID_674_Manager.ExportSystemConfiguration \
-H 'Content-Type: application/json' \
-d '{
"ExportFormat": "JSON",
"ShareParameters": {
"Target": "ALL"
}
}'
# Import SCP to configure a new server identically
curl -sk -u root:password \
-X POST https://idrac-ip/redfish/v1/Managers/iDRAC.Embedded.1/Actions/Oem/EID_674_Manager.ImportSystemConfiguration \
-H 'Content-Type: application/json' \
-d @server_config.json
BIOS/UEFI Configuration¶
Boot Modes¶
| Mode | Description | When to Use |
|---|---|---|
| UEFI | Modern boot with GPT, Secure Boot, fast POST | Default for all new deployments |
| Legacy | MBR-based BIOS boot | Only for legacy OS/tools |
Changing boot mode wipes the boot device list. Plan this before OS install.
Golden BIOS Config Checklist¶
Use this as a baseline for new server intake:
Boot Mode:
[x] UEFI mode enabled
[x] Secure Boot enabled (if OS supports it)
[x] Boot order: 1) PXE NIC1 2) RAID VD0 3) UEFI Shell
Performance:
[x] System Profile: Performance (not "Energy Efficient" for compute nodes)
[x] Logical Processor (Hyperthreading): Enabled
[x] Virtualization Technology (VT-x): Enabled
[x] SR-IOV Global Enable: Enabled (if using network passthrough)
[x] NUMA optimization: Enabled
[x] x2APIC: Enabled
Memory:
[x] Memory Operating Mode: Optimizer Mode
[x] Node Interleaving: Disabled (let NUMA manage it)
[x] Memory Refresh Rate: 1x (default, unless error-prone DIMMs)
Security:
[x] TPM 2.0: Enabled (required for Secure Boot, BitLocker, measured boot)
[x] TPM Security: On with Pre-boot Measurements
[x] AC Power Recovery: Last (or On, for headless servers)
[x] System Password: Set (prevent local BIOS changes)
I/O:
[x] Embedded NIC1: Enabled with PXE
[x] Embedded NIC2-4: Enabled, PXE disabled
[x] Slot disablement: Disable unused PCIe slots
Applying BIOS Settings via Automation¶
# Via RACADM
racadm set BIOS.SysProfileSettings.SysProfile PerfOptimized
racadm set BIOS.ProcSettings.LogicalProc Enabled
racadm set BIOS.ProcSettings.Virtualization Enabled
racadm set BIOS.SysSecurity.TpmSecurity On
# Create a BIOS config job (applies on next reboot)
racadm jobqueue create BIOS.Setup.1-1
racadm serveraction powercycle
# Via Redfish
curl -sk -u root:password \
-X PATCH https://idrac-ip/redfish/v1/Systems/System.Embedded.1/Bios/Settings \
-H 'Content-Type: application/json' \
-d '{
"Attributes": {
"SysProfile": "PerfOptimized",
"LogicalProc": "Enabled",
"ProcVirtualization": "Enabled",
"TpmSecurity": "On"
}
}'
Dell OpenManage¶
OpenManage Enterprise (OME)¶
OME is the fleet management console for Dell servers. It runs as a virtual appliance (VM) and manages hundreds to thousands of servers from a single pane.
Key capabilities: - Discovery: Auto-discover servers by IP range, subnet, or Active Directory - Inventory: Hardware inventory across the fleet (CPU, memory, disks, firmware) - Compliance Baselines: Define firmware/config baselines, detect drift, remediate - Firmware Updates: Batch update firmware using Dell Update Packages (DUPs) - Alerting: Aggregate alerts from all managed servers - Templates: Golden server config templates — deploy to new hardware - Warranty: Pull warranty status from Dell.com API - Reports: Hardware lifecycle, compliance, inventory reports
OME vs Alternatives¶
| Feature | OME | MAAS (Canonical) | Foreman/Katello |
|---|---|---|---|
| Dell integration | Native, deepest | Plugin (IPMI only) | Plugin (IPMI + smart_proxy) |
| Discovery | Auto (Redfish) | Manual enlist | PXE-based discovery |
| Firmware updates | Native (DUPs) | Not built-in | katello content views |
| OS provisioning | Basic (LC) | Excellent | Excellent |
| Config management | SCP/templates | Cloud-init | Puppet/Ansible |
| Scale | 8000+ servers | ~1000 machines | Large scale |
| Cost | Free with warranty | Free | Free |
| Best for | Dell-only fleet | Multi-vendor, Ubuntu | Multi-vendor, RHEL |
OMECLI (Command Line)¶
# Discover servers by IP range
omecli discover --range 10.0.10.1-10.0.10.50 --protocol REDFISH
# List managed devices
omecli device list --format json | jq '.[] | {Name, ServiceTag, PowerState}'
# Check firmware compliance
omecli baseline compliance --baseline-name "Production-R660" --format json
# Update firmware on a group
omecli firmware update --group "Rack-A" --catalog "Dell-Latest"
# Export inventory report
omecli report run --report-name "Full Inventory" --format csv --output inventory.csv
Hardware Diagnostics¶
System Event Log (SEL)¶
The SEL is your first stop for hardware issues. It's stored in iDRAC non-volatile memory and persists across reboots and OS reinstalls.
# View SEL via RACADM
racadm getsel
# Example output:
# 1 | 03/05/2026 | 14:23:01 | Critical | Memory | DIMM A1 correctable ECC error rate exceeded
# 2 | 03/05/2026 | 14:23:45 | Warning | Storage | Physical Disk 0:1:2 predicted failure
# View SEL via Redfish
curl -sk -u root:password \
https://idrac-ip/redfish/v1/Managers/iDRAC.Embedded.1/LogServices/Sel/Entries \
| jq '.Members[] | select(.Severity != "OK") | {Created, Message, Severity}'
# View Lifecycle Log (more detail than SEL)
curl -sk -u root:password \
https://idrac-ip/redfish/v1/Managers/iDRAC.Embedded.1/LogServices/Lclog/Entries \
| jq '.Members[:20] | .[] | {Created, Message}'
LED Indicators¶
| LED Color / Pattern | Meaning | Action |
|---|---|---|
| Solid blue (front panel) | System identified / UID active | Informational only |
| Blinking blue | UID beacon (locate in rack) | Informational only |
| Solid amber | Non-critical warning | Check SEL for details |
| Blinking amber | Critical fault | Check SEL, may need replacement |
| No LED, no power | PSU failure or no AC | Check power, PSU, PDU |
| Blinking amber on disk | Predicted failure or rebuild | Check PERC status, plan replace |
| Solid green on disk | Disk online, healthy | Normal operation |
| Blinking green on disk | Disk activity (I/O) | Normal operation |
Diagnostic Decision Tree¶
Server is unhealthy / alert fired
|
+-- Can you reach iDRAC?
| |
| +-- YES: Check SEL (racadm getsel / Redfish LogServices)
| | |
| | +-- Memory errors (ECC) -> Check DIMM slot, plan replacement
| | +-- Disk predictive failure -> Check PERC, hot spare, plan replace
| | +-- PSU fault -> Check redundancy, swap PSU
| | +-- Temp warning -> Check fans, airflow, ambient temp
| | +-- CPU machine check -> Reseat CPU, escalate to Dell
| | +-- PERC battery low -> Replace PERC battery/capacitor
| |
| +-- NO: Check iDRAC network config, cable, iDRAC power
| |
| +-- iDRAC LED off -> Check dedicated iDRAC port, try shared NIC mode
| +-- iDRAC LED on but no ping -> VLAN/IP misconfiguration
|
+-- Server won't POST?
|
+-- Check front panel LEDs
+-- Check PSU LEDs (both A and B feeds)
+-- Remove recent hardware changes (new DIMMs, cards)
+-- Reseat components, try minimal config (1 CPU, 1 DIMM, no drives)
+-- If still no POST -> motherboard or CPU failure, contact Dell
Common Hardware Issues¶
Memory (ECC errors):
# Check DIMM health via Redfish
curl -sk -u root:password \
https://idrac-ip/redfish/v1/Systems/System.Embedded.1/Memory \
| jq '.Members[]."@odata.id"' | while read -r dimm; do
curl -sk -u root:password "https://idrac-ip${dimm}" \
| jq '{Name: .Name, Status: .Status.Health, SizeGB: (.CapacityMiB/1024)}'
done
Disk health (SMART + PERC):
# List physical disks with health
racadm storage get pdisks -o -p Status,State,MediaType,Size,Model
# Check SMART attributes via Redfish
curl -sk -u root:password \
https://idrac-ip/redfish/v1/Systems/System.Embedded.1/Storage/RAID.Integrated.1-1 \
| jq '.Drives[]."@odata.id"' | while read -r drive; do
curl -sk -u root:password "https://idrac-ip${drive}" \
| jq '{Name: .Name, Model: .Model, CapacityGB: (.CapacityBytes/1073741824|round), Status: .Status.Health, PredictedLifeLeft: .PredictedMediaLifeLeftPercent}'
done
RAID & Storage (PERC Controller)¶
RAID Levels Quick Reference¶
| Level | Min Disks | Capacity | Fault Tolerance | Use Case |
|---|---|---|---|---|
| RAID 0 | 2 | N * disk | None | Scratch/temp (never prod) |
| RAID 1 | 2 | 50% of total | 1 disk | Boot drives, OS mirrors |
| RAID 5 | 3 | (N-1) * disk | 1 disk | Read-heavy, moderate data |
| RAID 6 | 4 | (N-2) * disk | 2 disks | Large arrays, write-light |
| RAID 10 | 4 | 50% of total | 1 per mirror | Databases, write-heavy |
PERC Controller Management¶
Dell PERC (PowerEdge RAID Controller) uses perccli (or storcli — same binary,
different branding) for CLI management.
# Show controller summary
perccli /c0 show
# List all virtual disks
perccli /c0/vall show
# List all physical disks
perccli /c0/eall/sall show
# Show detailed disk info (including SMART)
perccli /c0/e252/s0 show all
# Create RAID 1 from two disks (slot 0 and 1 on enclosure 252)
perccli /c0 add vd type=r1 drives=252:0,1
# Create RAID 5 from four disks with write-back cache and read-ahead
perccli /c0 add vd type=r5 drives=252:0-3 wb ra
# Create RAID 10 from six disks
perccli /c0 add vd type=r10 drives=252:0-5
# Configure a global hot spare
perccli /c0/e252/s6 add hotsparedrive
# Configure a dedicated hot spare for virtual disk 0
perccli /c0/e252/s6 add hotsparedrive dgs=0
# Check rebuild progress
perccli /c0/vall show rebuild
# Start consistency check
perccli /c0/v0 start cc
# Set cache policy (write-back vs write-through)
perccli /c0/v0 set wrcache=wb
perccli /c0/v0 set rdcache=ra
# Locate a physical disk (blink LED)
perccli /c0/e252/s0 start locate
# Stop disk LED
perccli /c0/e252/s0 stop locate
# Check battery/capacitor backup unit
perccli /c0/cv show all
RAID via Redfish¶
# List storage controllers
curl -sk -u root:password \
https://idrac-ip/redfish/v1/Systems/System.Embedded.1/Storage \
| jq '.Members[]."@odata.id"'
# Get RAID controller details
curl -sk -u root:password \
https://idrac-ip/redfish/v1/Systems/System.Embedded.1/Storage/RAID.Integrated.1-1 \
| jq '{Name, Status: .Status.Health, SpeedGbps, Volumes: [.Volumes."@odata.id"]}'
# List volumes (virtual disks)
curl -sk -u root:password \
https://idrac-ip/redfish/v1/Systems/System.Embedded.1/Storage/RAID.Integrated.1-1/Volumes \
| jq '.Members[]."@odata.id"'
# Create a RAID 1 volume via Redfish
curl -sk -u root:password \
-X POST https://idrac-ip/redfish/v1/Systems/System.Embedded.1/Storage/RAID.Integrated.1-1/Volumes \
-H 'Content-Type: application/json' \
-d '{
"VolumeType": "Mirrored",
"Name": "OS-Mirror",
"Drives": [
{"@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/RAID.Integrated.1-1/Drives/Disk.Bay.0:Enclosure.Internal.0-1:RAID.Integrated.1-1"},
{"@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/RAID.Integrated.1-1/Drives/Disk.Bay.1:Enclosure.Internal.0-1:RAID.Integrated.1-1"}
]
}'
RAID Operations Runbook¶
Replacing a failed disk:
1. Identify the failed disk: perccli /c0/eall/sall show (look for "Failed" or "Offline")
2. Blink the disk LED: perccli /c0/e252/s2 start locate
3. Physically replace the disk (hot-swappable on most PowerEdge)
4. If a hot spare was configured, rebuild starts automatically
5. If no hot spare: perccli /c0/e252/s2 add hotsparedrive dgs=0 or create a new VD
6. Monitor rebuild: perccli /c0/vall show rebuild
7. Rebuilds on 1TB SAS 10K take ~2-4 hours; 8TB NL-SAS take 12-24+ hours
Online capacity expansion:
# Add a disk to existing RAID 5 (online)
perccli /c0/v0 expand size=all drives=add 252:4
# Monitor expansion
perccli /c0/v0 show expansion
Migrating RAID levels:
# Migrate RAID 5 to RAID 6 (requires additional disk)
perccli /c0/v0 migrate type=r6 drives=add 252:4
Ansible Integration (dellemc.openmanage)¶
The dellemc.openmanage Ansible collection provides modules for automating iDRAC
management at scale.
Example Playbook: Server Inventory¶
---
- name: Gather Dell server inventory
hosts: idrac_hosts
gather_facts: false
connection: local
vars:
idrac_user: "{{ vault_idrac_user }}"
idrac_password: "{{ vault_idrac_password }}"
validate_certs: false
tasks:
- name: Get system inventory
dellemc.openmanage.idrac_system_info:
idrac_ip: "{{ inventory_hostname }}"
idrac_user: "{{ idrac_user }}"
idrac_password: "{{ idrac_password }}"
validate_certs: "{{ validate_certs }}"
register: sys_info
- name: Display system summary
ansible.builtin.debug:
msg: >
{{ sys_info.system_info.System[0].Model }}
| SN: {{ sys_info.system_info.System[0].ServiceTag }}
| BIOS: {{ sys_info.system_info.BIOS[0].BIOSReleaseDate }}
| iDRAC: {{ sys_info.system_info.iDRAC[0].FirmwareVersion }}
Example Playbook: Firmware Update¶
---
- name: Update firmware from Dell repository
hosts: idrac_hosts
gather_facts: false
connection: local
vars:
idrac_user: "{{ vault_idrac_user }}"
idrac_password: "{{ vault_idrac_password }}"
tasks:
- name: Update firmware from Dell.com catalog
dellemc.openmanage.idrac_firmware:
idrac_ip: "{{ inventory_hostname }}"
idrac_user: "{{ idrac_user }}"
idrac_password: "{{ idrac_password }}"
share_name: "https://downloads.dell.com"
reboot: true
job_wait: true
catalog_file_name: "Catalog.xml"
register: firmware_result
- name: Show update results
ansible.builtin.debug:
var: firmware_result
Example Playbook: BIOS Configuration¶
---
- name: Apply golden BIOS config
hosts: idrac_hosts
gather_facts: false
connection: local
vars:
idrac_user: "{{ vault_idrac_user }}"
idrac_password: "{{ vault_idrac_password }}"
tasks:
- name: Set performance BIOS attributes
dellemc.openmanage.idrac_bios:
idrac_ip: "{{ inventory_hostname }}"
idrac_user: "{{ idrac_user }}"
idrac_password: "{{ idrac_password }}"
attributes:
SysProfile: "PerfOptimized"
LogicalProc: "Enabled"
ProcVirtualization: "Enabled"
TpmSecurity: "On"
BootMode: "Uefi"
register: bios_result
- name: Reboot to apply BIOS changes
dellemc.openmanage.idrac_reset:
idrac_ip: "{{ inventory_hostname }}"
idrac_user: "{{ idrac_user }}"
idrac_password: "{{ idrac_password }}"
when: bios_result.changed
iDRAC Inventory File¶
# inventory/idrac-hosts.yml
all:
children:
idrac_hosts:
hosts:
rack-a-server-01:
ansible_host: 10.0.10.101
rack-a-server-02:
ansible_host: 10.0.10.102
rack-a-server-03:
ansible_host: 10.0.10.103
vars:
ansible_connection: local
ansible_python_interpreter: /usr/bin/python3
See Also¶
- Dell PowerEdge Deep Dive — conceptual knowledge, interview patterns, architecture mental models, and troubleshooting frameworks. Use the deep-dive when you need to understand Dell systems; use this guide when you need to operate them.