Skip to content

Dell Server Management

Reference guide for managing Dell PowerEdge servers in a DevOps context. Covers iDRAC, BIOS configuration, OpenManage, hardware diagnostics, and RAID/storage.

See Also

  • Dell PowerEdge Deep Dive — conceptual guide and interview preparation covering RAID internals, iDRAC architecture, firmware management theory, and common failure modes

Mental Model

[Your Automation    ]  Ansible, Redfish scripts, OMECLI
|
[Management Plane   ]  iDRAC9 / Lifecycle Controller / OpenManage Enterprise
|
[Firmware Layer     ]  BIOS/UEFI, PERC, NIC firmware, iDRAC firmware
|
[Hardware           ]  CPU, DIMMs, disks, PSUs, fans, backplane
|
[Physical           ]  Rack, cabling, power, cooling

Everything above the hardware layer is firmware you can update, configure, and automate remotely. The management plane (iDRAC) is your primary interface — it's always available even when the OS is down.


iDRAC & Lifecycle Controller

What iDRAC Is

iDRAC (integrated Dell Remote Access Controller) is a dedicated BMC (Baseboard Management Controller) with its own network interface, CPU, and memory. It runs independently of the host OS. Think of it as a tiny always-on management computer embedded in every Dell server.

Key capabilities: - Remote console (virtual KVM) — full BIOS-to-OS access from a browser - Virtual media — mount ISOs remotely for OS installs - Power control — power on/off/cycle/graceful shutdown - Sensor monitoring — temps, voltages, fan speeds, power draw - Alerting — SNMP traps, email, syslog forwarding, Redfish events - Firmware updates — flash BIOS, PERC, NIC, iDRAC itself - Lifecycle Controller — embedded deployment and update tool

iDRAC Versions

Generation iDRAC Version Key Additions
12G (R620) iDRAC7 Basic HTML5 console, RACADM
13G (R630) iDRAC8 Improved HTML5, Redfish v1 (partial)
14G (R640) iDRAC9 Full Redfish, Telemetry streaming, group management
15G (R660) iDRAC9 Enhanced Redfish, zero-touch provisioning
16G (R760) iDRAC9 Telemetry enhancements, security hardening

RACADM CLI

racadm is the iDRAC command-line interface. It can run locally (on the managed server), remotely via SSH, or from a management station.

# Remote connection
racadm -r 192.168.1.100 -u root -p <password> getversion

# Get system inventory
racadm -r 192.168.1.100 -u root -p <password> get BIOS.SysInformation

# Get iDRAC network config
racadm get iDRAC.NIC

# Set static IP on iDRAC
racadm set iDRAC.IPv4.Address 10.0.10.50
racadm set iDRAC.IPv4.Netmask 255.255.255.0
racadm set iDRAC.IPv4.Gateway 10.0.10.1
racadm set iDRAC.IPv4.DHCPEnable Disabled

# Power operations
racadm serveraction powerup
racadm serveraction powerdown
racadm serveraction powercycle
racadm serveraction graceshutdown

# Get system event log
racadm getsel

# Clear system event log
racadm clrsel

# Get sensor readings
racadm getsensorinfo

# Get storage controller info
racadm storage get controllers

# Get virtual disk info
racadm storage get vdisks

# Schedule a BIOS configuration job
racadm jobqueue create BIOS.Setup.1-1

# Export server config profile (SCP) to file
racadm get -t xml -f server_config.xml

# Import server config profile
racadm set -t xml -f server_config.xml

Redfish API

Redfish is the modern REST API standard for server management (DMTF standard). iDRAC9 has comprehensive Redfish support.

# Base discovery
curl -sk -u root:password https://idrac-ip/redfish/v1/ | jq .

# System inventory
curl -sk -u root:password https://idrac-ip/redfish/v1/Systems/System.Embedded.1 | jq '{
  Model: .Model,
  SerialNumber: .SerialNumber,
  BiosVersion: .BiosVersion,
  PowerState: .PowerState,
  TotalMemoryGiB: .MemorySummary.TotalSystemMemoryGiB,
  ProcessorCount: .ProcessorSummary.Count
}'

# Get all storage controllers
curl -sk -u root:password \
  https://idrac-ip/redfish/v1/Systems/System.Embedded.1/Storage | jq '.Members[]."@odata.id"'

# Get system event log entries
curl -sk -u root:password \
  https://idrac-ip/redfish/v1/Managers/iDRAC.Embedded.1/LogServices/Sel/Entries \
  | jq '.Members[] | {Created, Message, Severity}'

# Power cycle via Redfish
curl -sk -u root:password \
  -X POST https://idrac-ip/redfish/v1/Systems/System.Embedded.1/Actions/ComputerSystem.Reset \
  -H 'Content-Type: application/json' \
  -d '{"ResetType": "ForceRestart"}'

# Set next one-time boot to PXE
curl -sk -u root:password \
  -X PATCH https://idrac-ip/redfish/v1/Systems/System.Embedded.1 \
  -H 'Content-Type: application/json' \
  -d '{"Boot": {"BootSourceOverrideTarget": "Pxe", "BootSourceOverrideEnabled": "Once"}}'

# Get iDRAC firmware version
curl -sk -u root:password \
  https://idrac-ip/redfish/v1/Managers/iDRAC.Embedded.1 | jq .FirmwareVersion

# List firmware inventory (all components)
curl -sk -u root:password \
  https://idrac-ip/redfish/v1/UpdateService/FirmwareInventory | jq '.Members[]."@odata.id"'

Alerting Setup

Configure iDRAC to send alerts for critical events:

# Enable email alerts
racadm set iDRAC.EmailAlert.1.Enable Enabled
racadm set iDRAC.EmailAlert.1.Address ops-team@example.com
racadm set iDRAC.EmailAlert.1.CustomMsg "Dell Server Alert"

# Configure SMTP
racadm set iDRAC.RemoteHosts.SMTPServerIPAddress smtp.example.com
racadm set iDRAC.RemoteHosts.SMTPPort 587

# Enable SNMP traps
racadm set iDRAC.SNMP.TrapFormat SNMPv2
racadm set iDRAC.SNMP.Alert.1.Enable Enabled
racadm set iDRAC.SNMP.Alert.1.DestAddr 10.0.10.200

# Forward to syslog
racadm set iDRAC.SysLog.SysLogEnable Enabled
racadm set iDRAC.SysLog.Server1 10.0.10.201
racadm set iDRAC.SysLog.Port1 514

Lifecycle Controller

The Lifecycle Controller (LC) is a persistent embedded tool in iDRAC that handles:

  • OS Deployment: Built-in OS install wizard (mount ISO, configure RAID, install)
  • Firmware Updates: Pull from Dell.com or local repository, apply without OS
  • Hardware Diagnostics: Run ePSA/eSA diagnostics remotely
  • Server Configuration Profiles (SCP): Export/import full server config as XML/JSON

To access: reboot server, press F10 during POST, or use iDRAC virtual console.

SCPs are powerful for fleet standardization:

# Export full SCP via Redfish
curl -sk -u root:password \
  -X POST https://idrac-ip/redfish/v1/Managers/iDRAC.Embedded.1/Actions/Oem/EID_674_Manager.ExportSystemConfiguration \
  -H 'Content-Type: application/json' \
  -d '{
    "ExportFormat": "JSON",
    "ShareParameters": {
      "Target": "ALL"
    }
  }'

# Import SCP to configure a new server identically
curl -sk -u root:password \
  -X POST https://idrac-ip/redfish/v1/Managers/iDRAC.Embedded.1/Actions/Oem/EID_674_Manager.ImportSystemConfiguration \
  -H 'Content-Type: application/json' \
  -d @server_config.json

BIOS/UEFI Configuration

Boot Modes

Mode Description When to Use
UEFI Modern boot with GPT, Secure Boot, fast POST Default for all new deployments
Legacy MBR-based BIOS boot Only for legacy OS/tools

Changing boot mode wipes the boot device list. Plan this before OS install.

Golden BIOS Config Checklist

Use this as a baseline for new server intake:

Boot Mode:
  [x] UEFI mode enabled
  [x] Secure Boot enabled (if OS supports it)
  [x] Boot order: 1) PXE NIC1  2) RAID VD0  3) UEFI Shell

Performance:
  [x] System Profile: Performance (not "Energy Efficient" for compute nodes)
  [x] Logical Processor (Hyperthreading): Enabled
  [x] Virtualization Technology (VT-x): Enabled
  [x] SR-IOV Global Enable: Enabled (if using network passthrough)
  [x] NUMA optimization: Enabled
  [x] x2APIC: Enabled

Memory:
  [x] Memory Operating Mode: Optimizer Mode
  [x] Node Interleaving: Disabled (let NUMA manage it)
  [x] Memory Refresh Rate: 1x (default, unless error-prone DIMMs)

Security:
  [x] TPM 2.0: Enabled (required for Secure Boot, BitLocker, measured boot)
  [x] TPM Security: On with Pre-boot Measurements
  [x] AC Power Recovery: Last (or On, for headless servers)
  [x] System Password: Set (prevent local BIOS changes)

I/O:
  [x] Embedded NIC1: Enabled with PXE
  [x] Embedded NIC2-4: Enabled, PXE disabled
  [x] Slot disablement: Disable unused PCIe slots

Applying BIOS Settings via Automation

# Via RACADM
racadm set BIOS.SysProfileSettings.SysProfile PerfOptimized
racadm set BIOS.ProcSettings.LogicalProc Enabled
racadm set BIOS.ProcSettings.Virtualization Enabled
racadm set BIOS.SysSecurity.TpmSecurity On

# Create a BIOS config job (applies on next reboot)
racadm jobqueue create BIOS.Setup.1-1
racadm serveraction powercycle

# Via Redfish
curl -sk -u root:password \
  -X PATCH https://idrac-ip/redfish/v1/Systems/System.Embedded.1/Bios/Settings \
  -H 'Content-Type: application/json' \
  -d '{
    "Attributes": {
      "SysProfile": "PerfOptimized",
      "LogicalProc": "Enabled",
      "ProcVirtualization": "Enabled",
      "TpmSecurity": "On"
    }
  }'

Dell OpenManage

OpenManage Enterprise (OME)

OME is the fleet management console for Dell servers. It runs as a virtual appliance (VM) and manages hundreds to thousands of servers from a single pane.

Key capabilities: - Discovery: Auto-discover servers by IP range, subnet, or Active Directory - Inventory: Hardware inventory across the fleet (CPU, memory, disks, firmware) - Compliance Baselines: Define firmware/config baselines, detect drift, remediate - Firmware Updates: Batch update firmware using Dell Update Packages (DUPs) - Alerting: Aggregate alerts from all managed servers - Templates: Golden server config templates — deploy to new hardware - Warranty: Pull warranty status from Dell.com API - Reports: Hardware lifecycle, compliance, inventory reports

OME vs Alternatives

Feature OME MAAS (Canonical) Foreman/Katello
Dell integration Native, deepest Plugin (IPMI only) Plugin (IPMI + smart_proxy)
Discovery Auto (Redfish) Manual enlist PXE-based discovery
Firmware updates Native (DUPs) Not built-in katello content views
OS provisioning Basic (LC) Excellent Excellent
Config management SCP/templates Cloud-init Puppet/Ansible
Scale 8000+ servers ~1000 machines Large scale
Cost Free with warranty Free Free
Best for Dell-only fleet Multi-vendor, Ubuntu Multi-vendor, RHEL

OMECLI (Command Line)

# Discover servers by IP range
omecli discover --range 10.0.10.1-10.0.10.50 --protocol REDFISH

# List managed devices
omecli device list --format json | jq '.[] | {Name, ServiceTag, PowerState}'

# Check firmware compliance
omecli baseline compliance --baseline-name "Production-R660" --format json

# Update firmware on a group
omecli firmware update --group "Rack-A" --catalog "Dell-Latest"

# Export inventory report
omecli report run --report-name "Full Inventory" --format csv --output inventory.csv

Hardware Diagnostics

System Event Log (SEL)

The SEL is your first stop for hardware issues. It's stored in iDRAC non-volatile memory and persists across reboots and OS reinstalls.

# View SEL via RACADM
racadm getsel
# Example output:
# 1 | 03/05/2026 | 14:23:01 | Critical | Memory | DIMM A1 correctable ECC error rate exceeded
# 2 | 03/05/2026 | 14:23:45 | Warning  | Storage | Physical Disk 0:1:2 predicted failure

# View SEL via Redfish
curl -sk -u root:password \
  https://idrac-ip/redfish/v1/Managers/iDRAC.Embedded.1/LogServices/Sel/Entries \
  | jq '.Members[] | select(.Severity != "OK") | {Created, Message, Severity}'

# View Lifecycle Log (more detail than SEL)
curl -sk -u root:password \
  https://idrac-ip/redfish/v1/Managers/iDRAC.Embedded.1/LogServices/Lclog/Entries \
  | jq '.Members[:20] | .[] | {Created, Message}'

LED Indicators

LED Color / Pattern Meaning Action
Solid blue (front panel) System identified / UID active Informational only
Blinking blue UID beacon (locate in rack) Informational only
Solid amber Non-critical warning Check SEL for details
Blinking amber Critical fault Check SEL, may need replacement
No LED, no power PSU failure or no AC Check power, PSU, PDU
Blinking amber on disk Predicted failure or rebuild Check PERC status, plan replace
Solid green on disk Disk online, healthy Normal operation
Blinking green on disk Disk activity (I/O) Normal operation

Diagnostic Decision Tree

Server is unhealthy / alert fired
|
+-- Can you reach iDRAC?
|   |
|   +-- YES: Check SEL (racadm getsel / Redfish LogServices)
|   |   |
|   |   +-- Memory errors (ECC) -> Check DIMM slot, plan replacement
|   |   +-- Disk predictive failure -> Check PERC, hot spare, plan replace
|   |   +-- PSU fault -> Check redundancy, swap PSU
|   |   +-- Temp warning -> Check fans, airflow, ambient temp
|   |   +-- CPU machine check -> Reseat CPU, escalate to Dell
|   |   +-- PERC battery low -> Replace PERC battery/capacitor
|   |
|   +-- NO: Check iDRAC network config, cable, iDRAC power
|       |
|       +-- iDRAC LED off -> Check dedicated iDRAC port, try shared NIC mode
|       +-- iDRAC LED on but no ping -> VLAN/IP misconfiguration
|
+-- Server won't POST?
    |
    +-- Check front panel LEDs
    +-- Check PSU LEDs (both A and B feeds)
    +-- Remove recent hardware changes (new DIMMs, cards)
    +-- Reseat components, try minimal config (1 CPU, 1 DIMM, no drives)
    +-- If still no POST -> motherboard or CPU failure, contact Dell

Common Hardware Issues

Memory (ECC errors):

# Check DIMM health via Redfish
curl -sk -u root:password \
  https://idrac-ip/redfish/v1/Systems/System.Embedded.1/Memory \
  | jq '.Members[]."@odata.id"' | while read -r dimm; do
    curl -sk -u root:password "https://idrac-ip${dimm}" \
      | jq '{Name: .Name, Status: .Status.Health, SizeGB: (.CapacityMiB/1024)}'
  done

Disk health (SMART + PERC):

# List physical disks with health
racadm storage get pdisks -o -p Status,State,MediaType,Size,Model

# Check SMART attributes via Redfish
curl -sk -u root:password \
  https://idrac-ip/redfish/v1/Systems/System.Embedded.1/Storage/RAID.Integrated.1-1 \
  | jq '.Drives[]."@odata.id"' | while read -r drive; do
    curl -sk -u root:password "https://idrac-ip${drive}" \
      | jq '{Name: .Name, Model: .Model, CapacityGB: (.CapacityBytes/1073741824|round), Status: .Status.Health, PredictedLifeLeft: .PredictedMediaLifeLeftPercent}'
  done


RAID & Storage (PERC Controller)

RAID Levels Quick Reference

Level Min Disks Capacity Fault Tolerance Use Case
RAID 0 2 N * disk None Scratch/temp (never prod)
RAID 1 2 50% of total 1 disk Boot drives, OS mirrors
RAID 5 3 (N-1) * disk 1 disk Read-heavy, moderate data
RAID 6 4 (N-2) * disk 2 disks Large arrays, write-light
RAID 10 4 50% of total 1 per mirror Databases, write-heavy

PERC Controller Management

Dell PERC (PowerEdge RAID Controller) uses perccli (or storcli — same binary, different branding) for CLI management.

# Show controller summary
perccli /c0 show

# List all virtual disks
perccli /c0/vall show

# List all physical disks
perccli /c0/eall/sall show

# Show detailed disk info (including SMART)
perccli /c0/e252/s0 show all

# Create RAID 1 from two disks (slot 0 and 1 on enclosure 252)
perccli /c0 add vd type=r1 drives=252:0,1

# Create RAID 5 from four disks with write-back cache and read-ahead
perccli /c0 add vd type=r5 drives=252:0-3 wb ra

# Create RAID 10 from six disks
perccli /c0 add vd type=r10 drives=252:0-5

# Configure a global hot spare
perccli /c0/e252/s6 add hotsparedrive

# Configure a dedicated hot spare for virtual disk 0
perccli /c0/e252/s6 add hotsparedrive dgs=0

# Check rebuild progress
perccli /c0/vall show rebuild

# Start consistency check
perccli /c0/v0 start cc

# Set cache policy (write-back vs write-through)
perccli /c0/v0 set wrcache=wb
perccli /c0/v0 set rdcache=ra

# Locate a physical disk (blink LED)
perccli /c0/e252/s0 start locate

# Stop disk LED
perccli /c0/e252/s0 stop locate

# Check battery/capacitor backup unit
perccli /c0/cv show all

RAID via Redfish

# List storage controllers
curl -sk -u root:password \
  https://idrac-ip/redfish/v1/Systems/System.Embedded.1/Storage \
  | jq '.Members[]."@odata.id"'

# Get RAID controller details
curl -sk -u root:password \
  https://idrac-ip/redfish/v1/Systems/System.Embedded.1/Storage/RAID.Integrated.1-1 \
  | jq '{Name, Status: .Status.Health, SpeedGbps, Volumes: [.Volumes."@odata.id"]}'

# List volumes (virtual disks)
curl -sk -u root:password \
  https://idrac-ip/redfish/v1/Systems/System.Embedded.1/Storage/RAID.Integrated.1-1/Volumes \
  | jq '.Members[]."@odata.id"'

# Create a RAID 1 volume via Redfish
curl -sk -u root:password \
  -X POST https://idrac-ip/redfish/v1/Systems/System.Embedded.1/Storage/RAID.Integrated.1-1/Volumes \
  -H 'Content-Type: application/json' \
  -d '{
    "VolumeType": "Mirrored",
    "Name": "OS-Mirror",
    "Drives": [
      {"@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/RAID.Integrated.1-1/Drives/Disk.Bay.0:Enclosure.Internal.0-1:RAID.Integrated.1-1"},
      {"@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/RAID.Integrated.1-1/Drives/Disk.Bay.1:Enclosure.Internal.0-1:RAID.Integrated.1-1"}
    ]
  }'

RAID Operations Runbook

Replacing a failed disk: 1. Identify the failed disk: perccli /c0/eall/sall show (look for "Failed" or "Offline") 2. Blink the disk LED: perccli /c0/e252/s2 start locate 3. Physically replace the disk (hot-swappable on most PowerEdge) 4. If a hot spare was configured, rebuild starts automatically 5. If no hot spare: perccli /c0/e252/s2 add hotsparedrive dgs=0 or create a new VD 6. Monitor rebuild: perccli /c0/vall show rebuild 7. Rebuilds on 1TB SAS 10K take ~2-4 hours; 8TB NL-SAS take 12-24+ hours

Online capacity expansion:

# Add a disk to existing RAID 5 (online)
perccli /c0/v0 expand size=all drives=add 252:4

# Monitor expansion
perccli /c0/v0 show expansion

Migrating RAID levels:

# Migrate RAID 5 to RAID 6 (requires additional disk)
perccli /c0/v0 migrate type=r6 drives=add 252:4


Ansible Integration (dellemc.openmanage)

The dellemc.openmanage Ansible collection provides modules for automating iDRAC management at scale.

# Install the collection
ansible-galaxy collection install dellemc.openmanage

Example Playbook: Server Inventory

---
- name: Gather Dell server inventory
  hosts: idrac_hosts
  gather_facts: false
  connection: local

  vars:
    idrac_user: "{{ vault_idrac_user }}"
    idrac_password: "{{ vault_idrac_password }}"
    validate_certs: false

  tasks:
    - name: Get system inventory
      dellemc.openmanage.idrac_system_info:
        idrac_ip: "{{ inventory_hostname }}"
        idrac_user: "{{ idrac_user }}"
        idrac_password: "{{ idrac_password }}"
        validate_certs: "{{ validate_certs }}"
      register: sys_info

    - name: Display system summary
      ansible.builtin.debug:
        msg: >
          {{ sys_info.system_info.System[0].Model }}
          | SN: {{ sys_info.system_info.System[0].ServiceTag }}
          | BIOS: {{ sys_info.system_info.BIOS[0].BIOSReleaseDate }}
          | iDRAC: {{ sys_info.system_info.iDRAC[0].FirmwareVersion }}

Example Playbook: Firmware Update

---
- name: Update firmware from Dell repository
  hosts: idrac_hosts
  gather_facts: false
  connection: local

  vars:
    idrac_user: "{{ vault_idrac_user }}"
    idrac_password: "{{ vault_idrac_password }}"

  tasks:
    - name: Update firmware from Dell.com catalog
      dellemc.openmanage.idrac_firmware:
        idrac_ip: "{{ inventory_hostname }}"
        idrac_user: "{{ idrac_user }}"
        idrac_password: "{{ idrac_password }}"
        share_name: "https://downloads.dell.com"
        reboot: true
        job_wait: true
        catalog_file_name: "Catalog.xml"
      register: firmware_result

    - name: Show update results
      ansible.builtin.debug:
        var: firmware_result

Example Playbook: BIOS Configuration

---
- name: Apply golden BIOS config
  hosts: idrac_hosts
  gather_facts: false
  connection: local

  vars:
    idrac_user: "{{ vault_idrac_user }}"
    idrac_password: "{{ vault_idrac_password }}"

  tasks:
    - name: Set performance BIOS attributes
      dellemc.openmanage.idrac_bios:
        idrac_ip: "{{ inventory_hostname }}"
        idrac_user: "{{ idrac_user }}"
        idrac_password: "{{ idrac_password }}"
        attributes:
          SysProfile: "PerfOptimized"
          LogicalProc: "Enabled"
          ProcVirtualization: "Enabled"
          TpmSecurity: "On"
          BootMode: "Uefi"
      register: bios_result

    - name: Reboot to apply BIOS changes
      dellemc.openmanage.idrac_reset:
        idrac_ip: "{{ inventory_hostname }}"
        idrac_user: "{{ idrac_user }}"
        idrac_password: "{{ idrac_password }}"
      when: bios_result.changed

iDRAC Inventory File

# inventory/idrac-hosts.yml
all:
  children:
    idrac_hosts:
      hosts:
        rack-a-server-01:
          ansible_host: 10.0.10.101
        rack-a-server-02:
          ansible_host: 10.0.10.102
        rack-a-server-03:
          ansible_host: 10.0.10.103
      vars:
        ansible_connection: local
        ansible_python_interpreter: /usr/bin/python3

See Also

  • Dell PowerEdge Deep Dive — conceptual knowledge, interview patterns, architecture mental models, and troubleshooting frameworks. Use the deep-dive when you need to understand Dell systems; use this guide when you need to operate them.