Skip to content

Solution: iDRAC Unreachable, OS Up

Triage

  1. Since the host OS is accessible via SSH, use ipmitool to query the BMC locally:

    sudo ipmitool mc info          # Is the BMC responsive at all?
    sudo ipmitool lan print 1      # What's the current network config?
    

  2. Check if the BMC IP, netmask, and gateway are correct:

    sudo ipmitool lan print 1 | grep -E "IP Address|Subnet|Gateway|VLAN"
    

  3. From another host on the management VLAN, check for IP conflicts:

    arping -c 3 10.50.1.122
    

  4. Check switch port status for the dedicated management NIC port (requires switch access or remote hands).

Root Cause

Most commonly, the iDRAC/BMC firmware has hung or crashed. The BMC is an independent embedded system with its own OS; it can lock up while the host OS continues running normally. This often happens after extended uptime (months), memory leaks in the BMC firmware, or after failed automatic firmware checks.

Less common causes: VLAN misconfiguration changed on the switch side, IP conflict with a newly provisioned device, or a cable that has come partially unseated.

Fix

  1. BMC cold reset from the host OS (safest first step -- does NOT affect the running OS):

    sudo ipmitool mc reset cold
    
    Wait 2-3 minutes for the BMC to reinitialize. Then test connectivity:
    ping -c 3 10.50.1.122
    

  2. If the BMC was unresponsive to ipmitool mc info locally, the IPMI driver may need reloading:

    sudo modprobe -r ipmi_si && sudo modprobe ipmi_si
    sudo ipmitool mc info
    

  3. If BMC cold reset does not restore connectivity, reconfigure the network:

    sudo ipmitool lan set 1 ipsrc static
    sudo ipmitool lan set 1 ipaddr 10.50.1.122
    sudo ipmitool lan set 1 netmask 255.255.255.0
    sudo ipmitool lan set 1 defgw ipaddr 10.50.1.1
    sudo ipmitool mc reset cold
    

  4. If the dedicated management NIC is physically down (confirmed via switch), request remote hands to reseat the cable.

  5. After recovery, verify full iDRAC functionality:

  6. Web UI accessible
  7. Virtual console works
  8. SNMP/Redfish alerts configured

Rollback / Safety

  • ipmitool mc reset cold resets only the BMC, not the host. The running OS and all services are unaffected.
  • Do NOT use ipmitool mc reset warm -- it is less reliable for clearing hung states.
  • If the BMC is completely unresponsive to local IPMI commands, a full server power cycle (AC power pull) is the last resort, but this requires coordination and downtime.
  • Always verify the BMC is reachable after any maintenance that touches management networking.

Common Traps

  • Trap: Rebooting the entire server to fix a BMC issue. The BMC can be reset independently.
  • Trap: Assuming a network issue when the BMC firmware has simply hung. Always check local IPMI first.
  • Trap: Not checking for IP conflicts. A new server provisioned with the same management IP will cause intermittent connectivity for both.
  • Trap: Forgetting that some Dell servers have a dedicated iDRAC NIC and a shared LOM option. If the config was changed to "shared" mode, the iDRAC traffic now goes through the OS NIC and may not be on the management VLAN.
  • Trap: Not setting up BMC reachability monitoring. If you only check when you need it, you discover the outage at the worst time.