Solution: BIOS Settings Reset After CMOS Battery Replacement¶
Triage¶
- Assess the situation: The server cannot boot, and 12 VMs are down. This is a high-priority recovery.
- Determine the correct BIOS settings:
- Check the CMDB for the documented settings baseline for this server role
- If no CMDB entry, find an identical server (same model, same role) and export its BIOS config:
racadm get BIOSon the reference server- Or export a Server Configuration Profile (SCP):
racadm get -t xml -f bios_reference.xml
- Enter BIOS setup (F2 at POST) and compare current (default) settings with the required settings.
Root Cause¶
Replacing the CMOS battery on a Dell PowerEdge R640 clears all NVRAM-stored BIOS settings, restoring them to factory defaults. This is expected behavior -- the CMOS battery maintains the BIOS settings when the server is unpowered. When the old battery was removed, the settings were lost, and the new battery has no data.
Key settings that reverted to defaults and their impact: - Boot Mode: UEFI -> Legacy (OS cannot boot; EFI System Partition not recognized in Legacy mode) - VT-x / VT-d: Enabled -> Disabled (KVM cannot start VMs without hardware virtualization) - NUMA Node Interleaving: Disabled -> Enabled (memory performance degradation for NUMA-aware workloads) - System Profile: Performance -> Balanced (reduces CPU frequency and throughput) - SR-IOV: Enabled -> Disabled (VMs using SR-IOV NICs will fail to start)
Fix¶
-
Enter BIOS Setup (F2 at POST or via iDRAC virtual console):
-
Restore critical settings (in priority order):
a. Boot Mode (System BIOS > Boot Settings): - Boot Mode: UEFI - Secure Boot: Disabled (unless specifically required) - Boot Sequence: UEFI - should auto-detect the EFI boot entries
b. Processor Settings (System BIOS > Processor Settings): - Virtualization Technology (VT-x): Enabled - VT for Direct I/O (VT-d): Enabled - Number of Cores per Processor: All - Logical Processor (HyperThreading): Enabled
c. Memory Settings (System BIOS > Memory Settings): - Node Interleaving: Disabled - Memory Operating Mode: Optimizer Mode
d. System Profile (System BIOS > System Profile Settings): - System Profile: Performance - CPU Power Management: Maximum Performance - C-States: Disabled - Turbo Boost: Enabled
e. Integrated Devices: - SR-IOV Global Enable: Enabled - OS Watchdog Timer: Disabled
-
Save and reboot: The server should now boot from the UEFI partition.
-
Post-boot verification:
- Confirm OS boots:
uname -r - Verify VT-x:
grep -c vmx /proc/cpuinfo(should be > 0) - Verify NUMA:
numactl --hardware(should show separate nodes, not interleaved) - Start VMs:
virsh list --allthenvirsh start <vm>for each -
Verify SR-IOV:
lspci | grep "Virtual Function" -
Export BIOS profile for future use:
racadm get -t xml -f /tmp/srv-hyp-03_bios_profile.xml- Store in version control or configuration management system
- To restore in the future:
racadm set -t xml -f srv-hyp-03_bios_profile.xml
Rollback / Safety¶
- BIOS changes require a reboot to take effect; no way to avoid the reboot.
- If the wrong boot mode is selected and the server does not boot, re-enter BIOS and switch back.
- UEFI boot entries should survive the reset to Legacy and back; they are stored on the EFI System Partition, not in NVRAM. If boot entries are missing, recreate them with
efibootmgr. - VMs will not start until VT-x is re-enabled; do not attempt to start them before verifying BIOS settings.
Common Traps¶
- Forgetting to export BIOS settings before hardware maintenance: This entire incident is preventable with a pre-maintenance SCP export.
- Only fixing the boot mode: Getting the server to boot is step one, but VMs will fail without VT-x/VT-d, and performance will degrade without proper NUMA and power profile settings.
- Assuming UEFI boot entries are lost: The EFI boot entries are on the EFI System Partition (disk), not in CMOS. Switching back to UEFI mode should find them automatically.
- Enabling Node Interleaving: This is a common mistake. Node Interleaving sounds like it helps NUMA, but it actually disables NUMA topology and interleaves all memory, destroying locality.
- Not checking SR-IOV: If VMs use SR-IOV virtual functions, they will fail to start with cryptic PCI errors if SR-IOV is disabled in BIOS.
- Skipping the profile export after recovery: If you don't export the profile now, the next CMOS event will cause the same problem.