Firmware & BIOS - Street-Level Ops¶
Real-world firmware management and boot troubleshooting for datacenter servers.
Check boot mode: UEFI or legacy BIOS?¶
[ -d /sys/firmware/efi ] && echo "UEFI" || echo "BIOS"
# UEFI
# See UEFI boot entries
efibootmgr -v
# BootCurrent: 0001
# BootOrder: 0001,0002,0003
# Boot0001* ubuntu HD(1,GPT,...)/File(\EFI\ubuntu\shimx64.efi)
# Boot0002* UEFI: PXE IPv4 Intel(R) Ethernet
# Boot0003* UEFI: Built-in EFI Shell
Read BMC System Event Log after a crash¶
# Pull the last 20 events
ipmitool sel elist | tail -20
# 1 | 03/14/2024 | 02:15:33 | Memory #0x01 | Correctable ECC
# 2 | 03/14/2024 | 02:15:33 | Memory #0x01 | Correctable ECC
# 3 | 03/14/2024 | 02:17:01 | System Event | OEM System Boot Event
# Filter for boot and memory issues
ipmitool sel elist | grep -iE "memory|boot|cpu|critical|post"
# 1 | 03/14/2024 | 02:15:33 | Memory #0x01 | Correctable ECC
# Clear the SEL after documenting findings
ipmitool sel clear
Gotcha: The SEL has a fixed-size circular buffer (typically 512-2048 entries depending on vendor). When it fills up, new events silently overwrite old ones. Export the SEL to a syslog server or monitoring system before it wraps. Run
ipmitool sel infoto check how many entries remain.
Remote power control via IPMI¶
# Check power state
ipmitool -I lanplus -H 10.0.10.5 -U admin -P secret power status
# Chassis Power is on
# Hard power cycle a hung server
ipmitool -I lanplus -H 10.0.10.5 -U admin -P secret power cycle
# Difference: "power cycle" = off then on; "power reset" = immediate reboot
# Use "power off" then "power on" if cycle does not work
# Force PXE boot on next reboot (one-time)
ipmitool -I lanplus -H 10.0.10.5 -U admin -P secret chassis bootdev pxe
ipmitool -I lanplus -H 10.0.10.5 -U admin -P secret power reset
Check sensor readings for hardware anomalies¶
# Temperature sensors
ipmitool sensor list | grep -i temp
# Inlet Temp | 24.000 | degrees C | ok | 3.000 | 42.000
# Exhaust Temp | 37.000 | degrees C | ok | 3.000 | 70.000
# CPU1 Temp | 58.000 | degrees C | ok | 3.000 | 98.000
# Fan speeds (low RPM = failing fan)
ipmitool sensor list | grep -i fan
# Fan1 | 7200.000 | RPM | ok | 600 | na
# Fan2 | 7200.000 | RPM | ok | 600 | na
# BMC firmware version
ipmitool mc info | grep -i "firmware revision"
# Firmware Revision : 2.82
Firmware updates with fwupd¶
# List devices that fwupd can update
fwupdmgr get-devices
# Dell Inc. BIOS
# Device ID: ...
# Current version: 2.18.1
# Vendor: Dell Inc.
# Update State: Success
# Check for available updates
fwupdmgr get-updates
# Dell Inc. BIOS has firmware updates:
# Version: 2.19.0
# Description: Security fixes for speculative execution
# Apply all available updates (requires reboot)
fwupdmgr update
Under the hood:
fwupduses the UEFI Capsule Update mechanism — it stages the firmware payload in the EFI System Partition and sets UEFI variables that tell the firmware to apply the update on the next boot. If the update fails mid-flight, the BMC usually falls back to the previous firmware image (dual-bank flash).
Dell server fleet updates with DSU¶
# Inventory current firmware
dsu --inventory
# Component Installed Available
# BIOS 2.18.1 2.19.0
# iDRAC 6.10.00.00 6.10.30.00
# NIC (Broadcom) 22.31.5.5 22.41.6.1
# Apply all upgrades (requires maintenance window)
dsu --apply-upgrades --non-interactive
Boot log investigation¶
# Current boot log
journalctl -b 0 | head -50
# Previous boot (if the server rebooted unexpectedly)
journalctl -b -1 | tail -100
# List all recorded boots with timestamps
journalctl --list-boots
# -2 abc123... Thu 2024-03-12 03:00:01 — Thu 2024-03-12 14:22:33
# -1 def456... Thu 2024-03-12 14:23:01 — Fri 2024-03-14 02:15:00
# 0 789abc... Fri 2024-03-14 02:17:01 — Fri 2024-03-14 10:30:00
# Current kernel command line
cat /proc/cmdline
# BOOT_IMAGE=/vmlinuz-6.5.0-25 root=UUID=... ro quiet splash
Secure Boot verification¶
# Check if Secure Boot is enforced
mokutil --sb-state
# SecureBoot enabled
# List enrolled keys
mokutil --list-enrolled | grep -i "issuer\|subject" | head -10
# Enroll a key for a custom kernel module (interactive at next boot)
mokutil --import /root/mok-signing-key.der
Change UEFI boot order from Linux¶
# Current boot order
efibootmgr
# BootOrder: 0003,0001,0002
# Set disk boot first, PXE second
efibootmgr -o 0001,0002,0003
# Delete a stale boot entry
efibootmgr -b 0004 -B
# Add a new entry pointing to a custom EFI binary
efibootmgr -c -d /dev/sda -p 1 -L "Custom Linux" -l '\EFI\custom\grubx64.efi'
GRUB recovery after firmware update¶
# Firmware update changed boot order, GRUB is missing
# Boot from rescue media, mount the root filesystem, then:
mount /dev/sda2 /mnt
mount /dev/sda1 /mnt/boot/efi
mount --bind /dev /mnt/dev
mount --bind /proc /mnt/proc
mount --bind /sys /mnt/sys
chroot /mnt
grub-install --target=x86_64-efi --efi-directory=/boot/efi
grub-mkconfig -o /boot/grub/grub.cfg
exit
reboot
Serial-over-LAN for headless debugging¶
# Connect to serial console via IPMI SOL
ipmitool -I lanplus -H 10.0.10.5 -U admin -P secret sol activate
# You see POST output, GRUB menu, and kernel boot messages
# Press ~. to disconnect
# Enable console on the Linux side (add to kernel cmdline)
# console=tty0 console=ttyS1,115200n8
Debug clue: If SOL connects but shows garbled output, the baud rate is mismatched. The BMC, BIOS/UEFI, GRUB, and Linux kernel all must agree on the same baud rate (usually 115200). Check each layer independently: BMC serial config, GRUB
serialcommand, and kernelconsole=parameter.Remember: SOL disconnect sequence mnemonic: tilde-dot (
~.) — same as SSH escape. If SOL hangs, press Enter first, then~.to break out cleanly.