Skip to content

Firmware & BIOS - Street-Level Ops

Real-world firmware management and boot troubleshooting for datacenter servers.

Check boot mode: UEFI or legacy BIOS?

[ -d /sys/firmware/efi ] && echo "UEFI" || echo "BIOS"
# UEFI

# See UEFI boot entries
efibootmgr -v
# BootCurrent: 0001
# BootOrder: 0001,0002,0003
# Boot0001* ubuntu        HD(1,GPT,...)/File(\EFI\ubuntu\shimx64.efi)
# Boot0002* UEFI: PXE IPv4 Intel(R) Ethernet
# Boot0003* UEFI: Built-in EFI Shell

Read BMC System Event Log after a crash

# Pull the last 20 events
ipmitool sel elist | tail -20
#    1 | 03/14/2024 | 02:15:33 | Memory #0x01 | Correctable ECC
#    2 | 03/14/2024 | 02:15:33 | Memory #0x01 | Correctable ECC
#    3 | 03/14/2024 | 02:17:01 | System Event  | OEM System Boot Event

# Filter for boot and memory issues
ipmitool sel elist | grep -iE "memory|boot|cpu|critical|post"
#    1 | 03/14/2024 | 02:15:33 | Memory #0x01 | Correctable ECC

# Clear the SEL after documenting findings
ipmitool sel clear

Gotcha: The SEL has a fixed-size circular buffer (typically 512-2048 entries depending on vendor). When it fills up, new events silently overwrite old ones. Export the SEL to a syslog server or monitoring system before it wraps. Run ipmitool sel info to check how many entries remain.

Remote power control via IPMI

# Check power state
ipmitool -I lanplus -H 10.0.10.5 -U admin -P secret power status
# Chassis Power is on

# Hard power cycle a hung server
ipmitool -I lanplus -H 10.0.10.5 -U admin -P secret power cycle

# Difference: "power cycle" = off then on; "power reset" = immediate reboot
# Use "power off" then "power on" if cycle does not work

# Force PXE boot on next reboot (one-time)
ipmitool -I lanplus -H 10.0.10.5 -U admin -P secret chassis bootdev pxe
ipmitool -I lanplus -H 10.0.10.5 -U admin -P secret power reset

Check sensor readings for hardware anomalies

# Temperature sensors
ipmitool sensor list | grep -i temp
# Inlet Temp       | 24.000     | degrees C  | ok    | 3.000  | 42.000
# Exhaust Temp     | 37.000     | degrees C  | ok    | 3.000  | 70.000
# CPU1 Temp        | 58.000     | degrees C  | ok    | 3.000  | 98.000

# Fan speeds (low RPM = failing fan)
ipmitool sensor list | grep -i fan
# Fan1             | 7200.000   | RPM        | ok    | 600    | na
# Fan2             | 7200.000   | RPM        | ok    | 600    | na

# BMC firmware version
ipmitool mc info | grep -i "firmware revision"
# Firmware Revision         : 2.82

Firmware updates with fwupd

# List devices that fwupd can update
fwupdmgr get-devices
# Dell Inc. BIOS
#   Device ID:     ...
#   Current version: 2.18.1
#   Vendor:         Dell Inc.
#   Update State:   Success

# Check for available updates
fwupdmgr get-updates
# Dell Inc. BIOS has firmware updates:
#   Version: 2.19.0
#   Description: Security fixes for speculative execution

# Apply all available updates (requires reboot)
fwupdmgr update

Under the hood: fwupd uses the UEFI Capsule Update mechanism — it stages the firmware payload in the EFI System Partition and sets UEFI variables that tell the firmware to apply the update on the next boot. If the update fails mid-flight, the BMC usually falls back to the previous firmware image (dual-bank flash).

Dell server fleet updates with DSU

# Inventory current firmware
dsu --inventory
# Component          Installed    Available
# BIOS               2.18.1       2.19.0
# iDRAC              6.10.00.00   6.10.30.00
# NIC (Broadcom)     22.31.5.5    22.41.6.1

# Apply all upgrades (requires maintenance window)
dsu --apply-upgrades --non-interactive

Boot log investigation

# Current boot log
journalctl -b 0 | head -50

# Previous boot (if the server rebooted unexpectedly)
journalctl -b -1 | tail -100

# List all recorded boots with timestamps
journalctl --list-boots
#  -2 abc123... Thu 2024-03-12 03:00:01 — Thu 2024-03-12 14:22:33
#  -1 def456... Thu 2024-03-12 14:23:01 — Fri 2024-03-14 02:15:00
#   0 789abc... Fri 2024-03-14 02:17:01 — Fri 2024-03-14 10:30:00

# Current kernel command line
cat /proc/cmdline
# BOOT_IMAGE=/vmlinuz-6.5.0-25 root=UUID=... ro quiet splash

Secure Boot verification

# Check if Secure Boot is enforced
mokutil --sb-state
# SecureBoot enabled

# List enrolled keys
mokutil --list-enrolled | grep -i "issuer\|subject" | head -10

# Enroll a key for a custom kernel module (interactive at next boot)
mokutil --import /root/mok-signing-key.der

Change UEFI boot order from Linux

# Current boot order
efibootmgr
# BootOrder: 0003,0001,0002

# Set disk boot first, PXE second
efibootmgr -o 0001,0002,0003

# Delete a stale boot entry
efibootmgr -b 0004 -B

# Add a new entry pointing to a custom EFI binary
efibootmgr -c -d /dev/sda -p 1 -L "Custom Linux" -l '\EFI\custom\grubx64.efi'

GRUB recovery after firmware update

# Firmware update changed boot order, GRUB is missing
# Boot from rescue media, mount the root filesystem, then:
mount /dev/sda2 /mnt
mount /dev/sda1 /mnt/boot/efi
mount --bind /dev /mnt/dev
mount --bind /proc /mnt/proc
mount --bind /sys /mnt/sys

chroot /mnt
grub-install --target=x86_64-efi --efi-directory=/boot/efi
grub-mkconfig -o /boot/grub/grub.cfg
exit
reboot

Serial-over-LAN for headless debugging

# Connect to serial console via IPMI SOL
ipmitool -I lanplus -H 10.0.10.5 -U admin -P secret sol activate

# You see POST output, GRUB menu, and kernel boot messages
# Press ~. to disconnect

# Enable console on the Linux side (add to kernel cmdline)
# console=tty0 console=ttyS1,115200n8

Debug clue: If SOL connects but shows garbled output, the baud rate is mismatched. The BMC, BIOS/UEFI, GRUB, and Linux kernel all must agree on the same baud rate (usually 115200). Check each layer independently: BMC serial config, GRUB serial command, and kernel console= parameter.

Remember: SOL disconnect sequence mnemonic: tilde-dot (~.) — same as SSH escape. If SOL hangs, press Enter first, then ~. to break out cleanly.