Skip to content

Incident Replay: Serial Console Output Garbled

Setup

  • System context: Headless server in a remote datacenter. Primary access is via serial-over-LAN (SOL) through iDRAC. Server is not responding to SSH and the iDRAC virtual console shows garbled text.
  • Time: Saturday 01:30 UTC
  • Your role: On-call SRE (remote, no datacenter access)

Round 1: Alert Fires

[Pressure cue: "Server app-remote-03 is not responding to health checks. SSH times out. Only access is iDRAC serial console — but the output is garbage characters."]

What you see: Connecting via ipmitool sol activate shows streams of garbled characters — misaligned text, wrong symbols, line breaks in wrong places. You cannot read any output or type commands.

Choose your action: - A) Reboot the server via iDRAC power control - B) Check the serial console baud rate settings - C) Try the iDRAC virtual console (graphical KVM) instead - D) Reset the iDRAC BMC

[Result: ipmitool sol info 1 shows SOL baud rate is 19200. But the server's GRUB and kernel are configured for 115200 baud. The mismatch is causing garbled output. Proceed to Round 2.]

If you chose A:

[Result: Server reboots but the serial console is still garbled — the baud rate mismatch persists. You cannot even see POST output correctly.]

If you chose C:

[Result: iDRAC graphical console shows the server is at a kernel panic screen. Useful — you can now see the actual problem. But you still need serial console for future headless access. Partial win.]

If you chose D:

[Result: iDRAC reset takes 3 minutes. Serial console is still garbled after reset because the baud rate setting is persistent.]

Round 2: First Triage Data

[Pressure cue: "You can see via KVM that the server had a kernel panic. But you need serial console working for ongoing management."]

What you see: The SOL baud rate was changed during a recent iDRAC firmware update that reset it to 19200 (default). The OS is configured for 115200 in GRUB (console=ttyS0,115200) and in /etc/default/grub.

Choose your action: - A) Change the SOL baud rate to 115200 via ipmitool - B) Change the OS serial console config to 19200 to match the BMC - C) Set both sides to 9600 for maximum compatibility - D) Disable serial console and rely on iDRAC KVM only

[Result: ipmitool sol set volatile-bit-rate 115200 1 and ipmitool sol set non-volatile-bit-rate 115200 1. Serial console output is now readable. You can see the kernel panic message. Proceed to Round 3.]

If you chose B:

[Result: Requires booting the OS to change the config, but the OS has panicked. Chicken-and-egg problem.]

If you chose C:

[Result: 9600 is slow and unnecessary. 115200 is the standard for modern servers.]

If you chose D:

[Result: KVM works but serial console is needed for boot-level debugging and automation. Do not give up a management tool.]

Round 3: Root Cause Identification

[Pressure cue: "Serial console is fixed. Now deal with the kernel panic."]

What you see: The kernel panic message reads: "Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)". The root filesystem is not found. This happened after a kernel update that did not rebuild the initramfs for the RAID driver.

Choose your action: - A) Boot from the previous kernel version via GRUB - B) Boot from a rescue image and rebuild initramfs - C) Reinstall the OS from PXE - D) Check if the RAID array is degraded via iDRAC hardware logs

[Result: Via serial console, interrupt GRUB, select the previous kernel. Server boots successfully. Then rebuild initramfs for the new kernel: dracut --force /boot/initramfs-$(uname -r).img. Proceed to Round 4.]

If you chose B:

[Result: Works but requires PXE or USB rescue boot. More complex when you only have remote access.]

If you chose C:

[Result: Overkill — the server is fine, just needs the right kernel or initramfs fix.]

If you chose D:

[Result: RAID is healthy — the issue is the initramfs missing the RAID driver for the new kernel. Hardware is fine.]

Round 4: Remediation

[Pressure cue: "Server is back. Prevent recurrence."]

Actions: 1. Verify server is running and services are healthy 2. Rebuild initramfs for the new kernel and reboot to verify 3. Add SOL baud rate to the post-firmware-update verification checklist 4. Add initramfs rebuild verification to the kernel update automation 5. Document the serial console configuration standard (115200 baud) for all servers

Damage Report

  • Total downtime: 45 minutes (kernel panic + diagnosis + recovery)
  • Blast radius: Single application server; traffic rerouted to other instances
  • Optimal resolution time: 15 minutes (fix baud rate -> read panic -> boot old kernel)
  • If every wrong choice was made: 90+ minutes with blind troubleshooting and unnecessary reinstalls

Cross-References