Skip to content

Linux Boot Process — Street Ops

Real-world operational scenarios for boot problems. These are the situations you'll face when a server won't come back up after a reboot, a kernel update goes sideways, or someone fat-fingers an fstab entry.


Recovering from Failed Boot: Rescue and Single-User Mode

Scenario: Server won't boot after a change, you need to fix it

Method 1: GRUB menu rescue (most common)

  1. At the GRUB menu, highlight the default entry and press e
  2. Find the linux line
  3. For rescue mode, change the target:
  4. Append systemd.unit=rescue.target to the linux line
  5. Or replace quiet splash with single
  6. Press Ctrl+X to boot
# Rescue mode gives you a root shell with filesystems mounted
# You'll be prompted for root password (if set)

# If root password is unknown, use emergency mode instead:
# Append: systemd.unit=emergency.target
# This gives root shell without password on some distros
# Filesystems may be read-only — remount:
$ mount -o remount,rw /

Method 2: init=/bin/bash (bypass init entirely)

Append init=/bin/bash to the kernel command line in GRUB. This drops you to a bash shell as PID 1, before any services start:

# Root filesystem is read-only. Remount:
$ mount -o remount,rw /

# Make your fix (edit fstab, fix config, etc.)
$ vim /etc/fstab

# Sync and reboot
$ sync
$ reboot -f    # Force reboot (normal reboot won't work without init)

Method 3: rd.break (break into initramfs)

Append rd.break to the kernel line. This pauses after initramfs loads but before it mounts the real root:

# Real root is mounted at /sysroot (read-only)
switch_root:/# mount -o remount,rw /sysroot
switch_root:/# chroot /sysroot
sh-5.1# passwd root    # Reset root password
sh-5.1# touch /.autorelabel   # If SELinux is enabled (RHEL)
sh-5.1# exit
switch_root:/# reboot

This method is essential for RHEL/CentOS when you need to reset the root password with SELinux enabled.


GRUB Repair

Scenario: GRUB is broken or missing — system drops to grub> or grub rescue>

If you get grub> prompt (full GRUB shell):

# List available partitions
grub> ls
(hd0) (hd0,gpt1) (hd0,gpt2) (hd0,gpt3)

# Find which partition has /boot
grub> ls (hd0,gpt2)/
boot/ etc/ home/ ...

# Set root and boot manually
grub> set root=(hd0,gpt2)
grub> linux /boot/vmlinuz-5.15.0-91-generic root=/dev/sda2
grub> initrd /boot/initrd.img-5.15.0-91-generic
grub> boot

If you get grub rescue> prompt (minimal shell, modules not loaded):

# Find the partition with GRUB modules
grub rescue> ls (hd0,gpt2)/boot/grub/
grub rescue> set prefix=(hd0,gpt2)/boot/grub
grub rescue> set root=(hd0,gpt2)
grub rescue> insmod normal
grub rescue> normal
# This should bring up the GRUB menu

Full GRUB reinstall from live USB:

# Boot from live USB, mount the system
$ sudo mount /dev/sda2 /mnt
$ sudo mount /dev/sda1 /mnt/boot/efi   # If UEFI
$ sudo mount --bind /dev /mnt/dev
$ sudo mount --bind /proc /mnt/proc
$ sudo mount --bind /sys /mnt/sys
$ sudo mount --bind /run /mnt/run       # Important for UEFI

$ sudo chroot /mnt

# Reinstall GRUB
# For BIOS:
$ grub-install /dev/sda
$ update-grub

# For UEFI:
$ grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=ubuntu
$ update-grub

$ exit
$ sudo umount -R /mnt
$ sudo reboot

initramfs Regeneration

Scenario: Boot fails because initramfs is missing drivers or is corrupt

Symptoms: kernel panic with "Unable to mount root fs," "VFS: Cannot open root device," or "no working init found."

Fix from rescue mode or live USB:

# After chrooting into the system (see GRUB repair steps)

# Debian/Ubuntu:
$ update-initramfs -u -k $(uname -r)

# If current kernel doesn't match, specify the version:
$ ls /lib/modules/
5.15.0-91-generic  5.15.0-92-generic
$ update-initramfs -u -k 5.15.0-92-generic

# RHEL/CentOS (dracut):
$ dracut -f /boot/initramfs-$(uname -r).img $(uname -r)

# Verbose mode to see what's included:
$ dracut -fv /boot/initramfs-5.15.0-92-generic.img 5.15.0-92-generic 2>&1 | tee /tmp/dracut.log

# Force include specific modules (e.g., if RAID driver is missing):
$ dracut -f --add-drivers "megaraid_sas" /boot/initramfs-$(uname -r).img

Scenario: initramfs was accidentally deleted

# If /boot/initrd.img-5.15.0-91-generic is gone:

# From rescue mode or live USB chroot:
$ update-initramfs -c -k 5.15.0-91-generic   # Create new (not update)

# On RHEL:
$ dracut /boot/initramfs-5.15.0-91-generic.img 5.15.0-91-generic

Boot Performance Analysis

Scenario: System takes 90 seconds to boot, need to find the bottleneck

# Overall boot time breakdown
$ systemd-analyze
Startup finished in 3.2s (firmware) + 1.5s (loader) + 4.8s (kernel) + 82.1s (userspace) = 91.6s
graphical.target reached after 82.1s in userspace.

# Clearly the problem is in userspace. Find the culprits:
$ systemd-analyze blame | head -15
         65.234s NetworkManager-wait-online.service
          8.123s snapd.service
          3.456s plymouth-quit-wait.service
          2.345s docker.service
          1.234s dev-sda2.device
          ...

# The critical chain shows dependencies:
$ systemd-analyze critical-chain
multi-user.target @82.1s
└─NetworkManager-wait-online.service @16.8s +65.2s
  └─NetworkManager.service @14.5s +2.3s
    └─dbus.service @12.1s +0.4s
      └─basic.target @12.0s
        └─sockets.target @12.0s

# NetworkManager-wait-online is the bottleneck.
# If not needed (server with static IP):
$ sudo systemctl disable NetworkManager-wait-online.service

# Or reduce timeout:
$ sudo mkdir -p /etc/systemd/system/NetworkManager-wait-online.service.d/
$ cat <<EOF | sudo tee /etc/systemd/system/NetworkManager-wait-online.service.d/timeout.conf
[Service]
ExecStart=
ExecStart=/usr/bin/nm-online -s -q --timeout=10
EOF

# Generate SVG boot chart for detailed visualization
$ systemd-analyze plot > /tmp/boot-chart.svg

Kernel Panic Troubleshooting

Scenario: System shows kernel panic during boot

Common kernel panic messages and their causes:

"VFS: Unable to mount root fs on unknown-block(0,0)" - Root device not found. Wrong root= parameter, missing storage driver in initramfs, or hardware failure.

# Fix: verify root device at GRUB shell
grub> ls (hd0,gpt2)/
# If this works, the partition exists

# Check root= parameter matches
grub> cat (hd0,gpt2)/etc/fstab
# Find the UUID of the root partition
# Update the linux line with correct root=UUID=...

"Kernel panic - not syncing: No working init found" - The kernel can't find /sbin/init, /etc/init, /bin/init, or /bin/sh - Usually means initramfs is corrupt or root filesystem is damaged

# Boot with init=/bin/bash to verify filesystem
# If that works, rebuild initramfs

# If filesystem is damaged:
# Boot from live USB
$ sudo fsck -y /dev/sda2

"Kernel panic - not syncing: Attempted to kill init!" - PID 1 (init/systemd) crashed. Check for corrupt systemd binary or broken shared libraries.

# Boot with init=/bin/bash
# Check systemd binary
$ file /sbin/init    # Should be ELF executable or symlink to systemd
$ ldd /lib/systemd/systemd   # Check for missing libraries

# Reinstall systemd
# Debian: apt-get install --reinstall systemd
# RHEL: yum reinstall systemd

fsck on Boot Failure

Scenario: Boot drops to emergency shell because filesystem check failed

# Typical message:
# "Give root password for maintenance (or press Control-D to continue)"
# or: "You are in emergency mode"

# Check what failed:
$ journalctl -xb --no-pager | grep -i "fsck\|error\|fail"

# Run fsck manually (filesystem must be UNMOUNTED)
$ umount /dev/sda3        # If it's not root
$ fsck -y /dev/sda3       # Auto-fix errors

# For root filesystem, boot from live USB:
$ sudo fsck -y /dev/sda2

# If XFS:
$ sudo xfs_repair /dev/sda2

# If XFS repair fails:
$ sudo xfs_repair -L /dev/sda2   # Reset journal (data loss possible!)

# After fixing, reboot
$ reboot

Scenario: fstab entry has wrong UUID and system won't boot

# The system drops to emergency mode because a mount failed

# Check what failed
$ systemctl --failed
  UNIT              LOAD   ACTIVE SUB    DESCRIPTION
  mnt-data.mount    loaded failed failed Mount /mnt/data

# Check fstab
$ cat /etc/fstab
# There's a UUID that doesn't match any existing device

# Find correct UUIDs
$ blkid
/dev/sda1: UUID="abc123" TYPE="ext4"
/dev/sda2: UUID="def456" TYPE="ext4"
/dev/sdb1: UUID="789012" TYPE="xfs"     # This is the correct UUID

# Fix fstab
$ vim /etc/fstab
# Update the UUID

# Test before rebooting!
$ mount -a
# If no errors, it's safe to reboot

Recovering from a Bad Kernel Update

Scenario: New kernel won't boot, need to roll back

At the GRUB menu:

  1. Select "Advanced options for Ubuntu" (or similar)
  2. Choose the previous kernel version
  3. System boots with the old kernel

To make the rollback permanent:

# Once booted on the old kernel:
$ uname -r
5.15.0-91-generic   # Old (working) kernel

# Set GRUB to default to this kernel
$ grep menuentry /boot/grub/grub.cfg | head -10
# Find the exact menu entry string

# Or set by index (0 = first entry, typically newest)
$ sudo vim /etc/default/grub
GRUB_DEFAULT="1>2"   # Submenu index 1, entry index 2 (count from 0)

# Regenerate GRUB config
$ sudo update-grub

# Optionally remove the broken kernel
$ sudo apt-get remove linux-image-5.15.0-92-generic    # Debian
$ sudo dnf remove kernel-5.15.0-92.el9                  # RHEL

Boot Logging and Forensics

Scenario: System rebooted unexpectedly, need to find out why

# Check journal from the previous boot
$ journalctl -b -1 --no-pager | tail -50

# Look specifically for the shutdown/crash
$ journalctl -b -1 -p crit
$ journalctl -b -1 --grep="panic|oom|segfault|watchdog"

# Check if it was an OOM kill
$ journalctl -b -1 -k --grep="oom\|killed process"

# Check for hardware errors
$ journalctl -b -1 -k --grep="hardware error\|mce\|GHES"

# Check for watchdog timeout
$ journalctl -b -1 -k --grep="watchdog\|NMI\|lockup"

# See last log entries before the reboot
$ journalctl -b -1 -n 100 --no-pager

# List all boots with their timestamps
$ journalctl --list-boots
-3 abc... Tue 2026-03-16 10:00:00  Tue 2026-03-16 18:30:00
-2 def... Tue 2026-03-16 18:35:00  Wed 2026-03-17 02:15:00  # Short uptime
-1 ghi... Wed 2026-03-17 02:20:00  Wed 2026-03-18 14:00:00
 0 jkl... Wed 2026-03-18 14:05:00  present

# Check if it was a clean shutdown or crash
$ last -x reboot shutdown | head -10
reboot   system boot  5.15.0-91-generic Wed Mar 18 14:05   still running
shutdown system down   5.15.0-91-generic Wed Mar 18 14:00 - 14:05  (00:05)
reboot   system boot  5.15.0-91-generic Wed Mar 17 02:20 - 14:00 (1+11:40)
crash    system down   5.15.0-91-generic Wed Mar 17 02:15 - 02:20  (00:05)
# "crash" indicates unclean shutdown

Filling /boot Partition — Emergency Cleanup

Scenario: /boot is full and you can't install kernel updates or regenerate initramfs

# Check /boot usage
$ df -h /boot
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       477M  470M     0 100% /boot

# List installed kernels
$ ls -la /boot/vmlinuz-*
$ dpkg --list 'linux-image-*' | grep '^ii'    # Debian
$ rpm -qa kernel                                # RHEL

# Find current kernel (DO NOT REMOVE)
$ uname -r
5.15.0-91-generic

# Remove old kernels (keep current + one fallback)
# Debian/Ubuntu:
$ sudo apt-get purge linux-image-5.15.0-{85,86,87,88,89}-generic
$ sudo apt-get autoremove --purge

# RHEL/CentOS:
$ sudo dnf remove kernel-5.15.0-{85,86,87,88,89}.el9

# If apt won't run because /boot is full, manual cleanup:
$ sudo rm /boot/vmlinuz-5.15.0-85-generic
$ sudo rm /boot/initrd.img-5.15.0-85-generic
$ sudo rm /boot/System.map-5.15.0-85-generic
$ sudo rm /boot/config-5.15.0-85-generic
# Then run apt to clean up package state:
$ sudo apt-get -f install
$ sudo apt-get autoremove --purge

# Set kernel retention policy to prevent recurrence
# Debian (in /etc/apt/apt.conf.d/):
$ echo 'Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";' | \
    sudo tee /etc/apt/apt.conf.d/52-kernel-cleanup

# RHEL (in /etc/dnf/dnf.conf):
$ sudo grep installonly_limit /etc/dnf/dnf.conf
installonly_limit=3    # Keep only 3 kernel versions