What Happens When You Press Power
- lesson
- bios/uefi
- grub
- kernel
- initramfs
- systemd
- filesystems
- pid-1 ---# What Happens When You Press the Power Button
Topics: BIOS/UEFI, GRUB, kernel, initramfs, systemd, filesystems, PID 1 Level: L1–L2 (Foundations → Operations) Time: 60–90 minutes Prerequisites: None (everything is explained from scratch)
The Mission¶
You press the power button on a server. Forty-five seconds later, you can SSH in.
In those 45 seconds, the machine went from no electricity to a running Linux system with a filesystem, a network stack, running services, and a login prompt. It executed firmware written in the 1970s, a bootloader that's a small operating system in its own right, an operating system kernel that unpacked itself from a compressed archive, a temporary filesystem that exists only in RAM, and a process manager that brought up 200 services in parallel.
This lesson follows the boot sequence from the moment electricity hits the motherboard to the moment you can log in. Each stage solves exactly one problem, then hands off to the next. Understanding this chain is the foundation for debugging every boot failure you'll ever encounter.
Stage 1: Power-On — From Electricity to Firmware¶
You press the button. The power supply unit (PSU) stabilizes voltage on multiple rails (3.3V, 5V, 12V) and sends a "Power Good" signal to the motherboard. This takes about 100–500 milliseconds.
The CPU begins executing instructions from a fixed, hardwired address — the reset vector.
On x86 systems, this is 0xFFFFFFF0, the top of the 32-bit address space. The instruction
there is a jump to the firmware's entry point.
At this moment: - RAM is not initialized (DRAM needs training — the firmware discovers how much RAM exists and configures timings) - No storage devices are available - No USB, PCIe, GPU, or network - There is no operating system, no filesystem, no processes
The firmware has to bootstrap everything from nothing.
BIOS (the old way)¶
BIOS (Basic Input/Output System) was created by Gary Kildall for CP/M in 1975 and adopted by IBM for the original PC in 1981. It remained fundamentally unchanged for nearly 40 years.
Power on → POST (Power-On Self-Test) → Enumerate boot devices →
Read first 512 bytes (MBR) from boot device → Jump to MBR code
POST (Power-On Self-Test) checks that basic hardware works: CPU, RAM, keyboard controller, video. If POST fails, the motherboard emits beep codes (one long, two short = video failure, etc.) — this predates displays being available.
The MBR (Master Boot Record) is exactly 512 bytes:
┌─────────────────────────────────────────────────┐
│ Bootloader code (440 bytes) │ ← Tiny. Barely enough for anything.
│ Disk signature (4 bytes) │
│ Null (2 bytes) │
│ Partition table (64 bytes = 4 × 16-byte entries)│ ← Max 4 partitions. Max 2TB disk.
│ Boot signature: 0x55AA (2 bytes) │
└─────────────────────────────────────────────────┘
440 bytes of code. That's not enough to do anything useful — it's barely enough to find and load the real bootloader from a partition.
Trivia: The 0x55AA boot signature has been the same since the original IBM PC in 1981. Every x86 machine in the world still looks for those two bytes at offset 510–511 of the first sector to decide if a disk is bootable. The specific value was chosen because its bit pattern (01010101 10101010) alternates between 0 and 1, making it unlikely to occur randomly and easy to detect with simple hardware.
UEFI (the modern way)¶
UEFI (Unified Extensible Firmware Interface) began replacing BIOS around 2005 and became the default on most hardware by 2015.
Power on → POST → UEFI reads NVRAM boot variables →
Load EFI application from ESP (EFI System Partition) →
EFI app is the bootloader
Key differences from BIOS:
| BIOS | UEFI | |
|---|---|---|
| Boot config | Fixed: read MBR from first disk | Flexible: NVRAM boot entries point to specific EFI files |
| Partition table | MBR (2TB max, 4 partitions) | GPT (9.4 ZB max, 128 partitions) |
| Bootloader location | 440 bytes in MBR + post-MBR gap | Full binary on the ESP (FAT32 partition) |
| Secure Boot | No | Yes — cryptographic chain of trust |
| Pre-boot environment | 16-bit real mode, 1MB address space | 32/64-bit, GiB of address space, drivers, shell |
The EFI System Partition (ESP) is a small FAT32 partition (typically 512MB) mounted at
/boot/efi/. It contains bootloader binaries:
ls /boot/efi/EFI/
# → ubuntu/ Microsoft/ Boot/
ls /boot/efi/EFI/ubuntu/
# → shimx64.efi grubx64.efi grub.cfg
# See UEFI boot entries
efibootmgr -v
# → Boot0001* ubuntu HD(1,GPT,...)/File(\EFI\ubuntu\shimx64.efi)
# → Boot0002* EFI Network PciRoot(0x0)/Pci(0x1c,0x0)/.../MAC(...)
Secure Boot — the chain of trust¶
Secure Boot ensures that only signed code runs before the OS:
UEFI firmware (has Microsoft's key enrolled)
→ loads shimx64.efi (signed by Microsoft, contains distro vendor's key)
→ loads grubx64.efi (signed by distro vendor)
→ loads vmlinuz (signed by distro vendor)
If any signature fails, boot halts. This prevents bootkits — malware that hides in the boot chain below the OS.
Gotcha: Secure Boot uses Microsoft's key as the root of trust on most hardware. Linux distributions use a "shim" signed by Microsoft that contains the distro's own key. If you build a custom kernel without signing it, Secure Boot will refuse to load it. Either sign your kernel with a Machine Owner Key (MOK) enrolled in UEFI, or disable Secure Boot (which removes the protection).
Stage 2: GRUB — Loading the Kernel¶
The firmware found and loaded the bootloader. On most Linux systems, that's GRUB2 (GRand Unified Bootloader).
GRUB2 is a small operating system in its own right. It includes: - Filesystem drivers (ext4, XFS, Btrfs, FAT, NTFS) - A command-line shell with a scripting language - Module loading system - Network stack (for network boot) - Configuration parser
This complexity is why grub.cfg is auto-generated, not hand-edited.
What GRUB actually does¶
- Reads its configuration (
/boot/grub/grub.cfg) - Presents a menu (or auto-selects the default after a timeout)
- Loads the kernel image (
vmlinuz) into memory - Loads the initramfs image into memory
- Optionally loads CPU microcode (errata fixes, security mitigations)
- Assembles the kernel command line
- Jumps to the kernel's entry point
# See what GRUB is configured to boot
cat /boot/grub/grub.cfg | grep menuentry
# → menuentry 'Ubuntu, with Linux 6.5.0-44-generic' ...
# → menuentry 'Ubuntu, with Linux 6.5.0-44-generic (recovery mode)' ...
# See the kernel command line that was used for the current boot
cat /proc/cmdline
# → BOOT_IMAGE=/vmlinuz-6.5.0-44-generic root=UUID=abc123... ro quiet splash
The kernel command line — the contract¶
These parameters tell the kernel and initramfs how to behave:
| Parameter | What it does |
|---|---|
root=UUID=... |
Where to find the real root filesystem |
ro |
Mount root read-only initially (fsck safety) |
quiet |
Suppress most boot messages |
splash |
Show graphical splash instead of text |
single or 1 |
Boot to single-user mode (rescue) |
init=/bin/bash |
Skip init entirely, drop to shell (emergency) |
systemd.unit=rescue.target |
Boot to rescue target |
rd.break |
Break into initramfs shell before switch_root |
nomodeset |
Disable kernel mode setting (GPU troubleshooting) |
console=ttyS0,115200 |
Serial console (headless servers) |
Gotcha: Never edit
/boot/grub/grub.cfgdirectly — it's regenerated every time you runupdate-grub(Debian/Ubuntu) orgrub2-mkconfig(RHEL/Fedora). Your changes will be silently overwritten. Edit/etc/default/grubfor defaults, or add scripts in/etc/grub.d/for custom entries.Trivia: GRUB's name stands for GRand Unified Bootloader. "Grand Unified" is a physics reference — a Grand Unified Theory unifies the fundamental forces. GRUB "unifies" booting across different operating systems and architectures. The original GRUB was part of the GNU project; GRUB2 was a complete rewrite.
Stage 3: Kernel Initialization — From Compressed Blob to Running OS¶
GRUB loaded the kernel into memory and jumped to it. The first thing the kernel does is decompress itself.
Self-extraction¶
The kernel image (vmlinuz — the "z" stands for compressed) is a compressed archive with
a small decompressor prepended. The decompressor runs first, unpacks the real kernel into
memory, then jumps to it.
# See the compression format
file /boot/vmlinuz-$(uname -r)
# → Linux kernel x86 boot executable bzImage, ... compressed
# Kernel size compressed vs uncompressed
ls -lh /boot/vmlinuz-$(uname -r)
# → 12M (compressed)
# Actual kernel is 30-50MB uncompressed
Hardware discovery¶
Once decompressed, the kernel initializes the machine:
- CPU features — detects instruction sets, security mitigations, number of cores
- Memory — builds the memory map, sets up page tables
- Interrupts — configures the interrupt controller
- ACPI — reads hardware description tables from firmware
- Bus enumeration — discovers PCI/PCIe devices (network cards, storage controllers, GPUs)
- Built-in drivers — initializes drivers compiled directly into the kernel
All of this happens before any userspace code runs. You can see it in the kernel ring buffer:
dmesg | head -50
# → [ 0.000000] Linux version 6.5.0-44-generic ...
# → [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.5.0-44-generic root=UUID=...
# → [ 0.000000] BIOS-provided physical RAM map:
# → [ 0.003201] ACPI: RSDP 0x00000000000E0000 ...
# → [ 0.123456] PCI: Using host bridge windows ...
# → [ 0.234567] SCSI subsystem initialized
# → [ 0.345678] nvme nvme0: pci function 0000:01:00.0
Under the Hood: The kernel timestamps in
dmesgstart at 0.000000 — the moment the kernel takes control. Everything before that (firmware, GRUB) is invisible to Linux. On a typical server, the kernel initializes hardware in 1-3 seconds. The rest of the boot time is firmware (5-30 seconds on servers, due to hardware discovery and RAID controller init) and userspace services.
Stage 4: Initramfs — The Bridge to the Real Root¶
The kernel is running, but it has a problem: it needs to mount the root filesystem, and the root filesystem might be on an LVM volume, inside a LUKS encrypted container, on a software RAID, behind an iSCSI target, or on an NVMe drive whose driver isn't compiled into the kernel.
The kernel can't know in advance what storage setup it will encounter. So it doesn't try. Instead, it unpacks a temporary filesystem into RAM — the initramfs (initial RAM filesystem) — which contains just enough tools and drivers to find and mount the real root.
# See the initramfs file
ls -lh /boot/initrd.img-$(uname -r)
# → 60M (compressed archive)
# Peek inside
lsinitramfs /boot/initrd.img-$(uname -r) | head -20
# → .
# → bin
# → bin/busybox
# → etc
# → etc/modprobe.d
# → lib/modules/6.5.0-44-generic/kernel/drivers/nvme/
# → scripts/local-top/
What initramfs does¶
- The kernel unpacks the initramfs CPIO archive into a tmpfs at
/ - The kernel executes
/init(a shell script or systemd) /initloads kernel modules needed for storage (nvme, md, dm-crypt, lvm)/initdiscovers and assembles storage (RAID arrays, LVM volumes)/initunlocks encrypted volumes (LUKS — prompts for passphrase)/initfinds and mounts the real root filesystem/initcallsswitch_root— replaces the temporary RAM root with the real root/initexecs the real PID 1 (/sbin/init→ systemd)
┌──────────────────┐ ┌──────────────────┐
│ Initramfs (RAM) │ │ Real Root (disk) │
│ │ │ │
│ /init │ ──→ │ /sbin/init │
│ /bin/busybox │ │ /usr/lib/systemd/ │
│ /lib/modules/ │ │ /etc/ │
│ /scripts/ │ │ /var/ │
│ │ │ /home/ │
│ (temporary) │ │ (permanent) │
└──────────────────┘ └──────────────────┘
↑ switch_root ↓
Under the Hood:
switch_rootis a clever trick. It mounts the real root filesystem, moves/proc,/sys, and/devover to it, deletes everything in the initramfs (freeing the RAM), makes the real root the new/, and execs the real init process. After this point, the initramfs doesn't exist — its RAM is freed and reusable.Name Origin: "initramfs" = "initial RAM filesystem." It replaced the older "initrd" (initial RAM disk), which was a block device in RAM with a real filesystem on it. initramfs uses a simpler CPIO archive unpacked into a tmpfs — no block device, no filesystem formatting overhead. The filename is still often
initrd.imgfor historical reasons, even though it's actually an initramfs.
When initramfs fails¶
If the initramfs can't find the root filesystem, you get dropped to an emergency shell:
[ 3.456789] Gave up waiting for root file system device.
ALERT! UUID=abc123-def456-... does not exist. Dropping to a shell!
BusyBox v1.36.1 (Ubuntu 1:1.36.1-6ubuntu3) built-in shell (ash)
(initramfs) _
Common causes:
- Wrong root=UUID=... in the kernel command line (typo, or UUID changed after reinstall)
- Storage driver not included in the initramfs (changed controller, forgot to rebuild)
- LUKS or LVM tools missing from initramfs
- Disk failed or was removed
# Rebuild initramfs (after fixing the problem)
# Debian/Ubuntu:
sudo update-initramfs -u
# RHEL/Fedora:
sudo dracut --force
# Verify a specific driver is included
lsinitramfs /boot/initrd.img-$(uname -r) | grep nvme
# → lib/modules/.../kernel/drivers/nvme/host/nvme.ko
Gotcha: If you change your storage controller (migrate from SATA to NVMe, or change RAID controller), you MUST rebuild the initramfs before rebooting. The new driver won't be in the old initramfs, and the kernel will panic because it can't find the root filesystem. This catches people during cloud migrations and hardware refreshes.
Stage 5: PID 1 — systemd Takes Over¶
The initramfs has mounted the real root filesystem and called switch_root. Now the kernel
needs to start PID 1 — the first real userspace process.
The kernel tries these paths in order:
1. /sbin/init
2. /etc/init
3. /bin/init
4. /bin/sh
On modern systems, /sbin/init is a symlink to /usr/lib/systemd/systemd. When systemd
starts as PID 1, it:
- Reads its configuration and unit files
- Builds a dependency graph of all services
- Starts services in parallel (respecting dependencies)
- Reaches the target (typically
multi-user.targetfor servers)
# What target did we boot into?
systemctl get-default
# → multi-user.target
# See the full dependency tree
systemctl list-dependencies multi-user.target
# How long did boot take?
systemd-analyze
# → Startup finished in 2.5s (firmware) + 3.1s (loader) + 1.8s (kernel) + 8.4s (userspace) = 15.8s
# What took the longest?
systemd-analyze blame | head -10
# → 5.2s NetworkManager-wait-online.service
# → 2.1s snapd.service
# → 1.5s accounts-daemon.service
# Critical path (bottleneck chain)
systemd-analyze critical-chain
PID 1 is special¶
PID 1 has unique behavior in the Linux kernel:
-
It can't be killed. The kernel only delivers signals that PID 1 has explicitly registered handlers for. Sending SIGKILL to PID 1 does nothing.
-
If it exits, the kernel panics. PID 1 dying means the system is dead — there's no recovery.
-
It reaps orphans. When any process's parent dies, the orphaned children are re-parented to PID 1. PID 1 must call
wait()to clean up these orphans, or they become zombies.
Under the Hood: The reason PID 1 can't be killed by unregistered signals is a kernel design decision, not a bug. The kernel checks: "Is the target PID 1? Does PID 1 have a handler for this signal? If not, drop the signal." This prevents accidental system death. In containers, this same behavior causes problems — if your app is PID 1 and doesn't handle SIGTERM,
docker stophas to wait 10 seconds and then SIGKILL.
The boot target hierarchy¶
emergency.target
↓
rescue.target (single-user)
↓
sysinit.target (mounts, swap, udev, journald)
↓
basic.target (sockets, timers, paths)
↓
multi-user.target (networking, services, login)
↓
graphical.target (display manager, desktop)
systemd doesn't execute these sequentially — it resolves the dependency graph and starts
as many things in parallel as possible. multi-user.target might trigger 200 units to
start, and systemd figures out which can run simultaneously based on their After=,
Requires=, and Wants= declarations.
Trivia: systemd replaced SysV init — a system from 1983 that used numbered shell scripts (
S01networking,S02ssh,S03docker) executed in sequence. On a modern system with 200 services, sequential boot took 30-60 seconds. systemd's parallel dependency resolution gets the same system up in under 10 seconds. It's also one of the most controversial projects in Linux history — the Debian vote in 2014 nearly split the project and spawned the Devuan fork (a Debian variant specifically without systemd).
Stage 6: Services Start — From Target to Login¶
systemd reaches multi-user.target and brings up the services that make the system useful:
| Service | What it does | When it starts |
|---|---|---|
systemd-journald |
Logging | Very early (sysinit.target) |
systemd-udevd |
Device management | Very early (sysinit.target) |
NetworkManager / systemd-networkd |
Networking | After basic.target |
sshd |
Remote access | After network.target |
docker / containerd |
Container runtime | After network.target |
postgresql / mysql |
Database | After network.target |
nginx / apache |
Web server | After network.target |
getty@tty1 |
Console login prompt | After multi-user.target |
When SSH is reachable, the boot is "done"¶
For servers, the practical definition of "boot complete" is: can you SSH in? This requires:
- Kernel booted ✓
- Root filesystem mounted ✓
- systemd reached multi-user.target ✓
- Network is configured ✓
- sshd is listening ✓
# Check if the system finished booting
systemctl is-system-running
# → running ← everything is fine
# → degraded ← some non-critical unit failed
# → starting ← still booting
# → maintenance ← in rescue/emergency mode
When a service fails to start¶
# Check what's broken
systemctl list-units --failed
# → UNIT LOAD ACTIVE SUB DESCRIPTION
# → myapp.service loaded failed failed My Application
# See what happened
systemctl status myapp
# → Active: failed (Result: exit-code)
# → Process: ExecStart=/opt/myapp/bin/server (code=exited, status=1)
# Full logs
journalctl -u myapp -b # -b = current boot only
Gotcha: A service that starts successfully during boot but crashes 5 seconds later can look like a boot problem. Check
systemctl status— if it saysActive: failedwith a recent timestamp, the service started and then crashed. Check the journal for the actual error.
The Complete Boot — One Picture¶
[1] Power button → PSU stabilizes → "Power Good" signal
(100-500ms)
[2] CPU reset vector → Firmware entry point
BIOS: POST → enumerate boot devices → read MBR (512 bytes)
UEFI: POST → read NVRAM boot variables → load EFI app from ESP
(2-30 seconds, mostly hardware discovery)
[3] Bootloader (GRUB2)
Read grub.cfg → select kernel → load vmlinuz + initramfs into RAM
Assemble kernel command line → jump to kernel
(1-3 seconds)
[4] Kernel
Decompress self → CPU/memory/interrupt setup → ACPI discovery
Bus enumeration → built-in driver init → mount initramfs
(1-3 seconds)
[5] Initramfs
Load storage drivers → assemble RAID/LVM → unlock LUKS
Find root filesystem → mount it → switch_root
(0.5-5 seconds, more if LUKS passphrase needed)
[6] PID 1 (systemd)
Build dependency graph → parallel service activation
sysinit.target → basic.target → multi-user.target
(3-15 seconds)
[7] Login ready
getty on console, sshd on network, display manager on desktop
System is operational.
Total: 8–60 seconds depending on hardware (servers with RAID controllers are slow; NVMe desktops are fast) and services (databases take longer to start than web servers).
Debugging Boot Failures¶
When boot breaks, the symptoms tell you which stage failed:
| Symptom | Failed stage | What to check |
|---|---|---|
| No video, beep codes | Stage 1 — Firmware/POST | Hardware failure, RAM not seated |
| GRUB rescue prompt | Stage 2 — Bootloader | grub.cfg missing or corrupt, wrong partition |
| GRUB menu but "file not found" | Stage 2 — Bootloader | Kernel or initramfs file missing from /boot |
| Kernel panic early | Stage 3 — Kernel | Hardware incompatibility, bad kernel |
| "Gave up waiting for root device" | Stage 4 — Initramfs | Wrong root UUID, missing storage driver |
| BusyBox or dracut shell | Stage 4 — Initramfs | Can't find/mount root, LUKS can't unlock |
| Emergency or rescue shell | Stage 6 — systemd | fstab error, critical service failure |
| "degraded" but usable | Stage 6 — systemd | Non-critical service failed, check systemctl --failed |
Recovery techniques¶
# Boot to rescue mode (from GRUB menu, edit kernel line)
# Add: systemd.unit=rescue.target
# Or: single
# Or: 1
# Boot to emergency mode (minimal, almost nothing running)
# Add: systemd.unit=emergency.target
# Skip init entirely (absolute last resort)
# Add: init=/bin/bash
# → You get a root shell, but no services, no networking, no journal
# → Filesystem is read-only; remount with: mount -o remount,rw /
# Break into initramfs shell (before root is mounted)
# Add: rd.break
# → Useful for fixing root filesystem issues or resetting passwords
Gotcha: After editing
/etc/fstab, always test withmount -abefore rebooting. A typo in fstab can make the system unbootable — it drops to emergency mode because it can't mount a filesystem. On remote servers, this means a trip to the datacenter or a console session. Usenofailon non-critical mount entries.Gotcha: Keep at least two kernels installed. If a kernel update introduces a regression, you can boot the previous kernel from the GRUB menu. If you've already deleted the old kernel, your only option is a live USB.
Flashcard Check¶
Q1: What does the CPU execute first when power is applied?
The instruction at the reset vector (0xFFFFFFF0 on x86). This is hardwired in the CPU and points to the firmware's entry point.
Q2: What's in the MBR's 440 bytes of bootloader code?
Just enough to find and load the real bootloader (GRUB stage 1.5 or stage 2). 440 bytes can't fit a filesystem driver, so it relies on fixed disk offsets.
Q3: Why does the kernel decompress itself?
The kernel image (vmlinuz) is stored compressed to save space on
/boot. A small decompressor stub runs first, unpacks the real kernel, then jumps to it.
Q4: What is the initramfs for?
It's a temporary RAM filesystem containing drivers and tools to find and mount the real root filesystem. Needed because the kernel can't know in advance whether root is on LVM, LUKS, RAID, NVMe, iSCSI, or NFS.
Q5: What happens if PID 1 exits?
Kernel panic. PID 1 dying is a fatal condition. The system is dead and requires a reboot.
Q6: systemd-analyze blame shows NetworkManager-wait-online.service at 15 seconds. What is this?
It waits until the network is fully configured (DHCP lease, IP assigned, routes set). Many services depend on
network-online.target, which waits for this. It's often the single biggest boot time contributor on servers.
Q7: GRUB shows "file not found" for the kernel. What broke?
The kernel file is missing from
/boot/. Common causes: partition full, kernel deleted without updating GRUB, or wrong partition UUID in grub.cfg.
Q8: After changing the RAID controller, the system panics with "unable to mount root fs." Why?
The initramfs doesn't contain the driver for the new controller. Rebuild it with
update-initramfs -u(Debian) ordracut --force(RHEL) before changing hardware.
Exercises¶
Exercise 1: Explore your own boot (hands-on)¶
# How long did your boot take?
systemd-analyze
# What was the bottleneck?
systemd-analyze blame | head -10
# See the critical chain
systemd-analyze critical-chain
# What kernel command line was used?
cat /proc/cmdline
# When was the kernel built?
uname -a
# How big is your initramfs?
ls -lh /boot/initrd.img-$(uname -r) 2>/dev/null || ls -lh /boot/initramfs-$(uname -r).img
# What's inside it?
lsinitramfs /boot/initrd.img-$(uname -r) 2>/dev/null | wc -l
# How many files are in your initramfs?
Exercise 2: Read the boot messages (investigation)¶
# See kernel messages from the current boot
journalctl -k -b | head -100
# Find when specific hardware was detected
dmesg | grep -i nvme # NVMe drives
dmesg | grep -i eth # Network interfaces
dmesg | grep -i usb # USB devices
# See systemd's boot progress
journalctl -b | grep "Reached target"
# → Reached target Local File Systems.
# → Reached target Network.
# → Reached target Multi-User System.
What target was reached last? How long after kernel start?
Exercise 3: Boot failure triage (think)¶
For each scenario, identify which boot stage failed and what you'd do:
- Server powers on, fans spin, but no video output and you hear 3 short beeps
- GRUB menu appears but all entries say "error: file not found"
- Kernel starts loading, then you see "Kernel panic - VFS: Unable to mount root fs"
- System boots to a text prompt that says "(initramfs)" and won't go further
- System boots but
systemctl is-system-runningsays "degraded"
Answers
1. **Stage 1 — POST failure.** Beep codes indicate hardware failure (often RAM). Reseat RAM modules. Check motherboard manual for beep code meaning. 2. **Stage 2 — Bootloader.** GRUB can't find kernel files on `/boot`. Possible: partition table changed, `/boot` partition was reformatted, or files were deleted. Boot from live USB, mount the partition, reinstall kernel and rebuild GRUB. 3. **Stage 4 — Initramfs.** The kernel ran but couldn't find the root filesystem. Most likely: wrong `root=UUID` in kernel command line, or initramfs doesn't have the storage driver. Boot with `rd.break` to get an initramfs shell and investigate. 4. **Stage 4 — Initramfs.** The initramfs ran but couldn't find/mount root. From the `(initramfs)` shell: run `blkid` to find your disks, check if the root UUID matches what's in the kernel command line. If the disk is there but not the right UUID, you can mount it manually and fix GRUB. 5. **Stage 6 — systemd.** Boot completed but a non-critical service failed. `systemctl list-units --failed` shows what's broken. Fix or disable the failing service.Exercise 4: The mental model (think)¶
Draw the five stages from memory:
What problem does each stage solve?
Answer
| Stage | Problem it solves | |-------|------------------| | Firmware | "What device do I boot from?" | | Bootloader (GRUB) | "Which kernel, with what parameters?" | | Kernel | "How do I initialize this hardware?" | | Initramfs | "How do I find and mount the real root filesystem?" | | PID 1 (systemd) | "How do I bring up all the services?" |Cheat Sheet¶
Boot Investigation¶
| What you need | Command |
|---|---|
| Total boot time | systemd-analyze |
| Slowest services | systemd-analyze blame |
| Boot bottleneck chain | systemd-analyze critical-chain |
| Boot timeline SVG | systemd-analyze plot > boot.svg |
| Kernel command line | cat /proc/cmdline |
| Kernel boot messages | dmesg or journalctl -k -b |
| Current boot target | systemctl get-default |
| System boot status | systemctl is-system-running |
| Failed services | systemctl list-units --failed |
Recovery¶
| Situation | Kernel parameter to add |
|---|---|
| Need rescue shell | systemd.unit=rescue.target |
| Need emergency shell | systemd.unit=emergency.target |
| Need initramfs shell | rd.break |
| Skip init entirely | init=/bin/bash |
| Serial console | console=ttyS0,115200 |
Initramfs¶
| Task | Command |
|---|---|
| Rebuild (Debian/Ubuntu) | sudo update-initramfs -u |
| Rebuild (RHEL/Fedora) | sudo dracut --force |
| List contents | lsinitramfs /boot/initrd.img-$(uname -r) |
| Check for specific driver | lsinitramfs ... \| grep nvme |
GRUB¶
| Task | Command |
|---|---|
| Update GRUB config | sudo update-grub (Debian) or sudo grub2-mkconfig -o /boot/grub2/grub.cfg (RHEL) |
| Edit defaults | sudo vim /etc/default/grub then update-grub |
| See boot entries | grep menuentry /boot/grub/grub.cfg |
| See UEFI boot entries | efibootmgr -v |
Takeaways¶
-
Five stages, five handoffs. Firmware → Bootloader → Kernel → Initramfs → PID 1. Each one solves exactly one problem and passes control to the next.
-
The initramfs exists because root is complicated. LVM, LUKS, RAID, NVMe, iSCSI — the kernel can't know what storage setup it will encounter. The initramfs carries the tools to discover and mount it.
-
PID 1 is sacred. It can't be killed, and if it exits, the kernel panics. In containers, your app is PID 1 — which means it has these same responsibilities.
-
systemd parallelizes boot. It builds a dependency graph and starts everything it can simultaneously.
systemd-analyze blameshows you where the time goes. -
The symptom tells you the stage. Beep codes = firmware. GRUB prompt = bootloader. Kernel panic = kernel or initramfs. Emergency shell = systemd. Match the symptom to the stage, and you know where to look.
-
Always rebuild initramfs after hardware changes. New storage controller, new filesystem type, new LUKS config — all require rebuilding. Forgetting this is the #1 cause of "it was working before the migration."
Related Lessons¶
- The Hanging Deploy — what happens after boot: processes, signals, and systemd services
- The Disk That Filled Up — when the filesystem that was mounted during boot fills up