Quiz: Kernel Troubleshooting¶

6 questions

L0 (1 questions)¶

1. What command shows kernel error messages with human-readable timestamps, and what patterns should you grep for?

Show answer

dmesg -T -l err,crit,alert,emerg shows kernel errors with readable timestamps. Key patterns to grep for: 'oom' or 'out of memory' (memory issues), 'i/o error' or 'medium error' (disk problems), 'link down' (network), 'mce' or 'machine check' (CPU/hardware), and 'ext4' or 'xfs' with 'corrupt' (filesystem issues).

L1 (2 questions)¶

1. What is the difference between a kernel oops and a kernel panic?

Show answer

An oops is a kernel bug detection — the offending process is killed and the kernel continues running (degraded, marked tainted). A panic is fatal — the kernel cannot continue and the system halts or reboots. An oops can escalate to a panic if panic_on_oops=1 is set. Both produce stack traces in dmesg, but only a panic requires an immediate reboot.

2. What is a kernel panic and how do you capture and analyze the crash dump?

Show answer

Kernel panic = unrecoverable kernel error (null pointer dereference, corrupted data structures, hardware fault). Configure kdump: install kdump-tools or kexec-tools, set crashkernel=256M boot parameter. On panic, kdump captures memory to /var/crash/. Analyze with crash utility: crash /usr/lib/debug/vmlinux /var/crash/vmcore — then use bt (backtrace), log (dmesg), ps (process list). Common causes: bad RAM, driver bugs, filesystem corruption.

L2 (2 questions)¶

1. What is the REISUB sequence and when would you use it?

Show answer

REISUB is the SysRq safe reboot sequence for a completely hung system: R (un-Raw keyboard), E (tErminate all processes with SIGTERM), I (kIll all with SIGKILL), S (Sync disk buffers), U (Unmount/remount read-only), B (reBoot). Wait 2-5 seconds between each key. This is the cleanest possible reboot when the system is unresponsive to normal commands. Requires kernel.sysrq=1.

2. How do you use /proc/PID/oom_score_adj and cgroup memory limits together to protect critical services?

Show answer

Layer both:
1. Set oom_score_adj=-900 on critical processes (database, control plane) so the OOM killer targets other processes first.
2. Set cgroup memory.max on non-critical services to cap their memory and trigger OOM within their cgroup before system-wide OOM.
3. Set memory.low on critical services for best-effort memory protection (kernel reclaims from others first). In systemd: OOMScoreAdjust=-900, MemoryMax=2G, MemoryLow=512M. Monitor with memory.events for oom_kill counts.

L3 (1 questions)¶

1. How does kdump work and what are the key commands to analyze a crash dump?

Show answer

kdump reserves memory at boot for a secondary kernel. When the main kernel panics, the crash kernel activates via kexec, captures the memory state to /var/crash/ as a vmcore file, then reboots. Analysis: 'crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux /var/crash/*/vmcore'. Inside crash: 'bt' for backtrace (read bottom-up), 'log' for kernel log, 'ps' for process list, 'kmem -i' for memory summary.