Quiz: Kernel Troubleshooting¶
6 questions
L0 (1 questions)¶
1. What command shows kernel error messages with human-readable timestamps, and what patterns should you grep for?
Show answer
dmesg -T -l err,crit,alert,emerg shows kernel errors with readable timestamps. Key patterns to grep for: 'oom' or 'out of memory' (memory issues), 'i/o error' or 'medium error' (disk problems), 'link down' (network), 'mce' or 'machine check' (CPU/hardware), and 'ext4' or 'xfs' with 'corrupt' (filesystem issues).L1 (2 questions)¶
1. What is the difference between a kernel oops and a kernel panic?
Show answer
An oops is a kernel bug detection — the offending process is killed and the kernel continues running (degraded, marked tainted). A panic is fatal — the kernel cannot continue and the system halts or reboots. An oops can escalate to a panic if panic_on_oops=1 is set. Both produce stack traces in dmesg, but only a panic requires an immediate reboot.2. What is a kernel panic and how do you capture and analyze the crash dump?
Show answer
Kernel panic = unrecoverable kernel error (null pointer dereference, corrupted data structures, hardware fault). Configure kdump: install kdump-tools or kexec-tools, set crashkernel=256M boot parameter. On panic, kdump captures memory to /var/crash/. Analyze with crash utility: crash /usr/lib/debug/vmlinux /var/crash/vmcore — then use bt (backtrace), log (dmesg), ps (process list). Common causes: bad RAM, driver bugs, filesystem corruption.L2 (2 questions)¶
1. What is the REISUB sequence and when would you use it?
Show answer
REISUB is the SysRq safe reboot sequence for a completely hung system: R (un-Raw keyboard), E (tErminate all processes with SIGTERM), I (kIll all with SIGKILL), S (Sync disk buffers), U (Unmount/remount read-only), B (reBoot). Wait 2-5 seconds between each key. This is the cleanest possible reboot when the system is unresponsive to normal commands. Requires kernel.sysrq=1.2. How do you use /proc/PID/oom_score_adj and cgroup memory limits together to protect critical services?
Show answer
Layer both:1. Set oom_score_adj=-900 on critical processes (database, control plane) so the OOM killer targets other processes first.
2. Set cgroup memory.max on non-critical services to cap their memory and trigger OOM within their cgroup before system-wide OOM.
3. Set memory.low on critical services for best-effort memory protection (kernel reclaims from others first). In systemd: OOMScoreAdjust=-900, MemoryMax=2G, MemoryLow=512M. Monitor with memory.events for oom_kill counts.
L3 (1 questions)¶
1. How does kdump work and what are the key commands to analyze a crash dump?