Kernel Troubleshooting¶
11 cards — 🟢 3 easy | 🟡 5 medium | 🔴 3 hard
🟢 Easy (3)¶
1. How do you view kernel messages with human-readable timestamps and filter for errors?
Show answer
Use dmesg -T for human-readable timestamps. Filter for errors: dmesg -l err,crit,alert,emerg. Combine: dmesg -T -l err,crit,alert,emerg. Use dmesg -w to follow new messages in real-time. This is your first diagnostic command for system-level issues.2. What patterns should you grep for in dmesg when troubleshooting hardware or system issues?
Show answer
Hardware: "error|fault|fail|warn". Memory: "oom|out of memory|page allocation failure". Disk: "i/o error|medium error|sector|ata|scsi". Network: "link down|link up|carrier|dropped|reset". CPU: "mce|machine check|thermal|throttl". Filesystem: "ext4|xfs|corrupt|mount|remount".3. What is the difference between a kernel oops and a kernel panic?
Show answer
An oops is a kernel bug that kills the offending process but the system usually continues running (degraded, marked tainted). A panic is fatal -- the kernel cannot continue and the system halts or reboots. An oops can escalate to a panic if panic_on_oops=1 is set.🟡 Medium (5)¶
1. What is kdump and how does it capture crash dumps during a kernel panic?
Show answer
kdump reserves a small amount of memory at boot for a second (crash) kernel. During a panic, the crash kernel activates and writes the contents of memory (vmcore) to disk at /var/crash/. Setup: install kexec-tools, enable kdump service, ensure crashkernel=256M is in the kernel command line. Without kdump, crash forensic evidence is lost.2. What is the SysRq REISUB sequence, and when would you use it?
Show answer
REISUB is a safe emergency reboot sequence when the system is hung: R (un-Raw keyboard), E (tErminate all, SIGTERM), I (kIll all, SIGKILL), S (Sync disks), U (Unmount/remount read-only), B (reBoot). Wait 2-5 seconds between each key. This is the cleanest reboot when nothing else works. Enable with kernel.sysrq=1 in /etc/sysctl.d/.3. What does it mean when a kernel is "tainted," and why does it matter?
Show answer
A tainted kernel has been modified from its pristine state. Common taint flags: P (proprietary module like nvidia), F (module force-loaded), W (warning/oops occurred), E (unsigned module). Check with cat /proc/sys/kernel/tainted (0=clean). Tainted kernels may affect vendor support and bug report acceptance.4. How does the OOM killer work, and how do you investigate OOM kill events?
Show answer
When the system runs out of memory, the OOM killer selects and kills a process to free memory. Check for OOM kills: dmesg | grep -i "out of memory\|oom-killer\|killed process". Adjust OOM priority: echo -1000 > /proc/5. What SysRq keys help debug hung systems without rebooting?
Show answer
t = dump task states (debug hung processes), m = dump memory info (debug memory issues), w = dump blocked D-state tasks (debug I/O hangs), s = sync all filesystems, e = send SIGTERM to all processes. Access via Alt+SysRq+🔴 Hard (3)¶
1. How do you analyze a kernel crash dump using the crash utility, and how do you read a backtrace?
Show answer
Install crash and kernel-debuginfo. Open dump: crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux /var/crash/*/vmcore. Key commands: bt (backtrace), log (kernel log at crash time), ps (process list), sys (system info). Read backtraces bottom-up: the lowest frame is where the problem started, the root cause is usually in the middle frames.2. What are Machine Check Exceptions (MCEs), and how do you diagnose them?
Show answer
MCEs are hardware errors reported by the CPU. Common causes: faulty RAM (diagnose with memtest86+), overheating CPU (check thermal sensors), or failing CPU (needs replacement). Check with dmesg | grep -i "machine check\|mce". Install mcelog for detailed analysis. MCEs indicate real hardware problems that cannot be fixed with software.3. How do you configure kdump for remote crash dump storage, and what does the dump level (-d flag) control?