Linux Debugging¶
12 cards — 🟢 4 easy | 🟡 6 medium | 🔴 2 hard
🟢 Easy (4)¶
1. Why is dstat a good first tool when a machine is misbehaving?
Show answer
dstat shows CPU, disk, network, and memory stats updating in real time in a single view. It quickly answers "is the machine CPU-bound, disk-bound, memory-bound, or network-bound?" without switching between tools.2. What three questions can lsof answer?
Show answer
1. What files is this process holding open? (lsof -p2. Who is listening on this port? (lsof -i :
3. Why won't this filesystem unmount? (lsof /mount/point)
3. How do you list all TCP connections with process info using ss?
Show answer
ss -tp shows established TCP connections with the owning process. Add -l for listening sockets (ss -tlnp), -u for UDP (ss -unp).4. When should you check dmesg during debugging and what would you look for?
Show answer
Check dmesg when processes are killed unexpectedly (OOM killer messages), hardware errors are suspected (disk I/O errors, NIC failures), or containers crash without application logs. Look for: "Out of memory: Killed process", "I/O error", "segfault at", or "hardware error".🟡 Medium (6)¶
1. What kinds of problems is strace best at revealing?
Show answer
File access errors (ENOENT, EACCES), network connection failures, missing libraries, permission problems, signal delivery, and slow syscalls. It shows every syscall a process makes with arguments and return values.2. What does perf help you understand that top does not?
Show answer
perf shows WHERE CPU time is going at the function level (hot functions, call stacks) and can trace kernel and user-space events. top only shows per-process CPU percentage.3. A process is leaking file descriptors. How would you confirm this and find what it is opening?
Show answer
Check /proc/4. A service is slow but top shows low CPU usage. What does that tell you and what do you check next?
Show answer
Low CPU with high latency means the process is waiting, not computing. It is likely I/O-bound or blocked on a lock/network call. Check: iostat (disk I/O saturation), ss -tp (connection state — many CLOSE_WAIT or SYN_SENT?), strace -p5. What do vmstat and iostat show that top does not?
Show answer
vmstat shows memory, swap, I/O, and CPU stats per interval (reveals swapping and I/O wait). iostat shows per-device disk I/O statistics (throughput, queue depth, utilization per disk).6. How do you find which process is preventing a port from being reused?
Show answer
lsof -i :🔴 Hard (2)¶
1. What are common debugging mistakes that Linux tools can prevent?
Show answer
Reaching for application logs only (instead of OS-level data), assuming "slow" means CPU (could be disk or network), assuming "network issue" without packet capture, ignoring /proc, and restarting services before gathering evidence.2. You need to strace a production service handling 10K requests/sec. What precautions do you take?