Portal | Level: L2: Operations | Topics: Container Runtimes | Domain: Kubernetes
Container Runtime Debugging - Skill Check¶
Mental model (bottom-up)¶
When kubectl isn't enough, go deeper. The debugging hierarchy is: kubectl (API level) → crictl (container runtime) → nsenter (Linux namespaces) → /proc (kernel). Each level gives you more visibility. Distroless containers with no shell require ephemeral debug containers or namespace entry from the host.
Visual stack¶
[kubectl exec/debug ] through API server → kubelet → CRI
|
[crictl ] direct to containerd/CRI-O on the node
|
[nsenter ] enter container's Linux namespaces (net, pid, mnt)
|
[/proc/$PID ] kernel-level process info, fd counts, cgroups
|
[strace ] trace system calls to see what's blocking
Glossary¶
- crictl - CLI for CRI-compatible container runtimes (replacement for
dockeron K8s nodes) - nsenter - enter one or more Linux namespaces of a running process
- ephemeral container -
kubectl debuginjects a debug container into a running pod - /proc/$PID/root - direct access to container's filesystem from the host
- cgroup - Linux resource control group (CPU throttling, memory limits)
- nr_throttled - cgroup counter showing how many times CPU was throttled
- zombie process - child process that exited but wasn't reaped by PID 1
Core questions (easy -> hard)¶
- How do you debug a distroless container with no shell?
kubectl debug -it <pod> --image=busybox --target=<container>— shares PID namespace so you see its processes.- How do you run tcpdump in a container that doesn't have it?
- From node:
nsenter -t $PID -n -- tcpdump -i eth0— enters the container's network namespace using the host's tcpdump. - How do you check if a container is being CPU-throttled?
- Read
/sys/fs/cgroup/cpu/<cgroup>/cpu.stat— look atnr_throttled(count) andthrottled_time(nanoseconds). - Pod is OOMKilled. How do you see actual memory usage from the node?
cat /proc/$PID/smaps_rollupfor memory breakdown,/proc/$PID/oom_scorefor kill priority, cgroupmemory.usage_in_bytes.- Application is hanging. How do you find what it's waiting on?
strace -p $PID -fto see which syscall it's stuck on.futex()= lock wait,epoll_wait()= I/O wait,connect()= TCP hang.- How do you check open file descriptors for a leak?
ls /proc/$PID/fd | wc -lfor count,ls -la /proc/$PID/fdto see what they point to,grep "Max open files" /proc/$PID/limitsfor the limit.- Container has zombie processes. Why and how to fix?
- PID 1 in the container isn't reaping children. Fix: use
tini/dumb-initas entrypoint, or setshareProcessNamespace: true.
Wiki Navigation¶
Prerequisites¶
- Containers Deep Dive (Topic Pack, L1)
Related Content¶
- Container Runtime Drills (Drill, L2) — Container Runtimes
- Container Runtime Flashcards (CLI) (flashcard_deck, L1) — Container Runtimes
- Containers Deep Dive (Topic Pack, L1) — Container Runtimes
- Deep Dive: Containers How They Really Work (deep_dive, L2) — Container Runtimes
- Interview: Docker Container Debugging (Scenario, L1) — Container Runtimes
- cgroups & Linux Namespaces (Topic Pack, L2) — Container Runtimes