Portal | Level: L2: Operations | Topics: Container Runtimes | Domain: Kubernetes

Container Runtime Debugging - Skill Check¶

Mental model (bottom-up)¶

When kubectl isn't enough, go deeper. The debugging hierarchy is: kubectl (API level) → crictl (container runtime) → nsenter (Linux namespaces) → /proc (kernel). Each level gives you more visibility. Distroless containers with no shell require ephemeral debug containers or namespace entry from the host.

Visual stack¶

[kubectl exec/debug  ]  through API server → kubelet → CRI
|
[crictl               ]  direct to containerd/CRI-O on the node
|
[nsenter              ]  enter container's Linux namespaces (net, pid, mnt)
|
[/proc/$PID           ]  kernel-level process info, fd counts, cgroups
|
[strace               ]  trace system calls to see what's blocking

Glossary¶

crictl - CLI for CRI-compatible container runtimes (replacement for docker on K8s nodes)
nsenter - enter one or more Linux namespaces of a running process
ephemeral container - kubectl debug injects a debug container into a running pod
/proc/$PID/root - direct access to container's filesystem from the host
cgroup - Linux resource control group (CPU throttling, memory limits)
nr_throttled - cgroup counter showing how many times CPU was throttled
zombie process - child process that exited but wasn't reaped by PID 1

Core questions (easy -> hard)¶

How do you debug a distroless container with no shell?
kubectl debug -it <pod> --image=busybox --target=<container> — shares PID namespace so you see its processes.
How do you run tcpdump in a container that doesn't have it?
From node: nsenter -t $PID -n -- tcpdump -i eth0 — enters the container's network namespace using the host's tcpdump.
How do you check if a container is being CPU-throttled?
Read /sys/fs/cgroup/cpu/<cgroup>/cpu.stat — look at nr_throttled (count) and throttled_time (nanoseconds).
Pod is OOMKilled. How do you see actual memory usage from the node?
cat /proc/$PID/smaps_rollup for memory breakdown, /proc/$PID/oom_score for kill priority, cgroup memory.usage_in_bytes.
Application is hanging. How do you find what it's waiting on?
strace -p $PID -f to see which syscall it's stuck on. futex() = lock wait, epoll_wait() = I/O wait, connect() = TCP hang.
How do you check open file descriptors for a leak?
ls /proc/$PID/fd | wc -l for count, ls -la /proc/$PID/fd to see what they point to, grep "Max open files" /proc/$PID/limits for the limit.
Container has zombie processes. Why and how to fix?
PID 1 in the container isn't reaping children. Fix: use tini/dumb-init as entrypoint, or set shareProcessNamespace: true.

Prerequisites¶

Containers Deep Dive (Topic Pack, L1)

Container Runtime Drills (Drill, L2) — Container Runtimes
Container Runtime Flashcards (CLI) (flashcard_deck, L1) — Container Runtimes
Containers Deep Dive (Topic Pack, L1) — Container Runtimes
Deep Dive: Containers How They Really Work (deep_dive, L2) — Container Runtimes
Interview: Docker Container Debugging (Scenario, L1) — Container Runtimes
cgroups & Linux Namespaces (Topic Pack, L2) — Container Runtimes

Container Runtime Debugging - Skill Check¶

Mental model (bottom-up)¶

Visual stack¶

Glossary¶

Core questions (easy -> hard)¶

Wiki Navigation¶

Prerequisites¶

Pages that link here¶

Container Runtime Debugging - Skill Check¶

Mental model (bottom-up)¶

Visual stack¶

Glossary¶

Core questions (easy -> hard)¶

Wiki Navigation¶

Prerequisites¶

Related Content¶

Pages that link here¶