Skip to content

Portal | Level: L2: Operations | Topics: Container Runtimes | Domain: Kubernetes

Container Runtime Debugging - Skill Check

Mental model (bottom-up)

When kubectl isn't enough, go deeper. The debugging hierarchy is: kubectl (API level) → crictl (container runtime) → nsenter (Linux namespaces) → /proc (kernel). Each level gives you more visibility. Distroless containers with no shell require ephemeral debug containers or namespace entry from the host.

Visual stack

[kubectl exec/debug  ]  through API server → kubelet → CRI
|
[crictl               ]  direct to containerd/CRI-O on the node
|
[nsenter              ]  enter container's Linux namespaces (net, pid, mnt)
|
[/proc/$PID           ]  kernel-level process info, fd counts, cgroups
|
[strace               ]  trace system calls to see what's blocking

Glossary

  • crictl - CLI for CRI-compatible container runtimes (replacement for docker on K8s nodes)
  • nsenter - enter one or more Linux namespaces of a running process
  • ephemeral container - kubectl debug injects a debug container into a running pod
  • /proc/$PID/root - direct access to container's filesystem from the host
  • cgroup - Linux resource control group (CPU throttling, memory limits)
  • nr_throttled - cgroup counter showing how many times CPU was throttled
  • zombie process - child process that exited but wasn't reaped by PID 1

Core questions (easy -> hard)

  • How do you debug a distroless container with no shell?
  • kubectl debug -it <pod> --image=busybox --target=<container> — shares PID namespace so you see its processes.
  • How do you run tcpdump in a container that doesn't have it?
  • From node: nsenter -t $PID -n -- tcpdump -i eth0 — enters the container's network namespace using the host's tcpdump.
  • How do you check if a container is being CPU-throttled?
  • Read /sys/fs/cgroup/cpu/<cgroup>/cpu.stat — look at nr_throttled (count) and throttled_time (nanoseconds).
  • Pod is OOMKilled. How do you see actual memory usage from the node?
  • cat /proc/$PID/smaps_rollup for memory breakdown, /proc/$PID/oom_score for kill priority, cgroup memory.usage_in_bytes.
  • Application is hanging. How do you find what it's waiting on?
  • strace -p $PID -f to see which syscall it's stuck on. futex() = lock wait, epoll_wait() = I/O wait, connect() = TCP hang.
  • How do you check open file descriptors for a leak?
  • ls /proc/$PID/fd | wc -l for count, ls -la /proc/$PID/fd to see what they point to, grep "Max open files" /proc/$PID/limits for the limit.
  • Container has zombie processes. Why and how to fix?
  • PID 1 in the container isn't reaping children. Fix: use tini/dumb-init as entrypoint, or set shareProcessNamespace: true.

Wiki Navigation

Prerequisites