Skip to content

Portal | Level: L2: Operations | Topics: Container Runtimes | Domain: Kubernetes

Container Runtime Debugging Drills

Under the hood: On Kubernetes nodes, crictl is the standard CLI for CRI-compatible runtimes (containerd, CRI-O). It replaces docker commands on nodes that use containerd. Key equivalents: docker ps = crictl ps, docker logs = crictl logs, docker inspect = crictl inspect. The runtime socket is typically at /run/containerd/containerd.sock or /run/crio/crio.sock.

Remember: To debug a container from the host: find the container PID with crictl inspect, then use nsenter -t <PID> -n to enter the container's network namespace. This lets you run ss, ip, and tcpdump inside the container's network context without installing tools in the container image.

Drill 1: Find Container PID

Difficulty: Easy

Q: A container named web-app is running on the node. Using crictl, how do you find the PID of the main process inside the container?

Answer
# Find the container ID
CONTAINER_ID=$(crictl ps --name web-app -q)

# Get the PID
crictl inspect $CONTAINER_ID | jq '.info.pid'

Drill 2: Ephemeral Debug Container

Difficulty: Easy

Q: You need to debug a running pod api-server but it uses a distroless image with no shell. How do you attach a debug container with networking tools?

Answer
kubectl debug -it api-server --image=nicolaka/netshoot --target=api-server
The `--target` flag shares the process namespace with the specified container, so you can see its processes. The debug container has curl, dig, tcpdump, etc.

Drill 3: Network Namespace Debugging

Difficulty: Medium

Q: You're on a node and need to run tcpdump inside a container's network namespace, but the container doesn't have tcpdump installed. How?

Answer
# Get the container PID
PID=$(crictl inspect <container-id> | jq '.info.pid')

# Use nsenter to enter just the network namespace, using the HOST's tcpdump
nsenter -t $PID -n -- tcpdump -i eth0 -nn -c 50 port 80
The `-n` flag enters only the network namespace. tcpdump comes from the host but sees the container's network stack.

Drill 4: Check Open File Descriptors

Difficulty: Medium

Q: A container is suspected of leaking file descriptors. How do you check how many FDs it has open from the host?

Answer
PID=$(crictl inspect <container-id> | jq '.info.pid')

# Count open file descriptors
ls /proc/$PID/fd | wc -l

# See what they are
ls -la /proc/$PID/fd

# Check the process limits for max FDs
grep "Max open files" /proc/$PID/limits

Drill 5: Debug OOMKilled Container

Difficulty: Medium

Q: A pod keeps getting OOMKilled. From the node, how do you check the actual memory usage of the container and its OOM score?

Answer
PID=$(crictl inspect <container-id> | jq '.info.pid')

# Current memory usage summary
cat /proc/$PID/smaps_rollup

# OOM score (higher = more likely to be killed, max 1000)
cat /proc/$PID/oom_score

# OOM score adjustment
cat /proc/$PID/oom_score_adj

# Check cgroup memory limit
cat /proc/$PID/cgroup
# Then check the cgroup memory limit:
cat /sys/fs/cgroup/memory/<cgroup-path>/memory.limit_in_bytes
cat /sys/fs/cgroup/memory/<cgroup-path>/memory.usage_in_bytes

Drill 6: Trace System Calls

Difficulty: Medium

Q: An application is hanging. You suspect it's blocked on a syscall. How do you use strace to find out what it's doing?

Answer
PID=$(crictl inspect <container-id> | jq '.info.pid')

# See what the process is doing right now
strace -p $PID -f -tt

# Just see a summary of time spent in each syscall
strace -p $PID -c -f

# Trace only network-related syscalls
strace -p $PID -e trace=network

# Trace file operations
strace -p $PID -e trace=open,read,write,close
Common findings: - Stuck on `futex()` → thread waiting for a lock - Stuck on `epoll_wait()` → waiting for I/O events (often normal) - Stuck on `connect()` → TCP connection not completing - Repeated `read()` returning 0 → connection closed by remote

Drill 7: Check DNS from Container Namespace

Difficulty: Easy

Q: A container can't resolve service names. How do you check its DNS configuration and test resolution without exec'ing into it?

Answer
PID=$(crictl inspect <container-id> | jq '.info.pid')

# Check resolv.conf
nsenter -t $PID -m -- cat /etc/resolv.conf

# Test DNS resolution using host's tools in container's network
nsenter -t $PID -n -- nslookup kubernetes.default.svc.cluster.local

# Check if DNS port is reachable
nsenter -t $PID -n -- nc -zv 10.96.0.10 53

Drill 8: Container Filesystem Investigation

Difficulty: Medium

Q: You need to check if a config file inside a running container has the correct contents, but the container has no shell. How?

Answer
PID=$(crictl inspect <container-id> | jq '.info.pid')

# Enter the mount namespace to see the container's filesystem
nsenter -t $PID -m -- cat /app/config.yaml

# List files in the container
nsenter -t $PID -m -- ls -la /app/

# Check file permissions
nsenter -t $PID -m -- stat /app/config.yaml

# Alternative: access via /proc
cat /proc/$PID/root/app/config.yaml
The `/proc/$PID/root/` path gives you direct access to the container's root filesystem from the host.

Drill 9: Debug Node

Difficulty: Easy

Q: You need to check the kubelet logs and disk usage on a node but only have kubectl access (no SSH). How?

Answer
# Debug the node — gives you a pod with host filesystem at /host
kubectl debug node/worker-1 -it --image=ubuntu

# Inside the debug pod:
chroot /host bash

# Now you have full host access
journalctl -u kubelet --tail=50
df -h
crictl ps
systemctl status containerd

Drill 10: Identify Resource Throttling

Difficulty: Hard

Q: A container is slow but not OOMKilled. You suspect CPU throttling. How do you confirm from the node?

Answer
PID=$(crictl inspect <container-id> | jq '.info.pid')

# Find the cgroup path
CGROUP=$(cat /proc/$PID/cgroup | grep cpu | cut -d: -f3)

# Check CPU throttling (cgroup v1)
cat /sys/fs/cgroup/cpu${CGROUP}/cpu.stat
# Look for:
#   nr_throttled — number of times throttled
#   throttled_time — total time throttled (nanoseconds)

# cgroup v2
cat /sys/fs/cgroup${CGROUP}/cpu.stat
# Look for:
#   nr_throttled
#   throttled_usec

# Check the CPU quota
cat /sys/fs/cgroup/cpu${CGROUP}/cpu.cfs_quota_us    # -1 = no limit
cat /sys/fs/cgroup/cpu${CGROUP}/cpu.cfs_period_us   # Usually 100000 (100ms)
# quota/period = CPU cores allowed
If `nr_throttled` is high and increasing, the container needs a higher CPU limit or is genuinely overloaded.

Drill 11: crictl vs docker vs kubectl

Difficulty: Easy

Q: What's the difference between crictl, docker, and kubectl exec? When would you use each?

Answer | Tool | Level | When to Use | |------|-------|-------------| | `kubectl exec` | Kubernetes API | First choice. Works from anywhere with kubeconfig. | | `crictl` | Container runtime (CRI) | When kubectl doesn't work or need runtime-level info. Requires node access. | | `docker` | Docker daemon | Only if Docker is the runtime (deprecated in K8s 1.24+). | Key differences: - `kubectl exec` goes through API server → kubelet → CRI → container - `crictl exec` goes directly to the CRI (containerd/CRI-O) on the node - `crictl` can show info kubectl can't: container PIDs, low-level state, image layers Use `crictl` when: - API server is down - kubelet is having issues - You need container-level details not exposed by kubectl

Drill 12: Investigate Zombie Processes

Difficulty: Hard

Q: A container's memory keeps growing and you suspect zombie processes. How do you check for zombies and what causes them?

Answer
PID=$(crictl inspect <container-id> | jq '.info.pid')

# Check for zombie processes in the container's PID namespace
nsenter -t $PID -p -- ps aux | grep -w Z

# Count zombies
nsenter -t $PID -p -- ps aux | awk '$8 ~ /Z/ {count++} END {print count}'

# Check what the zombies' parent is
nsenter -t $PID -p -- ps -ef | grep defunct
Zombies in containers are caused by PID 1 not reaping child processes. Fixes: 1. Use `tini` or `dumb-init` as the entrypoint (acts as init process) 2. Set `shareProcessNamespace: true` in the pod spec (Kubernetes handles init) 3. Fix the application to properly wait() for child processes

Wiki Navigation

Prerequisites