Portal | Level: L2: Operations | Topics: Container Runtimes | Domain: Kubernetes
Container Runtime Debugging Drills¶
Under the hood: On Kubernetes nodes,
crictlis the standard CLI for CRI-compatible runtimes (containerd, CRI-O). It replacesdockercommands on nodes that use containerd. Key equivalents:docker ps=crictl ps,docker logs=crictl logs,docker inspect=crictl inspect. The runtime socket is typically at/run/containerd/containerd.sockor/run/crio/crio.sock.Remember: To debug a container from the host: find the container PID with
crictl inspect, then usensenter -t <PID> -nto enter the container's network namespace. This lets you runss,ip, andtcpdumpinside the container's network context without installing tools in the container image.
Drill 1: Find Container PID¶
Difficulty: Easy
Q: A container named web-app is running on the node. Using crictl, how do you find the PID of the main process inside the container?
Answer
Drill 2: Ephemeral Debug Container¶
Difficulty: Easy
Q: You need to debug a running pod api-server but it uses a distroless image with no shell. How do you attach a debug container with networking tools?
Answer
The `--target` flag shares the process namespace with the specified container, so you can see its processes. The debug container has curl, dig, tcpdump, etc.Drill 3: Network Namespace Debugging¶
Difficulty: Medium
Q: You're on a node and need to run tcpdump inside a container's network namespace, but the container doesn't have tcpdump installed. How?
Answer
The `-n` flag enters only the network namespace. tcpdump comes from the host but sees the container's network stack.Drill 4: Check Open File Descriptors¶
Difficulty: Medium
Q: A container is suspected of leaking file descriptors. How do you check how many FDs it has open from the host?
Answer
Drill 5: Debug OOMKilled Container¶
Difficulty: Medium
Q: A pod keeps getting OOMKilled. From the node, how do you check the actual memory usage of the container and its OOM score?
Answer
PID=$(crictl inspect <container-id> | jq '.info.pid')
# Current memory usage summary
cat /proc/$PID/smaps_rollup
# OOM score (higher = more likely to be killed, max 1000)
cat /proc/$PID/oom_score
# OOM score adjustment
cat /proc/$PID/oom_score_adj
# Check cgroup memory limit
cat /proc/$PID/cgroup
# Then check the cgroup memory limit:
cat /sys/fs/cgroup/memory/<cgroup-path>/memory.limit_in_bytes
cat /sys/fs/cgroup/memory/<cgroup-path>/memory.usage_in_bytes
Drill 6: Trace System Calls¶
Difficulty: Medium
Q: An application is hanging. You suspect it's blocked on a syscall. How do you use strace to find out what it's doing?
Answer
PID=$(crictl inspect <container-id> | jq '.info.pid')
# See what the process is doing right now
strace -p $PID -f -tt
# Just see a summary of time spent in each syscall
strace -p $PID -c -f
# Trace only network-related syscalls
strace -p $PID -e trace=network
# Trace file operations
strace -p $PID -e trace=open,read,write,close
Drill 7: Check DNS from Container Namespace¶
Difficulty: Easy
Q: A container can't resolve service names. How do you check its DNS configuration and test resolution without exec'ing into it?
Answer
PID=$(crictl inspect <container-id> | jq '.info.pid')
# Check resolv.conf
nsenter -t $PID -m -- cat /etc/resolv.conf
# Test DNS resolution using host's tools in container's network
nsenter -t $PID -n -- nslookup kubernetes.default.svc.cluster.local
# Check if DNS port is reachable
nsenter -t $PID -n -- nc -zv 10.96.0.10 53
Drill 8: Container Filesystem Investigation¶
Difficulty: Medium
Q: You need to check if a config file inside a running container has the correct contents, but the container has no shell. How?
Answer
PID=$(crictl inspect <container-id> | jq '.info.pid')
# Enter the mount namespace to see the container's filesystem
nsenter -t $PID -m -- cat /app/config.yaml
# List files in the container
nsenter -t $PID -m -- ls -la /app/
# Check file permissions
nsenter -t $PID -m -- stat /app/config.yaml
# Alternative: access via /proc
cat /proc/$PID/root/app/config.yaml
Drill 9: Debug Node¶
Difficulty: Easy
Q: You need to check the kubelet logs and disk usage on a node but only have kubectl access (no SSH). How?
Answer
Drill 10: Identify Resource Throttling¶
Difficulty: Hard
Q: A container is slow but not OOMKilled. You suspect CPU throttling. How do you confirm from the node?
Answer
PID=$(crictl inspect <container-id> | jq '.info.pid')
# Find the cgroup path
CGROUP=$(cat /proc/$PID/cgroup | grep cpu | cut -d: -f3)
# Check CPU throttling (cgroup v1)
cat /sys/fs/cgroup/cpu${CGROUP}/cpu.stat
# Look for:
# nr_throttled — number of times throttled
# throttled_time — total time throttled (nanoseconds)
# cgroup v2
cat /sys/fs/cgroup${CGROUP}/cpu.stat
# Look for:
# nr_throttled
# throttled_usec
# Check the CPU quota
cat /sys/fs/cgroup/cpu${CGROUP}/cpu.cfs_quota_us # -1 = no limit
cat /sys/fs/cgroup/cpu${CGROUP}/cpu.cfs_period_us # Usually 100000 (100ms)
# quota/period = CPU cores allowed
Drill 11: crictl vs docker vs kubectl¶
Difficulty: Easy
Q: What's the difference between crictl, docker, and kubectl exec? When would you use each?
Answer
| Tool | Level | When to Use | |------|-------|-------------| | `kubectl exec` | Kubernetes API | First choice. Works from anywhere with kubeconfig. | | `crictl` | Container runtime (CRI) | When kubectl doesn't work or need runtime-level info. Requires node access. | | `docker` | Docker daemon | Only if Docker is the runtime (deprecated in K8s 1.24+). | Key differences: - `kubectl exec` goes through API server → kubelet → CRI → container - `crictl exec` goes directly to the CRI (containerd/CRI-O) on the node - `crictl` can show info kubectl can't: container PIDs, low-level state, image layers Use `crictl` when: - API server is down - kubelet is having issues - You need container-level details not exposed by kubectlDrill 12: Investigate Zombie Processes¶
Difficulty: Hard
Q: A container's memory keeps growing and you suspect zombie processes. How do you check for zombies and what causes them?
Answer
PID=$(crictl inspect <container-id> | jq '.info.pid')
# Check for zombie processes in the container's PID namespace
nsenter -t $PID -p -- ps aux | grep -w Z
# Count zombies
nsenter -t $PID -p -- ps aux | awk '$8 ~ /Z/ {count++} END {print count}'
# Check what the zombies' parent is
nsenter -t $PID -p -- ps -ef | grep defunct
Wiki Navigation¶
Prerequisites¶
- Containers Deep Dive (Topic Pack, L1)
Related Content¶
- Container Runtime Flashcards (CLI) (flashcard_deck, L1) — Container Runtimes
- Containers Deep Dive (Topic Pack, L1) — Container Runtimes
- Deep Dive: Containers How They Really Work (deep_dive, L2) — Container Runtimes
- Interview: Docker Container Debugging (Scenario, L1) — Container Runtimes
- Skillcheck: Container Runtime Debug (Assessment, L2) — Container Runtimes
- cgroups & Linux Namespaces (Topic Pack, L2) — Container Runtimes