Containers Deep Dive — Trivia & Interesting Facts¶

Surprising, historical, and little-known facts about container internals and runtime debugging.

nsenter was the original container debugging superpower¶

Before kubectl debug or Docker exec, operators used nsenter to enter a container's Linux namespaces from the host. Written by Karel Zak for util-linux around 2012, it lets you step into any combination of a process's namespaces (mount, network, PID, etc.) independently. You could enter a container's network namespace while keeping the host's filesystem, giving you full debugging tools against the container's network stack.

kubectl debug with ephemeral containers took 5 years to stabilize¶

Ephemeral containers — the ability to inject a debug container into a running pod — was proposed as KEP-277 in 2018 and did not reach stable/GA status until Kubernetes 1.25 in August 2022. The delay was partly philosophical: some maintainers argued that attaching new containers to running pods violated the immutability principle. The compromise was making ephemeral containers unable to have ports, readiness probes, or resource guarantees.

strace inside a container requires capabilities most runtimes strip by default¶

Running strace in a container typically fails with "Operation not permitted" because the default Docker/containerd seccomp profile blocks the ptrace syscall. You need --cap-add=SYS_PTRACE in Docker or the equivalent securityContext in Kubernetes. This single capability unlocks strace, gdb, ltrace, and most memory debuggers. Many teams add it only to debug sidecars, never to production containers.

/proc inside a container is a filtered view of the host¶

When you read /proc/meminfo or /proc/cpuinfo inside a container, you are seeing the host's values — not the container's limits. This has caused countless misconfigurations where JVMs or applications allocate memory based on the host's 256 GB of RAM instead of the container's 512 MB limit. LXCFS was created to solve this by FUSE-mounting a cgroup-aware /proc, and newer JVM versions (8u191+) read cgroup limits directly.

Container runtime logs are often in a completely different place than you expect¶

Container stdout/stderr goes through the runtime's logging driver, which varies by configuration. Docker can send logs to json-file, journald, syslog, fluentd, or other drivers. In Kubernetes, container logs are symlinked from /var/log/containers/ to /var/log/pods/ to the actual log file managed by the container runtime. If a node's disk fills up, it is often these log files (not application data) that are the culprit.

Core dumps in containers require special configuration to be useful¶

When a process crashes inside a container, the core dump is written according to the host's /proc/sys/kernel/core_pattern — not the container's. If the pattern writes to a path that does not exist in the container's filesystem, the dump is silently lost. Additionally, core dumps reference the container's libraries at specific paths, so analyzing them on the host with gdb requires mapping the container's filesystem. Kubernetes has no built-in core dump collection.

When the kernel's OOM killer terminates a container process, the evidence appears in the host's dmesg / kernel ring buffer — not in the container's stdout/stderr. The container simply disappears. The dmesg output shows the process name, its RSS, the cgroup's memory limit, and every process considered for killing. Many operators miss this because they only check kubectl logs, which shows nothing for OOM-killed containers.

runc has a hidden debug mode that traces every syscall it makes¶

Setting RUNC_DEBUG=true or passing --debug to runc produces verbose logs of every operation the runtime performs: namespace creation, cgroup configuration, mount operations, seccomp filter application, and capability drops. This is invaluable for diagnosing "container won't start" issues that produce cryptic one-line errors at the Docker/containerd level.

Docker's built-in healthcheck was ignored by Kubernetes on purpose¶

Docker introduced HEALTHCHECK instructions in Dockerfiles in 2016, but Kubernetes deliberately ignores them. The Kubernetes team decided that health checking belongs at the orchestration layer (via liveness/readiness probes) rather than baked into the image. This means a perfectly healthy container according to Docker may be killed by Kubernetes, or vice versa, if the two health definitions diverge.

tcpdump in a sidecar captures traffic invisible to the host¶

Because containers in a Kubernetes pod share a network namespace, a tcpdump sidecar can capture all traffic for the pod — including loopback traffic between containers — without modifying the application container. This technique catches traffic that host-level packet captures miss, since inter-container communication on localhost never hits the host's network interfaces.

crictl replaced docker CLI for debugging on Kubernetes nodes¶

When Kubernetes deprecated Docker as a runtime in 1.20 (December 2020), operators lost docker ps, docker logs, and docker inspect on nodes. The replacement, crictl, speaks the CRI (Container Runtime Interface) protocol and works with containerd, CRI-O, or any compliant runtime. However, crictl deliberately lacks build and push commands — it is a pure debugging tool, not a development workflow.

Slim.ai and Docker Slim can trace which files a container actually uses¶

Docker Slim (now Slim.ai) runs a container in a monitored sandbox, traces every file access and syscall, then rebuilds the image containing only the files that were actually touched. This approach has produced images 30x smaller than the original while remaining functional. The technique is essentially dynamic analysis applied to container optimization — the opposite of the static analysis that scanners perform.