Skip to content

Portal | Level: L1: Foundations | Topics: Containers Deep Dive, Docker / Containers, Container Runtimes | Domain: DevOps & Tooling

Containers Deep Dive - Primer

Why This Matters

Containers are not VMs. They are not "lightweight virtual machines." They are processes running on a shared Linux kernel, isolated by kernel primitives that have existed for over a decade. If you only know docker run, you are driving a car without understanding the engine, transmission, or brakes. When something breaks — and it will — you will be helpless.

This primer covers the Linux primitives that make containers work, the runtime stack that orchestrates them, the image format that packages them, and the networking and storage models that connect them. Everything here is foundational. Every debugging session, every performance investigation, every security audit comes back to these building blocks.


Linux Namespaces

Namespaces are the isolation mechanism. Each namespace type restricts what a process can see of a particular system resource. A container is a process (or group of processes) running inside a set of namespaces.

The Seven Namespace Types

Namespace Flag What it isolates Kernel version
PID CLONE_NEWPID Process IDs — container sees its own PID tree, PID 1 is the entrypoint 2.6.24 (2008)
NET CLONE_NEWNET Network stack — interfaces, routes, iptables, sockets 2.6.29 (2009)
MNT CLONE_NEWNS Mount points — container has its own filesystem view 2.4.19 (2002, the first namespace!)
UTS CLONE_NEWUTS Hostname and NIS domain name 2.6.19 (2006)
IPC CLONE_NEWIPC System V IPC, POSIX message queues 2.6.19 (2006)
USER CLONE_NEWUSER User and group IDs — root inside container maps to unprivileged user outside 3.8 (2013)
CGROUP CLONE_NEWCGROUP Cgroup root directory — container sees only its own cgroup hierarchy 4.6 (2016)

Inspecting namespaces

# List namespaces for a process
ls -la /proc/<pid>/ns/
# lrwxrwxrwx 1 root root 0 ... cgroup -> 'cgroup:[4026531835]'
# lrwxrwxrwx 1 root root 0 ... ipc -> 'ipc:[4026532456]'
# lrwxrwxrwx 1 root root 0 ... mnt -> 'mnt:[4026532454]'
# lrwxrwxrwx 1 root root 0 ... net -> 'net:[4026532459]'
# lrwxrwxrwx 1 root root 0 ... pid -> 'pid:[4026532457]'
# lrwxrwxrwx 1 root root 0 ... user -> 'user:[4026531837]'
# lrwxrwxrwx 1 root root 0 ... uts -> 'uts:[4026532455]'

# Enter a container's network namespace
nsenter -t <pid> -n ip addr

# Enter all namespaces of a container process
nsenter -t <pid> -m -u -i -n -p -- /bin/sh

# Create a new namespace manually (for learning)
unshare --pid --fork --mount-proc /bin/bash
# You're now in a new PID namespace. `ps aux` shows only your shell.

PID namespace details

PID 1 inside the container is special. It is the init process. If PID 1 exits, the entire container stops. If PID 1 does not handle SIGCHLD, zombie processes accumulate. This is why tini and dumb-init exist — they act as proper init processes that reap children.

# Inside a container
ps aux
# PID 1 is your ENTRYPOINT
# From the host, the same process has a different, higher PID

# Host view
ps aux | grep <container-process>
# PID is something like 28473 on the host

NET namespace details

Each container gets its own network stack: interfaces, IP addresses, routing table, iptables rules, and socket listings. The container's eth0 is one end of a veth pair; the other end lives in the host namespace and is plugged into a bridge (usually docker0 or cni0).

# From host: find the veth pair for a container
PID=$(docker inspect --format '{{.State.Pid}}' <container>)
nsenter -t $PID -n ip link show eth0
# Note the interface index (e.g., "if4")
# On host: ip link | grep "if4" shows the veth peer

USER namespace details

User namespaces allow mapping UID 0 inside the container to an unprivileged UID on the host. This is the foundation of rootless containers. Inside the container, the process thinks it is root. On the host, it is UID 100000 (or similar). If the process escapes the container, it has no privileges.

# Check if a container uses user namespaces
cat /proc/<pid>/uid_map
# Format: <inside-uid> <outside-uid> <range>
# "0 100000 65536" means container root maps to host UID 100000

Cgroups: Resource Limits

Namespaces isolate visibility. Cgroups (control groups) limit resources. Without cgroups, a container could consume all CPU, memory, or I/O on the host and starve everything else.

Cgroups v1 vs v2

Aspect cgroups v1 cgroups v2
Hierarchy Separate hierarchy per controller (cpu, memory, blkio, pids, etc.) Single unified hierarchy
Mount point /sys/fs/cgroup/<controller>/ /sys/fs/cgroup/ (unified)
Controller enablement Per-hierarchy Per-subtree via cgroup.subtree_control
Delegation Complex, error-prone Clean delegation to unprivileged processes
PSI (pressure stall info) Not available Built-in (cpu.pressure, memory.pressure, io.pressure)
Memory accounting Approximate More accurate, includes kernel memory
Default in modern distros Legacy Ubuntu 22.04+, Fedora 31+, RHEL 9+

Key cgroup controllers

# CPU — limit CPU time
# v2: cpu.max = "<quota> <period>" in microseconds
echo "50000 100000" > /sys/fs/cgroup/<group>/cpu.max  # 50% of one CPU

# Memory — hard limit (OOM kill) and soft limit (reclaim pressure)
echo 536870912 > /sys/fs/cgroup/<group>/memory.max    # 512MB hard limit
echo 268435456 > /sys/fs/cgroup/<group>/memory.high   # 256MB soft limit (reclaim starts)

# PIDs — prevent fork bombs
echo 256 > /sys/fs/cgroup/<group>/pids.max

# I/O — limit block device throughput
# Format: "<major>:<minor> rbps=<bytes> wbps=<bytes>"
echo "8:0 rbps=10485760 wbps=10485760" > /sys/fs/cgroup/<group>/io.max  # 10MB/s

Checking container cgroup limits

# Docker: inspect resource limits
docker inspect --format '{{.HostConfig.Memory}}' <container>
docker inspect --format '{{.HostConfig.NanoCpus}}' <container>

# From inside the container (cgroups v2)
cat /sys/fs/cgroup/memory.max
cat /sys/fs/cgroup/cpu.max
cat /sys/fs/cgroup/pids.max

# Current usage
cat /sys/fs/cgroup/memory.current
cat /sys/fs/cgroup/cpu.stat

Pressure Stall Information (cgroups v2 only)

PSI tells you whether a cgroup is resource-starved, not just how much it is using.

cat /sys/fs/cgroup/<group>/memory.pressure
# some avg10=5.23 avg60=2.10 avg300=0.87 total=123456
# "some" = at least one task stalled waiting for memory
# "full" = all tasks stalled (severe)

cat /sys/fs/cgroup/<group>/cpu.pressure
cat /sys/fs/cgroup/<group>/io.pressure

Union Filesystems and Image Layers

Container images are not monolithic disk images. They are stacks of read-only layers, composed into a single filesystem view using a union filesystem (usually OverlayFS on modern Linux).

How OverlayFS works

+-------------------+
| Container Layer   |  ← writable (upperdir), changes go here
+-------------------+
| Image Layer N     |  ← read-only (lowerdir)
+-------------------+
| Image Layer N-1   |  ← read-only (lowerdir)
+-------------------+
| ...               |
+-------------------+
| Base Layer        |  ← read-only (lowerdir)
+-------------------+
        |
    [merged view]    ← what the container process sees (mount point)

Debug clue: If a container is using unexpectedly high disk I/O, check if it is modifying large files from image layers. Each modification triggers a full file copy-up to the writable layer. A common culprit: application log files written to a path that exists in the image layer, causing multi-megabyte copy-ups on first write.

Copy-on-write: When a container modifies a file from a lower layer, the file is copied up to the writable layer first. The original in the lower layer is untouched. This is why modifying large files in containers is expensive — the entire file is copied, even for a one-byte change.

Whiteout files: Deleting a file from a lower layer creates a whiteout marker in the upper layer. The file still exists in the image layer, consuming space. This is why RUN apt-get install && apt-get clean must be in the same layer.

# Inspect OverlayFS mount for a running container
docker inspect --format '{{.GraphDriver.Data}}' <container>
# Shows: LowerDir, UpperDir, MergedDir, WorkDir

# See what the container has written (upperdir = container changes)
ls /var/lib/docker/overlay2/<id>/diff/

# Check layer sizes
docker history <image>
# Shows each layer, its command, and its size

Image layer best practices

Each Dockerfile instruction that modifies the filesystem creates a new layer. Layer ordering determines cache efficiency:

# BAD: copying source first invalidates all subsequent layers on any code change
COPY . /app
RUN pip install -r requirements.txt

# GOOD: copy requirements first, install, then copy source
COPY requirements.txt /app/
RUN pip install -r /app/requirements.txt
COPY . /app
# Now source changes don't re-trigger pip install

Container Runtimes

The "container runtime" is not one thing. It is a stack with clearly defined boundaries.

The runtime stack

High-level runtime (container management)
├── containerd — manages container lifecycle, image pulls, storage, networking setup
├── CRI-O     — lightweight alternative, built specifically for Kubernetes CRI
Low-level runtime (actually creates the container)
├── runc      — reference OCI runtime, creates namespaces + cgroups + executes process
├── crun      — C implementation, faster startup, lower memory
├── gVisor    — sandboxed runtime, intercepts syscalls (security-focused)
├── Kata      — micro-VM runtime, each container is a lightweight VM

containerd architecture

# containerd is the most common high-level runtime
# Docker uses containerd internally since Docker 1.11

# Interact with containerd directly
ctr containers list
ctr images list
ctr tasks list

# On Kubernetes nodes, use crictl instead
crictl ps
crictl pods
crictl images

runc: what actually happens

When you docker run, here is what runc does at the lowest level:

  1. Creates a new set of Linux namespaces (PID, NET, MNT, UTS, IPC, optionally USER, CGROUP)
  2. Sets up cgroups with resource limits
  3. Pivots the root filesystem to the container's rootfs (prepared from image layers)
  4. Drops capabilities, applies seccomp filters, sets up AppArmor/SELinux profiles
  5. Executes the container's entrypoint process as PID 1 in the new namespace set
# You can run runc manually (for learning)
mkdir -p mycontainer/rootfs
docker export $(docker create busybox) | tar -C mycontainer/rootfs -xf -
cd mycontainer
runc spec  # generates config.json (OCI runtime spec)
runc run mycontainer

OCI Specification

Who made it: The OCI was founded in 2015 by Docker, CoreOS, Google, Microsoft, and others under the Linux Foundation. Docker donated its container image format and runtime (runc) as the starting point. The OCI exists because the industry needed container portability guarantees — an image built with Docker must run on any OCI-compliant runtime.

The Open Container Initiative (OCI) defines two specs that make containers portable across runtimes.

Image Spec

An OCI image consists of:

Component Purpose
Manifest JSON listing all layers (digests) and the config
Config JSON with environment, entrypoint, exposed ports, labels, architecture
Layers Compressed tar archives (gzip or zstd), each representing a filesystem diff
Index (optional) Multi-architecture manifest list
# Inspect an image manifest
docker manifest inspect nginx:1.25
# Shows: mediaType, config digest, layer digests, platform info

# Inspect image config
docker inspect nginx:1.25
# Shows: Env, Cmd, Entrypoint, ExposedPorts, Labels, Architecture

# Inspect layers
docker save nginx:1.25 | tar -tf -
# Shows: manifest.json, config JSON, layer tar.gz files

Runtime Spec

The OCI runtime spec (config.json) defines: - Root filesystem path - Mounts - Process (args, env, cwd, capabilities, rlimits) - Linux-specific: namespaces, cgroups path, seccomp profile, sysctl, apparmor profile - Hooks: prestart, createRuntime, createContainer, startContainer, poststart, poststop

Digests vs tags

Tags are mutable pointers. nginx:1.25 can point to different images on different days. Digests are immutable content hashes.

# Pull by tag (mutable — NOT reproducible)
docker pull nginx:1.25

# Pull by digest (immutable — reproducible)
docker pull nginx@sha256:6db391d1c0cfb...

# Get the digest of an image
docker inspect --format '{{.RepoDigests}}' nginx:1.25

Docker Architecture

Docker is a user-facing tool. Under the hood, it delegates to containerd and runc.

docker CLI (client)
    |
    | REST API (usually /var/run/docker.sock)
    |
dockerd (Docker daemon)
    |
    | gRPC
    |
containerd
    |
    | OCI runtime spec
    |
containerd-shim → runc → [container process]

containerd-shim: A per-container process that parents the container process. It allows containerd to restart without killing running containers and reports exit status back.

# See the process tree
pstree -p $(pgrep dockerd)
# dockerd → containerd → containerd-shim → <your process>

# Docker socket location
ls -la /var/run/docker.sock

# Docker storage driver
docker info | grep "Storage Driver"
# Usually: overlay2

Dockerfile Best Practices

Layer ordering for cache efficiency

# Least-frequently-changing content goes first
FROM python:3.11-slim

# System deps change rarely
RUN apt-get update && \
    apt-get install -y --no-install-recommends curl && \
    rm -rf /var/lib/apt/lists/*

# App deps change occasionally
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# App code changes frequently — last
COPY . .

Multi-stage builds

# Build stage — has compilers, dev tools
FROM golang:1.22 AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o /app/server .

# Runtime stage — minimal image
FROM gcr.io/distroless/static:nonroot
COPY --from=builder /app/server /server
USER nonroot:nonroot
ENTRYPOINT ["/server"]

Result: final image has no compiler, no source code, no package manager. Attack surface is minimal.

COPY vs ADD

Feature COPY ADD
Copy files from build context Yes Yes
Copy from URL No Yes (but don't — use curl in RUN)
Auto-extract tar archives No Yes
Predictable behavior Yes Surprising

Rule: Always use COPY unless you specifically need tar auto-extraction.

RUN consolidation

# BAD: three layers, apt cache persists in layer 1
RUN apt-get update
RUN apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*

# GOOD: one layer, clean up in the same layer
RUN apt-get update && \
    apt-get install -y --no-install-recommends curl && \
    rm -rf /var/lib/apt/lists/*

ENTRYPOINT vs CMD

# ENTRYPOINT = the executable (hard to override, use exec form)
# CMD = default arguments (easy to override)

ENTRYPOINT ["python", "app.py"]
CMD ["--port", "8000"]

# docker run myapp                → python app.py --port 8000
# docker run myapp --port 9000    → python app.py --port 9000
# docker run --entrypoint sh myapp → sh (overrides entrypoint)

Shell form vs exec form:

# Exec form (preferred) — process is PID 1, receives signals directly
ENTRYPOINT ["python", "app.py"]

# Shell form — wraps in /bin/sh -c, PID 1 is sh, signals don't reach your process
ENTRYPOINT python app.py

Non-root USER

RUN groupadd -r appuser && useradd -r -g appuser -d /app -s /sbin/nologin appuser
USER appuser
WORKDIR /app

.dockerignore

# .dockerignore
.git
.github
__pycache__
*.pyc
.env
.env.*
node_modules
.vscode
*.md
!README.md
Dockerfile
docker-compose*.yml

Without .dockerignore, COPY . . sends the entire build context (including .git, node_modules, secrets) to the daemon. This slows builds and risks leaking secrets.


Container Networking

Network drivers

Driver Use case How it works
bridge (default) Single-host, container-to-container Virtual bridge (docker0), veth pairs, NAT for external access
host Performance-critical, no isolation Container shares host network namespace directly
none Security-sensitive, custom networking No networking at all
overlay Multi-host (Swarm, some K8s setups) VXLAN encapsulation between hosts
macvlan Direct L2 access needed Container gets its own MAC address on the physical network

Bridge networking internals

# Default bridge
docker network inspect bridge
# Shows: subnet (usually 172.17.0.0/16), gateway, connected containers

# How traffic flows:
# Container eth0 → veth pair → docker0 bridge → iptables NAT → host eth0 → internet

# See the bridge
brctl show docker0   # or: ip link show docker0
# See veth pairs
ip link show type veth

# See NAT rules
iptables -t nat -L -n | grep MASQUERADE
iptables -t nat -L -n | grep DNAT  # port mappings

DNS resolution in user-defined networks

# Default bridge: no DNS, containers must use --link (deprecated) or IPs
# User-defined bridge: built-in DNS at 127.0.0.11

docker network create mynet
docker run -d --name db --network mynet postgres:16
docker run -it --network mynet busybox nslookup db
# Resolves to the container's IP on mynet

Port mapping

# Map host port 8080 to container port 80
docker run -p 8080:80 nginx

# Map to specific host interface
docker run -p 127.0.0.1:8080:80 nginx

# Random host port
docker run -p 80 nginx
docker port <container>  # shows the assigned host port

# How it works: iptables DNAT rule
iptables -t nat -L DOCKER -n
# DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 to:172.17.0.2:80

Container Storage

Three storage options

Type Managed by Docker Persists after container removal Use case
Volumes Yes (/var/lib/docker/volumes/) Yes Database data, shared config
Bind mounts No (any host path) Yes (it's the host filesystem) Development, host config files
tmpfs No (memory-backed) No Secrets, temp data, scratch space
# Named volume
docker volume create pgdata
docker run -v pgdata:/var/lib/postgresql/data postgres:16

# Bind mount
docker run -v /host/path:/container/path:ro nginx
# :ro = read-only inside container

# tmpfs
docker run --tmpfs /tmp:rw,noexec,nosuid,size=100m myapp

# Inspect a volume
docker volume inspect pgdata
# Shows: Mountpoint, Driver, Labels

Volume gotchas

# Anonymous volumes are created when Dockerfile has VOLUME directive
# They persist but are easy to lose track of
docker volume ls -f dangling=true  # orphaned volumes

# Volume mount vs image data: if the volume is empty, Docker copies
# image data into it (first run only). If the volume has data, it
# shadows the image data completely.

Security

Linux capabilities

Containers do not run with full root capabilities. Docker drops dangerous capabilities by default and keeps a minimal set.

# Default capabilities (Docker)
# CHOWN, DAC_OVERRIDE, FSETID, FOWNER, MKNOD, NET_RAW, SETGID,
# SETUID, SETFCAP, SETPCAP, NET_BIND_SERVICE, SYS_CHROOT, KILL, AUDIT_WRITE

# Drop all, add only what you need
docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE myapp

# Check capabilities of a running container
docker inspect --format '{{.HostConfig.CapAdd}}' <container>
docker inspect --format '{{.HostConfig.CapDrop}}' <container>

# Inside the container
cat /proc/1/status | grep Cap
# Decode: capsh --decode=<hex-value>

Seccomp profiles

Seccomp filters restrict which system calls a container can make. Docker's default profile blocks ~44 syscalls (including reboot, kexec_load, mount, umount).

# Run with default seccomp profile (automatic)
docker run myapp

# Run with custom seccomp profile
docker run --security-opt seccomp=profile.json myapp

# Run with no seccomp (dangerous, for debugging only)
docker run --security-opt seccomp=unconfined myapp

# Check if seccomp is active
docker inspect --format '{{.HostConfig.SecurityOpt}}' <container>

AppArmor and SELinux

# AppArmor (Debian/Ubuntu)
docker run --security-opt apparmor=docker-default myapp
# Check: cat /proc/<pid>/attr/current

# SELinux (RHEL/CentOS/Fedora)
docker run --security-opt label=type:container_t myapp
# Check: ps -eZ | grep <container-process>

Rootless containers

Run the entire Docker daemon as an unprivileged user. No root anywhere in the stack.

# Install rootless Docker
dockerd-rootless-setuptool.sh install

# Verify
docker info | grep "rootless"
# "rootless" should appear in Security Options

# Limitations: no binding to ports < 1024, no AppArmor,
# limited storage drivers (overlay2 requires kernel 5.11+)

Read-only root filesystem

docker run --read-only --tmpfs /tmp --tmpfs /var/run myapp
# Container cannot write to any path except /tmp and /var/run
# Prevents malware from modifying binaries or writing to disk

BuildKit and Build Cache

BuildKit is the modern Docker build engine (default since Docker 23.0). It is faster, supports concurrent layer building, and has better caching.

Key BuildKit features

# Enable BuildKit (if not default)
DOCKER_BUILDKIT=1 docker build .

# Build with cache mount (don't re-download pip packages)
# syntax=docker/dockerfile:1
FROM python:3.11-slim
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install -r requirements.txt

# Build with secret mount (don't leak secrets in layers)
RUN --mount=type=secret,id=mytoken \
    TOKEN=$(cat /run/secrets/mytoken) && \
    curl -H "Authorization: Bearer $TOKEN" https://api.example.com/install.sh | sh

# Build command with secrets
docker build --secret id=mytoken,src=./token.txt .

# Cache export/import (for CI)
docker build --cache-to type=local,dest=/tmp/cache .
docker build --cache-from type=local,src=/tmp/cache .

Build cache invalidation

Cache is invalidated when: - The instruction changes (different RUN command) - A COPY/ADD source file changes (content hash, not timestamp) - Any parent layer is invalidated (cascade)

# See build cache usage
docker builder prune --all --dry-run

# Clear build cache
docker builder prune --all

Podman: Docker Alternative

Podman is a daemonless, rootless container engine. It is CLI-compatible with Docker but architecturally different.

Key differences from Docker

Aspect Docker Podman
Daemon dockerd (always running) No daemon (fork/exec model)
Root requirement Requires root by default Rootless by default
Socket /var/run/docker.sock No socket needed (but can emulate)
Systemd integration Requires extra config Native (generates systemd units)
Pod concept No native pods First-class pods (like K8s)
OCI compliance Yes Yes
Compose docker-compose / docker compose podman-compose or podman compose
# Drop-in replacement for Docker CLI
alias docker=podman

# Run a container
podman run -d --name web -p 8080:80 nginx:1.25

# Create a pod (Kubernetes-style)
podman pod create --name mypod -p 8080:80
podman run -d --pod mypod --name web nginx:1.25
podman run -d --pod mypod --name app myapp:latest
# Both containers share the same network namespace (like K8s)

# Generate a systemd unit file
podman generate systemd --new --name web > ~/.config/systemd/user/container-web.service
systemctl --user enable --now container-web

# Generate a Kubernetes YAML
podman generate kube mypod > pod.yaml

Podman rootless internals

Podman uses user namespaces to map the container's UID 0 to your unprivileged UID. It uses slirp4netns (or pasta) for network namespace connectivity without root.

# Check subuid/subgid mappings
cat /etc/subuid
# youruser:100000:65536

# Verify rootless mode
podman info | grep rootless

Pulling It All Together

When you run docker run -d -p 8080:80 --memory=512m --name web nginx:1.25, here is what happens:

  1. Docker CLI parses flags, sends request to dockerd via REST API
  2. dockerd resolves the image, checks local cache, pulls if needed
  3. containerd prepares the rootfs (assembles overlay layers), creates an OCI runtime spec
  4. containerd-shim is spawned; it invokes runc
  5. runc creates namespaces (PID, NET, MNT, UTS, IPC), sets up cgroups (512MB memory limit), pivots root to the overlay mount, drops capabilities, applies seccomp
  6. runc executes nginx as PID 1 inside the new namespace set
  7. Docker networking creates a veth pair, plugs one end into docker0 bridge, assigns an IP, adds iptables DNAT rule for port 8080 → container:80
  8. containerd-shim remains as the parent process, reporting status back to containerd

Every debugging technique in the street ops and footguns files traces back to one of these layers. Know which layer you are operating at, and you will know which tools to reach for.


Runtime Debugging

When kubectl logs isn't enough, you need to go deeper. Container runtime debugging is the ability to inspect containers at the OS level -- examining namespaces, filesystems, network stacks, and processes.

crictl -- Container Runtime CLI

crictl talks directly to the container runtime (containerd/CRI-O), bypassing Kubernetes. Essential when kubelet is down.

crictl ps                          # list all containers
crictl pods                        # list pods
crictl inspect <container-id>      # inspect a container
crictl logs <container-id>         # get logs (bypasses kubelet)
crictl exec -it <container-id> sh  # execute in container
crictl stats                       # container stats

nsenter -- Enter Container Namespaces

# Find the container's PID
PID=$(crictl inspect <container-id> | jq '.info.pid')

# Enter network namespace (most common)
nsenter -t $PID -n -- ip addr
nsenter -t $PID -n -- ss -tlnp
nsenter -t $PID -n -- curl localhost:8000/health

# Enter mount namespace to inspect files
nsenter -t $PID -m -- ls -la /app/
nsenter -t $PID -m -- cat /etc/resolv.conf
Flag Namespace Isolates
-m Mount Filesystem view
-u UTS Hostname
-n Network Network stack
-p PID Process tree

kubectl debug -- Ephemeral Containers

# Debug with network tools
kubectl debug -it pod/myapp -n ns --image=nicolaka/netshoot -- bash

# Share process namespace (see app's processes)
kubectl debug -it pod/myapp -n ns --image=busybox:1.36 --target=myapp -- sh

# Debug a node
kubectl debug node/worker-1 -it --image=busybox:1.36

strace -- System Call Tracing

When a process crashes or hangs with no useful logs:

strace -p $PID -f -e trace=network     # network calls
strace -p $PID -f -e trace=open,read   # file operations
Pattern Meaning
connect() = -1 ECONNREFUSED Target service down
connect() = -1 ETIMEDOUT Network/firewall issue
open("...") = -1 ENOENT Missing file/mount
futex(FUTEX_WAIT) hanging Deadlock

Common Runtime Debugging Pitfalls

  1. Distroless images -- No shell. Use kubectl debug with a debug image.
  2. Read-only filesystems -- Can't write temp files. Use ephemeral containers.
  3. Dropped capabilities -- Container may lack NET_ADMIN, SYS_PTRACE.
  4. PID namespace isolation -- Use --target flag with kubectl debug.

Wiki Navigation

Prerequisites

Next Steps