Containers Deep Dive - Footguns & Pitfalls¶

Mistakes that silently break builds, leak secrets, waste disk, or cause outages. Every one of these has been made in production by experienced engineers.

1. Running as root in containers¶

Your Dockerfile has no USER directive. The process runs as UID 0 inside the container. If an attacker exploits your application, they have root inside the container. Combined with a kernel vulnerability or a misconfigured mount, they escape to the host as root.

# BAD: default is root
FROM python:3.11-slim
COPY . /app
CMD ["python", "app.py"]

# GOOD: create and switch to unprivileged user
FROM python:3.11-slim
RUN groupadd -r appuser && useradd -r -g appuser -d /app -s /sbin/nologin appuser
COPY --chown=appuser:appuser . /app
USER appuser
CMD ["python", "app.py"]

Why it bites: Many base images default to root. Your app works fine as root during development. You never notice until a security scan flags it or an incident forces you to care.

2. Using the `latest` tag¶

You deploy myapp:latest to production. It works. Three weeks later, someone pushes a new build to latest. Your next pod restart pulls a completely different image. Your production environment is now running untested code.

# BAD
docker pull nginx:latest
# What version did you get? Who knows.

# GOOD
docker pull nginx:1.25.4
# Better: pin the digest
docker pull nginx@sha256:6db391d1c0cfb30588ba0bf72ea999404f2764e...

Why it bites: latest is not "the latest version." It is whatever someone last tagged as latest. It could be a dev build. It could be months old. It is never reproducible.

3. Not pinning base image digests¶

You pin the tag: FROM python:3.11-slim. But tags are mutable. The Python team pushes a security patch. python:3.11-slim now points to a different image. Your build produces a different result on Tuesday than it did on Monday, with no change to your code.

# Mutable (tag can change underneath you)
FROM python:3.11-slim

# Immutable (digest is a content hash)
FROM python:3.11-slim@sha256:abc123def456...

Why it bites: This is the most insidious reproducibility problem. Everything looks the same — same Dockerfile, same code, same CI pipeline — but the output is different. Debugging "it worked yesterday" when the base image silently changed is maddening.

4. COPY . invalidating the build cache¶

Your Dockerfile copies all source code before installing dependencies. Any change to any file — a README edit, a comment, a whitespace change — invalidates the COPY . layer and everything after it. Your pip install runs from scratch every time.

# BAD: any file change busts the pip install cache
COPY . /app
RUN pip install -r /app/requirements.txt

# GOOD: copy only requirements first
COPY requirements.txt /app/
RUN pip install --no-cache-dir -r /app/requirements.txt
COPY . /app

Why it bites: Builds that should take 30 seconds take 5 minutes. Multiply by every developer, every push, every CI run. The cumulative cost is enormous, and the fix is two lines.

5. Secrets in build args (visible in history)¶

You pass a private registry token as a build arg. It works. But docker history shows every ARG value in plain text. Anyone who pulls your image can extract the token.

# BAD: secret visible in docker history
ARG NPM_TOKEN
RUN echo "//registry.npmjs.org/:_authToken=${NPM_TOKEN}" > .npmrc && \
    npm install && \
    rm .npmrc
# The ARG value is baked into the layer metadata. Deleting .npmrc doesn't help.

# GOOD: use BuildKit secret mounts
RUN --mount=type=secret,id=npmrc,target=/root/.npmrc \
    npm install
# Secret is mounted at build time, never written to a layer

# Build:
docker build --secret id=npmrc,src=.npmrc -t myapp .

Why it bites: The rm in the same or later RUN layer gives a false sense of security. The secret exists in the layer metadata and can be extracted with docker save and inspecting the tar layers.

6. Not cleaning up in RUN layers¶

You install packages with apt-get install in one layer and run apt-get clean in the next. The clean-up removes files from the writable layer, but the installed packages (including the cache) still exist in the earlier layer. Your image is bloated with megabytes of package cache that "don't exist" but are stored in the image.

# BAD: cache exists in layer 1, "deletion" in layer 2 creates a whiteout but doesn't save space
RUN apt-get update && apt-get install -y curl
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

# GOOD: install and clean in the same RUN
RUN apt-get update && \
    apt-get install -y --no-install-recommends curl && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

Why it bites: Image is 200MB bigger than it needs to be. CI and deploys are slower. Every pull, on every node, pays the cost.

7. Missing .dockerignore (huge build context)¶

You run docker build . in your project root. Docker sends the entire directory to the daemon as the build context. This includes .git (hundreds of MB), node_modules (hundreds of MB), .env files (secrets), test data, documentation, everything.

# Check context size
docker build . 2>&1 | grep "Sending build context"
# "Sending build context to Docker daemon  847.3MB"

# Fix: create .dockerignore
cat > .dockerignore << 'EOF'
.git
.github
node_modules
__pycache__
*.pyc
.env
.env.*
*.md
!README.md
.vscode
.idea
tests/
docs/
EOF

Why it bites: Build takes 30 seconds just to send the context. Worse, if .env is in the context and you COPY . ., your secrets are in the image.

8. ENTRYPOINT vs CMD confusion¶

You set CMD ["python", "app.py"] in your Dockerfile. Someone runs docker run myapp --debug. Instead of passing --debug to your app, Docker replaces the entire CMD with --debug and the container crashes with "executable file not found."

# Only CMD: entire command is replaced by docker run args
CMD ["python", "app.py"]
# docker run myapp --debug → tries to run "--debug" as a command

# ENTRYPOINT + CMD: args replace CMD only, ENTRYPOINT stays
ENTRYPOINT ["python", "app.py"]
CMD ["--port", "8000"]
# docker run myapp --debug → python app.py --debug

The shell form trap:

# Exec form (correct) — process is PID 1, receives SIGTERM
ENTRYPOINT ["python", "app.py"]

# Shell form (broken) — /bin/sh -c is PID 1, your process doesn't get signals
ENTRYPOINT python app.py
# SIGTERM goes to sh, your app never gets it, container hangs until SIGKILL (10s timeout)

Why it bites: Your graceful shutdown doesn't work. Your health checks fail during rolling updates. Your container takes 10 seconds to stop instead of 1 because Docker has to escalate to SIGKILL.

9. Volume mount shadowing container data¶

Your image has application data at /app/data/. You mount a volume at /app/data/. The volume is empty. The container starts with an empty /app/data/ — the image's data is completely hidden.

# Image has /app/data/config.yaml baked in
# But at runtime:
docker run -v newvolume:/app/data myapp
# /app/data/ is now empty. config.yaml is gone.

# Docker volumes: if the volume is empty AND it's a named volume,
# Docker copies image data into it on first use.
# But: bind mounts NEVER copy. And subsequent runs with a non-empty volume
# always shadow.

Why it bites: Works in development (no volumes). Fails in production (volumes mounted for persistence). The error message is "file not found" for a file you can clearly see in the Dockerfile.

10. Container running but process exited (zombies and PID 1)¶

Your container shows as "running" but the actual application process exited. PID 1 is a shell script that launched your app in the background and is now doing nothing. The container is alive but useless. Health checks might even pass if they check the wrong thing.

# BAD: shell script backgrounding the real process
ENTRYPOINT ["/bin/sh", "-c", "myapp &"]
# myapp crashes. sh doesn't care. Container stays "running."

# BAD: shell form (implicit /bin/sh -c wrapper)
ENTRYPOINT myapp
# sh is PID 1. myapp is a child. sh doesn't forward signals.

# GOOD: exec form, direct process
ENTRYPOINT ["myapp"]

# GOOD: if you need a wrapper script, exec into the final process
#!/bin/sh
# setup code here
exec myapp "$@"
# exec replaces sh with myapp. myapp is now PID 1.

Why it bites: Monitoring shows the container is healthy. But no requests are being served. You spend 30 minutes debugging the network before realizing the process isn't running.

11. --restart=always without log limits¶

You set --restart=always on a container that crashes on startup. It restarts. Crashes. Restarts. Each crash dumps error logs. Docker's default logging driver has no size limit. After a weekend, the node's disk is full with gigabytes of repeated error messages in /var/lib/docker/containers/<id>/<id>-json.log.

# BAD
docker run --restart=always mycrashingapp

# GOOD: set log limits
docker run --restart=always \
  --log-opt max-size=10m \
  --log-opt max-file=3 \
  myapp

# Or set daemon-wide defaults in /etc/docker/daemon.json
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}

Why it bites: Disk fills silently. Other containers on the same host start failing. You get paged at 3am for a disk space issue caused by a crash loop that started Friday evening.

12. Bridge network DNS resolution gotchas¶

You create two containers on the default bridge network. You try to reach one by name. It doesn't resolve. You spend 20 minutes debugging DNS before discovering that the default bridge network does not support DNS resolution between containers.

# Default bridge: NO DNS resolution
docker run --name web nginx
docker run --rm busybox nslookup web
# nslookup: can't resolve 'web'

# User-defined bridge: DNS works
docker network create mynet
docker run --name web --network mynet nginx
docker run --rm --network mynet busybox nslookup web
# Name: web  Address: 172.18.0.2

Why it bites: Every Docker tutorial uses the default bridge. It works with --link (deprecated) or IP addresses. When you try to do it "properly" with names, it silently fails.

13. No tmpfs on /tmp (disk fills from temp files)¶

Your application writes temporary files to /tmp. Inside the container, /tmp is part of the writable overlay layer, which lives on the host's disk at /var/lib/docker/overlay2/. A burst of temp file creation fills the overlay storage. Other containers on the same host are affected.

# BAD: /tmp writes go to overlay storage on host disk
docker run myapp

# GOOD: /tmp is memory-backed, auto-cleaned, size-limited
docker run --tmpfs /tmp:rw,noexec,nosuid,size=100m myapp

Why it bites: You sized your host disk for image storage, not for temp file bursts. The overlay2 directory fills up, and suddenly Docker cannot start any new containers.

14. Ignoring security scan warnings¶

Trivy reports 47 HIGH and 3 CRITICAL vulnerabilities. You look at the list, see they are all in base image packages, and decide "we'll update the base image next sprint." That was six months ago. One of the CRITICALs is a remotely exploitable RCE in OpenSSL.

# The minimum viable security posture
trivy image --severity CRITICAL --exit-code 1 myapp:latest
# Block deploys with critical CVEs

# Track unfixed vulns separately
trivy image --ignore-unfixed --severity HIGH,CRITICAL myapp:latest
# Only show fixable issues — these are your action items

# Suppress known acceptable risks
echo "CVE-2024-9999" >> .trivyignore
trivy image myapp:latest
# But document WHY each suppression exists

Why it bites: The scan is there to protect you. Ignoring it because "most of the CVEs are in the base image" means you are trusting that no one will exploit those CVEs in your specific deployment. That is a bet, not a strategy.

Runtime Debugging Footguns¶

15. `nsenter` into the wrong PID namespace¶

You run nsenter -t <pid> -n but grab the wrong PID -- the pause container, the sidecar, or a different container. You debug the wrong thing and draw wrong conclusions.

Fix: Verify the PID first: crictl inspect <container-id> | jq '.info.pid'. Cross-reference with crictl ps.

16. `strace` on a production process causing latency¶

strace intercepts every syscall, adding significant latency (10ms becomes 500ms). You've turned a debugging session into an incident.

Fix: Use strace -c for statistical summary. For production, prefer eBPF-based tools (bpftrace, perf) with much lower overhead.

17. `crictl rm` on a running container¶

You meant to remove a stopped container but removed a running one. If it was a bare pod or StatefulSet with critical state, you just killed it without warning.

Fix: Use crictl stop first, then crictl rm. Better yet, use kubectl delete pod which respects graceful shutdown.

18. `/proc` misinterpretation inside containers¶

cat /proc/meminfo inside a container shows the HOST's total memory, not the container's limit. cat /proc/cpuinfo shows all host CPUs. You set JVM heap based on this and OOMKill the container.

Fix: Read cgroup limits: cat /sys/fs/cgroup/memory/memory.limit_in_bytes. Modern runtimes expose limits via /sys/fs/cgroup/. Use JVM -XX:+UseContainerSupport (default since JDK 10).

19. Ignoring `dmesg` during container debugging¶

A container is OOMKilled but kubectl describe pod just says "OOMKilled" with no details. The actual kernel OOM killer message -- which process was killed, how much memory it used, what triggered it -- is in dmesg on the host node.

Fix: Always check dmesg -T | grep -i oom on the node when debugging OOMKilled containers.