Skip to content

Docker / Containers - Street-Level Ops

Real-world container workflows for building, debugging, and operating in production.

Build and Ship

# Build with a specific tag and no cache
docker build -t myapp:v2.1.0 --no-cache .

# Multi-platform build (ARM + AMD64)
docker buildx build --platform linux/amd64,linux/arm64 -t registry.example.com/myapp:v2.1.0 --push .

# Tag for a private registry and push
docker tag myapp:v2.1.0 registry.example.com/team/myapp:v2.1.0
docker push registry.example.com/team/myapp:v2.1.0

# Pin by digest for production reproducibility
docker pull registry.example.com/team/myapp@sha256:abc123...
docker inspect --format='{{index .RepoDigests 0}}' myapp:v2.1.0

Gotcha: --no-cache rebuilds every layer from scratch, which re-downloads all packages. On slow networks this turns a 30-second build into 10+ minutes. Use --no-cache-filter <stage> (BuildKit) to bust the cache for just one stage instead of the whole Dockerfile.

Debug a Running Container

# Get a shell
docker exec -it myapp /bin/sh

# Check what process is running (is PID 1 correct?)
docker exec myapp ps aux
# Or from the host:
docker top myapp

# Read container logs with timestamps
docker logs --tail 100 --follow --timestamps myapp

# Check exit code and OOM status of a stopped container
docker inspect myapp --format '{{.State.ExitCode}}'
docker inspect myapp --format '{{.State.OOMKilled}}'
# Output: 137 / true

# Remember: exit code = 128 + signal number
# 137 = 128 + 9 (SIGKILL, OOM), 143 = 128 + 15 (SIGTERM, graceful)

# Copy files out for analysis
docker cp myapp:/var/log/app.log ./app.log

# Check the container's network config
docker exec myapp cat /etc/resolv.conf
docker inspect myapp --format '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}'

Debug Networking

# Enter a container's network namespace from the host
PID=$(docker inspect --format '{{.State.Pid}}' myapp)
nsenter -t $PID -n ip addr show
nsenter -t $PID -n ss -tlnp

# Check port mappings
docker port myapp
# 8080/tcp -> 0.0.0.0:8080

# Test connectivity between containers on a custom network
docker network create backend
docker run -d --name db --network backend postgres:16
docker run --rm --network backend nicolaka/netshoot curl -v db:5432

Resource Inspection

# Live resource usage (CPU, memory, I/O)
docker stats myapp --no-stream
# CONTAINER  CPU %  MEM USAGE / LIMIT    MEM %   NET I/O          BLOCK I/O
# myapp      2.34%  342.1MiB / 512MiB    66.82%  12.3kB / 8.1kB   0B / 4.1MB

# Check configured resource limits
docker inspect myapp --format '{{.HostConfig.Memory}}'
docker inspect myapp --format '{{.HostConfig.NanoCpus}}'

# Check cgroup limits directly
cat /sys/fs/cgroup/memory/docker/$(docker inspect -f '{{.Id}}' myapp)/memory.limit_in_bytes

Disk Cleanup

# See what is eating disk
docker system df
# TYPE            TOTAL   ACTIVE  SIZE      RECLAIMABLE
# Images          45      8       12.34GB   9.87GB (79%)
# Containers      12      3       234.5MB   200.1MB (85%)
# Build Cache     0       0       2.1GB     2.1GB

# Clean up stopped containers, unused images, build cache
docker system prune -a --volumes
# WARNING: --volumes deletes named volumes too — data loss risk

# Remove dangling images only (safe, conservative)
docker image prune

# Find and remove images older than 30 days
docker images --format '{{.Repository}}:{{.Tag}} {{.CreatedSince}}' | grep "months"

Volume Operations

# Create a named volume
docker volume create pgdata

# Run postgres with persistent data
docker run -d --name db -v pgdata:/var/lib/postgresql/data postgres:16

# Find volume on host
docker volume inspect pgdata --format '{{.Mountpoint}}'
# /var/lib/docker/volumes/pgdata/_data

# Backup a volume
docker run --rm -v pgdata:/source -v $(pwd):/backup alpine \
  tar czf /backup/pgdata-backup.tar.gz -C /source .

# Restore a volume
docker run --rm -v pgdata-restored:/target -v $(pwd):/backup alpine \
  tar xzf /backup/pgdata-backup.tar.gz -C /target

Security Hardening at Runtime

# Run as non-root with read-only filesystem
docker run -d --user 1000:1000 --read-only --tmpfs /tmp:size=64m myapp

# Drop all capabilities, add only what is needed
docker run -d --cap-drop=ALL --cap-add=NET_BIND_SERVICE myapp

# Prevent privilege escalation
docker run -d --security-opt=no-new-privileges myapp

# Use Docker's built-in init (handles PID 1 and zombies)
docker run -d --init myapp
# --init injects tini as PID 1, which reaps zombies and forwards signals

# Scan an image for CVEs before deploying
trivy image myapp:v2.1.0

Compose for Local Dev

# Start all services
docker compose up -d

# Rebuild after code changes
docker compose up -d --build

# View logs from a specific service
docker compose logs -f api

# Run a one-off command in a service
docker compose exec api python manage.py migrate

# Tear down everything including volumes
docker compose down -v

Default trap: docker compose down keeps volumes. docker compose down -v deletes them. Muscle memory from typing -v during dev will destroy production data if you run it on the wrong host.

Inspect Image Layers

# See layer history and sizes
docker history myapp:v2.1.0

# Check total image size
docker images myapp:v2.1.0 --format '{{.Size}}'

# Compare two image tags
docker inspect myapp:v2.0.0 --format '{{.RootFS.Layers}}' > old.txt
docker inspect myapp:v2.1.0 --format '{{.RootFS.Layers}}' > new.txt
diff old.txt new.txt

Remember: Docker layer mnemonic: FRACFROM sets the base, RUN creates a layer, ADD/COPY creates a layer, everything else (ENV, LABEL, EXPOSE) adds metadata only. Fewer RUN instructions = fewer layers = smaller images.


Quick Reference