Docker / Containers - Street-Level Ops¶
Real-world container workflows for building, debugging, and operating in production.
Build and Ship¶
# Build with a specific tag and no cache
docker build -t myapp:v2.1.0 --no-cache .
# Multi-platform build (ARM + AMD64)
docker buildx build --platform linux/amd64,linux/arm64 -t registry.example.com/myapp:v2.1.0 --push .
# Tag for a private registry and push
docker tag myapp:v2.1.0 registry.example.com/team/myapp:v2.1.0
docker push registry.example.com/team/myapp:v2.1.0
# Pin by digest for production reproducibility
docker pull registry.example.com/team/myapp@sha256:abc123...
docker inspect --format='{{index .RepoDigests 0}}' myapp:v2.1.0
Gotcha:
--no-cacherebuilds every layer from scratch, which re-downloads all packages. On slow networks this turns a 30-second build into 10+ minutes. Use--no-cache-filter <stage>(BuildKit) to bust the cache for just one stage instead of the whole Dockerfile.
Debug a Running Container¶
# Get a shell
docker exec -it myapp /bin/sh
# Check what process is running (is PID 1 correct?)
docker exec myapp ps aux
# Or from the host:
docker top myapp
# Read container logs with timestamps
docker logs --tail 100 --follow --timestamps myapp
# Check exit code and OOM status of a stopped container
docker inspect myapp --format '{{.State.ExitCode}}'
docker inspect myapp --format '{{.State.OOMKilled}}'
# Output: 137 / true
# Remember: exit code = 128 + signal number
# 137 = 128 + 9 (SIGKILL, OOM), 143 = 128 + 15 (SIGTERM, graceful)
# Copy files out for analysis
docker cp myapp:/var/log/app.log ./app.log
# Check the container's network config
docker exec myapp cat /etc/resolv.conf
docker inspect myapp --format '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}'
Debug Networking¶
# Enter a container's network namespace from the host
PID=$(docker inspect --format '{{.State.Pid}}' myapp)
nsenter -t $PID -n ip addr show
nsenter -t $PID -n ss -tlnp
# Check port mappings
docker port myapp
# 8080/tcp -> 0.0.0.0:8080
# Test connectivity between containers on a custom network
docker network create backend
docker run -d --name db --network backend postgres:16
docker run --rm --network backend nicolaka/netshoot curl -v db:5432
Resource Inspection¶
# Live resource usage (CPU, memory, I/O)
docker stats myapp --no-stream
# CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O
# myapp 2.34% 342.1MiB / 512MiB 66.82% 12.3kB / 8.1kB 0B / 4.1MB
# Check configured resource limits
docker inspect myapp --format '{{.HostConfig.Memory}}'
docker inspect myapp --format '{{.HostConfig.NanoCpus}}'
# Check cgroup limits directly
cat /sys/fs/cgroup/memory/docker/$(docker inspect -f '{{.Id}}' myapp)/memory.limit_in_bytes
Disk Cleanup¶
# See what is eating disk
docker system df
# TYPE TOTAL ACTIVE SIZE RECLAIMABLE
# Images 45 8 12.34GB 9.87GB (79%)
# Containers 12 3 234.5MB 200.1MB (85%)
# Build Cache 0 0 2.1GB 2.1GB
# Clean up stopped containers, unused images, build cache
docker system prune -a --volumes
# WARNING: --volumes deletes named volumes too — data loss risk
# Remove dangling images only (safe, conservative)
docker image prune
# Find and remove images older than 30 days
docker images --format '{{.Repository}}:{{.Tag}} {{.CreatedSince}}' | grep "months"
Volume Operations¶
# Create a named volume
docker volume create pgdata
# Run postgres with persistent data
docker run -d --name db -v pgdata:/var/lib/postgresql/data postgres:16
# Find volume on host
docker volume inspect pgdata --format '{{.Mountpoint}}'
# /var/lib/docker/volumes/pgdata/_data
# Backup a volume
docker run --rm -v pgdata:/source -v $(pwd):/backup alpine \
tar czf /backup/pgdata-backup.tar.gz -C /source .
# Restore a volume
docker run --rm -v pgdata-restored:/target -v $(pwd):/backup alpine \
tar xzf /backup/pgdata-backup.tar.gz -C /target
Security Hardening at Runtime¶
# Run as non-root with read-only filesystem
docker run -d --user 1000:1000 --read-only --tmpfs /tmp:size=64m myapp
# Drop all capabilities, add only what is needed
docker run -d --cap-drop=ALL --cap-add=NET_BIND_SERVICE myapp
# Prevent privilege escalation
docker run -d --security-opt=no-new-privileges myapp
# Use Docker's built-in init (handles PID 1 and zombies)
docker run -d --init myapp
# --init injects tini as PID 1, which reaps zombies and forwards signals
# Scan an image for CVEs before deploying
trivy image myapp:v2.1.0
Compose for Local Dev¶
# Start all services
docker compose up -d
# Rebuild after code changes
docker compose up -d --build
# View logs from a specific service
docker compose logs -f api
# Run a one-off command in a service
docker compose exec api python manage.py migrate
# Tear down everything including volumes
docker compose down -v
Default trap:
docker compose downkeeps volumes.docker compose down -vdeletes them. Muscle memory from typing-vduring dev will destroy production data if you run it on the wrong host.
Inspect Image Layers¶
# See layer history and sizes
docker history myapp:v2.1.0
# Check total image size
docker images myapp:v2.1.0 --format '{{.Size}}'
# Compare two image tags
docker inspect myapp:v2.0.0 --format '{{.RootFS.Layers}}' > old.txt
docker inspect myapp:v2.1.0 --format '{{.RootFS.Layers}}' > new.txt
diff old.txt new.txt
Remember: Docker layer mnemonic: FRAC —
FROMsets the base,RUNcreates a layer,ADD/COPYcreates a layer, everything else (ENV,LABEL,EXPOSE) adds metadata only. FewerRUNinstructions = fewer layers = smaller images.
Quick Reference¶
- Cheatsheet: Docker
- Deep Dive: Containers How They Really Work