Skip to content

Docker / Containers - Primer

Why This Matters

Containers are the deployment unit for modern infrastructure. Every CI/CD pipeline builds them. Every orchestrator runs them. Every cloud platform expects them. If you cannot build, ship, debug, and secure a container, you cannot operate modern production systems. Docker is the tool that made containers mainstream, and even in a world of containerd, Podman, and Kubernetes CRI, Docker's CLI and image format remain the baseline every ops engineer must know cold.

Who made it: Docker was created by Solomon Hykes at dotCloud (a PaaS company) and open-sourced in March 2013. It did not invent containers — Linux namespaces and cgroups existed since 2002/2006, and LXC provided userspace tooling. Docker's breakthrough was the developer experience: a simple CLI, a layered image format, and a public registry (Docker Hub). The OCI (Open Container Initiative) standardized the image and runtime specs in 2015 after Docker donated them.

Name origin: Docker is named after dockworkers who load and unload shipping containers — the same metaphor that inspired the container analogy. The whale logo (Moby Dock) carries containers on its back. The underlying container runtime was originally called libcontainer, later donated to the CNCF as runc, which remains the reference OCI runtime.

Core Concepts

1. Images vs Containers

An image is a read-only, layered filesystem snapshot. A container is a running (or stopped) instance of an image with a writable layer on top.

# Pull an image and inspect its layers
docker pull nginx:1.25
docker image inspect nginx:1.25 --format '{{.RootFS.Layers}}' | tr ' ' '\n' | wc -l
# Output: 7

# Each instruction in a Dockerfile creates a layer
docker history nginx:1.25

Layers and copy-on-write: Images are built from stacked layers. When a container writes to a file from a lower layer, the storage driver copies that file into the writable container layer (copy-on-write). This is why containers start fast — they share the base layers.

Tags vs digests: Tags are mutable pointers (nginx:1.25 can change). Digests are immutable content hashes (nginx@sha256:abc123...). In production, pin by digest or a verified tag to avoid surprise changes.

# Pull by digest for reproducibility
docker pull nginx@sha256:6db391d1c0cfb30588ba0bf72ea999404f2764febf0f1f196acd5867ac7d7b7d

# Show the digest of a local image
docker inspect --format='{{index .RepoDigests 0}}' nginx:1.25

Image IDs: The image ID is a SHA256 hash of the image's configuration JSON. Two images built from identical layers on different machines will have the same ID.

2. Dockerfile Best Practices

# Multi-stage build: build in one stage, copy artifacts to a slim runtime
FROM golang:1.22 AS builder
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o /app

FROM alpine:3.19
RUN adduser -D -u 1000 appuser
COPY --from=builder /app /app
USER appuser
ENTRYPOINT ["/app"]

Layer ordering: Put instructions that change rarely (install OS packages) before instructions that change often (copy source code). Docker caches layers top-down and invalidates everything below a changed layer.

.dockerignore: Always include one. Without it, COPY . . sends your entire build context — including .git, node_modules, test fixtures, and secrets — to the daemon.

# .dockerignore
.git
.env
*.pem
node_modules
__pycache__

Non-root user: Never run production containers as root. Create a user in the Dockerfile and switch to it with USER.

COPY vs ADD: Use COPY unless you specifically need ADD's tar extraction or URL fetching. COPY is explicit and predictable.

CMD vs ENTRYPOINT: ENTRYPOINT defines the executable. CMD provides default arguments. Together they let you build containers that behave like commands:

ENTRYPOINT ["curl"]
CMD ["--help"]
# docker run myimage              → curl --help
# docker run myimage https://x.co → curl https://x.co

3. Container Lifecycle

# Full lifecycle
docker create --name myapp nginx:1.25    # Created (not running)
docker start myapp                        # Running
docker pause myapp                        # Paused (frozen via cgroups freezer)
docker unpause myapp                      # Running again
docker stop myapp                         # Sends SIGTERM, waits 10s, then SIGKILL
docker kill myapp                         # Sends SIGKILL immediately
docker rm myapp                           # Removes the container

Default trap: docker stop sends SIGTERM and waits 10 seconds (the --stop-timeout default), then sends SIGKILL. Many application frameworks (e.g., Python's default signal handler) do not handle SIGTERM gracefully. If your container takes exactly 10 seconds to stop, it is being SIGKILLed — it never received or handled the SIGTERM. Fix: add explicit SIGTERM handling in your application or use tini as PID 1.

Signals and PID 1: The container's entrypoint runs as PID 1. PID 1 in Linux has special signal handling — it does not receive default signal handlers. If your app does not handle SIGTERM, docker stop waits the full timeout and then kills it.

The PID 1 / zombie problem: PID 1 is supposed to reap orphaned child processes. Most application processes do not do this, so zombie processes accumulate. Use tini or --init:

# Use Docker's built-in init process
docker run --init myapp

# Or install tini in your Dockerfile
RUN apk add --no-cache tini
ENTRYPOINT ["tini", "--", "/app"]

4. Networking

Docker provides three built-in network drivers:

Driver Use case
bridge Default. Isolated network with NAT. Containers talk via internal IPs.
host Container shares the host's network namespace. No isolation, no NAT overhead.
none No networking. Container is fully isolated.
# Port mapping: host 8080 → container 80
docker run -d -p 8080:80 nginx:1.25

# List port mappings
docker port <container_id>

# Create a custom bridge network
docker network create backend

# Containers on the same custom network get DNS resolution by container name
docker run -d --name db --network backend postgres:16
docker run -d --name app --network backend myapp
# Inside 'app': ping db  → resolves to db's container IP

Docker Compose networks: Compose automatically creates a network per project. Services resolve each other by service name. You can define additional networks to isolate frontend from backend:

services:
  web:
    networks: [frontend, backend]
  api:
    networks: [backend]
  db:
    networks: [backend]

networks:
  frontend:
  backend:

5. Volumes and Bind Mounts

Containers are ephemeral. Anything written to the container's writable layer is lost when the container is removed. Persistent data requires volumes or bind mounts.

# Named volume (Docker manages the storage location)
docker volume create pgdata
docker run -d -v pgdata:/var/lib/postgresql/data postgres:16

# Bind mount (you control the host path)
docker run -d -v /host/config:/etc/app/config:ro myapp

# tmpfs mount (in-memory, never hits disk — good for secrets)
docker run -d --tmpfs /tmp:size=64m myapp
Type Managed by Survives rm Shareable Use case
Named volume Docker Yes Yes Database data, persistent state
Bind mount You Yes Yes Dev config, host log access
tmpfs Kernel No No Secrets, scratch space

Volume drivers: Docker supports pluggable volume drivers for NFS, cloud block storage, etc. This is how stateful containers run in orchestrated environments.

# Inspect a volume to find its mountpoint
docker volume inspect pgdata --format '{{.Mountpoint}}'
# /var/lib/docker/volumes/pgdata/_data

6. Resource Limits

Without limits, a single container can consume all host memory and CPU, taking down everything else.

# Memory limit: container gets OOM-killed if it exceeds 512MB
docker run -d --memory=512m --memory-swap=512m myapp

# CPU limit: container gets 1.5 CPU cores worth of time
docker run -d --cpus=1.5 myapp

# CPU shares (relative weight, soft limit)
docker run -d --cpu-shares=512 myapp

Cgroups under the hood: Docker uses Linux cgroups (control groups) to enforce these limits. --memory sets memory.limit_in_bytes in the cgroup. --cpus sets CFS (Completely Fair Scheduler) quota and period.

# Verify cgroup limits for a running container
docker inspect <id> --format '{{.HostConfig.Memory}}'
# Or look directly:
cat /sys/fs/cgroup/memory/docker/<id>/memory.limit_in_bytes

OOM behavior: When a container exceeds its memory limit, the kernel's OOM killer terminates it. Docker reports this as exit code 137 (128 + 9 = SIGKILL). Check with:

docker inspect <id> --format '{{.State.OOMKilled}}'
# true

7. Debugging Running Containers

# Execute a command inside a running container
docker exec -it myapp /bin/sh
docker exec myapp cat /etc/resolv.conf

# View logs (stdout/stderr of PID 1)
docker logs myapp
docker logs --tail 50 --follow --timestamps myapp

# Full container metadata
docker inspect myapp

# Inspect specific fields
docker inspect myapp --format '{{.State.Status}}'
docker inspect myapp --format '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}'

# Copy files in/out of a container
docker cp myapp:/var/log/app.log ./app.log
docker cp ./config.yaml myapp:/etc/app/config.yaml

# Enter the container's network namespace directly (when exec is not available)
PID=$(docker inspect --format '{{.State.Pid}}' myapp)
nsenter -t $PID -n ip addr show

Practical debugging flow: 1. docker logs — check for crash output or error messages 2. docker inspect — check state, exit code, OOM status, network config 3. docker exec — get a shell, check filesystem, run diagnostic commands 4. docker cp — extract log files or config for offline analysis 5. nsenter — debug networking when the container has no shell or ip tools

8. Security

# Run as non-root (should also be in Dockerfile, but enforce at runtime)
docker run --user 1000:1000 myapp

# Read-only root filesystem (writable volumes still work)
docker run --read-only --tmpfs /tmp myapp

# Prevent privilege escalation
docker run --security-opt=no-new-privileges myapp

# Drop all capabilities, add only what you need
docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE myapp

Seccomp profiles: Docker ships with a default seccomp profile that blocks ~44 dangerous syscalls (including reboot, mount, kexec_load). Never run with --security-opt seccomp=unconfined in production.

Image scanning: Scan images for known CVEs before deploying:

# Trivy (open source, widely used)
trivy image myapp:latest

# Docker Scout (Docker's built-in scanner)
docker scout cves myapp:latest

Remember: Mnemonic for Docker security hardening: RRNDRead-only rootfs, Run as non-root, No new privileges, Drop all capabilities. The four flags: --read-only --user 1000 --security-opt=no-new-privileges --cap-drop=ALL. Apply all four to every production container unless you have a documented exception.

General rules: - Never use --privileged unless you have a documented, reviewed reason - Never mount the Docker socket into a container (it gives full host control) - Use multi-stage builds to keep build tools out of the production image - Pin base images to specific versions, not latest - Rebuild images regularly to pick up security patches in base layers

9. Registry Operations

# Log in to a registry
docker login registry.example.com

# Tag for a private registry
docker tag myapp:1.0 registry.example.com/team/myapp:1.0

# Push
docker push registry.example.com/team/myapp:1.0

# Pull
docker pull registry.example.com/team/myapp:1.0

# List tags (using the registry HTTP API)
curl -s https://registry.example.com/v2/team/myapp/tags/list | jq .

Image pruning: Docker accumulates images, containers, volumes, and build cache. Disk fills up.

# Remove stopped containers, unused networks, dangling images, build cache
docker system prune

# Also remove unused images (not just dangling)
docker system prune -a

# Check disk usage
docker system df

Private registries: Most teams run a private registry (Harbor, AWS ECR, GCR, GitHub Container Registry). Authentication is typically handled via docker login or credential helpers configured in ~/.docker/config.json. In CI, use short-lived tokens rather than long-lived passwords.


Wiki Navigation

Next Steps