Skip to content

Portal | Level: L2: Operations | Topics: Docker / Containers, Container Image Optimization (alias → container_images) | Domain: DevOps & Tooling

Docker Image Internals

Scope

This document explains Docker image internals and the related OCI image model. It covers:

  • layers
  • manifests
  • config objects
  • storage drivers / snapshotters
  • overlayfs mental model
  • tagging vs digests
  • build cache implications
  • runtime writable layer
  • common misconceptions and operational consequences

Big picture

A Docker image is not a single opaque blob in the conceptual sense. It is a content-addressed bundle of metadata plus one or more filesystem layers.

Simplified model

image reference (name:tag or digest)
  -> manifest
  -> config object
  -> ordered layer blobs
  -> local unpack/snapshot storage
  -> container runtime mounts layers + writable layer

An image is build-time packaged state. A container is runtime execution state derived from it.


Image identity: tag vs digest

Tags

Examples:

  • nginx:latest
  • ubuntu:24.04

Tags are mutable pointers. They are convenient names, not immutable truth.

Digests

Example form:

  • sha256:...

Digests identify content immutably.

Why this matters

If you deploy by tag:

  • the same text may point to different content later

If you deploy by digest:

  • you know exactly which bytes you meant

For reproducible infrastructure, digests are king and tags are rumors.


OCI image structure

Modern Docker images align with OCI image concepts.

Core pieces:

  • manifest
  • config object
  • layer blobs

Manifest

The manifest describes:

  • which config object belongs to the image
  • which layers, in order, compose it
  • media types and digests

Config object

Contains metadata such as:

  • environment defaults
  • command / entrypoint
  • working directory
  • labels
  • history
  • root filesystem diff IDs
  • architecture/OS info

Layer blobs

Compressed or otherwise packaged filesystem diffs representing changes introduced at build steps.


Layers

Each image layer is a filesystem delta, not usually a full filesystem copy.

A layer can represent:

  • added files
  • modified files
  • deleted files (represented via whiteout semantics in union filesystems)

Ordered stacking

Layers are ordered. Later layers can override or mask earlier content.

This is the core reason container images are efficient to distribute and cache:

  • common base layers are reused
  • only changed layers need transfer/storage

Build steps and layers

In a Dockerfile-style mental model, instructions often produce new layers or new image metadata.

Common consequences:

  • combining operations affects layer count and cache behavior
  • deleting files in a later layer does not erase them from earlier layer history for image-size purposes the way people naively imagine
  • order matters for cache efficiency

Example

If you install huge packages in one layer and "delete" them later, the lower-layer content may still exist in image history/storage; you only masked it from the merged view.

That is why careless image construction creates obese images.


Overlayfs / overlay2 mental model

A common Docker storage mechanism on Linux uses OverlayFS.

Runtime view

lowerdirs = image layers (read-only)
upperdir  = container writable layer
workdir   = overlay bookkeeping
merged    = presented root filesystem

Why overlay matters

  • image layers stay read-only and shared
  • each container gets its own writable layer
  • reads can come from lower layers
  • writes may trigger copy-up from lower layer into upper layer

Copy-up implication

If a container modifies a file that exists in a lower layer, OverlayFS may copy it into the writable layer first. That can be surprisingly expensive for some workloads.


Local storage representation

A runtime stores image content locally using content-addressed blobs and snapshot metadata. Historically Docker emphasized storage drivers; newer stacks increasingly involve containerd snapshotters and content stores.

Important idea:

  • the registry representation
  • the local unpacked representation
  • the mounted runtime rootfs

are related but not identical views.


Writable layer vs volumes

A running container has a writable layer, but that is not the ideal home for durable data.

Writable layer

  • tied to container lifecycle
  • can be slower/awkward for some heavy write workloads
  • disappears with container deletion unless preserved via container commit/image tricks, which is usually not how sane systems manage state

Volumes / bind mounts

  • externalize persistent or host-coupled data
  • survive container recreation as configured
  • often better for databases, application data, caches that should persist, etc.

Whiteouts and deletions

Union filesystem semantics use whiteout markers to represent deletions of files from lower layers.

Why this matters

The merged rootfs says "this file is gone," but the bytes in an older layer may still exist in the image history/storage graph.

That is the image-layer version of sweeping dirt under a rug and then declaring the house immaculate.


Multi-arch images

An image tag may refer not to one image manifest but to a manifest list / image index containing entries for multiple platforms.

Example platforms:

  • linux/amd64
  • linux/arm64

The runtime chooses the appropriate platform-specific image based on host/requested platform.

Why you care

  • same tag may resolve differently on different architectures
  • buildx and multi-platform publishing make this common
  • debugging "works on my machine" image issues sometimes comes down to architecture mismatch

Pull flow

Simplified pull sequence:

client/runtime resolves reference
  -> authenticate if needed
  -> fetch manifest/index
  -> determine platform
  -> fetch config and missing layers by digest
  -> verify digests
  -> store blobs locally
  -> unpack/snapshot for runtime use as needed

This is content-addressed, which is why shared layers and digest verification work.


Build cache

Image builds often reuse prior layers if instructions and inputs are unchanged enough.

Why cache matters

  • dramatic speedup
  • predictable rebuild behavior when designed well
  • poor Dockerfile ordering destroys cache efficiency

Good general pattern

Put more stable steps earlier:

  • base image
  • package manager metadata/install patterns
  • dependency manifests
  • source copy later if source changes frequently

Bad pattern

Copy the entire repo first, then install dependencies. Tiny code change now invalidates expensive dependency layers.


Container image security implications

What image internals tell you operationally

  • mutable tags are risky
  • layer history can preserve data you thought you deleted
  • base image lineage matters
  • too many packages increase attack surface
  • secrets copied during build may end up in layers/history
  • image scans operate on filesystem/package content but are only part of the story

Common mistakes

  • baking secrets into layers
  • giant general-purpose base images
  • relying on latest
  • assuming deleting a secret in later build step removes it from image history safely

Runtime startup from image

When starting a container from an image:

  1. image reference resolves to content
  2. local snapshot/mount structure is prepared
  3. writable upper layer is added
  4. OCI runtime spec says what process to run
  5. process executes in that merged filesystem

So the image supplies filesystem and metadata defaults; the runtime supplies the process environment and isolation context.


Debugging image issues

Symptom: image huge

Likely causes:

  • poor layer ordering
  • package caches retained
  • unnecessary tooling in runtime image
  • copied build artifacts / source / test data
  • deleting in later layers instead of avoiding inclusion earlier

Symptom: rebuild slow

Likely causes:

  • cache invalidation too early in Dockerfile
  • mutable base image changes
  • dependency install step not isolated
  • registry/cache not reused

Symptom: file "deleted" but image still large

Whiteout/layer-history problem.

Symptom: different results on different hosts

Possible causes:

  • multi-arch tag resolves differently
  • mutable tag drift
  • different runtime unpack/storage behavior
  • different build context or ignored files

Interview angles

Questions hidden here:

  • difference between image and container
  • what a layer is
  • why tags are mutable and digests are preferable
  • how overlayfs presents image + writable layer
  • why deleting files in later layers may not shrink image as expected
  • what multi-arch images are
  • why Dockerfile instruction order affects caching

Strong answers tie image structure to runtime consequences.


Mental model to keep

A Docker image is:

  • a manifest and config
  • plus ordered content-addressed filesystem layers

A running container is:

  • that read-only layered filesystem
  • plus a writable upper layer
  • plus a process launched by the runtime

If you separate build-time image identity from runtime container state, most confusion disappears.


References


Wiki Navigation

Prerequisites