Skip to content

Portal | Level: L2: Operations | Topics: Kubernetes Core | Domain: Kubernetes

Kubernetes Pod Lifecycle

Scope

This document explains what really happens from "I applied YAML" to "my Pod is running" and then to termination. It covers:

  • scheduling
  • admission
  • API objects
  • kubelet behavior
  • image pulls
  • sandbox creation
  • init containers
  • probes
  • restarts
  • termination
  • failure patterns

This is the pod-centric view of Kubernetes internals.


Big picture

A Pod is the smallest deployable workload unit in Kubernetes, but it is not "just a container." It is a scheduling and runtime envelope for one or more containers that share some resources and lifecycle rules.

End-to-end flow

kubectl apply / controller creates Pod spec
  -> API server stores object in etcd
  -> scheduler assigns a node
  -> kubelet on that node notices assigned Pod
  -> kubelet asks runtime to create pod sandbox
  -> networking for Pod is set up
  -> volumes are prepared
  -> init containers run
  -> app containers start
  -> readiness gates traffic
  -> probes monitor health
  -> Pod runs, restarts containers as needed
  -> termination begins when deleted / evicted / completed
  -> graceful shutdown
  -> resources cleaned up
flowchart TD
    User[kubectl apply] --> API[API Server]
    API --> etcd[(etcd)]
    API --> Sched[Scheduler]
    Sched -->|assigns node| KL[Kubelet]
    KL --> Sandbox[Create Pod Sandbox]
    Sandbox --> CNI[CNI Network Setup]
    Sandbox --> Vols[Volume Mounts]
    CNI --> Init[Init Containers]
    Vols --> Init
    Init -->|sequential, must pass| App[App Containers]
    App --> Running[Running Pod]
    Running -.->|startup probe| SP[Startup Probe]
    Running -.->|liveness| LP[Liveness Probe]
    Running -.->|readiness| RP[Readiness Probe]
    LP -.->|fail| Restart[Container Restart]
    RP -.->|pass| Endpoints[Added to Endpoints]
    KL --- CRI[Container Runtime]
    CRI --> Sandbox

Pod fundamentals

A Pod contains one or more containers that share:

  • network namespace
  • IP address
  • port space
  • some storage volumes
  • pod-level metadata and policy

Containers in the same Pod are intentionally coupled. If you need independent scaling or lifecycle, they should usually not be in the same Pod.


Object creation path

1. Pod spec submitted

The Pod spec may come from:

  • direct Pod manifest
  • Deployment -> ReplicaSet -> Pod
  • StatefulSet
  • DaemonSet
  • Job / CronJob
  • custom controllers

Most real Pods come from controllers, not from you hand-creating naked Pods forever like a caveman.

2. API server validation and admission

The API server:

  • authenticates request
  • authorizes action
  • validates schema
  • runs admission control / mutation / policy
  • persists desired state into etcd

Possible mutations here:

  • default values inserted
  • sidecars injected
  • security settings modified
  • labels/annotations added
  • image policy enforced

Important consequence: the Pod you wrote and the Pod actually stored may differ.


Scheduling

3. Pod is Pending and unscheduled

Initially the Pod usually has no .spec.nodeName. It sits in Pending state waiting for the scheduler.

4. Scheduler evaluates the Pod

The scheduler filters nodes based on hard constraints, then scores candidates.

Typical constraints:

  • resource requests
  • taints/tolerations
  • node selectors
  • node affinity
  • pod affinity / anti-affinity
  • topology spread constraints
  • volume topology constraints
  • special runtime class constraints

Key reality

Scheduling is based mostly on requested resources and policy, not actual runtime usage.

If requests are wrong, scheduling decisions are wrong.

5. Node assignment

The scheduler binds the Pod to a node by setting spec.nodeName.

At this point the pod is still not running. It is merely assigned.


Kubelet takes over

6. Kubelet watches for assigned Pods

The kubelet on the chosen node sees the Pod assignment and begins reconciliation. Kubelet's job is basically:

  • observe desired Pod state
  • observe actual local runtime state
  • do local work until they match

Kubelet is the node-local truth enforcer.

7. Kubelet prepares Pod environment

This includes:

  • pulling secrets/config needed locally
  • preparing volumes
  • calculating sandbox configuration
  • consulting CNI for networking
  • talking to CRI runtime

Pod sandbox creation

8. Runtime creates sandbox

For CRI runtimes, the first step is often creation of a Pod sandbox. Think of the sandbox as the pod-level environment:

  • network namespace
  • shared Linux namespaces as configured
  • some cgroup structure
  • basic infra / pause container pattern in some implementations

Why sandbox exists

All containers in a Pod need to share pod-level resources. The sandbox anchors that shared environment.

Pause container

In many implementations, a tiny "pause" container holds the shared namespaces alive. The app containers then join them.


Networking

9. CNI plugin sets up Pod network

The kubelet asks the runtime, which invokes CNI plugin logic to:

  • create/attach interfaces
  • assign Pod IP
  • add routes
  • configure veth pair / bridge / overlay / ENI / whatever the environment uses
  • set DNS config

Why this matters:

  • Pod startup can fail before your app even begins if networking setup fails
  • many "container failed to start" issues are actually CNI failures

Volumes

10. Volume setup

The kubelet prepares declared volumes:

  • emptyDir
  • ConfigMaps
  • Secrets
  • projected volumes
  • PersistentVolumeClaim-backed storage
  • CSI volumes
  • hostPath
  • tmpfs-backed secret/config mechanisms under the hood as appropriate

Volume mount preparation happens before containers that depend on those mounts start.


Image pull and container creation

11. Init containers start first

If init containers exist, they run sequentially. Each must complete successfully before the next starts and before app containers start.

Use them for:

  • one-time setup
  • migrations
  • dependency checks
  • asset preparation

Do not use them as a dumping ground for random startup sins.

Failure pattern

If an init container keeps failing, the Pod never proceeds to app containers. It remains stuck in a not-fully-initialized state.

12. App container creation

For each app container, kubelet/runtime does roughly:

  • pull image if needed
  • create container with pod namespace attachments
  • mount volumes
  • set environment
  • apply resource settings
  • start process

The Pod may now move toward Running, but traffic should not necessarily be sent yet.


Pod phases vs container states

People often confuse Pod phase with detailed runtime state.

Pod phase

High-level values include:

  • Pending
  • Running
  • Succeeded
  • Failed
  • Unknown

This is a broad summary, not a precise state machine for every internal transition.

Container states

Each container has more detailed states such as:

  • Waiting
  • Running
  • Terminated

Those include reasons like:

  • ImagePullBackOff
  • CrashLoopBackOff
  • ContainerCreating
  • OOMKilled
  • Error
  • Completed

Always inspect container state details, not just Pod phase.


Readiness, liveness, and startup

Readiness probe

Determines whether the Pod should receive traffic through Services/endpoints.

A Pod can be running but not ready.

That distinction is critical.

Liveness probe

Determines whether kubelet should restart a container that appears unhealthy.

Use it carefully. Bad liveness probes are self-inflicted denial-of-service.

Startup probe

Gives slow-starting apps more time before liveness/readiness logic begins punishing them.

Common anti-pattern

People use liveness to detect dependency failures too aggressively, then their app restarts endlessly instead of stabilizing.


Restarts and crash loops

Kubelet may restart containers in a Pod depending on restart policy and workload type.

Restart policies

Common behavior differs for:

  • naked Pods
  • Jobs
  • higher-level controllers

For most long-running app Pods managed by controllers, repeated app-container failure results in restart attempts and eventually CrashLoopBackOff.

CrashLoopBackOff

This is not a root cause. It is the symptom that:

  • container starts
  • fails
  • restarts
  • backoff increases

Root causes are usually:

  • bad config
  • missing secret
  • migration failed
  • permission problem
  • wrong command
  • dependency unavailable
  • OOM kill
  • probe misconfiguration

Pod readiness in service routing

When a Pod becomes Ready, endpoint information is updated so Services can send traffic to it.

That means there is a control-plane + kube-proxy/CNI propagation path between "container looks healthy" and "network traffic now reaches it."

This is why readiness changes are not pure local process facts. They affect routing.


Eviction, preemption, and disruption

Pods do not leave only because you deleted them.

They can also be removed due to:

  • node pressure eviction
  • node drain
  • preemption by higher-priority workloads
  • taint-based eviction
  • controller rollout replacement
  • pod disruption budget interactions
  • underlying node failure

Eviction due to pressure

Common causes:

  • memory pressure
  • disk pressure
  • PID pressure

Kubernetes is not sentimental. Under pressure it will kill your stuff.


Termination path

1. Deletion requested

A Pod receives a deletion timestamp.

2. Grace period starts

Kubelet begins graceful shutdown, usually:

  • remove from endpoints/readiness path
  • send SIGTERM to containers
  • wait termination grace period
  • run preStop hooks where configured
  • send SIGKILL if processes remain

Important point

A deleted Pod object does not mean the process vanished instantly. There is a termination dance.

3. Volumes/network/sandbox cleanup

After processes exit, kubelet/runtime cleans up:

  • containers
  • sandbox
  • mounts
  • network attachment state
  • cgroup structures
  • logs subject to retention/runtime behavior

Static Pods and mirror Pods

Static Pods are managed directly by kubelet from local files, not from the normal API-driven scheduling flow.

They are often used for control-plane components in certain cluster setups.

Important distinction:

  • kubelet runs them because local config says so
  • API server may show mirror objects, but the API is not the source of truth for those Pods

Debugging workflow

Step 1 - find where lifecycle stopped

Ask:

  • rejected by API?
  • unscheduled?
  • stuck at image pull?
  • stuck at CNI?
  • init container failed?
  • app container failed?
  • probes failing?
  • evicted?

Step 2 - inspect events first

Events often expose the stage of failure fastest.

Step 3 - inspect container states and reasons

Do not stop at Pending or CrashLoopBackOff.

Step 4 - separate control plane vs node-local failure

  • Scheduler problem?
  • Admission/policy problem?
  • Kubelet/CRI problem?
  • CNI problem?
  • CSI/storage problem?
  • app problem?

Common production failure patterns

Pod stuck in Pending

Usually one of:

  • no schedulable nodes
  • requests too large
  • taints
  • affinity impossible
  • PVC not bindable
  • image pull secret/policy issues in some cases after scheduling

Pod stuck in ContainerCreating

Usually node-local setup issue:

  • CNI failed
  • volume mount failed
  • runtime slow/broken
  • image pull waiting

ImagePullBackOff

  • wrong image name
  • auth failure
  • registry unavailable
  • tag missing
  • network/DNS problem

CrashLoopBackOff

  • app crashes immediately
  • probe kills app
  • OOM
  • config/secret wrong
  • command/args wrong

Pod Running but not Ready

  • readiness probe failing
  • app listening on wrong port/interface
  • sidecar/init dependency not satisfied
  • readiness gate condition not met

Pod terminates slowly

  • app ignores SIGTERM
  • long preStop hook
  • stuck IO
  • finalizers or controller behavior confusion
  • volume detach or node issues prolong cleanup

Interview angles

Good questions hidden here:

  • difference between Pending and Running
  • what kubelet does after scheduling
  • what the pause container is
  • difference between liveness and readiness
  • what CrashLoopBackOff actually means
  • how a Pod gets an IP
  • what happens on Pod deletion
  • what init containers are for
  • why a Pod can be Running but still not serve traffic

Strong answers explain the control-plane handoff to the kubelet and the sandbox/network/init/app/probe sequence.


Mental model to keep

A Pod lifecycle is a relay race:

  1. API server stores desired state
  2. scheduler picks a node
  3. kubelet on that node reconciles the Pod
  4. runtime creates sandbox and containers
  5. CNI and storage wire up dependencies
  6. probes decide traffic eligibility
  7. kubelet keeps reconciling until termination
  8. graceful shutdown and cleanup occur

If the Pod fails, ask which runner dropped the baton.


References

Practice


Wiki Navigation

Prerequisites