Portal | Level: L2: Operations | Topics: Kubernetes Core | Domain: Kubernetes
Kubernetes Pod Lifecycle¶
Scope¶
This document explains what really happens from "I applied YAML" to "my Pod is running" and then to termination. It covers:
- scheduling
- admission
- API objects
- kubelet behavior
- image pulls
- sandbox creation
- init containers
- probes
- restarts
- termination
- failure patterns
This is the pod-centric view of Kubernetes internals.
Big picture¶
A Pod is the smallest deployable workload unit in Kubernetes, but it is not "just a container." It is a scheduling and runtime envelope for one or more containers that share some resources and lifecycle rules.
End-to-end flow¶
kubectl apply / controller creates Pod spec
-> API server stores object in etcd
-> scheduler assigns a node
-> kubelet on that node notices assigned Pod
-> kubelet asks runtime to create pod sandbox
-> networking for Pod is set up
-> volumes are prepared
-> init containers run
-> app containers start
-> readiness gates traffic
-> probes monitor health
-> Pod runs, restarts containers as needed
-> termination begins when deleted / evicted / completed
-> graceful shutdown
-> resources cleaned up
flowchart TD
User[kubectl apply] --> API[API Server]
API --> etcd[(etcd)]
API --> Sched[Scheduler]
Sched -->|assigns node| KL[Kubelet]
KL --> Sandbox[Create Pod Sandbox]
Sandbox --> CNI[CNI Network Setup]
Sandbox --> Vols[Volume Mounts]
CNI --> Init[Init Containers]
Vols --> Init
Init -->|sequential, must pass| App[App Containers]
App --> Running[Running Pod]
Running -.->|startup probe| SP[Startup Probe]
Running -.->|liveness| LP[Liveness Probe]
Running -.->|readiness| RP[Readiness Probe]
LP -.->|fail| Restart[Container Restart]
RP -.->|pass| Endpoints[Added to Endpoints]
KL --- CRI[Container Runtime]
CRI --> Sandbox
Pod fundamentals¶
A Pod contains one or more containers that share:
- network namespace
- IP address
- port space
- some storage volumes
- pod-level metadata and policy
Containers in the same Pod are intentionally coupled. If you need independent scaling or lifecycle, they should usually not be in the same Pod.
Object creation path¶
1. Pod spec submitted¶
The Pod spec may come from:
- direct Pod manifest
- Deployment -> ReplicaSet -> Pod
- StatefulSet
- DaemonSet
- Job / CronJob
- custom controllers
Most real Pods come from controllers, not from you hand-creating naked Pods forever like a caveman.
2. API server validation and admission¶
The API server:
- authenticates request
- authorizes action
- validates schema
- runs admission control / mutation / policy
- persists desired state into etcd
Possible mutations here:
- default values inserted
- sidecars injected
- security settings modified
- labels/annotations added
- image policy enforced
Important consequence: the Pod you wrote and the Pod actually stored may differ.
Scheduling¶
3. Pod is Pending and unscheduled¶
Initially the Pod usually has no .spec.nodeName. It sits in Pending state waiting for the scheduler.
4. Scheduler evaluates the Pod¶
The scheduler filters nodes based on hard constraints, then scores candidates.
Typical constraints:
- resource requests
- taints/tolerations
- node selectors
- node affinity
- pod affinity / anti-affinity
- topology spread constraints
- volume topology constraints
- special runtime class constraints
Key reality¶
Scheduling is based mostly on requested resources and policy, not actual runtime usage.
If requests are wrong, scheduling decisions are wrong.
5. Node assignment¶
The scheduler binds the Pod to a node by setting spec.nodeName.
At this point the pod is still not running. It is merely assigned.
Kubelet takes over¶
6. Kubelet watches for assigned Pods¶
The kubelet on the chosen node sees the Pod assignment and begins reconciliation. Kubelet's job is basically:
- observe desired Pod state
- observe actual local runtime state
- do local work until they match
Kubelet is the node-local truth enforcer.
7. Kubelet prepares Pod environment¶
This includes:
- pulling secrets/config needed locally
- preparing volumes
- calculating sandbox configuration
- consulting CNI for networking
- talking to CRI runtime
Pod sandbox creation¶
8. Runtime creates sandbox¶
For CRI runtimes, the first step is often creation of a Pod sandbox. Think of the sandbox as the pod-level environment:
- network namespace
- shared Linux namespaces as configured
- some cgroup structure
- basic infra / pause container pattern in some implementations
Why sandbox exists¶
All containers in a Pod need to share pod-level resources. The sandbox anchors that shared environment.
Pause container¶
In many implementations, a tiny "pause" container holds the shared namespaces alive. The app containers then join them.
Networking¶
9. CNI plugin sets up Pod network¶
The kubelet asks the runtime, which invokes CNI plugin logic to:
- create/attach interfaces
- assign Pod IP
- add routes
- configure veth pair / bridge / overlay / ENI / whatever the environment uses
- set DNS config
Why this matters:
- Pod startup can fail before your app even begins if networking setup fails
- many "container failed to start" issues are actually CNI failures
Volumes¶
10. Volume setup¶
The kubelet prepares declared volumes:
emptyDir- ConfigMaps
- Secrets
- projected volumes
- PersistentVolumeClaim-backed storage
- CSI volumes
- hostPath
- tmpfs-backed secret/config mechanisms under the hood as appropriate
Volume mount preparation happens before containers that depend on those mounts start.
Image pull and container creation¶
11. Init containers start first¶
If init containers exist, they run sequentially. Each must complete successfully before the next starts and before app containers start.
Use them for:
- one-time setup
- migrations
- dependency checks
- asset preparation
Do not use them as a dumping ground for random startup sins.
Failure pattern¶
If an init container keeps failing, the Pod never proceeds to app containers. It remains stuck in a not-fully-initialized state.
12. App container creation¶
For each app container, kubelet/runtime does roughly:
- pull image if needed
- create container with pod namespace attachments
- mount volumes
- set environment
- apply resource settings
- start process
The Pod may now move toward Running, but traffic should not necessarily be sent yet.
Pod phases vs container states¶
People often confuse Pod phase with detailed runtime state.
Pod phase¶
High-level values include:
PendingRunningSucceededFailedUnknown
This is a broad summary, not a precise state machine for every internal transition.
Container states¶
Each container has more detailed states such as:
- Waiting
- Running
- Terminated
Those include reasons like:
ImagePullBackOffCrashLoopBackOffContainerCreatingOOMKilledErrorCompleted
Always inspect container state details, not just Pod phase.
Readiness, liveness, and startup¶
Readiness probe¶
Determines whether the Pod should receive traffic through Services/endpoints.
A Pod can be running but not ready.
That distinction is critical.
Liveness probe¶
Determines whether kubelet should restart a container that appears unhealthy.
Use it carefully. Bad liveness probes are self-inflicted denial-of-service.
Startup probe¶
Gives slow-starting apps more time before liveness/readiness logic begins punishing them.
Common anti-pattern¶
People use liveness to detect dependency failures too aggressively, then their app restarts endlessly instead of stabilizing.
Restarts and crash loops¶
Kubelet may restart containers in a Pod depending on restart policy and workload type.
Restart policies¶
Common behavior differs for:
- naked Pods
- Jobs
- higher-level controllers
For most long-running app Pods managed by controllers, repeated app-container failure results in restart attempts and eventually CrashLoopBackOff.
CrashLoopBackOff¶
This is not a root cause. It is the symptom that:
- container starts
- fails
- restarts
- backoff increases
Root causes are usually:
- bad config
- missing secret
- migration failed
- permission problem
- wrong command
- dependency unavailable
- OOM kill
- probe misconfiguration
Pod readiness in service routing¶
When a Pod becomes Ready, endpoint information is updated so Services can send traffic to it.
That means there is a control-plane + kube-proxy/CNI propagation path between "container looks healthy" and "network traffic now reaches it."
This is why readiness changes are not pure local process facts. They affect routing.
Eviction, preemption, and disruption¶
Pods do not leave only because you deleted them.
They can also be removed due to:
- node pressure eviction
- node drain
- preemption by higher-priority workloads
- taint-based eviction
- controller rollout replacement
- pod disruption budget interactions
- underlying node failure
Eviction due to pressure¶
Common causes:
- memory pressure
- disk pressure
- PID pressure
Kubernetes is not sentimental. Under pressure it will kill your stuff.
Termination path¶
1. Deletion requested¶
A Pod receives a deletion timestamp.
2. Grace period starts¶
Kubelet begins graceful shutdown, usually:
- remove from endpoints/readiness path
- send
SIGTERMto containers - wait termination grace period
- run preStop hooks where configured
- send
SIGKILLif processes remain
Important point¶
A deleted Pod object does not mean the process vanished instantly. There is a termination dance.
3. Volumes/network/sandbox cleanup¶
After processes exit, kubelet/runtime cleans up:
- containers
- sandbox
- mounts
- network attachment state
- cgroup structures
- logs subject to retention/runtime behavior
Static Pods and mirror Pods¶
Static Pods are managed directly by kubelet from local files, not from the normal API-driven scheduling flow.
They are often used for control-plane components in certain cluster setups.
Important distinction:
- kubelet runs them because local config says so
- API server may show mirror objects, but the API is not the source of truth for those Pods
Debugging workflow¶
Step 1 - find where lifecycle stopped¶
Ask:
- rejected by API?
- unscheduled?
- stuck at image pull?
- stuck at CNI?
- init container failed?
- app container failed?
- probes failing?
- evicted?
Step 2 - inspect events first¶
Events often expose the stage of failure fastest.
Step 3 - inspect container states and reasons¶
Do not stop at Pending or CrashLoopBackOff.
Step 4 - separate control plane vs node-local failure¶
- Scheduler problem?
- Admission/policy problem?
- Kubelet/CRI problem?
- CNI problem?
- CSI/storage problem?
- app problem?
Common production failure patterns¶
Pod stuck in Pending¶
Usually one of:
- no schedulable nodes
- requests too large
- taints
- affinity impossible
- PVC not bindable
- image pull secret/policy issues in some cases after scheduling
Pod stuck in ContainerCreating¶
Usually node-local setup issue:
- CNI failed
- volume mount failed
- runtime slow/broken
- image pull waiting
ImagePullBackOff¶
- wrong image name
- auth failure
- registry unavailable
- tag missing
- network/DNS problem
CrashLoopBackOff¶
- app crashes immediately
- probe kills app
- OOM
- config/secret wrong
- command/args wrong
Pod Running but not Ready¶
- readiness probe failing
- app listening on wrong port/interface
- sidecar/init dependency not satisfied
- readiness gate condition not met
Pod terminates slowly¶
- app ignores SIGTERM
- long preStop hook
- stuck IO
- finalizers or controller behavior confusion
- volume detach or node issues prolong cleanup
Interview angles¶
Good questions hidden here:
- difference between Pending and Running
- what kubelet does after scheduling
- what the pause container is
- difference between liveness and readiness
- what
CrashLoopBackOffactually means - how a Pod gets an IP
- what happens on Pod deletion
- what init containers are for
- why a Pod can be Running but still not serve traffic
Strong answers explain the control-plane handoff to the kubelet and the sandbox/network/init/app/probe sequence.
Mental model to keep¶
A Pod lifecycle is a relay race:
- API server stores desired state
- scheduler picks a node
- kubelet on that node reconciles the Pod
- runtime creates sandbox and containers
- CNI and storage wire up dependencies
- probes decide traffic eligibility
- kubelet keeps reconciling until termination
- graceful shutdown and cleanup occur
If the Pod fails, ask which runner dropped the baton.
References¶
- Kubernetes Pod lifecycle
- Pods
- Container lifecycle hooks
- Resource management for Pods and containers
- Pod security standards
Practice¶
- Topic primer: K8s Ops
- Drills: kubectl Drills
- Skillcheck: Kubernetes Under the Covers
- Runbooks: CrashLoopBackOff, OOMKilled
Wiki Navigation¶
Prerequisites¶
- Kubernetes Ops (Production) (Topic Pack, L2)
Related Content¶
- Adversarial Interview Gauntlet (30 sequences) (Scenario, L2) — Kubernetes Core
- Case Study: Alert Storm — Flapping Health Checks (Case Study, L2) — Kubernetes Core
- Case Study: Canary Deploy Routing to Wrong Backend — Ingress Misconfigured (Case Study, L2) — Kubernetes Core
- Case Study: CrashLoopBackOff No Logs (Case Study, L1) — Kubernetes Core
- Case Study: DNS Looks Broken — TLS Expired, Fix Is Cert-Manager (Case Study, L2) — Kubernetes Core
- Case Study: DaemonSet Blocks Eviction (Case Study, L2) — Kubernetes Core
- Case Study: Deployment Stuck — ImagePull Auth Failure, Vault Secret Rotation (Case Study, L2) — Kubernetes Core
- Case Study: Drain Blocked by PDB (Case Study, L2) — Kubernetes Core
- Case Study: HPA Flapping — Metrics Server Clock Skew, Fix Is NTP (Case Study, L2) — Kubernetes Core
- Case Study: ImagePullBackOff Registry Auth (Case Study, L1) — Kubernetes Core
Pages that link here¶
- Chaos Engineering & Fault Injection
- Kubernetes Under the Covers
- Practical Kubernetes Ops - Street Ops
- Primer
- Runbook: OOMKilled Container
- Runbook: Pod CrashLoopBackOff
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning