What Happens When Kubernetes Evicts Your Pod
- lesson
- node-pressure
- eviction-signals
- qos-classes
- graceful-termination
- rescheduling
- l2 ---# What Happens When Kubernetes Evicts Your Pod
Topics: node pressure, eviction signals, QoS classes, graceful termination, rescheduling Level: L2 (Operations) Time: 45–60 minutes Prerequisites: Basic Kubernetes understanding
The Mission¶
Your pod is healthy. It's passing readiness probes. It has plenty of memory within its limits. Then suddenly:
The pod was evicted — not because it did anything wrong, but because the node was under pressure. Kubernetes decided your pod was the one to sacrifice.
Why Kubernetes Evicts Pods¶
The kubelet monitors node resources. When resources cross a threshold, it starts evicting pods to protect the node:
| Signal | Eviction threshold | What it measures |
|---|---|---|
memory.available |
< 100Mi (default) | Free RAM on the node |
nodefs.available |
< 10% | Free disk on the node's root filesystem |
nodefs.inodesFree |
< 5% | Free inodes |
imagefs.available |
< 15% | Free disk for container images |
pid.available |
< 100 | Available PIDs |
# Check node conditions
kubectl describe node mynode | grep -A5 Conditions
# → MemoryPressure False
# → DiskPressure False
# → PIDPressure False
# → Ready True
# If any pressure is True, evictions are happening or imminent
The Eviction Order¶
When the kubelet decides to evict, it doesn't pick randomly. It follows this priority:
1. BestEffort pods (no resource requests or limits — killed first)
2. Burstable pods (requests < limits)
→ Sorted by how far over their request they are
3. Guaranteed pods (requests == limits — killed last)
Within each QoS class, pods using the most resources above their request are evicted first.
# BestEffort — no resources defined, killed first
containers:
- name: myapp
# No resources section at all
# Burstable — requests < limits, killed second
containers:
- name: myapp
resources:
requests:
memory: 256Mi
limits:
memory: 512Mi
# Guaranteed — requests == limits, killed last
containers:
- name: myapp
resources:
requests:
memory: 512Mi
cpu: 500m
limits:
memory: 512Mi
cpu: 500m
Gotcha: BestEffort pods have
oom_score_adj: 1000— they're always killed first. If you have any pods without resource requests/limits in a cluster under memory pressure, they die first. Always set resources.
Graceful Eviction Sequence¶
1. kubelet detects node pressure (e.g., memory < 100Mi)
2. kubelet selects pods for eviction (QoS priority order)
3. Pod status set to "Failed" with reason "Evicted"
4. Pod receives SIGTERM
5. preStop hook runs (if configured)
6. terminationGracePeriodSeconds countdown (default 30s)
7. If pod hasn't exited: SIGKILL
8. Controller (Deployment) notices pod is gone
9. Controller creates replacement pod
10. Scheduler places replacement on a node with capacity
Gotcha: Evicted pods are NOT rescheduled on the same node (it's under pressure). The replacement goes to a different node. But if ALL nodes are under pressure, the replacement pod stays Pending indefinitely.
Ephemeral Storage Eviction¶
The most confusing eviction: your pod isn't using much memory or CPU, but it's evicted for
ephemeral-storage. This includes:
Ephemeral storage = container writable layer
+ emptyDir volumes
+ container log files (/var/log/containers/)
+ ANY writes inside the container
A container that writes large temp files, generates big logs, or has a build cache filling up can trigger ephemeral storage eviction.
# Set ephemeral storage limits
containers:
- name: myapp
resources:
requests:
ephemeral-storage: 1Gi
limits:
ephemeral-storage: 2Gi
Debugging Evictions¶
# Find evicted pods
kubectl get pods --field-selector status.phase=Failed | grep Evicted
# See why a pod was evicted
kubectl describe pod <evicted-pod-name>
# → Status: Failed
# → Reason: Evicted
# → Message: The node was low on resource: memory.
# → Container myapp was using 780Mi, request is 512Mi.
# Check node pressure events
kubectl get events --sort-by=.metadata.creationTimestamp | grep -i evict
# Check node resource allocation
kubectl describe node <node-name> | grep -A20 "Allocated resources"
# → Requests Limits
# → cpu 3200m (80%) 6400m (160%)
# → memory 12Gi (75%) 24Gi (150%)
# ↑ 150% memory limits = node is overcommitted
Gotcha: Limits can exceed node capacity (overcommit). Kubernetes schedules based on requests, not limits. If every pod on a node bursts to its limit simultaneously, total usage exceeds node capacity → evictions. This is by design (efficient utilization) but catches teams who set high limits "just in case."
Preventing Evictions¶
# 1. Set appropriate requests (based on actual usage, not guesses)
resources:
requests:
memory: 400Mi # 95th percentile of actual usage
cpu: 200m
limits:
memory: 600Mi # 1.5x request for burst headroom
cpu: 500m
# 2. Use PodDisruptionBudget to prevent draining too many at once
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: myapp-pdb
spec:
minAvailable: 2 # Always keep at least 2 pods running
selector:
matchLabels:
app: myapp
# 3. Use PriorityClass for critical workloads
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: critical
value: 1000000
globalDefault: false
description: "Critical workloads that should not be evicted"
Flashcard Check¶
Q1: Node has MemoryPressure=True. Which pod dies first?
BestEffort (no resource requests/limits). Then Burstable pods using most above their request. Guaranteed pods die last.
Q2: Pod evicted for "ephemeral-storage." What counts as ephemeral?
Container writable layer + emptyDir volumes + container logs + any writes inside the container. Set
ephemeral-storagelimits to prevent surprise evictions.
Q3: All nodes are under pressure. What happens to evicted pods?
The replacement pod stays Pending indefinitely — no node has capacity to schedule it. You need to either add nodes or reduce workload.
Cheat Sheet¶
| Task | Command |
|---|---|
| Find evicted pods | kubectl get pods --field-selector status.phase=Failed |
| Node pressure | kubectl describe node NODE \| grep Pressure |
| Node allocation | kubectl describe node NODE \| grep -A20 "Allocated" |
| Clean up evicted pods | kubectl delete pods --field-selector status.phase=Failed |
| Check QoS class | kubectl get pod POD -o jsonpath='{.status.qosClass}' |
Takeaways¶
-
Always set resource requests. Pods without them are BestEffort and die first during node pressure. Set requests based on actual usage, not guesses.
-
Eviction follows QoS priority. BestEffort → Burstable → Guaranteed. Set requests == limits for critical pods to get Guaranteed QoS.
-
Ephemeral storage is sneaky. Container logs, temp files, and writable layer all count. Set ephemeral storage limits or get surprise evictions.
-
Overcommit is by design. Limits can exceed node capacity. Kubernetes schedules by requests. If everyone bursts simultaneously, evictions happen.
Related Lessons¶
- Out of Memory — when cgroup OOM killer fires before kubelet evicts
- The Disk That Filled Up — ephemeral storage and node disk pressure
- What Happens When You
kubectl apply— the scheduler that places replacement pods