What Happens When Kubernetes Evicts Your Pod

lesson
node-pressure
eviction-signals
qos-classes
graceful-termination
rescheduling
l2 ---# What Happens When Kubernetes Evicts Your Pod

Topics: node pressure, eviction signals, QoS classes, graceful termination, rescheduling Level: L2 (Operations) Time: 45–60 minutes Prerequisites: Basic Kubernetes understanding

The Mission¶

Your pod is healthy. It's passing readiness probes. It has plenty of memory within its limits. Then suddenly:

Status:  Failed
Reason:  Evicted
Message: The node was low on resource: ephemeral-storage

The pod was evicted — not because it did anything wrong, but because the node was under pressure. Kubernetes decided your pod was the one to sacrifice.

Why Kubernetes Evicts Pods¶

The kubelet monitors node resources. When resources cross a threshold, it starts evicting pods to protect the node:

Signal	Eviction threshold	What it measures
`memory.available`	< 100Mi (default)	Free RAM on the node
`nodefs.available`	< 10%	Free disk on the node's root filesystem
`nodefs.inodesFree`	< 5%	Free inodes
`imagefs.available`	< 15%	Free disk for container images
`pid.available`	< 100	Available PIDs

# Check node conditions
kubectl describe node mynode | grep -A5 Conditions
# → MemoryPressure    False
# → DiskPressure      False
# → PIDPressure       False
# → Ready             True

# If any pressure is True, evictions are happening or imminent

The Eviction Order¶

When the kubelet decides to evict, it doesn't pick randomly. It follows this priority:

1. BestEffort pods       (no resource requests or limits — killed first)
2. Burstable pods        (requests < limits)
   → Sorted by how far over their request they are
3. Guaranteed pods       (requests == limits — killed last)

Within each QoS class, pods using the most resources above their request are evicted first.

# BestEffort — no resources defined, killed first
containers:
  - name: myapp
    # No resources section at all

# Burstable — requests < limits, killed second
containers:
  - name: myapp
    resources:
      requests:
        memory: 256Mi
      limits:
        memory: 512Mi

# Guaranteed — requests == limits, killed last
containers:
  - name: myapp
    resources:
      requests:
        memory: 512Mi
        cpu: 500m
      limits:
        memory: 512Mi
        cpu: 500m

Gotcha: BestEffort pods have oom_score_adj: 1000 — they're always killed first. If you have any pods without resource requests/limits in a cluster under memory pressure, they die first. Always set resources.

Graceful Eviction Sequence¶

1. kubelet detects node pressure (e.g., memory < 100Mi)
2. kubelet selects pods for eviction (QoS priority order)
3. Pod status set to "Failed" with reason "Evicted"
4. Pod receives SIGTERM
5. preStop hook runs (if configured)
6. terminationGracePeriodSeconds countdown (default 30s)
7. If pod hasn't exited: SIGKILL
8. Controller (Deployment) notices pod is gone
9. Controller creates replacement pod
10. Scheduler places replacement on a node with capacity

Gotcha: Evicted pods are NOT rescheduled on the same node (it's under pressure). The replacement goes to a different node. But if ALL nodes are under pressure, the replacement pod stays Pending indefinitely.

Ephemeral Storage Eviction¶

The most confusing eviction: your pod isn't using much memory or CPU, but it's evicted for ephemeral-storage. This includes:

Ephemeral storage = container writable layer
                  + emptyDir volumes
                  + container log files (/var/log/containers/)
                  + ANY writes inside the container

A container that writes large temp files, generates big logs, or has a build cache filling up can trigger ephemeral storage eviction.

# Set ephemeral storage limits
containers:
  - name: myapp
    resources:
      requests:
        ephemeral-storage: 1Gi
      limits:
        ephemeral-storage: 2Gi

Debugging Evictions¶

# Find evicted pods
kubectl get pods --field-selector status.phase=Failed | grep Evicted

# See why a pod was evicted
kubectl describe pod <evicted-pod-name>
# → Status:    Failed
# → Reason:    Evicted
# → Message:   The node was low on resource: memory.
# →            Container myapp was using 780Mi, request is 512Mi.

# Check node pressure events
kubectl get events --sort-by=.metadata.creationTimestamp | grep -i evict

# Check node resource allocation
kubectl describe node <node-name> | grep -A20 "Allocated resources"
# →                         Requests    Limits
# → cpu                     3200m (80%) 6400m (160%)
# → memory                  12Gi (75%)  24Gi (150%)
#   ↑ 150% memory limits = node is overcommitted

Gotcha: Limits can exceed node capacity (overcommit). Kubernetes schedules based on requests, not limits. If every pod on a node bursts to its limit simultaneously, total usage exceeds node capacity → evictions. This is by design (efficient utilization) but catches teams who set high limits "just in case."

Preventing Evictions¶

# 1. Set appropriate requests (based on actual usage, not guesses)
resources:
  requests:
    memory: 400Mi     # 95th percentile of actual usage
    cpu: 200m
  limits:
    memory: 600Mi     # 1.5x request for burst headroom
    cpu: 500m

# 2. Use PodDisruptionBudget to prevent draining too many at once
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: myapp-pdb
spec:
  minAvailable: 2     # Always keep at least 2 pods running
  selector:
    matchLabels:
      app: myapp

# 3. Use PriorityClass for critical workloads
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: critical
value: 1000000
globalDefault: false
description: "Critical workloads that should not be evicted"

Flashcard Check¶

Q1: Node has MemoryPressure=True. Which pod dies first?

BestEffort (no resource requests/limits). Then Burstable pods using most above their request. Guaranteed pods die last.

Q2: Pod evicted for "ephemeral-storage." What counts as ephemeral?

Container writable layer + emptyDir volumes + container logs + any writes inside the container. Set ephemeral-storage limits to prevent surprise evictions.

Q3: All nodes are under pressure. What happens to evicted pods?

The replacement pod stays Pending indefinitely — no node has capacity to schedule it. You need to either add nodes or reduce workload.

Cheat Sheet¶

Task	Command
Find evicted pods	`kubectl get pods --field-selector status.phase=Failed`
Node pressure	`kubectl describe node NODE \\| grep Pressure`
Node allocation	`kubectl describe node NODE \\| grep -A20 "Allocated"`
Clean up evicted pods	`kubectl delete pods --field-selector status.phase=Failed`
Check QoS class	`kubectl get pod POD -o jsonpath='{.status.qosClass}'`

Takeaways¶

Always set resource requests. Pods without them are BestEffort and die first during node pressure. Set requests based on actual usage, not guesses.
Eviction follows QoS priority. BestEffort → Burstable → Guaranteed. Set requests == limits for critical pods to get Guaranteed QoS.
Ephemeral storage is sneaky. Container logs, temp files, and writable layer all count. Set ephemeral storage limits or get surprise evictions.
Overcommit is by design. Limits can exceed node capacity. Kubernetes schedules by requests. If everyone bursts simultaneously, evictions happen.

Out of Memory — when cgroup OOM killer fires before kubelet evicts
The Disk That Filled Up — ephemeral storage and node disk pressure
What Happens When You kubectl apply — the scheduler that places replacement pods