Skip to content

Portal | Level: L2: Operations | Topics: Kubernetes Core | Domain: Kubernetes

Runbook: Pod Eviction

Symptoms

  • Pods in Evicted status across multiple deployments
  • kubectl describe pod shows The node was low on resource: [memory|ephemeral-storage]
  • Node shows MemoryPressure, DiskPressure, or PIDPressure conditions
  • Alert for pod evictions or node pressure

Fast Triage

# Count evicted pods
kubectl get pods -A --field-selector status.phase=Failed | grep Evicted | wc -l

# See eviction details
kubectl get pods -A --field-selector status.phase=Failed -o json | \
  jq -r '.items[] | select(.status.reason=="Evicted") | "\(.metadata.namespace)/\(.metadata.name): \(.status.message)"'

# Check node conditions
kubectl describe nodes | grep -A5 "Conditions:"

# Check node resource usage
kubectl top nodes

Likely Causes (ranked)

  1. Ephemeral storage exhaustion — container logs, emptyDir, or tmp files filling node disk
  2. Memory pressure — node running out of allocatable memory
  3. DiskPressure — node root disk filling up (images, logs, containers)
  4. PID exhaustion — too many processes on the node
  5. Best-effort pods evicted first — pods without resource requests get evicted first

Evidence Interpretation

What bad looks like:

Status:    Failed
Reason:    Evicted
Message:   The node was low on resource: ephemeral-storage.
           Container app was using 2Gi, which exceeds its request of 0.

Eviction priority order:

1. BestEffort pods (no requests/limits) — evicted first
2. Burstable pods (partial requests) — evicted next
3. Guaranteed pods (requests == limits) — evicted last

Fix

1. Clean up evicted pods

[!TIP] Evicted pods are already dead — they will never run again. Don't troubleshoot them; delete them and focus on the root cause (node resource pressure) to prevent future evictions.

# Delete all evicted pods (they're already dead)
kubectl get pods -A --field-selector status.phase=Failed -o json | \
  jq -r '.items[] | select(.status.reason=="Evicted") | "\(.metadata.namespace) \(.metadata.name)"' | \
  xargs -n2 kubectl delete pod -n

2. Fix ephemeral storage pressure

# On the node:
# Check disk usage
df -h
du -sh /var/lib/containerd/*   # Container runtime storage
du -sh /var/log/pods/*         # Pod logs

# Clean up
crictl rmi --prune              # Remove unused images
journalctl --vacuum-size=500M   # Trim journal logs

# In manifests — set ephemeral storage limits:
# resources:
#   requests:
#     ephemeral-storage: "1Gi"
#   limits:
#     ephemeral-storage: "2Gi"

3. Fix memory pressure

# Find memory-hungry pods on the node
kubectl top pods -A --sort-by=memory | head -20

# Check which pods have no memory requests (BestEffort)
kubectl get pods -A -o json | jq -r '
  .items[] |
  select(.spec.containers[] | .resources.requests.memory == null) |
  "\(.metadata.namespace)/\(.metadata.name)"'

# Fix: add memory requests and limits to all workloads

4. Fix disk pressure from images

# On the node:
crictl images | sort -k3 -rh | head -20   # Largest images
crictl rmi --prune                          # Remove unused

# If container runtime disk is separate:
df -h /var/lib/containerd
# May need to expand the disk or move to larger volume

5. Adjust eviction thresholds

# Check current kubelet eviction config:
kubectl get node <name> -o json | jq '.status.allocatable'

# kubelet flags (in kubelet config):
# evictionHard:
#   memory.available: "100Mi"
#   nodefs.available: "10%"
#   imagefs.available: "15%"
# evictionSoft:
#   memory.available: "200Mi"
# evictionSoftGracePeriod:
#   memory.available: "1m30s"

Verification

# Confirm no evicted pods remain
kubectl get pods -A --field-selector status.phase=Failed | grep Evicted
# Expected: no output

# Confirm node pressure conditions are cleared
kubectl describe node <node-name> | grep -A5 Conditions
# Expected: MemoryPressure=False, DiskPressure=False

Prevention

  • Always set resource requests and limits — prevents BestEffort pods
  • Set ephemeral-storage requests on all containers
  • Monitor node disk usage and set alerts at 80%
  • Use emptyDir.sizeLimit to prevent runaway temp file usage
  • Configure log rotation (container runtime + application level)
  • Right-size nodes to match workload requirements
  • Use cluster autoscaler to add capacity before evictions happen

Automated Cleanup

# CronJob to clean evicted pods
kubectl create cronjob cleanup-evicted \
  --image=bitnami/kubectl:latest \
  --schedule="0 */6 * * *" \
  -- /bin/sh -c 'kubectl get pods -A --field-selector status.phase=Failed -o json | jq -r ".items[] | select(.status.reason==\"Evicted\") | \"\(.metadata.namespace) \(.metadata.name)\"" | xargs -n2 kubectl delete pod -n'

Wiki Navigation