Skip to content

Incident Replay: DaemonSet Blocks Node Eviction

Setup

  • System context: Kubernetes cluster with 10 worker nodes. A rolling OS upgrade requires draining nodes one at a time. The drain operation hangs on every node.
  • Time: Saturday 02:00 UTC (maintenance window)
  • Your role: Platform engineer performing the rolling upgrade

Round 1: Alert Fires

[Pressure cue: "Maintenance window started 30 minutes ago. First node drain has been stuck for 25 minutes. 9 more nodes to go. Window closes in 4 hours."]

What you see: kubectl drain k8s-worker-01 --ignore-daemonsets --delete-emptydir-pods hangs with: "evicting pod kube-system/log-collector-xxxxx" and never completes. The log-collector is a DaemonSet pod.

Choose your action: - A) Force delete the stuck pod: kubectl delete pod --force --grace-period=0 - B) Check why --ignore-daemonsets is not skipping this DaemonSet pod - C) Add --timeout=60s to the drain command to force it through - D) Skip the drain and just reboot the node

[Result: The log-collector pod is managed by a DaemonSet, but it also has a PodDisruptionBudget (PDB) with minAvailable: 1. The --ignore-daemonsets flag skips DaemonSet pods from eviction, but the PDB is still being evaluated. The drain API respects PDBs even for DaemonSet pods. Proceed to Round 2.]

If you chose A:

[Result: Force-deleting the pod removes it, but the DaemonSet immediately recreates it. The drain command picks it up again and gets stuck again. Infinite loop.]

If you chose C:

[Result: Timeout causes the drain to fail rather than complete. The node is not cordoned for maintenance. Does not help.]

If you chose D:

[Result: Rebooting without draining causes ungraceful pod termination. Stateful workloads may lose data. Running workloads get abruptly killed.]

Round 2: First Triage Data

[Pressure cue: "1 hour into a 4-hour window. Zero nodes upgraded. Team is getting anxious."]

What you see: The PDB log-collector-pdb has minAvailable: 1 and there is exactly 1 log-collector pod per node (DaemonSet). When drain tries to evict the pod on worker-01, the PDB blocks it because removing 1 of 10 pods would violate minAvailable=1 per-node... wait, minAvailable is cluster-wide. With 10 replicas and minAvailable=1, evicting 1 should leave 9 (above minimum). Something else is wrong.

Choose your action: - A) Check the PDB status with kubectl get pdb log-collector-pdb -o yaml - B) Delete the PDB and retry the drain - C) Check if there are unhealthy log-collector pods on other nodes - D) Check if the pod has a finalizer preventing eviction

[Result: PDB status shows currentHealthy: 9, desiredHealthy: 1, disruptionsAllowed: 0. Wait — 9 healthy with desired 1 should allow 8 disruptions. But disruptionsAllowed: 0 means something is wrong. The PDB selector matches ALL pods in the namespace, not just log-collector pods. 50 pods match, 45 healthy, minAvailable=1 but the eviction controller sees pending evictions from other pods. Proceed to Round 3.]

If you chose B:

[Result: Deleting the PDB works — drain proceeds. But the PDB exists for a reason (protecting the logging pipeline). You need a targeted fix.]

If you chose C:

[Result: All log-collector pods are healthy. The PDB is the blocker, not pod health.]

If you chose D:

[Result: No finalizers on the pod. The eviction API is being blocked by the PDB, not by finalizers.]

Round 3: Root Cause Identification

[Pressure cue: "Identified the PDB issue. Fix it correctly."]

What you see: Root cause: The PDB's label selector is too broad — it matches all pods in kube-system, not just log-collector pods. When drain tries to evict any pod, the PDB's disruption budget counts all matched pods, and concurrent evictions push the allowed disruptions to 0.

Choose your action: - A) Fix the PDB selector to match only log-collector pods, then retry drain - B) Change minAvailable from 1 to 0 to allow full disruption - C) Delete the PDB, complete the upgrade, then recreate it correctly - D) Use --disable-eviction=true flag on kubectl drain

[Result: Update the PDB selector to matchLabels: {app: log-collector}. Apply the corrected PDB. Drain now proceeds — log-collector pod is evicted, DaemonSet recreates it on the node after upgrade. Proceed to Round 4.]

If you chose B:

[Result: minAvailable=0 means the PDB provides no protection at all. Might as well delete it.]

If you chose C:

[Result: Works for the maintenance window but risky — if someone forgets to recreate the PDB, the logging pipeline has no disruption protection.]

If you chose D:

[Result: --disable-eviction bypasses the eviction API entirely, using direct delete. This ignores ALL PDBs, not just the broken one. Dangerous for other workloads with valid PDBs.]

Round 4: Remediation

[Pressure cue: "First node draining. Continue the rolling upgrade."]

Actions: 1. Verify drain completes on k8s-worker-01: kubectl drain finishes 2. Perform OS upgrade on the drained node 3. Uncordon: kubectl uncordon k8s-worker-01 4. Repeat for remaining 9 nodes 5. After upgrade: verify the corrected PDB is in place and selectors are accurate 6. Add PDB selector validation to the CI pipeline

Damage Report

  • Total downtime: 0 (rolling upgrade, workloads moved between nodes)
  • Blast radius: Maintenance window delayed by 1 hour; remaining 9 nodes upgraded within window
  • Optimal resolution time: 15 minutes (identify PDB -> fix selector -> resume drain)
  • If every wrong choice was made: 2+ hours plus risk of data loss from forced reboots

Cross-References