Incident Replay: DaemonSet Blocks Node Eviction¶
Setup¶
- System context: Kubernetes cluster with 10 worker nodes. A rolling OS upgrade requires draining nodes one at a time. The drain operation hangs on every node.
- Time: Saturday 02:00 UTC (maintenance window)
- Your role: Platform engineer performing the rolling upgrade
Round 1: Alert Fires¶
[Pressure cue: "Maintenance window started 30 minutes ago. First node drain has been stuck for 25 minutes. 9 more nodes to go. Window closes in 4 hours."]
What you see:
kubectl drain k8s-worker-01 --ignore-daemonsets --delete-emptydir-pods hangs with: "evicting pod kube-system/log-collector-xxxxx" and never completes. The log-collector is a DaemonSet pod.
Choose your action:
- A) Force delete the stuck pod: kubectl delete pod --force --grace-period=0
- B) Check why --ignore-daemonsets is not skipping this DaemonSet pod
- C) Add --timeout=60s to the drain command to force it through
- D) Skip the drain and just reboot the node
If you chose B (recommended):¶
[Result: The log-collector pod is managed by a DaemonSet, but it also has a PodDisruptionBudget (PDB) with
minAvailable: 1. The--ignore-daemonsetsflag skips DaemonSet pods from eviction, but the PDB is still being evaluated. The drain API respects PDBs even for DaemonSet pods. Proceed to Round 2.]
If you chose A:¶
[Result: Force-deleting the pod removes it, but the DaemonSet immediately recreates it. The drain command picks it up again and gets stuck again. Infinite loop.]
If you chose C:¶
[Result: Timeout causes the drain to fail rather than complete. The node is not cordoned for maintenance. Does not help.]
If you chose D:¶
[Result: Rebooting without draining causes ungraceful pod termination. Stateful workloads may lose data. Running workloads get abruptly killed.]
Round 2: First Triage Data¶
[Pressure cue: "1 hour into a 4-hour window. Zero nodes upgraded. Team is getting anxious."]
What you see:
The PDB log-collector-pdb has minAvailable: 1 and there is exactly 1 log-collector pod per node (DaemonSet). When drain tries to evict the pod on worker-01, the PDB blocks it because removing 1 of 10 pods would violate minAvailable=1 per-node... wait, minAvailable is cluster-wide. With 10 replicas and minAvailable=1, evicting 1 should leave 9 (above minimum). Something else is wrong.
Choose your action:
- A) Check the PDB status with kubectl get pdb log-collector-pdb -o yaml
- B) Delete the PDB and retry the drain
- C) Check if there are unhealthy log-collector pods on other nodes
- D) Check if the pod has a finalizer preventing eviction
If you chose A (recommended):¶
[Result: PDB status shows
currentHealthy: 9,desiredHealthy: 1,disruptionsAllowed: 0. Wait — 9 healthy with desired 1 should allow 8 disruptions. ButdisruptionsAllowed: 0means something is wrong. The PDB selector matches ALL pods in the namespace, not just log-collector pods. 50 pods match, 45 healthy, minAvailable=1 but the eviction controller sees pending evictions from other pods. Proceed to Round 3.]
If you chose B:¶
[Result: Deleting the PDB works — drain proceeds. But the PDB exists for a reason (protecting the logging pipeline). You need a targeted fix.]
If you chose C:¶
[Result: All log-collector pods are healthy. The PDB is the blocker, not pod health.]
If you chose D:¶
[Result: No finalizers on the pod. The eviction API is being blocked by the PDB, not by finalizers.]
Round 3: Root Cause Identification¶
[Pressure cue: "Identified the PDB issue. Fix it correctly."]
What you see: Root cause: The PDB's label selector is too broad — it matches all pods in kube-system, not just log-collector pods. When drain tries to evict any pod, the PDB's disruption budget counts all matched pods, and concurrent evictions push the allowed disruptions to 0.
Choose your action:
- A) Fix the PDB selector to match only log-collector pods, then retry drain
- B) Change minAvailable from 1 to 0 to allow full disruption
- C) Delete the PDB, complete the upgrade, then recreate it correctly
- D) Use --disable-eviction=true flag on kubectl drain
If you chose A (recommended):¶
[Result: Update the PDB selector to
matchLabels: {app: log-collector}. Apply the corrected PDB. Drain now proceeds — log-collector pod is evicted, DaemonSet recreates it on the node after upgrade. Proceed to Round 4.]
If you chose B:¶
[Result: minAvailable=0 means the PDB provides no protection at all. Might as well delete it.]
If you chose C:¶
[Result: Works for the maintenance window but risky — if someone forgets to recreate the PDB, the logging pipeline has no disruption protection.]
If you chose D:¶
[Result:
--disable-evictionbypasses the eviction API entirely, using direct delete. This ignores ALL PDBs, not just the broken one. Dangerous for other workloads with valid PDBs.]
Round 4: Remediation¶
[Pressure cue: "First node draining. Continue the rolling upgrade."]
Actions:
1. Verify drain completes on k8s-worker-01: kubectl drain finishes
2. Perform OS upgrade on the drained node
3. Uncordon: kubectl uncordon k8s-worker-01
4. Repeat for remaining 9 nodes
5. After upgrade: verify the corrected PDB is in place and selectors are accurate
6. Add PDB selector validation to the CI pipeline
Damage Report¶
- Total downtime: 0 (rolling upgrade, workloads moved between nodes)
- Blast radius: Maintenance window delayed by 1 hour; remaining 9 nodes upgraded within window
- Optimal resolution time: 15 minutes (identify PDB -> fix selector -> resume drain)
- If every wrong choice was made: 2+ hours plus risk of data loss from forced reboots
Cross-References¶
- Primer: Kubernetes Ops
- Primer: Kubernetes Node Lifecycle
- Primer: Kubernetes Pods & Scheduling
- Footguns: Kubernetes Ops