Solution¶

Triage¶

Confirm the drain is stuck and identify which pod is blocking:

kubectl get pods -n prod -o wide --field-selector spec.nodeName=node-3.internal

List PodDisruptionBudgets in the namespace:
```
kubectl get pdb -n prod
```

Inspect the specific PDB:

kubectl describe pdb payment-service-pdb -n prod

Check how many replicas the deployment has and how many are ready:
```
kubectl get deployment payment-service -n prod
```

Root Cause¶

The PDB payment-service-pdb specifies minAvailable: 1. The deployment has exactly 1 replica. The Kubernetes eviction API refuses to evict a pod when doing so would violate the PDB. Since evicting the only replica would bring available pods to 0 (below the minimum of 1), the drain operation blocks indefinitely waiting for the PDB condition to be satisfiable.

This is a configuration conflict: the PDB guarantees at least 1 pod is always available, but the deployment only runs 1 pod, making voluntary disruption impossible.

Fix¶

Immediate (unblock the drain):

Scale the deployment up so a second replica is running on another node:
```
kubectl scale deployment payment-service -n prod --replicas=2
```

Wait for the new replica to become Ready:

kubectl wait --for=condition=Ready pod -l app=payment-service -n prod --timeout=120s

The drain should now proceed automatically, since evicting one of two pods still satisfies minAvailable: 1.
After the drain completes and the node is cordoned, scale back if desired:
```
kubectl scale deployment payment-service -n prod --replicas=1
```

Alternative (if no capacity exists for a second replica):

Cancel the drain and use --disable-eviction to bypass the eviction API entirely:

kubectl drain node-3.internal --ignore-daemonsets --disable-eviction

This deletes the pod directly instead of using the eviction API, bypassing PDB checks. This WILL cause downtime.

Rollback / Safety¶

If the drain was started with --delete-emptydir-data, ensure no important data lives in emptyDir volumes.
Verify that the payment-service pod is healthy on its new node after drain completes.
If the service has a readiness probe, confirm it passes before declaring success.
Do not uncordon the drained node until maintenance is complete.

Common Traps¶

Assuming --force bypasses PDBs. It does not. --force only handles pods not managed by a controller. Only --disable-eviction (Kubernetes 1.18+) bypasses PDB enforcement.
Forgetting to check cluster capacity. Scaling to 2 replicas does nothing if the second pod is stuck in Pending due to insufficient resources.
Setting minAvailable equal to replicas count. This is a common misconfiguration. Use maxUnavailable: 1 instead, or ensure replicas always exceeds minAvailable.
Not cancelling the hanging drain. If you Ctrl+C the drain command, the node remains cordoned. You must kubectl uncordon to allow scheduling again if you abort.
Ignoring PDBs in IaC. Fix the Helm chart or Terraform module that defines the PDB, not just the live object.