Skip to content

Drill: Safely Drain a Kubernetes Node

Goal

Perform a safe node drain workflow: cordon the node, check PodDisruptionBudgets, drain, and verify workloads moved.

Setup

  • kubectl configured with cluster access
  • A multi-node cluster with workloads running
  • Appropriate RBAC permissions for node operations

Commands

Check current node status:

kubectl get nodes -o wide

Check what is running on the target node:

kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=<node-name>

Cordon the node (prevent new scheduling):

kubectl cordon <node-name>

Verify node shows SchedulingDisabled:

kubectl get node <node-name>

Check PodDisruptionBudgets that may block the drain:

kubectl get pdb --all-namespaces

Perform the drain:

kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

If pods without controllers exist, add force:

kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data --force

Verify pods moved to other nodes:

kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=<node-name>

Uncordon when maintenance is complete:

kubectl uncordon <node-name>

What to Look For

  • After cordon, node status shows SchedulingDisabled
  • PDB violations block drain with explicit error messages
  • DaemonSet pods remain on the node (expected with --ignore-daemonsets)
  • Workload pods should appear on other nodes after drain completes

Common Mistakes

  • Draining without checking PDBs first, causing the drain to hang
  • Forgetting --ignore-daemonsets, which causes drain to fail on every node
  • Using --force without understanding it deletes pods not managed by a controller
  • Forgetting to uncordon the node after maintenance

Cleanup

kubectl uncordon <node-name>