Skip to content

Node Maintenance - Street-Level Ops

Real-world workflows for cordoning, draining, patching, and upgrading Kubernetes nodes safely.

Pre-Flight Checks

# See node status and versions
kubectl get nodes -o wide
# NAME        STATUS   VERSION   INTERNAL-IP   OS-IMAGE             KERNEL-VERSION
# worker-01   Ready    v1.28.3   10.0.1.21     Ubuntu 22.04.3 LTS   5.15.0-91
# worker-02   Ready    v1.28.3   10.0.1.22     Ubuntu 22.04.3 LTS   5.15.0-91
# worker-03   Ready    v1.28.3   10.0.1.23     Ubuntu 22.04.3 LTS   5.15.0-91

# Check PDB headroom before starting
kubectl get pdb -A
# NAME       MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
# api-pdb    2               N/A               1                     30d
# If ALLOWED DISRUPTIONS is 0, the drain will HANG (not fail) until headroom opens

# Check what pods are on the target node
kubectl get pods -A --field-selector spec.nodeName=worker-03 -o wide

Cordon, Drain, Uncordon

# Step 1: Mark node unschedulable (no new pods)
kubectl cordon worker-03
# node/worker-03 cordoned

kubectl get node worker-03
# worker-03   Ready,SchedulingDisabled   <none>   45d   v1.28.3

# Step 2: Dry-run the drain first
kubectl drain worker-03 --ignore-daemonsets --delete-emptydir-data --dry-run=client
# pod/myapp-abc123 would be evicted
# pod/worker-xyz789 would be evicted

# Step 3: Execute the drain
kubectl drain worker-03 \
  --ignore-daemonsets \
  --delete-emptydir-data \
  --grace-period=120 \
  --timeout=300s

# Step 4: Perform maintenance (SSH to node, patch, reboot, etc.)

# Step 5: Uncordon
kubectl uncordon worker-03
# node/worker-03 uncordoned

# Verify pods are rescheduling
kubectl get pods -A -o wide | grep worker-03

Remember: Node maintenance mnemonic: C-D-M-U — Cordon (stop scheduling), Drain (evict pods), Maintain (patch/reboot), Uncordon (resume scheduling). Never skip cordon — draining without it risks new pods landing on the node mid-maintenance.

OS Patching

# After cordon and drain, SSH to the node
ssh worker-03

# Apply OS updates
apt-get update && apt-get upgrade -y

# If kernel was updated, reboot
reboot

# Wait for node to rejoin cluster (from control plane)
kubectl get nodes -w
# worker-03   NotReady   ...
# worker-03   Ready      ...

# Uncordon
kubectl uncordon worker-03

Kubelet Upgrade

# On the node (after cordon + drain):
apt-get update
apt-get install -y kubelet=1.29.0-1.1 kubectl=1.29.0-1.1
apt-mark hold kubelet kubectl

systemctl daemon-reload
systemctl restart kubelet

# Verify
systemctl status kubelet
journalctl -u kubelet --no-pager -n 20

# From control plane: check version
kubectl get node worker-03
# worker-03   Ready   v1.29.0

kubectl uncordon worker-03

Fix Stuck Drains

# Find what is blocking the drain
kubectl get pods -A --field-selector spec.nodeName=worker-03

# Check for standalone pods (no controller — drain refuses to evict these)
kubectl get pods -A --field-selector spec.nodeName=worker-03 -o json | \
  jq '.items[] | select(.metadata.ownerReferences == null) | .metadata.namespace + "/" + .metadata.name'

# Force evict standalone pods (--force bypasses the safety check for controller-less pods)
kubectl drain worker-03 --ignore-daemonsets --delete-emptydir-data --force

# If a pod is stuck terminating (finalizer or termination grace period)
kubectl delete pod stuck-pod -n production --grace-period=0 --force

# Check for finalizers blocking deletion
kubectl get pod stuck-pod -n production -o jsonpath='{.metadata.finalizers}'
# Remove stuck finalizer (last resort)
kubectl patch pod stuck-pod -n production -p '{"metadata":{"finalizers":null}}'

Gotcha: --grace-period=0 --force on a pod does NOT kill the container immediately — it removes the pod from the API server. If the kubelet on that node is unreachable, the container keeps running. You will have a ghost container consuming resources until the node comes back. Verify with docker ps or crictl ps on the node after it recovers.

Rolling Maintenance Script

#!/usr/bin/env bash
set -euo pipefail

NODES=$(kubectl get nodes -l role=worker -o jsonpath='{.items[*].metadata.name}')

for node in ${NODES}; do
    echo "=== Maintaining ${node} ==="

    # Pre-flight: check PDB headroom
    DISRUPTIONS=$(kubectl get pdb -A -o jsonpath='{range .items[*]}{.status.disruptionsAllowed}{" "}{end}')
    echo "PDB headroom: ${DISRUPTIONS}"

    kubectl cordon "${node}"
    kubectl drain "${node}" --ignore-daemonsets --delete-emptydir-data --timeout=300s

    ssh "${node}" 'apt-get update && apt-get upgrade -y && reboot' || true

    echo "Waiting for ${node} to rejoin..."
    until kubectl get node "${node}" 2>/dev/null | grep -q " Ready"; do
        sleep 10
    done

    kubectl uncordon "${node}"
    echo "=== ${node} complete ==="

    # Wait for pods to reschedule before next node
    sleep 60
done
echo "All nodes maintained."

Control Plane Node Maintenance

# For control plane nodes with etcd, extra care needed

# Check etcd cluster health first
ETCDCTL_API=3 etcdctl endpoint health --cluster \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

# Never drain more than one control plane node at a time
# Ensure etcd has quorum (2/3 or 3/5) before proceeding

Under the hood: etcd requires a strict majority for quorum: 2/3, 3/5, or 4/7 nodes. Losing quorum makes the entire cluster read-only — no new pods, no deployments, no config changes. With a 3-node control plane, losing one node is survivable; losing two is a cluster-down event. Always verify endpoint health --cluster before touching a control plane node.

Verify After Maintenance

# Check node is Ready and schedulable
kubectl get node worker-03

# Check node conditions
kubectl describe node worker-03 | grep -A10 "Conditions:"
# MemoryPressure    False
# DiskPressure      False
# PIDPressure       False
# Ready             True

# Verify DaemonSet pods are back
kubectl get pods -A --field-selector spec.nodeName=worker-03 | grep -i daemon

# Check cluster-wide pod health
kubectl get pods -A | grep -v Running | grep -v Completed