- k8s
- l1
- topic-pack
- node-maintenance --- Portal | Level: L1: Foundations | Topics: Node Lifecycle & Maintenance | Domain: Kubernetes
Node Maintenance - Primer¶
Why This Matters¶
Kubernetes nodes need patching, upgrading, and occasional hardware replacement. Do it wrong and you drop production traffic, violate pod disruption budgets, or leave the cluster in a degraded state with unschedulable workloads. Node maintenance is one of the most common operational tasks in Kubernetes — and one of the most common sources of avoidable outages when done carelessly.
Core Concepts¶
1. Cordon, Drain, Uncordon — The Maintenance Lifecycle¶
Every node maintenance follows the same three-step pattern:
# Step 1: Cordon — mark node as unschedulable (no new pods land here)
kubectl cordon worker-03
# Step 2: Drain — evict all pods gracefully
kubectl drain worker-03 --ignore-daemonsets --delete-emptydir-data --grace-period=120
# Step 3: Perform maintenance (OS patch, kubelet upgrade, hardware swap)
# Step 4: Uncordon — mark node as schedulable again
kubectl uncordon worker-03
Check node status at each step:
The SchedulingDisabled taint means the node is cordoned. Existing pods keep running until drained.
Remember: The node maintenance mantra: "CDC" — Cordon, Drain, unCordon. Always in this order. Cordoning first prevents new pods from landing while you prepare the drain. Draining second evicts existing pods gracefully. Uncordoning last returns the node to service. Skipping the cordon step means new pods can land on the node between your drain and your maintenance — defeating the purpose.
2. Drain Flags That Matter¶
# Basic drain (will fail if there are pods not managed by a controller)
kubectl drain worker-03
# Production drain — handle DaemonSets, emptyDir, and local data
kubectl drain worker-03 \
--ignore-daemonsets \
--delete-emptydir-data \
--grace-period=120 \
--timeout=300s \
--force
# Dry run first — see what would be evicted
kubectl drain worker-03 --ignore-daemonsets --delete-emptydir-data --dry-run=client
| Flag | Purpose |
|---|---|
--ignore-daemonsets |
Skip DaemonSet pods (they run on every node by design) |
--delete-emptydir-data |
Allow eviction of pods using emptyDir volumes (data will be lost) |
--grace-period=N |
Seconds to wait for graceful pod shutdown |
--timeout=N |
Abort drain if it takes longer than N seconds |
--force |
Evict pods not managed by ReplicaSet/Job/DaemonSet/StatefulSet |
--pod-selector |
Only drain pods matching a label selector |
3. PodDisruptionBudgets (PDBs)¶
PDBs tell Kubernetes how many pods of a given set must remain available during voluntary disruptions (like drains):
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
namespace: production
spec:
minAvailable: 2 # At least 2 pods must stay running
selector:
matchLabels:
app: api-server
# Alternative: maxUnavailable
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: worker-pdb
spec:
maxUnavailable: 1 # At most 1 pod can be down at a time
selector:
matchLabels:
app: background-worker
# Check PDB status before draining
kubectl get pdb -A
# NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
# api-pdb 2 N/A 1 30d
# If ALLOWED DISRUPTIONS is 0, the drain will block until a pod becomes available
A drain that violates a PDB will hang (not fail) until the budget allows eviction. This is by design — it prevents you from accidentally taking down too many replicas.
4. DaemonSet Implications¶
DaemonSets run one pod per node. During node maintenance:
--ignore-daemonsetsskips them during drain (they will be terminated when the node shuts down)- DaemonSet pods automatically re-create when the node comes back
- If a DaemonSet uses
hostPathvolumes, data persists across pod restarts on the same node
# Check which DaemonSets are running on the target node
kubectl get pods -A --field-selector spec.nodeName=worker-03 | grep -i daemon
# Common DaemonSets you will see
# - kube-proxy
# - calico-node / cilium
# - fluent-bit / fluentd (logging)
# - node-exporter (monitoring)
Gotcha: A PDB with
minAvailableequal to the total replica count (e.g.,minAvailable: 3on a 3-replica Deployment) will block every drain indefinitely — there is no headroom for eviction. This is one of the most common "drain is stuck" root causes. Always setminAvailableto at least one less than the replica count, or usemaxUnavailable: 1instead.War story: A team ran a rolling OS upgrade script across 20 nodes without checking PDB headroom first. The script cordoned and drained 3 nodes simultaneously, but a critical service had
maxUnavailable: 1. The second and third drains hung, the script stalled, and three nodes were cordoned but not maintained — reducing cluster capacity by 15% for hours until someone noticed.
5. Node Upgrades (kubelet + OS)¶
# On the node (after cordon + drain):
# Update kubelet and kubectl
apt-get update && apt-get install -y kubelet=1.29.0-1.1 kubectl=1.29.0-1.1
apt-mark hold kubelet kubectl
# Restart kubelet
systemctl daemon-reload
systemctl restart kubelet
# Verify kubelet is running
systemctl status kubelet
journalctl -u kubelet --no-pager -n 50
# Back on the control plane — uncordon
kubectl uncordon worker-03
kubectl get nodes
For OS kernel upgrades:
# On the node
apt-get update && apt-get upgrade -y
# If kernel was updated:
reboot
# After reboot — verify node rejoins cluster
kubectl get nodes -w
# Then uncordon
kubectl uncordon worker-03
6. etcd Member Removal (Control Plane Maintenance)¶
For control plane nodes running etcd:
# List etcd members
ETCDCTL_API=3 etcdctl member list \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# Remove a member (use the member ID from the list)
ETCDCTL_API=3 etcdctl member remove <MEMBER_ID> \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# Verify cluster health after removal
ETCDCTL_API=3 etcdctl endpoint health \
--endpoints=https://10.0.1.10:2379,https://10.0.1.11:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
Never remove more than one etcd member at a time. A 3-member cluster can tolerate one failure. Removing two members loses quorum.
7. Rolling Node Maintenance¶
For maintaining multiple nodes safely:
#!/usr/bin/env bash
set -euo pipefail
NODES=$(kubectl get nodes -l role=worker -o jsonpath='{.items[*].metadata.name}')
for node in ${NODES}; do
echo "=== Maintaining ${node} ==="
# Check PDB headroom before starting
kubectl get pdb -A -o jsonpath='{range .items[*]}{.metadata.name}: allowed={.status.disruptionsAllowed}{"\n"}{end}'
kubectl cordon "${node}"
kubectl drain "${node}" --ignore-daemonsets --delete-emptydir-data --timeout=300s
# Perform maintenance via SSH
ssh "${node}" 'apt-get update && apt-get upgrade -y && reboot'
# Wait for node to come back
echo "Waiting for ${node} to rejoin..."
until kubectl get node "${node}" | grep -q " Ready"; do
sleep 10
done
kubectl uncordon "${node}"
echo "=== ${node} complete ==="
# Wait for pods to reschedule before moving to next node
sleep 60
done
8. Troubleshooting Stuck Drains¶
# Find pods blocking the drain
kubectl get pods -A --field-selector spec.nodeName=worker-03
# Check for pods without controllers (standalone pods)
kubectl get pods -A --field-selector spec.nodeName=worker-03 -o json | \
jq '.items[] | select(.metadata.ownerReferences == null) | .metadata.name'
# Check PDB status — is the drain blocked by a budget?
kubectl get pdb -A
# Force-delete a stuck pod (last resort)
kubectl delete pod stuck-pod -n production --grace-period=0 --force
# Check for finalizers blocking pod deletion
kubectl get pod stuck-pod -n production -o jsonpath='{.metadata.finalizers}'
One-liner:
kubectl drain node --dry-run=client --ignore-daemonsets --delete-emptydir-data— always dry-run first. It lists exactly which pods will be evicted and which will block the drain, without actually evicting anything. This 5-second check prevents hours of stuck-drain debugging.
Key Takeaway¶
Node maintenance follows a predictable lifecycle: cordon, drain, maintain, uncordon. The complexity comes from PDBs, DaemonSets, standalone pods, and stateful workloads. Always dry-run drains first, check PDB headroom, and never rush through multiple nodes without verifying pod rescheduling between each one.
Wiki Navigation¶
Related Content¶
- Case Study: DaemonSet Blocks Eviction (Case Study, L2) — Node Lifecycle & Maintenance
- Kubernetes Node Lifecycle (Topic Pack, L2) — Node Lifecycle & Maintenance
- Kubernetes Node Lifecycle Flashcards (CLI) (flashcard_deck, L1) — Node Lifecycle & Maintenance
- Kubernetes Ops (Production) (Topic Pack, L2) — Node Lifecycle & Maintenance
- Runbook: Node NotReady (Runbook, L1) — Node Lifecycle & Maintenance
- Skillcheck: Kubernetes Under the Covers (Assessment, L2) — Node Lifecycle & Maintenance