Kubernetes Node Lifecycle & Cluster Upgrades¶

Advanced operational patterns for node maintenance, cluster upgrades, and recovery.

Node Lifecycle: The Full Cordon-Drain-Upgrade-Uncordon Pattern¶

Every node maintenance follows the same fundamental sequence. The devil is in the details.

Step 1: Pre-flight Checks¶

Before touching any node, understand what is running on it:

# What pods are on this node?
kubectl get pods -A --field-selector spec.nodeName=worker-03 -o wide

# Are any PDBs going to block the drain?
kubectl get pdb -A -o wide
# Look for ALLOWED DISRUPTIONS = 0. That will block you.

# Check for pods using local storage (emptyDir, hostPath)
kubectl get pods -A --field-selector spec.nodeName=worker-03 -o json | \
  jq -r '.items[] | select(.spec.volumes[]? | .emptyDir or .hostPath) | "\(.metadata.namespace)/\(.metadata.name)"'

# Check for bare pods (no controller) -- these will NOT be rescheduled
kubectl get pods -A --field-selector spec.nodeName=worker-03 -o json | \
  jq -r '.items[] | select(.metadata.ownerReferences == null) | "\(.metadata.namespace)/\(.metadata.name)"'

Step 2: Cordon¶

Cordoning marks the node as unschedulable. Existing pods keep running. New pods will not land here.

kubectl cordon worker-03
kubectl get nodes worker-03
# STATUS should show: Ready,SchedulingDisabled

Step 3: Drain¶

Draining evicts all pods (except DaemonSets) using the Eviction API, which respects PDBs:

kubectl drain worker-03 \
  --ignore-daemonsets \
  --delete-emptydir-data \
  --timeout=300s \
  --grace-period=60

If drain hangs, check what is blocking it:

# In another terminal:
kubectl get pods -A --field-selector spec.nodeName=worker-03
# Any pods still Running? Check their PDBs or finalizers.

kubectl get events -A --field-selector reason=EvictionFailed --sort-by='.lastTimestamp' | tail -10

Step 4: Perform Maintenance¶

Now the node is empty (except DaemonSets). Do your work: upgrade kubelet, patch OS, replace hardware.

Step 5: Uncordon¶

kubectl uncordon worker-03
kubectl get nodes worker-03
# STATUS should show: Ready (no SchedulingDisabled)

Pods will NOT automatically migrate back. The node will receive new pods as scheduling decisions are made. If you need to rebalance, use the descheduler or manually evict pods from overloaded nodes.

Cluster Upgrade Realities¶

The Version Skew Rule¶

Kubernetes enforces strict version skew policies: - kubelet can be up to 2 minor versions behind the API server (but never ahead) - kubectl can be 1 minor version ahead or behind the API server - Control plane components must all be at the same version

This means: always upgrade the control plane first, then workers.

Control Plane Upgrade (kubeadm example)¶

# 1. Back up etcd BEFORE anything else
ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-pre-upgrade-$(date +%Y%m%d).db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

# Verify the backup
ETCDCTL_API=3 etcdctl snapshot status /backup/etcd-pre-upgrade-*.db

# 2. Upgrade kubeadm
apt-get update && apt-get install -y kubeadm=1.29.0-00
kubeadm version  # confirm

# 3. Plan the upgrade (dry run)
kubeadm upgrade plan

# 4. Apply the upgrade
kubeadm upgrade apply v1.29.0

# 5. Upgrade kubelet and kubectl on the control plane node
apt-get install -y kubelet=1.29.0-00 kubectl=1.29.0-00
systemctl daemon-reload && systemctl restart kubelet

# 6. Verify
kubectl get nodes  # control plane should show new version

Worker Node Rolling Upgrade¶

Upgrade workers one at a time. Never drain multiple workers simultaneously unless you are certain your workloads can tolerate the capacity reduction.

for node in worker-01 worker-02 worker-03; do
  echo "=== Upgrading $node ==="
  kubectl drain $node --ignore-daemonsets --delete-emptydir-data --timeout=300s

  # SSH to the node and upgrade (or use Ansible)
  ssh $node "apt-get update && apt-get install -y kubeadm=1.29.0-00 kubelet=1.29.0-00"
  ssh $node "kubeadm upgrade node"
  ssh $node "systemctl daemon-reload && systemctl restart kubelet"

  kubectl uncordon $node
  kubectl get nodes $node  # verify Ready + correct version

  echo "Waiting 60s for stabilization..."
  sleep 60
done

PDB Interactions During Drain¶

PodDisruptionBudgets are the most common reason drains get stuck.

How the eviction API uses PDBs: 1. Drain calls the Eviction API for each pod 2. The API server checks if evicting this pod would violate any PDB 3. If it would, the eviction is rejected and drain retries (with backoff) 4. The drain blocks until the PDB allows the eviction or the timeout expires

The stuck PDB scenario:

# Deployment has 2 replicas, PDB says minAvailable: 2
# Both replicas are on the node you are draining
# Evicting either one violates the PDB -> drain hangs forever

# Diagnosis:
kubectl get pdb -A -o wide | grep "0 " | grep "ALLOWED"
# Shows PDBs with 0 allowed disruptions

# Fix options:
# 1. Scale up the deployment first so replicas exist elsewhere
kubectl scale deployment myapp --replicas=3 -n production
# Wait for the new pod to be Ready, then drain again

# 2. Temporarily patch the PDB (risky, document why)
kubectl patch pdb myapp-pdb -n production -p '{"spec":{"minAvailable":1}}'
# Drain, then restore the PDB

# 3. Use --disable-eviction (last resort, bypasses PDBs entirely)
kubectl drain worker-03 --ignore-daemonsets --disable-eviction

DaemonSet Behavior During Maintenance¶

DaemonSet pods are NOT evicted during drain (that is what --ignore-daemonsets acknowledges). They keep running on the cordoned node.

When the node comes back after upgrade: - DaemonSet pods that were running continue running - If the kubelet restarts, DaemonSet pods restart automatically - If a DaemonSet was updated while the node was down, the old pod is replaced with the new version

Gotcha: If you are upgrading a DaemonSet (like a CNI plugin or log shipper) as part of the cluster upgrade, the DaemonSet update strategy matters: - RollingUpdate: pods update automatically as the DaemonSet spec changes - OnDelete: pods only update when manually deleted -- you must delete the pod yourself

Rollback Strategies When Upgrade Fails¶

Node Fails to Come Back After Upgrade¶

# Node shows NotReady after kubelet restart
kubectl get nodes worker-03
kubectl describe node worker-03 | grep -A 10 Conditions

# Check kubelet logs on the node
ssh worker-03 "journalctl -u kubelet -l --no-pager | tail -50"

# Common causes:
# 1. Certificate issues -- kubelet can't auth to API server
# 2. CNI plugin incompatibility -- check /var/log/pods/ for CNI errors
# 3. Container runtime version mismatch

# Rollback kubelet to previous version
ssh worker-03 "apt-get install -y kubelet=1.28.5-00"
ssh worker-03 "systemctl daemon-reload && systemctl restart kubelet"

Control Plane Rollback via etcd Restore¶

This is the nuclear option. Only use it if the control plane is broken and you have a good etcd backup.

# Stop the API server and etcd
systemctl stop kubelet
mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/
mv /etc/kubernetes/manifests/etcd.yaml /tmp/

# Restore etcd from backup
ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-pre-upgrade.db \
  --data-dir=/var/lib/etcd-restored

# Replace etcd data directory
mv /var/lib/etcd /var/lib/etcd-broken
mv /var/lib/etcd-restored /var/lib/etcd

# Restore manifests and restart
mv /tmp/kube-apiserver.yaml /etc/kubernetes/manifests/
mv /tmp/etcd.yaml /etc/kubernetes/manifests/
systemctl start kubelet

# Downgrade kubeadm, kubelet, kubectl to previous versions
apt-get install -y kubeadm=1.28.5-00 kubelet=1.28.5-00 kubectl=1.28.5-00
systemctl daemon-reload && systemctl restart kubelet

Real-World Gotchas¶

Local storage evacuation: Pods using emptyDir lose their data on drain. Pods using hostPath leave data on the node. If your workload needs data to survive a drain, use persistent volumes.

Node NotReady after upgrade: The most common cause is a container runtime version incompatibility. Check that containerd/crio is compatible with the new kubelet version. Also check that the CNI plugin supports the new version.

Stuck finalizers blocking drain: Some operators add finalizers to pods. If the operator is broken, the finalizer never gets removed, and the pod never terminates.

# Find pods with finalizers
kubectl get pods -A --field-selector spec.nodeName=worker-03 -o json | \
  jq -r '.items[] | select(.metadata.finalizers != null) | "\(.metadata.namespace)/\(.metadata.name): \(.metadata.finalizers)"'

# Remove a stuck finalizer (understand what it does first)
kubectl patch pod stuck-pod -n production -p '{"metadata":{"finalizers":null}}' --type=merge

Version skew during upgrade window: While workers are at mixed versions, workloads using new API features will fail on old-version nodes. Schedule those workloads only on upgraded nodes using node selectors or taints until the upgrade is complete.

etcd backup verification: An untested backup is not a backup. Periodically restore your etcd snapshots to a test cluster to verify they work. The worst time to discover your backup is corrupt is during a failed upgrade.