Portal | Level: L2: Operations | Topics: Node Lifecycle & Maintenance | Domain: Kubernetes

Kubernetes Node Lifecycle - Primer¶

Why This Matters¶

Nodes are where your pods actually run. When a node fails, gets patched, or needs an OS upgrade, every pod on it is affected. The Kubernetes model treats nodes as cattle, but workloads expect continuity.

The gap between theory and "this drain has been stuck for 45 minutes" is where this topic lives. Most node incidents come from three areas: nodes going NotReady, drains stuck on PodDisruptionBudgets, and DaemonSets blocking eviction.

Under the hood: The kubelet sends heartbeats to the API server via NodeStatus updates (default every 10 seconds) and Lease objects (default every 10 seconds). The node controller uses a node-monitor-grace-period (default 40 seconds) — if no heartbeat arrives in that window, the node is marked NotReady. After pod-eviction-timeout (default 5 minutes), pods are evicted. These timers directly affect your failover speed.

Core Concepts¶

1. Node States and Conditions¶

The kubelet reports conditions via heartbeat:

Condition	Meaning
Ready	Kubelet healthy, accepts pods
NotReady	Kubelet unhealthy or unreachable
SchedulingDisabled	Cordoned, no new pods
MemoryPressure	Node low on memory
DiskPressure	Node low on disk

kubectl get nodes
kubectl describe node <name> | grep -A5 Conditions

When a node goes NotReady, the node controller waits pod-eviction-timeout (default 5m) before evicting pods. During this window, pods may still be running but unreachable -- split brain risk.

2. Kubelet Registration¶

Debug clue: If a new node never appears in kubectl get nodes, check the kubelet logs first: journalctl -u kubelet -f. The three most common registration failures are: (1) the kubelet cannot reach the API server (firewall, wrong API endpoint in kubelet config), (2) TLS certificate issues (bootstrap token expired, clock skew causing cert validation failure), and (3) hostname collision (two nodes registering with the same name — the second one fails silently).

On startup, the kubelet registers with the API server (name, resources, labels, taints). If it fails, the node never appears. Common causes: cannot reach API server, certificate issues, hostname collision.

systemctl status kubelet
journalctl -u kubelet -f

3. Taints and Tolerations¶

Analogy: Think of taints as a "No Trespassing" sign on a node, and tolerations as a permission slip that lets specific pods ignore the sign. The effect (NoSchedule, PreferNoSchedule, NoExecute) determines how aggressively the sign is enforced — from "please avoid" to "get out now."

Taints on nodes repel pods. Tolerations on pods let them schedule on tainted nodes.

kubectl taint nodes node1 maintenance=true:NoSchedule
kubectl taint nodes node1 maintenance=true:NoSchedule-

Effect	New Pods	Existing Pods
NoSchedule	Blocked	Unaffected
PreferNoSchedule	Avoided	Unaffected
NoExecute	Blocked	Evicted

4. Cordoning and Draining¶

Cordoning stops new pods. Draining evicts existing pods.

Remember: The drain workflow mnemonic: "CDC" — Cordon (stop new pods), Drain (evict existing pods), unCordon (allow pods again). Always cordon before drain. Drain without cordon still works, but new pods might land on the node while you are draining it.

kubectl cordon node1
kubectl drain node1 \
  --ignore-daemonsets \
  --delete-emptydir-data \
  --timeout=300s
kubectl uncordon node1

Flag	Purpose
`--ignore-daemonsets`	Skip DaemonSet pods
`--delete-emptydir-data`	Allow emptyDir deletion
`--force`	Delete unmanaged pods
`--timeout`	Abort if drain takes too long

5. PodDisruptionBudgets (PDBs)¶

PDBs declare how many pods must remain available during voluntary disruptions (drains).

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: myapp-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: myapp

PDBs are the #1 cause of stuck drains. If you have 3 replicas and minAvailable: 3, no pod can ever be evicted. The drain hangs forever.

Gotcha: A PDB with minAvailable: 100% or maxUnavailable: 0 is a foot-gun that blocks all voluntary disruptions, including node upgrades, autoscaler scale-downs, and kubectl drain. Always audit PDBs before starting maintenance: kubectl get pdb -A -o wide and check the "Allowed Disruptions" column. Zero means drain will hang.

kubectl get pdb -A
kubectl describe pdb <name>
# "Allowed Disruptions: 0" means drain will block

6. DaemonSets During Drain¶

DaemonSets run one pod per node. Drain skips them with --ignore-daemonsets because they would just be recreated on the same node.

7. Node Upgrade Workflow¶

kubectl cordon node1
kubectl drain node1 \
  --ignore-daemonsets --delete-emptydir-data
# SSH to node, upgrade kubelet/kubectl
systemctl daemon-reload && systemctl restart kubelet
# Wait for Ready
kubectl uncordon node1

In managed Kubernetes (EKS/GKE/AKS), upgrades often mean replacing the node entirely: drain, terminate, let autoscaler provision a new instance.

8. Node Auto-Repair¶

War story: A common production surprise: GKE auto-repair replaces a NotReady node by terminating the VM and creating a new one. If the node had local SSDs with ephemeral data (e.g., a caching tier), that data is gone. Auto-repair is a feature, not a backup strategy. Any workload on auto-repaired nodes must tolerate complete node replacement.

Cloud providers detect NotReady nodes and recreate them (GKE automatic, EKS via ASG health checks, AKS automatic). On bare metal, monitor NotReady duration and alert. Node Problem Detector surfaces hardware and kernel issues as node conditions.

What Experienced People Know¶

Always set --timeout on drain commands. A stuck drain with no timeout hangs automation forever.
PDBs with minAvailable equal to replica count are a time bomb. Use maxUnavailable: 1 instead.
Check PDBs before starting maintenance, not after.
Pods with long terminationGracePeriodSeconds hold drain for that duration per pod.
Local storage makes drain refuse unless you pass --delete-emptydir-data or --force.
In autoscaling clusters, cordoned nodes still count toward capacity. The autoscaler will not provision replacements until pods are unschedulable.
Force-deleting stuck pods should be a last resort. It can cause split brain if the pod still runs.
Test your drain procedure in staging with realistic PDBs, pod counts, and grace periods.

Prerequisites¶

Kubernetes Ops (Production) (Topic Pack, L2)

Case Study: DaemonSet Blocks Eviction (Case Study, L2) — Node Lifecycle & Maintenance
Kubernetes Node Lifecycle Flashcards (CLI) (flashcard_deck, L1) — Node Lifecycle & Maintenance
Kubernetes Ops (Production) (Topic Pack, L2) — Node Lifecycle & Maintenance
Node Maintenance (Topic Pack, L1) — Node Lifecycle & Maintenance
Runbook: Node NotReady (Runbook, L1) — Node Lifecycle & Maintenance
Skillcheck: Kubernetes Under the Covers (Assessment, L2) — Node Lifecycle & Maintenance

Kubernetes Node Lifecycle - Primer¶

Why This Matters¶

Core Concepts¶

1. Node States and Conditions¶

2. Kubelet Registration¶

3. Taints and Tolerations¶

4. Cordoning and Draining¶

5. PodDisruptionBudgets (PDBs)¶

6. DaemonSets During Drain¶

7. Node Upgrade Workflow¶

8. Node Auto-Repair¶

What Experienced People Know¶

Wiki Navigation¶

Prerequisites¶

Pages that link here¶

Kubernetes Node Lifecycle - Primer¶

Why This Matters¶

Core Concepts¶

1. Node States and Conditions¶

2. Kubelet Registration¶

3. Taints and Tolerations¶

4. Cordoning and Draining¶

5. PodDisruptionBudgets (PDBs)¶

6. DaemonSets During Drain¶

7. Node Upgrade Workflow¶

8. Node Auto-Repair¶

What Experienced People Know¶

Wiki Navigation¶

Prerequisites¶

Related Content¶

Pages that link here¶