Portal | Level: L2: Operations | Topics: Node Lifecycle & Maintenance | Domain: Kubernetes
Kubernetes Node Lifecycle - Primer¶
Why This Matters¶
Nodes are where your pods actually run. When a node fails, gets patched, or needs an OS upgrade, every pod on it is affected. The Kubernetes model treats nodes as cattle, but workloads expect continuity.
The gap between theory and "this drain has been stuck for 45 minutes" is where this topic lives. Most node incidents come from three areas: nodes going NotReady, drains stuck on PodDisruptionBudgets, and DaemonSets blocking eviction.
Under the hood: The kubelet sends heartbeats to the API server via
NodeStatusupdates (default every 10 seconds) and Lease objects (default every 10 seconds). The node controller uses anode-monitor-grace-period(default 40 seconds) — if no heartbeat arrives in that window, the node is marked NotReady. Afterpod-eviction-timeout(default 5 minutes), pods are evicted. These timers directly affect your failover speed.
Core Concepts¶
1. Node States and Conditions¶
The kubelet reports conditions via heartbeat:
| Condition | Meaning |
|---|---|
| Ready | Kubelet healthy, accepts pods |
| NotReady | Kubelet unhealthy or unreachable |
| SchedulingDisabled | Cordoned, no new pods |
| MemoryPressure | Node low on memory |
| DiskPressure | Node low on disk |
When a node goes NotReady, the node controller waits
pod-eviction-timeout (default 5m) before evicting
pods. During this window, pods may still be running
but unreachable -- split brain risk.
2. Kubelet Registration¶
Debug clue: If a new node never appears in
kubectl get nodes, check the kubelet logs first:journalctl -u kubelet -f. The three most common registration failures are: (1) the kubelet cannot reach the API server (firewall, wrong API endpoint in kubelet config), (2) TLS certificate issues (bootstrap token expired, clock skew causing cert validation failure), and (3) hostname collision (two nodes registering with the same name — the second one fails silently).
On startup, the kubelet registers with the API server (name, resources, labels, taints). If it fails, the node never appears. Common causes: cannot reach API server, certificate issues, hostname collision.
3. Taints and Tolerations¶
Analogy: Think of taints as a "No Trespassing" sign on a node, and tolerations as a permission slip that lets specific pods ignore the sign. The effect (
NoSchedule,PreferNoSchedule,NoExecute) determines how aggressively the sign is enforced — from "please avoid" to "get out now."
Taints on nodes repel pods. Tolerations on pods let them schedule on tainted nodes.
kubectl taint nodes node1 maintenance=true:NoSchedule
kubectl taint nodes node1 maintenance=true:NoSchedule-
| Effect | New Pods | Existing Pods |
|---|---|---|
| NoSchedule | Blocked | Unaffected |
| PreferNoSchedule | Avoided | Unaffected |
| NoExecute | Blocked | Evicted |
4. Cordoning and Draining¶
Cordoning stops new pods. Draining evicts existing pods.
Remember: The drain workflow mnemonic: "CDC" — Cordon (stop new pods), Drain (evict existing pods), unCordon (allow pods again). Always cordon before drain. Drain without cordon still works, but new pods might land on the node while you are draining it.
kubectl cordon node1
kubectl drain node1 \
--ignore-daemonsets \
--delete-emptydir-data \
--timeout=300s
kubectl uncordon node1
| Flag | Purpose |
|---|---|
--ignore-daemonsets |
Skip DaemonSet pods |
--delete-emptydir-data |
Allow emptyDir deletion |
--force |
Delete unmanaged pods |
--timeout |
Abort if drain takes too long |
5. PodDisruptionBudgets (PDBs)¶
PDBs declare how many pods must remain available during voluntary disruptions (drains).
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: myapp-pdb
spec:
maxUnavailable: 1
selector:
matchLabels:
app: myapp
PDBs are the #1 cause of stuck drains. If you
have 3 replicas and minAvailable: 3, no pod can
ever be evicted. The drain hangs forever.
Gotcha: A PDB with
minAvailable: 100%ormaxUnavailable: 0is a foot-gun that blocks all voluntary disruptions, including node upgrades, autoscaler scale-downs, andkubectl drain. Always audit PDBs before starting maintenance:kubectl get pdb -A -o wideand check the "Allowed Disruptions" column. Zero means drain will hang.
6. DaemonSets During Drain¶
DaemonSets run one pod per node. Drain skips them
with --ignore-daemonsets because they would just be
recreated on the same node.
7. Node Upgrade Workflow¶
kubectl cordon node1
kubectl drain node1 \
--ignore-daemonsets --delete-emptydir-data
# SSH to node, upgrade kubelet/kubectl
systemctl daemon-reload && systemctl restart kubelet
# Wait for Ready
kubectl uncordon node1
In managed Kubernetes (EKS/GKE/AKS), upgrades often mean replacing the node entirely: drain, terminate, let autoscaler provision a new instance.
8. Node Auto-Repair¶
War story: A common production surprise: GKE auto-repair replaces a NotReady node by terminating the VM and creating a new one. If the node had local SSDs with ephemeral data (e.g., a caching tier), that data is gone. Auto-repair is a feature, not a backup strategy. Any workload on auto-repaired nodes must tolerate complete node replacement.
Cloud providers detect NotReady nodes and recreate them (GKE automatic, EKS via ASG health checks, AKS automatic). On bare metal, monitor NotReady duration and alert. Node Problem Detector surfaces hardware and kernel issues as node conditions.
What Experienced People Know¶
- Always set
--timeouton drain commands. A stuck drain with no timeout hangs automation forever. - PDBs with
minAvailableequal to replica count are a time bomb. UsemaxUnavailable: 1instead. - Check PDBs before starting maintenance, not after.
- Pods with long
terminationGracePeriodSecondshold drain for that duration per pod. - Local storage makes drain refuse unless you pass
--delete-emptydir-dataor--force. - In autoscaling clusters, cordoned nodes still count toward capacity. The autoscaler will not provision replacements until pods are unschedulable.
- Force-deleting stuck pods should be a last resort. It can cause split brain if the pod still runs.
- Test your drain procedure in staging with realistic PDBs, pod counts, and grace periods.
Wiki Navigation¶
Prerequisites¶
- Kubernetes Ops (Production) (Topic Pack, L2)
Related Content¶
- Case Study: DaemonSet Blocks Eviction (Case Study, L2) — Node Lifecycle & Maintenance
- Kubernetes Node Lifecycle Flashcards (CLI) (flashcard_deck, L1) — Node Lifecycle & Maintenance
- Kubernetes Ops (Production) (Topic Pack, L2) — Node Lifecycle & Maintenance
- Node Maintenance (Topic Pack, L1) — Node Lifecycle & Maintenance
- Runbook: Node NotReady (Runbook, L1) — Node Lifecycle & Maintenance
- Skillcheck: Kubernetes Under the Covers (Assessment, L2) — Node Lifecycle & Maintenance
Pages that link here¶
- Anti-Primer: Kubernetes Node Lifecycle
- Certification Prep: CKA — Certified Kubernetes Administrator
- Certification Prep: CKS — Certified Kubernetes Security Specialist
- Incident Replay: DaemonSet Blocks Node Eviction
- Incident Replay: Node Drain Blocked by PDB
- Incident Replay: Node Pressure Evictions
- Kubernetes Node Lifecycle
- Kubernetes Under the Covers
- Master Curriculum: 40 Weeks
- Node Maintenance
- Production Readiness Review: Answer Key
- Production Readiness Review: Study Plans
- Runbook: Node NotReady
- Symptoms
- Thinking Out Loud: Kubernetes Node Lifecycle