Portal | Level: L2: Operations | Topics: HPA / Autoscaling | Domain: Kubernetes
Runbook: HPA Not Scaling¶
Symptoms¶
- HPA shows
<unknown>/50%for CPU - Pod count stays at minReplicas despite high load
kubectl describe hpashows "unable to get metrics"
Fast Triage¶
kubectl get hpa -n grokdevops
kubectl describe hpa grokdevops -n grokdevops
kubectl top pods -n grokdevops
kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/grokdevops/pods
Likely Causes (ranked)¶
- metrics-server not installed — HPA needs metrics API
- No resource requests defined — HPA percentage-based scaling requires CPU requests
- metrics-server not ready — recently installed, needs ~60s
- Wrong HPA target — HPA points to wrong deployment name
Evidence Interpretation¶
What bad looks like:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
grokdevops Deployment/grokdevops <unknown>/50% 1 5 1 10m
<unknown> in TARGETS means the HPA cannot read metrics — usually metrics-server is missing or the Deployment has no CPU requests defined.
- kubectl describe hpa will show a Conditions section with messages like "unable to get metrics for resource cpu" or "missing request for cpu".
- Once metrics flow, the percentage shown is (actual CPU usage / CPU request) * 100 — it is relative to the request, not to node capacity.
Fix Steps¶
- Install metrics-server if missing:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml # For k3s with self-signed certs: kubectl patch deployment metrics-server -n kube-system --type=json \ -p='[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-insecure-tls"}]' - Ensure deployment has CPU requests:
- Wait 60s for metrics to propagate, then check again:
Verification¶
kubectl get hpa -n grokdevops # TARGETS should show actual%/target%
kubectl top pods -n grokdevops # should return CPU/memory values
Cleanup¶
None needed.
Unknown Unknowns¶
- HPA percentage =
(actual CPU / requested CPU) * 100. If you request 100m and use 80m, HPA sees 80% — not a percentage of node CPU. - Metrics-server scrapes kubelets every 15 seconds; there is always a delay before HPA sees current load.
- Scale-down cooldown is 5 minutes by default (
--horizontal-pod-autoscaler-downscale-stabilization); don't assume scaling down is broken if it feels slow. - Custom metrics (e.g., request rate) require a metrics adapter like Prometheus Adapter — the built-in HPA only handles CPU and memory natively.
[!WARNING] HPA requires metrics-server, but a missing or broken metrics-server produces no error pods and no obvious failure — the HPA simply shows
<unknown>in TARGETS and silently does nothing. Always verify metrics-server is running and healthy before debugging HPA logic.
Pitfalls¶
- Forgetting to install metrics-server — HPA silently fails with
<unknown>instead of an error pod. - Not defining resource requests — without a CPU request, HPA has no denominator for percentage calculation and cannot function.
- Expecting instant scaling — HPA evaluates every 15s, metrics lag by 15s, and pods take time to start. Budget at least 60s before seeing new replicas.
See Also¶
training/library/guides/observability.md(HPA section)training/interactive/runtime-labs/lab-runtime-02-hpa-live-scaling/training/interview-scenarios/02-hpa-not-scaling.mdtraining/interactive/incidents/scenarios/hpa-not-scaling.sh
Wiki Navigation¶
Related Content¶
- Incident Simulator (18 scenarios) (CLI) (Exercise Set, L2) — HPA / Autoscaling
- Interview: HPA Not Scaling (Scenario, L2) — HPA / Autoscaling
- Kubernetes Exercises (Quest Ladder) (CLI) (Exercise Set, L1) — HPA / Autoscaling
- Kubernetes Ops (Production) (Topic Pack, L2) — HPA / Autoscaling
- Lab: HPA Live Scaling (CLI) (Lab, L1) — HPA / Autoscaling
- Runbook: HPA Thrashing (Rapid Scale Up/Down) (Runbook, L2) — HPA / Autoscaling
- Skillcheck: Kubernetes (Assessment, L1) — HPA / Autoscaling
Pages that link here¶
- Decision Tree: Deployment Is Stuck
- Decision Tree: Latency Has Increased
- Decision Tree: Service Returning 5xx Errors
- Kubernetes - Skill Check
- Kubernetes Ops Domain
- Level 3: Production Kubernetes
- Operational Runbooks
- Runbook: HPA Thrashing (Rapid Scale Up/Down)
- Scenario: HPA Not Scaling Under Load
- Solution: Lab Runtime 02 -- HPA Live Scaling