Solution: Lab Runtime 02 -- HPA Live Scaling¶
SPOILER WARNING: Try to solve it yourself first. Use hints progressively.
Hint Ladder¶
Hint 1: This lab is about understanding HPA behavior, not fixing a bug. Watch what happens to pod count under load.
Hint 2: Check kubectl get hpa -n grokdevops. Does the TARGETS column show actual CPU values or <unknown>? If unknown, metrics-server may be missing.
Hint 3: HPA needs two things to work: (1) metrics-server installed and (2) CPU resource requests defined on the deployment. Check both.
Hint 4: Stop the load generator with ./fix.sh and wait ~5 minutes for HPA to scale back down (cooldown period).
Minimal Solution¶
# If HPA shows <unknown>, install metrics-server:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# For k3s:
kubectl patch deployment metrics-server -n kube-system --type=json \
-p='[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-insecure-tls"}]'
# Wait for metrics to propagate (~60s), then check:
kubectl get hpa -n grokdevops
kubectl top pods -n grokdevops
Explain¶
Symptom: HPA shows <unknown>/50% for CPU target, or pod count stays at minReplicas despite high load.
Evidence: kubectl describe hpa shows "unable to get metrics" or "missing request for cpu".
Root cause: HPA calculates desired replicas as: ceil(currentReplicas * (currentMetric / desiredMetric)). For CPU percentage, currentMetric = actual_cpu / requested_cpu * 100. Without CPU requests, there's no denominator, so the percentage is undefined. Without metrics-server, there's no actual_cpu value at all.
Key insight: HPA percentage-based scaling is relative to resource requests, not to node capacity. A pod requesting 100m CPU using 200m CPU is at 200%, not at 2% of node capacity.
Prevent¶
- Always define resource requests when using HPA
- Include metrics-server in cluster bootstrap (Ansible/Terraform)
- Monitor HPA status in dashboards (alert on
<unknown>targets)