- k8s
- l2
- runbook
- hpa
- k8s-core --- Portal | Level: L2: Operations | Topics: HPA / Autoscaling, Kubernetes Core | Domain: Kubernetes
Runbook: HPA Thrashing (Rapid Scale Up/Down)¶
| Field | Value |
|---|---|
| Domain | Kubernetes |
| Alert | kube_horizontalpodautoscaler_status_current_replicas changing rapidly, or HPA events firing frequently |
| Severity | P3 |
| Est. Resolution Time | 20-40 minutes |
| Escalation Timeout | 45 minutes — page if not resolved |
| Last Tested | 2026-03-19 |
| Prerequisites | kubectl access, cluster-admin or namespace-admin, kubeconfig configured |
Quick Assessment (30 seconds)¶
If output shows: TARGETS metric is constantly fluctuating (refresh a few times) → HPA is actively thrashing If output shows: TARGETS metric is stable but replica count changed recently → This may be a one-time scale event, not thrashing — monitor for 5 minutes before proceedingStep 1: Observe HPA Status and Event History¶
Why: The HPA status and events show the exact scaling decisions being made and why. This is the ground truth for diagnosing thrashing.
# Get current HPA status
kubectl get hpa <HPA_NAME> -n <NAMESPACE>
kubectl describe hpa <HPA_NAME> -n <NAMESPACE>
# Check HPA events specifically
kubectl get events -n <NAMESPACE> --field-selector involvedObject.name=<HPA_NAME> --sort-by='.lastTimestamp'
Name: my-app-hpa
Namespace: production
Reference: Deployment/my-app
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 85% (850m) / 60%
Min replicas: 2
Max replicas: 20
Deployment pods: 8 available / 8 desired
Events:
Normal SuccessfulRescale 2m horizontal-pod-autoscaler New size: 10; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 4m horizontal-pod-autoscaler New size: 7; reason: All metrics below target
Normal SuccessfulRescale 6m horizontal-pod-autoscaler New size: 10; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 8m horizontal-pod-autoscaler New size: 7; reason: All metrics below target
Step 2: Check Metrics Server Data¶
Why: HPA makes decisions based on metrics from the metrics server. If metrics are noisy, delayed, or stale, the HPA will make poor decisions. Confirm the metrics are real and current.
# Check if metrics-server is running
kubectl get pods -n kube-system | grep metrics-server
# Check current CPU/memory metrics directly
kubectl top pods -n <NAMESPACE> -l app=<APP_LABEL>
# Get the raw metrics the HPA is seeing
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/<NAMESPACE>/pods" | python3 -m json.tool | grep -A 5 '"name"'
error: Metrics API not available: The metrics-server is down — restart it with kubectl rollout restart deployment/metrics-server -n kube-system. HPA cannot function without it.
If metrics fluctuate wildly on repeated kubectl top calls: The application's CPU usage is genuinely bursty — the target utilization needs adjustment (Step 6) or the workload needs to be smoothed upstream.
Step 3: Check Scale Stabilization Window¶
Why: The stabilizationWindowSeconds setting in the HPA spec controls how long the HPA waits after a scale event before scaling again. If this is 0 or missing, the HPA will scale immediately in both directions, causing thrashing.
# Check the HPA spec for stabilization settings
kubectl get hpa <HPA_NAME> -n <NAMESPACE> -o yaml | grep -A 20 "behavior"
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 20
targetCPUUtilizationPercentage: 60
# No behavior block — uses defaults
Step 4: Check Target Utilization vs Actual Load¶
Why: If the target CPU utilization percentage is set too low (e.g., 30%), even light load will keep triggering scale-ups, and the moment load drops even slightly, scale-downs begin — creating a perpetual oscillation.
# Check what CPU utilization % is currently observed
kubectl describe hpa <HPA_NAME> -n <NAMESPACE> | grep -E "cpu|utilization|target"
# Cross-reference with actual pod CPU requests
kubectl get deployment <DEPLOYMENT_NAME> -n <NAMESPACE> -o jsonpath='{.spec.template.spec.containers[*].resources.requests}'
# Check CPU utilization over time (if Prometheus is available)
# Query: rate(container_cpu_usage_seconds_total{namespace="<NAMESPACE>",pod=~"<APP_LABEL>.*"}[5m])
Step 5: Adjust stabilizationWindowSeconds¶
Why: The stabilization window prevents the HPA from reacting to short-lived spikes. Adding a meaningful window (e.g., 180 seconds for scale-down, 60 seconds for scale-up) dramatically reduces thrashing without sacrificing responsiveness.
Add or modify thebehavior block in the HPA spec:
spec:
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min after scale-up before scaling down
policies:
- type: Percent
value: 25 # Scale down no more than 25% at a time
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 60 # Wait 1 min before scaling up again
policies:
- type: Percent
value: 100 # Allow doubling replicas per minute
periodSeconds: 60
kubectl edit is not available:
kubectl patch hpa <HPA_NAME> -n <NAMESPACE> --type=merge -p '{
"spec": {
"behavior": {
"scaleDown": {"stabilizationWindowSeconds": 300},
"scaleUp": {"stabilizationWindowSeconds": 60}
}
}
}'
Step 6: Tune targetCPUUtilizationPercentage¶
Why: A target utilization that is too close to steady-state CPU usage means the HPA will constantly be triggering — the pod is always near the threshold. A good target for most workloads is 50-70% of CPU, leaving headroom for bursts without constant scaling.
# For HPA using the older autoscaling/v1 API (targetCPUUtilizationPercentage field):
kubectl patch hpa <HPA_NAME> -n <NAMESPACE> \
-p '{"spec":{"targetCPUUtilizationPercentage":<NEW_TARGET_PERCENT>}}'
# For HPA using autoscaling/v2 (metrics block):
kubectl edit hpa <HPA_NAME> -n <NAMESPACE>
# Change the averageUtilization value in spec.metrics[].resource.target
Expected outcome: After adjusting the target, the HPA stabilizes at a replica count that keeps actual utilization comfortably below the new target. If this fails: The application may have fundamentally bursty load that no HPA tuning can smooth — consider using KEDA for event-driven autoscaling, or pre-warming pods on a cron schedule.
Verification¶
# Watch HPA status for 5 minutes to confirm it has stabilized
watch -n 30 kubectl get hpa <HPA_NAME> -n <NAMESPACE>
# Confirm no new rapid scaling events
kubectl get events -n <NAMESPACE> --field-selector involvedObject.name=<HPA_NAME> --sort-by='.lastTimestamp'
TARGETS shows a stable percentage well below the threshold.
If still broken: Escalate — see below.
Escalation¶
| Condition | Who to Page | What to Say |
|---|---|---|
| Not resolved in 45 min | SRE on-call | "Kubernetes HPA thrashing in |
| Data loss suspected | Platform Lead | "Data loss risk: stateful workload |
| Scope expanding beyond namespace | Platform team | "Multi-namespace impact: metrics-server instability causing HPA decisions across multiple namespaces" |
Post-Incident¶
- Update monitoring if alert was noisy or missing
- File postmortem if P1/P2
- Update this runbook if steps were wrong or incomplete
- Commit the updated HPA spec (with stabilization windows) to git — do not leave it only patched live
- Review whether the application's CPU requests are set accurately (HPA uses requests as the baseline)
- Add a Grafana dashboard panel showing HPA replica count over time to make thrashing visible before it becomes an incident
Common Mistakes¶
- Setting targetCPUUtilizationPercentage too low: A value like 20-30% means the HPA considers the pod "overloaded" at very light CPU usage. Every small traffic variation will trigger a scale-up, and the subsequent scale-down creates the thrashing pattern. The target should reflect the CPU utilization at which you actually want more pods — for most services, 50-70% is appropriate.
- Missing stabilizationWindowSeconds configuration: The HPA behavior block with explicit stabilization windows is not configured by default for scale-up (default is 0 seconds). Engineers frequently deploy HPAs without a
behaviorblock and are then confused when scale-up thrashing occurs. Always configure stabilization windows explicitly, especially for scale-up, so the HPA does not react to second-to-second CPU spikes. - Misconfigured CPU requests making HPA math wrong: HPA calculates utilization as
actual_cpu / requested_cpu * 100. Ifresources.requests.cpuis set very low (e.g., 10m) but the pod actually uses 200m, the HPA sees 2000% utilization and reacts aggressively. Always ensure CPU requests reflect the actual expected baseline CPU usage of the pod.
Cross-References¶
- Survival Guide: On-Call Survival Guide (pocket card version)
- Topic Pack: Kubernetes Topics (deep background)
- Related Runbook: oom-kill.md — if memory-based HPA is thrashing due to OOMKills during scale-up
- Related Runbook: deploy-stuck.md — if rapid scaling interferes with a rolling deployment
- Related Runbook: pod-crashloop.md — if newly scaled pods are crashing immediately
Wiki Navigation¶
Related Content¶
- Kubernetes Exercises (Quest Ladder) (CLI) (Exercise Set, L1) — HPA / Autoscaling, Kubernetes Core
- Lab: HPA Live Scaling (CLI) (Lab, L1) — HPA / Autoscaling, Kubernetes Core
- Skillcheck: Kubernetes (Assessment, L1) — HPA / Autoscaling, Kubernetes Core
- Adversarial Interview Gauntlet (30 sequences) (Scenario, L2) — Kubernetes Core
- Case Study: Alert Storm — Flapping Health Checks (Case Study, L2) — Kubernetes Core
- Case Study: Canary Deploy Routing to Wrong Backend — Ingress Misconfigured (Case Study, L2) — Kubernetes Core
- Case Study: CrashLoopBackOff No Logs (Case Study, L1) — Kubernetes Core
- Case Study: DNS Looks Broken — TLS Expired, Fix Is Cert-Manager (Case Study, L2) — Kubernetes Core
- Case Study: DaemonSet Blocks Eviction (Case Study, L2) — Kubernetes Core
- Case Study: Deployment Stuck — ImagePull Auth Failure, Vault Secret Rotation (Case Study, L2) — Kubernetes Core