k8s
l2
runbook
hpa
k8s-core --- Portal | Level: L2: Operations | Topics: HPA / Autoscaling, Kubernetes Core | Domain: Kubernetes

Runbook: HPA Thrashing (Rapid Scale Up/Down)¶

Field	Value
Domain	Kubernetes
Alert	`kube_horizontalpodautoscaler_status_current_replicas` changing rapidly, or HPA events firing frequently
Severity	P3
Est. Resolution Time	20-40 minutes
Escalation Timeout	45 minutes — page if not resolved
Last Tested	2026-03-19
Prerequisites	kubectl access, cluster-admin or namespace-admin, kubeconfig configured

Quick Assessment (30 seconds)¶

# Run this first — it tells you the scope of the problem
kubectl get hpa -n <NAMESPACE>

If output shows: TARGETS metric is constantly fluctuating (refresh a few times) → HPA is actively thrashing If output shows: TARGETS metric is stable but replica count changed recently → This may be a one-time scale event, not thrashing — monitor for 5 minutes before proceeding

Step 1: Observe HPA Status and Event History¶

Why: The HPA status and events show the exact scaling decisions being made and why. This is the ground truth for diagnosing thrashing.

# Get current HPA status
kubectl get hpa <HPA_NAME> -n <NAMESPACE>
kubectl describe hpa <HPA_NAME> -n <NAMESPACE>

# Check HPA events specifically
kubectl get events -n <NAMESPACE> --field-selector involvedObject.name=<HPA_NAME> --sort-by='.lastTimestamp'

Expected output (thrashing HPA):

Name:                                                  my-app-hpa
Namespace:                                             production
Reference:                                             Deployment/my-app
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  85% (850m) / 60%
Min replicas:                                          2
Max replicas:                                          20
Deployment pods:                                       8 available / 8 desired

Events:
  Normal  SuccessfulRescale  2m   horizontal-pod-autoscaler  New size: 10; reason: cpu resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  4m   horizontal-pod-autoscaler  New size: 7; reason: All metrics below target
  Normal  SuccessfulRescale  6m   horizontal-pod-autoscaler  New size: 10; reason: cpu resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  8m   horizontal-pod-autoscaler  New size: 7; reason: All metrics below target

If events show alternating scale-up and scale-down every 2-5 minutes: This is thrashing — continue with Steps 2-6. If events are infrequent (every 10+ minutes): This may not be thrashing — monitor and see if it stabilizes.

Step 2: Check Metrics Server Data¶

Why: HPA makes decisions based on metrics from the metrics server. If metrics are noisy, delayed, or stale, the HPA will make poor decisions. Confirm the metrics are real and current.

# Check if metrics-server is running
kubectl get pods -n kube-system | grep metrics-server

# Check current CPU/memory metrics directly
kubectl top pods -n <NAMESPACE> -l app=<APP_LABEL>

# Get the raw metrics the HPA is seeing
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/<NAMESPACE>/pods" | python3 -m json.tool | grep -A 5 '"name"'

Expected output (metrics-server healthy):

NAME                    CPU(cores)   MEMORY(bytes)
my-app-5c4a2b1d9-abc    850m         245Mi
my-app-5c4a2b1d9-def    820m         238Mi

If kubectl top shows error: Metrics API not available: The metrics-server is down — restart it with kubectl rollout restart deployment/metrics-server -n kube-system. HPA cannot function without it. If metrics fluctuate wildly on repeated kubectl top calls: The application's CPU usage is genuinely bursty — the target utilization needs adjustment (Step 6) or the workload needs to be smoothed upstream.

Step 3: Check Scale Stabilization Window¶

Why: The stabilizationWindowSeconds setting in the HPA spec controls how long the HPA waits after a scale event before scaling again. If this is 0 or missing, the HPA will scale immediately in both directions, causing thrashing.

# Check the HPA spec for stabilization settings
kubectl get hpa <HPA_NAME> -n <NAMESPACE> -o yaml | grep -A 20 "behavior"

Expected output (missing stabilization — thrash-prone):

spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 20
  targetCPUUtilizationPercentage: 60
  # No behavior block — uses defaults

Default behavior: Scale-up stabilization: 0 seconds (immediate). Scale-down stabilization: 300 seconds (5 min). If scale-up thrashing is occurring despite the 5-min scale-down window, the workload is genuinely oscillating — continue to Steps 4-6. If the behavior block exists but stabilizationWindowSeconds is 0 or very low: This is the direct cause — fix it in Step 5.

Step 4: Check Target Utilization vs Actual Load¶

Why: If the target CPU utilization percentage is set too low (e.g., 30%), even light load will keep triggering scale-ups, and the moment load drops even slightly, scale-downs begin — creating a perpetual oscillation.

# Check what CPU utilization % is currently observed
kubectl describe hpa <HPA_NAME> -n <NAMESPACE> | grep -E "cpu|utilization|target"

# Cross-reference with actual pod CPU requests
kubectl get deployment <DEPLOYMENT_NAME> -n <NAMESPACE> -o jsonpath='{.spec.template.spec.containers[*].resources.requests}'

# Check CPU utilization over time (if Prometheus is available)
# Query: rate(container_cpu_usage_seconds_total{namespace="<NAMESPACE>",pod=~"<APP_LABEL>.*"}[5m])

Expected output:

Metrics:                 ( current / target )
  resource cpu on pods:  85% (850m) / 60%

If current utilization is close to target (within 10-15%): The HPA is constantly triggering because the margin is too thin. Raise the target to create headroom (Step 6). If utilization swings between 20% and 120%: The workload is bursty — the scale-up target may be appropriate but the scale-down stabilization window needs extending (Step 5).

Step 5: Adjust stabilizationWindowSeconds¶

Why: The stabilization window prevents the HPA from reacting to short-lived spikes. Adding a meaningful window (e.g., 180 seconds for scale-down, 60 seconds for scale-up) dramatically reduces thrashing without sacrificing responsiveness.

kubectl edit hpa <HPA_NAME> -n <NAMESPACE>

Add or modify the behavior block in the HPA spec:

spec:
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300    # Wait 5 min after scale-up before scaling down
      policies:
      - type: Percent
        value: 25                         # Scale down no more than 25% at a time
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 60     # Wait 1 min before scaling up again
      policies:
      - type: Percent
        value: 100                        # Allow doubling replicas per minute
        periodSeconds: 60

Expected outcome: HPA events should show scaling decisions 5+ minutes apart instead of every 2 minutes. If this fails: Apply via patch if kubectl edit is not available:

kubectl patch hpa <HPA_NAME> -n <NAMESPACE> --type=merge -p '{
  "spec": {
    "behavior": {
      "scaleDown": {"stabilizationWindowSeconds": 300},
      "scaleUp": {"stabilizationWindowSeconds": 60}
    }
  }
}'

Step 6: Tune targetCPUUtilizationPercentage¶

Why: A target utilization that is too close to steady-state CPU usage means the HPA will constantly be triggering — the pod is always near the threshold. A good target for most workloads is 50-70% of CPU, leaving headroom for bursts without constant scaling.

# For HPA using the older autoscaling/v1 API (targetCPUUtilizationPercentage field):
kubectl patch hpa <HPA_NAME> -n <NAMESPACE> \
  -p '{"spec":{"targetCPUUtilizationPercentage":<NEW_TARGET_PERCENT>}}'

# For HPA using autoscaling/v2 (metrics block):
kubectl edit hpa <HPA_NAME> -n <NAMESPACE>
# Change the averageUtilization value in spec.metrics[].resource.target

Recommended target utilization values: - CPU-bound stateless services: 60-70% - Memory-bound services: 70-80% (memory-based HPA) - Low-latency/interactive services: 40-50% (to have surge capacity) - Batch jobs: 80-90% (cost-optimized, latency is less critical)

Expected outcome: After adjusting the target, the HPA stabilizes at a replica count that keeps actual utilization comfortably below the new target. If this fails: The application may have fundamentally bursty load that no HPA tuning can smooth — consider using KEDA for event-driven autoscaling, or pre-warming pods on a cron schedule.

Verification¶

# Watch HPA status for 5 minutes to confirm it has stabilized
watch -n 30 kubectl get hpa <HPA_NAME> -n <NAMESPACE>

# Confirm no new rapid scaling events
kubectl get events -n <NAMESPACE> --field-selector involvedObject.name=<HPA_NAME> --sort-by='.lastTimestamp'

Success looks like: Replica count holds steady (or changes slowly, once per 5+ minutes), and TARGETS shows a stable percentage well below the threshold. If still broken: Escalate — see below.

Escalation¶

Condition	Who to Page	What to Say
Not resolved in 45 min	SRE on-call	"Kubernetes HPA thrashing in , HPA , constant scale events, runbook exhausted"
Data loss suspected	Platform Lead	"Data loss risk: stateful workload thrashing, in-flight requests being dropped during scale events"
Scope expanding beyond namespace	Platform team	"Multi-namespace impact: metrics-server instability causing HPA decisions across multiple namespaces"

Post-Incident¶

Update monitoring if alert was noisy or missing
File postmortem if P1/P2
Update this runbook if steps were wrong or incomplete
Commit the updated HPA spec (with stabilization windows) to git — do not leave it only patched live
Review whether the application's CPU requests are set accurately (HPA uses requests as the baseline)
Add a Grafana dashboard panel showing HPA replica count over time to make thrashing visible before it becomes an incident

Common Mistakes¶

Setting targetCPUUtilizationPercentage too low: A value like 20-30% means the HPA considers the pod "overloaded" at very light CPU usage. Every small traffic variation will trigger a scale-up, and the subsequent scale-down creates the thrashing pattern. The target should reflect the CPU utilization at which you actually want more pods — for most services, 50-70% is appropriate.
Missing stabilizationWindowSeconds configuration: The HPA behavior block with explicit stabilization windows is not configured by default for scale-up (default is 0 seconds). Engineers frequently deploy HPAs without a behavior block and are then confused when scale-up thrashing occurs. Always configure stabilization windows explicitly, especially for scale-up, so the HPA does not react to second-to-second CPU spikes.
Misconfigured CPU requests making HPA math wrong: HPA calculates utilization as actual_cpu / requested_cpu * 100. If resources.requests.cpu is set very low (e.g., 10m) but the pod actually uses 200m, the HPA sees 2000% utilization and reacts aggressively. Always ensure CPU requests reflect the actual expected baseline CPU usage of the pod.

Cross-References¶

Survival Guide: On-Call Survival Guide (pocket card version)
Topic Pack: Kubernetes Topics (deep background)
Related Runbook: oom-kill.md — if memory-based HPA is thrashing due to OOMKills during scale-up
Related Runbook: deploy-stuck.md — if rapid scaling interferes with a rolling deployment
Related Runbook: pod-crashloop.md — if newly scaled pods are crashing immediately

Kubernetes Exercises (Quest Ladder) (CLI) (Exercise Set, L1) — HPA / Autoscaling, Kubernetes Core
Lab: HPA Live Scaling (CLI) (Lab, L1) — HPA / Autoscaling, Kubernetes Core
Skillcheck: Kubernetes (Assessment, L1) — HPA / Autoscaling, Kubernetes Core
Adversarial Interview Gauntlet (30 sequences) (Scenario, L2) — Kubernetes Core
Case Study: Alert Storm — Flapping Health Checks (Case Study, L2) — Kubernetes Core
Case Study: Canary Deploy Routing to Wrong Backend — Ingress Misconfigured (Case Study, L2) — Kubernetes Core
Case Study: CrashLoopBackOff No Logs (Case Study, L1) — Kubernetes Core
Case Study: DNS Looks Broken — TLS Expired, Fix Is Cert-Manager (Case Study, L2) — Kubernetes Core
Case Study: DaemonSet Blocks Eviction (Case Study, L2) — Kubernetes Core
Case Study: Deployment Stuck — ImagePull Auth Failure, Vault Secret Rotation (Case Study, L2) — Kubernetes Core

Runbook: HPA Thrashing (Rapid Scale Up/Down)¶

Quick Assessment (30 seconds)¶

Step 1: Observe HPA Status and Event History¶

Step 2: Check Metrics Server Data¶

Step 3: Check Scale Stabilization Window¶

Step 4: Check Target Utilization vs Actual Load¶

Step 5: Adjust stabilizationWindowSeconds¶

Step 6: Tune targetCPUUtilizationPercentage¶

Verification¶

Escalation¶

Post-Incident¶

Common Mistakes¶

Cross-References¶

Wiki Navigation¶

Pages that link here¶

Runbook: HPA Thrashing (Rapid Scale Up/Down)¶

Quick Assessment (30 seconds)¶

Step 1: Observe HPA Status and Event History¶

Step 2: Check Metrics Server Data¶

Step 3: Check Scale Stabilization Window¶

Step 4: Check Target Utilization vs Actual Load¶

Step 5: Adjust stabilizationWindowSeconds¶

Step 6: Tune targetCPUUtilizationPercentage¶

Verification¶

Escalation¶

Post-Incident¶

Common Mistakes¶

Cross-References¶

Wiki Navigation¶

Related Content¶

Pages that link here¶