Skip to content

Runbook: HPA Thrashing (Rapid Scale Up/Down)

Field Value
Domain Kubernetes
Alert kube_horizontalpodautoscaler_status_current_replicas changing rapidly, or HPA events firing frequently
Severity P3
Est. Resolution Time 20-40 minutes
Escalation Timeout 45 minutes — page if not resolved
Last Tested 2026-03-19
Prerequisites kubectl access, cluster-admin or namespace-admin, kubeconfig configured

Quick Assessment (30 seconds)

# Run this first — it tells you the scope of the problem
kubectl get hpa -n <NAMESPACE>
If output shows: TARGETS metric is constantly fluctuating (refresh a few times) → HPA is actively thrashing If output shows: TARGETS metric is stable but replica count changed recently → This may be a one-time scale event, not thrashing — monitor for 5 minutes before proceeding

Step 1: Observe HPA Status and Event History

Why: The HPA status and events show the exact scaling decisions being made and why. This is the ground truth for diagnosing thrashing.

# Get current HPA status
kubectl get hpa <HPA_NAME> -n <NAMESPACE>
kubectl describe hpa <HPA_NAME> -n <NAMESPACE>

# Check HPA events specifically
kubectl get events -n <NAMESPACE> --field-selector involvedObject.name=<HPA_NAME> --sort-by='.lastTimestamp'
Expected output (thrashing HPA):
Name:                                                  my-app-hpa
Namespace:                                             production
Reference:                                             Deployment/my-app
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  85% (850m) / 60%
Min replicas:                                          2
Max replicas:                                          20
Deployment pods:                                       8 available / 8 desired

Events:
  Normal  SuccessfulRescale  2m   horizontal-pod-autoscaler  New size: 10; reason: cpu resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  4m   horizontal-pod-autoscaler  New size: 7; reason: All metrics below target
  Normal  SuccessfulRescale  6m   horizontal-pod-autoscaler  New size: 10; reason: cpu resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  8m   horizontal-pod-autoscaler  New size: 7; reason: All metrics below target
If events show alternating scale-up and scale-down every 2-5 minutes: This is thrashing — continue with Steps 2-6. If events are infrequent (every 10+ minutes): This may not be thrashing — monitor and see if it stabilizes.

Step 2: Check Metrics Server Data

Why: HPA makes decisions based on metrics from the metrics server. If metrics are noisy, delayed, or stale, the HPA will make poor decisions. Confirm the metrics are real and current.

# Check if metrics-server is running
kubectl get pods -n kube-system | grep metrics-server

# Check current CPU/memory metrics directly
kubectl top pods -n <NAMESPACE> -l app=<APP_LABEL>

# Get the raw metrics the HPA is seeing
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/<NAMESPACE>/pods" | python3 -m json.tool | grep -A 5 '"name"'
Expected output (metrics-server healthy):
NAME                    CPU(cores)   MEMORY(bytes)
my-app-5c4a2b1d9-abc    850m         245Mi
my-app-5c4a2b1d9-def    820m         238Mi
If kubectl top shows error: Metrics API not available: The metrics-server is down — restart it with kubectl rollout restart deployment/metrics-server -n kube-system. HPA cannot function without it. If metrics fluctuate wildly on repeated kubectl top calls: The application's CPU usage is genuinely bursty — the target utilization needs adjustment (Step 6) or the workload needs to be smoothed upstream.

Step 3: Check Scale Stabilization Window

Why: The stabilizationWindowSeconds setting in the HPA spec controls how long the HPA waits after a scale event before scaling again. If this is 0 or missing, the HPA will scale immediately in both directions, causing thrashing.

# Check the HPA spec for stabilization settings
kubectl get hpa <HPA_NAME> -n <NAMESPACE> -o yaml | grep -A 20 "behavior"
Expected output (missing stabilization — thrash-prone):
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 20
  targetCPUUtilizationPercentage: 60
  # No behavior block — uses defaults
Default behavior: Scale-up stabilization: 0 seconds (immediate). Scale-down stabilization: 300 seconds (5 min). If scale-up thrashing is occurring despite the 5-min scale-down window, the workload is genuinely oscillating — continue to Steps 4-6. If the behavior block exists but stabilizationWindowSeconds is 0 or very low: This is the direct cause — fix it in Step 5.

Step 4: Check Target Utilization vs Actual Load

Why: If the target CPU utilization percentage is set too low (e.g., 30%), even light load will keep triggering scale-ups, and the moment load drops even slightly, scale-downs begin — creating a perpetual oscillation.

# Check what CPU utilization % is currently observed
kubectl describe hpa <HPA_NAME> -n <NAMESPACE> | grep -E "cpu|utilization|target"

# Cross-reference with actual pod CPU requests
kubectl get deployment <DEPLOYMENT_NAME> -n <NAMESPACE> -o jsonpath='{.spec.template.spec.containers[*].resources.requests}'

# Check CPU utilization over time (if Prometheus is available)
# Query: rate(container_cpu_usage_seconds_total{namespace="<NAMESPACE>",pod=~"<APP_LABEL>.*"}[5m])
Expected output:
Metrics:                 ( current / target )
  resource cpu on pods:  85% (850m) / 60%
If current utilization is close to target (within 10-15%): The HPA is constantly triggering because the margin is too thin. Raise the target to create headroom (Step 6). If utilization swings between 20% and 120%: The workload is bursty — the scale-up target may be appropriate but the scale-down stabilization window needs extending (Step 5).

Step 5: Adjust stabilizationWindowSeconds

Why: The stabilization window prevents the HPA from reacting to short-lived spikes. Adding a meaningful window (e.g., 180 seconds for scale-down, 60 seconds for scale-up) dramatically reduces thrashing without sacrificing responsiveness.

kubectl edit hpa <HPA_NAME> -n <NAMESPACE>
Add or modify the behavior block in the HPA spec:
spec:
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300    # Wait 5 min after scale-up before scaling down
      policies:
      - type: Percent
        value: 25                         # Scale down no more than 25% at a time
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 60     # Wait 1 min before scaling up again
      policies:
      - type: Percent
        value: 100                        # Allow doubling replicas per minute
        periodSeconds: 60
Expected outcome: HPA events should show scaling decisions 5+ minutes apart instead of every 2 minutes. If this fails: Apply via patch if kubectl edit is not available:
kubectl patch hpa <HPA_NAME> -n <NAMESPACE> --type=merge -p '{
  "spec": {
    "behavior": {
      "scaleDown": {"stabilizationWindowSeconds": 300},
      "scaleUp": {"stabilizationWindowSeconds": 60}
    }
  }
}'

Step 6: Tune targetCPUUtilizationPercentage

Why: A target utilization that is too close to steady-state CPU usage means the HPA will constantly be triggering — the pod is always near the threshold. A good target for most workloads is 50-70% of CPU, leaving headroom for bursts without constant scaling.

# For HPA using the older autoscaling/v1 API (targetCPUUtilizationPercentage field):
kubectl patch hpa <HPA_NAME> -n <NAMESPACE> \
  -p '{"spec":{"targetCPUUtilizationPercentage":<NEW_TARGET_PERCENT>}}'

# For HPA using autoscaling/v2 (metrics block):
kubectl edit hpa <HPA_NAME> -n <NAMESPACE>
# Change the averageUtilization value in spec.metrics[].resource.target
Recommended target utilization values: - CPU-bound stateless services: 60-70% - Memory-bound services: 70-80% (memory-based HPA) - Low-latency/interactive services: 40-50% (to have surge capacity) - Batch jobs: 80-90% (cost-optimized, latency is less critical)

Expected outcome: After adjusting the target, the HPA stabilizes at a replica count that keeps actual utilization comfortably below the new target. If this fails: The application may have fundamentally bursty load that no HPA tuning can smooth — consider using KEDA for event-driven autoscaling, or pre-warming pods on a cron schedule.

Verification

# Watch HPA status for 5 minutes to confirm it has stabilized
watch -n 30 kubectl get hpa <HPA_NAME> -n <NAMESPACE>

# Confirm no new rapid scaling events
kubectl get events -n <NAMESPACE> --field-selector involvedObject.name=<HPA_NAME> --sort-by='.lastTimestamp'
Success looks like: Replica count holds steady (or changes slowly, once per 5+ minutes), and TARGETS shows a stable percentage well below the threshold. If still broken: Escalate — see below.

Escalation

Condition Who to Page What to Say
Not resolved in 45 min SRE on-call "Kubernetes HPA thrashing in , HPA , constant scale events, runbook exhausted"
Data loss suspected Platform Lead "Data loss risk: stateful workload thrashing, in-flight requests being dropped during scale events"
Scope expanding beyond namespace Platform team "Multi-namespace impact: metrics-server instability causing HPA decisions across multiple namespaces"

Post-Incident

  • Update monitoring if alert was noisy or missing
  • File postmortem if P1/P2
  • Update this runbook if steps were wrong or incomplete
  • Commit the updated HPA spec (with stabilization windows) to git — do not leave it only patched live
  • Review whether the application's CPU requests are set accurately (HPA uses requests as the baseline)
  • Add a Grafana dashboard panel showing HPA replica count over time to make thrashing visible before it becomes an incident

Common Mistakes

  1. Setting targetCPUUtilizationPercentage too low: A value like 20-30% means the HPA considers the pod "overloaded" at very light CPU usage. Every small traffic variation will trigger a scale-up, and the subsequent scale-down creates the thrashing pattern. The target should reflect the CPU utilization at which you actually want more pods — for most services, 50-70% is appropriate.
  2. Missing stabilizationWindowSeconds configuration: The HPA behavior block with explicit stabilization windows is not configured by default for scale-up (default is 0 seconds). Engineers frequently deploy HPAs without a behavior block and are then confused when scale-up thrashing occurs. Always configure stabilization windows explicitly, especially for scale-up, so the HPA does not react to second-to-second CPU spikes.
  3. Misconfigured CPU requests making HPA math wrong: HPA calculates utilization as actual_cpu / requested_cpu * 100. If resources.requests.cpu is set very low (e.g., 10m) but the pod actually uses 200m, the HPA sees 2000% utilization and reacts aggressively. Always ensure CPU requests reflect the actual expected baseline CPU usage of the pod.

Cross-References


Wiki Navigation