Skip to content

Incident Replay: Resource Quota Blocking Deployment

Setup

  • System context: Multi-tenant Kubernetes cluster with ResourceQuotas per namespace. A team's deployment scaling event is rejected by the admission controller.
  • Time: Wednesday 11:00 UTC
  • Your role: Platform engineer

Round 1: Alert Fires

[Pressure cue: "Frontend team reports their HPA cannot scale beyond 5 replicas. Traffic is spiking from a marketing campaign. Users seeing slow page loads."]

What you see: kubectl get events -n frontend shows: "Error creating: pods is forbidden: exceeded quota: frontend-quota, requested: cpu=500m, used: cpu=2500m, limited: cpu=3000m." HPA wants 8 replicas but only 5 can run.

Choose your action: - A) Increase the ResourceQuota for the frontend namespace - B) Check the current resource usage vs quota limits - C) Reduce the CPU request per pod to fit more replicas - D) Remove the ResourceQuota entirely

[Result: kubectl describe quota frontend-quota -n frontend shows: cpu used=2500m/limited=3000m, memory used=2.5Gi/limited=4Gi. Each pod requests 500m CPU. 5 pods = 2500m. The 6th pod would need 3000m, leaving zero headroom. The quota was set for a pre-HPA era when the deployment was fixed at 5 replicas. Proceed to Round 2.]

If you chose A:

[Result: Increasing the quota works but you should understand the current usage and the HPA target range first. Blind increase may over-provision.]

If you chose C:

[Result: Reducing requests might allow more pods but could lead to CPU throttling under load. The pods need that CPU during traffic spikes.]

If you chose D:

[Result: Removing the quota entirely defeats the purpose of multi-tenancy. Other namespaces could be starved.]

Round 2: First Triage Data

[Pressure cue: "Marketing campaign is peaking. Frontend needs to scale to 10 replicas minimum. Current quota caps at 6."]

What you see: HPA max replicas is 10. Each pod needs 500m CPU and 500Mi memory. To support 10 replicas: 5000m CPU and 5Gi memory needed. Current quota: 3000m CPU and 4Gi memory.

Choose your action: - A) Increase quota to cpu=6000m, memory=6Gi (with headroom) - B) Increase quota to exactly 5000m/5Gi (no headroom) - C) Split the frontend into two namespaces with separate quotas - D) Temporarily remove the quota during the traffic spike

[Result: kubectl apply updated ResourceQuota with cpu=6000m, memory=6Gi. HPA immediately scales to 8 replicas, then 10 as traffic grows. Frontend response times improve. Proceed to Round 3.]

If you chose B:

[Result: Exact limits leave no room for rolling updates or temporary burst. During a rolling deploy, old and new pods coexist briefly, requiring double the resources.]

If you chose C:

[Result: Architectural overkill for a capacity planning issue.]

If you chose D:

[Result: No quota means the frontend could consume resources needed by other teams. Unsafe in multi-tenant clusters.]

Round 3: Root Cause Identification

[Pressure cue: "Frontend is scaling. Why was the quota too low?"]

What you see: Root cause: The ResourceQuota was set during initial namespace creation when the deployment was a fixed 5 replicas with no HPA. When HPA was added later, nobody updated the quota to accommodate the scaling range. The quota review was not part of the HPA enablement checklist.

Choose your action: - A) Add quota review to the HPA enablement checklist - B) Set quotas based on HPA maxReplicas * per-pod-requests + 20% headroom - C) Add alerting when quota usage exceeds 80% - D) All of the above

[Result: Checklist updated, formula documented, alerting added. Future HPA-enabled services will have appropriate quotas from the start. Proceed to Round 4.]

If you chose A:

[Result: Process improvement but no formula for calculating the right quota.]

If you chose B:

[Result: Good formula but needs alerting to catch drift.]

If you chose C:

[Result: Alerting catches issues proactively but does not fix the process gap.]

Round 4: Remediation

[Pressure cue: "Frontend scaled successfully. Campaign traffic handled. Close."]

Actions: 1. Verify HPA is scaling as expected: kubectl get hpa -n frontend 2. Verify all pods are Running: kubectl get pods -n frontend 3. Verify quota headroom: kubectl describe quota -n frontend 4. Update the quota formula documentation 5. Add quota usage monitoring with 80% threshold alerts

Damage Report

  • Total downtime: 0 (service ran at degraded capacity)
  • Blast radius: Frontend response times degraded by 3x for 20 minutes; user experience impacted during marketing campaign peak
  • Optimal resolution time: 8 minutes (check quota -> calculate need -> increase -> verify)
  • If every wrong choice was made: 60+ minutes with over-engineering and multi-tenant disruption

Cross-References