Skip to content

Runbook: Cloud Capacity Limit Hit

Field Value
Domain Cloud/Terraform
Alert Resource creation failing with quota/limit error, or scaling event failing
Severity P1 (if blocking scaling during incident), P2 (if blocking new deployments)
Est. Resolution Time 30-120 minutes
Escalation Timeout 30 minutes — page if not resolved (quota increases require human action with the cloud provider)
Last Tested 2026-03-19
Prerequisites Cloud provider CLI, cloud console access, ability to submit quota increase requests

Quick Assessment (30 seconds)

# Run this first — it tells you the scope of the problem
# For AWS — check what failed and what the limit is:
aws ec2 describe-instances 2>&1 | grep -i "LimitExceeded\|RequestLimitExceeded\|quota"
# Or check the terraform/deployment error message directly:
# The error will name the specific quota — e.g. "You have requested more vCPU capacity than your current vCPU limit"
echo "Read the exact error message — it names the quota that was hit"
If output shows: "LimitExceeded" or "QuotaExceeded" with a specific resource type → You've confirmed the limit; note the quota name and proceed to Step 1 If output shows: a different error (insufficient permissions, wrong region, missing VPC) → This is a different problem, not a quota issue — check your Terraform/deployment config

Step 1: Identify the Exact Quota That Was Hit

Why: Cloud providers have hundreds of different quotas. Knowing the exact quota name allows you to check the current usage, find workarounds, and submit the correct increase request.

# Read the full error message carefully — it will name the quota:
# Examples:
#   AWS: "You have requested more instances (X) than your current instance limit (Y) allows"
#   AWS: "The maximum number of VPCs has been reached"
#   GCP: "Quota 'CPUS_ALL_REGIONS' exceeded. Limit: 24.0, got: 32.0"
#   Azure: "Operation could not be completed as it results in exceeding approved Total Regional vCPUs quota"

# AWS — list all current quotas and their limits for EC2:
aws service-quotas list-service-quotas --service-code ec2 \
  --output json | jq '.Quotas[] | {Name, Value, Adjustable}' | head -50

# Find a specific quota by name (example: vCPU limit):
aws service-quotas list-service-quotas --service-code ec2 \
  --output json | jq '.Quotas[] | select(.QuotaName | contains("vCPU")) | {QuotaName, Value, QuotaCode}'

# GCP — check current quotas:
gcloud compute project-info describe --format=json | jq '.quotas[] | {metric, limit, usage}'

# Azure — check VM quota:
az vm list-usage --location <REGION> --output table | grep -i "vCPU\|cores"
Expected output:
AWS service-quotas output:
  {
    "QuotaName": "Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) instances",
    "Value": 32,
    "QuotaCode": "L-1216C47A",
    "Adjustable": true
  }

GCP quotas output:
  {"metric": "CPUS", "limit": 24, "usage": 24}   ← at the limit
If this fails: If aws service-quotas returns no results, the quota may be a region-specific limit. Add --region <REGION> to the command, or check the AWS console: Service Quotas → Amazon EC2.

Step 2: Check Current Usage vs. the Limit

Why: Before requesting a limit increase, confirm you are actually at the limit (not a configuration error). Also, understanding current usage helps you find resources to free up as an immediate workaround.

# AWS — check current EC2 instance usage by type:
aws ec2 describe-instances \
  --filters "Name=instance-state-name,Values=running,pending,stopping,stopped" \
  --output json | jq '.Reservations[].Instances[] | {InstanceType, State: .State.Name}' | \
  jq -s 'group_by(.InstanceType) | map({type: .[0].InstanceType, count: length})'

# Check a specific service quota's current usage vs limit:
aws service-quotas get-service-quota \
  --service-code ec2 \
  --quota-code <QUOTA_CODE> \
  --output json | jq '{QuotaName, Value}'

# GCP — check vCPU usage in a region:
gcloud compute regions describe <REGION> \
  --format="json" | jq '.quotas[] | select(.metric=="CPUS") | {limit, usage}'

# Azure — check current vCPU usage:
az vm list-usage --location <REGION> \
  --output json | jq '.[] | select(.name.value | contains("vCPU")) | {name: .name.localizedValue, current: .currentValue, limit: .limit}'
Expected output:
AWS instance count per type:
  [{"type": "t3.medium", "count": 12}, {"type": "m5.large", "count": 20}]

AWS quota check:
  {"QuotaName": "Running On-Demand Standard instances", "Value": 32}
  → If your total running instance vCPUs equals 32, you are at the limit.

GCP:
  {"limit": 24, "usage": 24}   ← confirmed at limit
If this fails: If you cannot determine current usage programmatically, check the cloud console quota dashboard: AWS → Service Quotas → EC2; GCP → IAM & Admin → Quotas; Azure → Subscriptions → Usage + quotas.

Step 3: Implement an Immediate Workaround (While Waiting for Quota Increase)

Why: Quota increase requests can take minutes to hours. If the limit is blocking an active incident (scaling needed NOW), a workaround is essential while the increase is being processed.

# Workaround Option A — Use a different AWS region:
# Some quotas are per-region. If us-east-1 is at limit, try us-west-2.
# Check quota in alternate region:
aws service-quotas list-service-quotas --service-code ec2 --region us-west-2 \
  --output json | jq '.Quotas[] | select(.QuotaName | contains("vCPU")) | {QuotaName, Value}'

# Workaround Option B — Use a different instance type:
# Quota limits are often per-instance-family (Standard, High Memory, etc.)
# Check if a different family has quota available:
aws service-quotas list-service-quotas --service-code ec2 \
  --output json | jq '.Quotas[] | select(.QuotaName | contains("Running On-Demand")) | {QuotaName, Value}'

# Workaround Option C — Terminate idle or unused resources to free up quota:
# Find stopped EC2 instances that can be terminated:
aws ec2 describe-instances \
  --filters "Name=instance-state-name,Values=stopped" \
  --output json | jq '.Reservations[].Instances[] | {InstanceId, InstanceType, LaunchTime}'

# Terminate specific unused instances (CONFIRM BEFORE RUNNING):
# aws ec2 terminate-instances --instance-ids <INSTANCE_ID_1> <INSTANCE_ID_2>

# Workaround Option D — Right-size: use fewer, larger instances instead of many small ones
# (Depends on your workload — check if this is feasible)
echo "Document which workaround you used and its impact in the incident log"
Expected output:
Option A: quota in alternate region shows headroom (e.g., value: 96 vs 32 in the original region)
Option B: a different instance family has available quota
Option C: a list of stopped instances that can be terminated to free up quota
If this fails: If no workaround is available (all options exhausted), escalate immediately — the business may need to accept reduced capacity while waiting for the quota increase.

Step 4: Submit a Quota Increase Request

Why: The quota limit must be raised to support long-term capacity. This requires action in the cloud provider's console and may take time — submit immediately even if a workaround is in place.

# AWS — request quota increase via CLI:
aws service-quotas request-service-quota-increase \
  --service-code ec2 \
  --quota-code <QUOTA_CODE> \
  --desired-value <NEW_LIMIT>

# Or via the AWS Console (often faster for approvals):
# AWS Console → Service Quotas → Amazon EC2 → find the quota → Request quota increase
# Provide business justification in the reason field.

# Check status of submitted request:
aws service-quotas list-requested-changes-by-service --service-code ec2 \
  --output json | jq '.RequestedQuotas[] | {QuotaName, DesiredValue, Status, Created}'

# GCP — request quota increase:
# GCP Console → IAM & Admin → Quotas → find the quota → Edit Quotas
# Fill in the new value and business justification.

# Azure — request quota increase:
# Azure Portal → Subscriptions → Usage + quotas → Request increase
# Or: az support tickets create (for VM quota increases)

echo "Note the request ID and expected approval time — some increases are automatic, others require review"
Expected output:
AWS CLI:
  {
    "RequestedQuota": {
      "Id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
      "Status": "PENDING",
      "QuotaName": "Running On-Demand Standard instances",
      "DesiredValue": 64
    }
  }

Status will transition: PENDING → APPROVED (automatic for small increases) or CASE_OPENED (needs AWS review)
If this fails: If the AWS CLI request fails with "Cannot increase this quota", some quotas can only be increased via a support case. Go to AWS Support → Create case → Service limit increase.

Step 5: Monitor and Unblock the Scaling Event

Why: Once quota is increased (or a workaround is in place), you need to confirm the blocked operation can now proceed.

# After quota increase is approved (or workaround in place), retry the failing operation:
# Terraform:
terraform apply -target=<RESOURCE_TYPE>.<RESOURCE_NAME>

# Kubernetes HPA scale:
kubectl get hpa -n <NAMESPACE>
kubectl describe hpa <HPA_NAME> -n <NAMESPACE> | grep -A5 "Events"

# ASG (Auto Scaling Group):
aws autoscaling describe-scaling-activities \
  --auto-scaling-group-name <ASG_NAME> \
  --output json | jq '.Activities[0] | {ActivityId, Description, StatusCode, StatusMessage}'

# Verify the new resources were created:
aws ec2 describe-instances \
  --filters "Name=instance-state-name,Values=running,pending" \
  --output json | jq '.Reservations | length'
Expected output:
Terraform apply: "Apply complete! Resources: X added."
Kubernetes HPA: events show "scaled up" rather than "failed to scale"
ASG activity: {"StatusCode": "Successful", "StatusMessage": "..."}
Instance count has increased as expected.
If this fails: If the operation still fails after quota increase approval, the increase may not have propagated yet (AWS can take a few minutes). Wait 5-10 minutes and retry. If it still fails, verify the new quota value is reflected: aws service-quotas get-service-quota --service-code ec2 --quota-code <QUOTA_CODE>.

Step 6: Document the Limit and Add Prevention

Why: Hitting a quota limit silently in a scaling event is a reliability risk. Document the limit and add visibility so you can act before the next incident.

# Add the quota limit as a Terraform variable or output for visibility:
# In your Terraform code (example — add to outputs.tf):
# output "ec2_vcpu_quota_limit" {
#   value       = 64  # current approved limit
#   description = "Current approved vCPU limit in us-east-1 — submit increase request if usage approaches this"
# }

# Add a CloudWatch alarm (AWS) to alert when usage approaches the limit:
aws cloudwatch put-metric-alarm \
  --alarm-name "EC2-vCPU-Usage-High" \
  --alarm-description "EC2 vCPU usage approaching quota limit" \
  --namespace "AWS/Usage" \
  --metric-name "ResourceCount" \
  --dimensions Name=Service,Value=EC2 Name=Resource,Value=vCPU Name=Type,Value=Resource \
  --statistic Average \
  --period 300 \
  --threshold 80 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --evaluation-periods 1 \
  --alarm-actions <SNS_TOPIC_ARN>

echo "Document the limit in team wiki and set an alert at 80% of quota"
Expected output:
CloudWatch alarm created: "OK" state, will trigger when vCPU count reaches 80% of quota limit.
If this fails: If CloudWatch metric is not available for this quota type, use a Lambda function to periodically check quota usage and publish a custom metric.

Verification

# Confirm the issue is resolved
aws service-quotas get-service-quota --service-code ec2 --quota-code <QUOTA_CODE> \
  --output json | jq '{QuotaName, Value}'
# Verify the new limit is higher than what you need
Success looks like: Quota increase approved and the new limit value is visible. The previously-failing scaling event or Terraform apply now completes successfully. If still broken: Escalate — see below.

Escalation

Condition Who to Page What to Say
Not resolved in 30 min (blocking incident) Platform/Infra on-call "P1: Quota limit blocking scaling during active incident — need emergency workaround or escalation to cloud provider"
Quota increase rejected Platform/Infra on-call + Engineering Manager "Cloud provider rejected quota increase request — need management escalation or alternative architecture decision"
Security incident Security on-call "Security incident: resource creation by unexpected actor is exhausting quota — possible crypto-jacking"
No workaround available Engineering Manager "Quota limit reached with no workaround — service cannot scale, customer impact possible"

Post-Incident

  • Update monitoring if alert was noisy or missing
  • File postmortem if P1/P2
  • Update this runbook if steps were wrong or incomplete
  • Set quota usage alerts at 70-80% of current limits for all critical quotas
  • Pre-request quota increases proactively for expected growth before hitting limits
  • Document all quota limits and approved values in team wiki
  • Review whether any unused resources can be terminated to free up headroom

Common Mistakes

  1. Submitting a quota increase request without a workaround: Quota increase approvals take time (minutes to hours). If capacity is needed now, implement a workaround (different region, instance type, cleanup) while waiting — don't just wait for the increase.
  2. Not checking all quotas: A vCPU limit is separate from an instance count limit, EIP limit, VPC limit, and security group limit. Fixing one quota may reveal you are also at another limit — check all related quotas together.
  3. Forgetting that quotas are per-region on AWS: Hitting the vCPU limit in us-east-1 does not affect us-west-2. If you have a multi-region deployment, check each region separately.
  4. Not documenting the limit after the incident: Teams regularly re-hit the same quota because the limit was not documented. Add it to your wiki and set an alert at 80% of the limit.
  5. Terminating instances to free up quota without checking if they are in use: A "stopped" instance may be stopped for a reason (DR standby, scheduled maintenance). Confirm before terminating.

Cross-References

  • Topic Pack: training/library/topic-packs/cloud-terraform/ (deep background on cloud quotas and capacity planning)
  • Related Runbook: terraform-state-lock.md — if the terraform apply is stuck due to a lock in addition to quota errors
  • Related Runbook: drift-detection.md — if quota exhaustion led to partial resource creation and resulting drift
  • Related Runbook: ../kubernetes/hpa_not_scaling.md — if the quota limit is preventing Kubernetes HPA from scaling

Wiki Navigation

  • AWS CloudWatch (Topic Pack, L2) — Cloud Deep Dive
  • AWS Devops Flashcards (CLI) (flashcard_deck, L1) — Cloud Deep Dive
  • AWS EC2 (Topic Pack, L1) — Cloud Deep Dive
  • AWS ECS (Topic Pack, L2) — Cloud Deep Dive
  • AWS General Flashcards (CLI) (flashcard_deck, L1) — Cloud Deep Dive
  • AWS IAM (Topic Pack, L1) — Cloud Deep Dive
  • AWS Lambda (Topic Pack, L2) — Cloud Deep Dive
  • AWS Networking (Topic Pack, L1) — Cloud Deep Dive
  • AWS Route 53 (Topic Pack, L2) — Cloud Deep Dive
  • AWS S3 Deep Dive (Topic Pack, L1) — Cloud Deep Dive