- cloud
- l2
- runbook
- cloud-deep-dive
- terraform --- Portal | Level: L2: Operations | Topics: Cloud Deep Dive, Terraform | Domain: Cloud
Runbook: Cloud Capacity Limit Hit¶
| Field | Value |
|---|---|
| Domain | Cloud/Terraform |
| Alert | Resource creation failing with quota/limit error, or scaling event failing |
| Severity | P1 (if blocking scaling during incident), P2 (if blocking new deployments) |
| Est. Resolution Time | 30-120 minutes |
| Escalation Timeout | 30 minutes — page if not resolved (quota increases require human action with the cloud provider) |
| Last Tested | 2026-03-19 |
| Prerequisites | Cloud provider CLI, cloud console access, ability to submit quota increase requests |
Quick Assessment (30 seconds)¶
# Run this first — it tells you the scope of the problem
# For AWS — check what failed and what the limit is:
aws ec2 describe-instances 2>&1 | grep -i "LimitExceeded\|RequestLimitExceeded\|quota"
# Or check the terraform/deployment error message directly:
# The error will name the specific quota — e.g. "You have requested more vCPU capacity than your current vCPU limit"
echo "Read the exact error message — it names the quota that was hit"
Step 1: Identify the Exact Quota That Was Hit¶
Why: Cloud providers have hundreds of different quotas. Knowing the exact quota name allows you to check the current usage, find workarounds, and submit the correct increase request.
# Read the full error message carefully — it will name the quota:
# Examples:
# AWS: "You have requested more instances (X) than your current instance limit (Y) allows"
# AWS: "The maximum number of VPCs has been reached"
# GCP: "Quota 'CPUS_ALL_REGIONS' exceeded. Limit: 24.0, got: 32.0"
# Azure: "Operation could not be completed as it results in exceeding approved Total Regional vCPUs quota"
# AWS — list all current quotas and their limits for EC2:
aws service-quotas list-service-quotas --service-code ec2 \
--output json | jq '.Quotas[] | {Name, Value, Adjustable}' | head -50
# Find a specific quota by name (example: vCPU limit):
aws service-quotas list-service-quotas --service-code ec2 \
--output json | jq '.Quotas[] | select(.QuotaName | contains("vCPU")) | {QuotaName, Value, QuotaCode}'
# GCP — check current quotas:
gcloud compute project-info describe --format=json | jq '.quotas[] | {metric, limit, usage}'
# Azure — check VM quota:
az vm list-usage --location <REGION> --output table | grep -i "vCPU\|cores"
AWS service-quotas output:
{
"QuotaName": "Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) instances",
"Value": 32,
"QuotaCode": "L-1216C47A",
"Adjustable": true
}
GCP quotas output:
{"metric": "CPUS", "limit": 24, "usage": 24} ← at the limit
aws service-quotas returns no results, the quota may be a region-specific limit. Add --region <REGION> to the command, or check the AWS console: Service Quotas → Amazon EC2.
Step 2: Check Current Usage vs. the Limit¶
Why: Before requesting a limit increase, confirm you are actually at the limit (not a configuration error). Also, understanding current usage helps you find resources to free up as an immediate workaround.
# AWS — check current EC2 instance usage by type:
aws ec2 describe-instances \
--filters "Name=instance-state-name,Values=running,pending,stopping,stopped" \
--output json | jq '.Reservations[].Instances[] | {InstanceType, State: .State.Name}' | \
jq -s 'group_by(.InstanceType) | map({type: .[0].InstanceType, count: length})'
# Check a specific service quota's current usage vs limit:
aws service-quotas get-service-quota \
--service-code ec2 \
--quota-code <QUOTA_CODE> \
--output json | jq '{QuotaName, Value}'
# GCP — check vCPU usage in a region:
gcloud compute regions describe <REGION> \
--format="json" | jq '.quotas[] | select(.metric=="CPUS") | {limit, usage}'
# Azure — check current vCPU usage:
az vm list-usage --location <REGION> \
--output json | jq '.[] | select(.name.value | contains("vCPU")) | {name: .name.localizedValue, current: .currentValue, limit: .limit}'
AWS instance count per type:
[{"type": "t3.medium", "count": 12}, {"type": "m5.large", "count": 20}]
AWS quota check:
{"QuotaName": "Running On-Demand Standard instances", "Value": 32}
→ If your total running instance vCPUs equals 32, you are at the limit.
GCP:
{"limit": 24, "usage": 24} ← confirmed at limit
Step 3: Implement an Immediate Workaround (While Waiting for Quota Increase)¶
Why: Quota increase requests can take minutes to hours. If the limit is blocking an active incident (scaling needed NOW), a workaround is essential while the increase is being processed.
# Workaround Option A — Use a different AWS region:
# Some quotas are per-region. If us-east-1 is at limit, try us-west-2.
# Check quota in alternate region:
aws service-quotas list-service-quotas --service-code ec2 --region us-west-2 \
--output json | jq '.Quotas[] | select(.QuotaName | contains("vCPU")) | {QuotaName, Value}'
# Workaround Option B — Use a different instance type:
# Quota limits are often per-instance-family (Standard, High Memory, etc.)
# Check if a different family has quota available:
aws service-quotas list-service-quotas --service-code ec2 \
--output json | jq '.Quotas[] | select(.QuotaName | contains("Running On-Demand")) | {QuotaName, Value}'
# Workaround Option C — Terminate idle or unused resources to free up quota:
# Find stopped EC2 instances that can be terminated:
aws ec2 describe-instances \
--filters "Name=instance-state-name,Values=stopped" \
--output json | jq '.Reservations[].Instances[] | {InstanceId, InstanceType, LaunchTime}'
# Terminate specific unused instances (CONFIRM BEFORE RUNNING):
# aws ec2 terminate-instances --instance-ids <INSTANCE_ID_1> <INSTANCE_ID_2>
# Workaround Option D — Right-size: use fewer, larger instances instead of many small ones
# (Depends on your workload — check if this is feasible)
echo "Document which workaround you used and its impact in the incident log"
Option A: quota in alternate region shows headroom (e.g., value: 96 vs 32 in the original region)
Option B: a different instance family has available quota
Option C: a list of stopped instances that can be terminated to free up quota
Step 4: Submit a Quota Increase Request¶
Why: The quota limit must be raised to support long-term capacity. This requires action in the cloud provider's console and may take time — submit immediately even if a workaround is in place.
# AWS — request quota increase via CLI:
aws service-quotas request-service-quota-increase \
--service-code ec2 \
--quota-code <QUOTA_CODE> \
--desired-value <NEW_LIMIT>
# Or via the AWS Console (often faster for approvals):
# AWS Console → Service Quotas → Amazon EC2 → find the quota → Request quota increase
# Provide business justification in the reason field.
# Check status of submitted request:
aws service-quotas list-requested-changes-by-service --service-code ec2 \
--output json | jq '.RequestedQuotas[] | {QuotaName, DesiredValue, Status, Created}'
# GCP — request quota increase:
# GCP Console → IAM & Admin → Quotas → find the quota → Edit Quotas
# Fill in the new value and business justification.
# Azure — request quota increase:
# Azure Portal → Subscriptions → Usage + quotas → Request increase
# Or: az support tickets create (for VM quota increases)
echo "Note the request ID and expected approval time — some increases are automatic, others require review"
AWS CLI:
{
"RequestedQuota": {
"Id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"Status": "PENDING",
"QuotaName": "Running On-Demand Standard instances",
"DesiredValue": 64
}
}
Status will transition: PENDING → APPROVED (automatic for small increases) or CASE_OPENED (needs AWS review)
Step 5: Monitor and Unblock the Scaling Event¶
Why: Once quota is increased (or a workaround is in place), you need to confirm the blocked operation can now proceed.
# After quota increase is approved (or workaround in place), retry the failing operation:
# Terraform:
terraform apply -target=<RESOURCE_TYPE>.<RESOURCE_NAME>
# Kubernetes HPA scale:
kubectl get hpa -n <NAMESPACE>
kubectl describe hpa <HPA_NAME> -n <NAMESPACE> | grep -A5 "Events"
# ASG (Auto Scaling Group):
aws autoscaling describe-scaling-activities \
--auto-scaling-group-name <ASG_NAME> \
--output json | jq '.Activities[0] | {ActivityId, Description, StatusCode, StatusMessage}'
# Verify the new resources were created:
aws ec2 describe-instances \
--filters "Name=instance-state-name,Values=running,pending" \
--output json | jq '.Reservations | length'
Terraform apply: "Apply complete! Resources: X added."
Kubernetes HPA: events show "scaled up" rather than "failed to scale"
ASG activity: {"StatusCode": "Successful", "StatusMessage": "..."}
Instance count has increased as expected.
aws service-quotas get-service-quota --service-code ec2 --quota-code <QUOTA_CODE>.
Step 6: Document the Limit and Add Prevention¶
Why: Hitting a quota limit silently in a scaling event is a reliability risk. Document the limit and add visibility so you can act before the next incident.
# Add the quota limit as a Terraform variable or output for visibility:
# In your Terraform code (example — add to outputs.tf):
# output "ec2_vcpu_quota_limit" {
# value = 64 # current approved limit
# description = "Current approved vCPU limit in us-east-1 — submit increase request if usage approaches this"
# }
# Add a CloudWatch alarm (AWS) to alert when usage approaches the limit:
aws cloudwatch put-metric-alarm \
--alarm-name "EC2-vCPU-Usage-High" \
--alarm-description "EC2 vCPU usage approaching quota limit" \
--namespace "AWS/Usage" \
--metric-name "ResourceCount" \
--dimensions Name=Service,Value=EC2 Name=Resource,Value=vCPU Name=Type,Value=Resource \
--statistic Average \
--period 300 \
--threshold 80 \
--comparison-operator GreaterThanOrEqualToThreshold \
--evaluation-periods 1 \
--alarm-actions <SNS_TOPIC_ARN>
echo "Document the limit in team wiki and set an alert at 80% of quota"
Verification¶
# Confirm the issue is resolved
aws service-quotas get-service-quota --service-code ec2 --quota-code <QUOTA_CODE> \
--output json | jq '{QuotaName, Value}'
# Verify the new limit is higher than what you need
Escalation¶
| Condition | Who to Page | What to Say |
|---|---|---|
| Not resolved in 30 min (blocking incident) | Platform/Infra on-call | "P1: Quota limit blocking scaling during active incident — need emergency workaround or escalation to cloud provider" |
| Quota increase rejected | Platform/Infra on-call + Engineering Manager | "Cloud provider rejected quota increase request — need management escalation or alternative architecture decision" |
| Security incident | Security on-call | "Security incident: resource creation by unexpected actor is exhausting quota — possible crypto-jacking" |
| No workaround available | Engineering Manager | "Quota limit reached with no workaround — service cannot scale, customer impact possible" |
Post-Incident¶
- Update monitoring if alert was noisy or missing
- File postmortem if P1/P2
- Update this runbook if steps were wrong or incomplete
- Set quota usage alerts at 70-80% of current limits for all critical quotas
- Pre-request quota increases proactively for expected growth before hitting limits
- Document all quota limits and approved values in team wiki
- Review whether any unused resources can be terminated to free up headroom
Common Mistakes¶
- Submitting a quota increase request without a workaround: Quota increase approvals take time (minutes to hours). If capacity is needed now, implement a workaround (different region, instance type, cleanup) while waiting — don't just wait for the increase.
- Not checking all quotas: A vCPU limit is separate from an instance count limit, EIP limit, VPC limit, and security group limit. Fixing one quota may reveal you are also at another limit — check all related quotas together.
- Forgetting that quotas are per-region on AWS: Hitting the vCPU limit in us-east-1 does not affect us-west-2. If you have a multi-region deployment, check each region separately.
- Not documenting the limit after the incident: Teams regularly re-hit the same quota because the limit was not documented. Add it to your wiki and set an alert at 80% of the limit.
- Terminating instances to free up quota without checking if they are in use: A "stopped" instance may be stopped for a reason (DR standby, scheduled maintenance). Confirm before terminating.
Cross-References¶
- Topic Pack:
training/library/topic-packs/cloud-terraform/(deep background on cloud quotas and capacity planning) - Related Runbook: terraform-state-lock.md — if the terraform apply is stuck due to a lock in addition to quota errors
- Related Runbook: drift-detection.md — if quota exhaustion led to partial resource creation and resulting drift
- Related Runbook:
../kubernetes/hpa_not_scaling.md— if the quota limit is preventing Kubernetes HPA from scaling
Wiki Navigation¶
Related Content¶
- AWS CloudWatch (Topic Pack, L2) — Cloud Deep Dive
- AWS Devops Flashcards (CLI) (flashcard_deck, L1) — Cloud Deep Dive
- AWS EC2 (Topic Pack, L1) — Cloud Deep Dive
- AWS ECS (Topic Pack, L2) — Cloud Deep Dive
- AWS General Flashcards (CLI) (flashcard_deck, L1) — Cloud Deep Dive
- AWS IAM (Topic Pack, L1) — Cloud Deep Dive
- AWS Lambda (Topic Pack, L2) — Cloud Deep Dive
- AWS Networking (Topic Pack, L1) — Cloud Deep Dive
- AWS Route 53 (Topic Pack, L2) — Cloud Deep Dive
- AWS S3 Deep Dive (Topic Pack, L1) — Cloud Deep Dive
Pages that link here¶
- Crossplane - Primer
- Infrastructure as Code with Terraform - Primer
- OpenTofu & Terraform Ecosystem - Primer
- Operational Runbooks
- Pulumi - Primer
- Runbook: Terraform Drift Detection Response
- Runbook: Terraform State Lock Stuck
- Symptoms: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- Terraform / Infrastructure as Code - Skill Check
- Terraform Deep Dive - Primer
- Terraform Drills
- Terraform State Internals