Solution¶

Triage¶

Check deployment status:

kubectl get deployment notification-service -n staging
kubectl describe deployment notification-service -n staging

Check the ReplicaSet events (this is where quota errors appear):

kubectl get rs -n staging -l app=notification-service
kubectl describe rs <replicaset-name> -n staging

Inspect the ResourceQuota:
```
kubectl describe quota -n staging
```

Check if the pod spec has resource requests/limits:

kubectl get deployment notification-service -n staging -o jsonpath='{.spec.template.spec.containers[*].resources}'

Root Cause¶

The staging namespace has a ResourceQuota that sets hard limits on CPU and memory requests. The existing workloads in the namespace are consuming most of the quota. The new deployment requests 3 replicas with 500m CPU and 512Mi memory each (totaling 1500m CPU and 1536Mi memory). The remaining quota headroom is only 400m CPU and 256Mi memory, which is insufficient for even a single replica.

The ReplicaSet controller attempts to create pods but the admission controller rejects them with: exceeded quota: staging-quota, requested: requests.cpu=500m,requests.memory=512Mi, used: requests.cpu=3600m,requests.memory=7680Mi, limited: requests.cpu=4000m,requests.memory=8Gi.

Fix¶

Option 1: Increase the quota (if capacity allows):

kubectl patch resourcequota staging-quota -n staging -p '{"spec":{"hard":{"requests.cpu":"6","requests.memory":"12Gi"}}}'

Option 2: Reduce the new deployment's resource requests:

resources:
  requests:
    cpu: 100m      # reduced from 500m
    memory: 128Mi  # reduced from 512Mi
  limits:
    cpu: 500m
    memory: 512Mi

Option 3: Free up quota by scaling down idle workloads:

kubectl scale deployment old-test-app -n staging --replicas=0

Option 4: Clean up completed/evicted pods consuming quota:

kubectl delete pods -n staging --field-selector status.phase=Succeeded
kubectl delete pods -n staging --field-selector status.phase=Failed

After any fix, verify pods are being created:

kubectl get pods -n staging -l app=notification-service -w

Rollback / Safety¶

Increasing quota is safe but ensure the cluster has physical capacity to back it.
Reducing resource requests can lead to OOM kills or CPU starvation if the application needs more than requested.
Scaling down other workloads in staging should be coordinated with their owners.

Common Traps¶

Looking at the Deployment events instead of the ReplicaSet. Quota rejection events appear on the ReplicaSet, not the Deployment. Many engineers miss this.
Forgetting that ResourceQuota requires resource specs. If a quota covers requests.cpu, every container in the namespace must specify requests.cpu. Pods without it are rejected at admission.
Not checking for object count quotas. Quotas can also limit count/pods, count/services, etc. The error message will tell you which resource is exceeded.
Assuming quota only counts Running pods. Completed and Failed pods still count toward quota until they are deleted.
Editing the live quota without updating IaC. The next Helm deploy or GitOps sync will revert your change.