FinOps & Cost Optimization Cheat Sheet¶
Remember: CPU and Memory limits behave differently: CPU is throttled (slowed down) when exceeding its limit, but Memory is OOMKilled (terminated). This asymmetry is crucial: a CPU-limited pod still works, just slower. A memory-limited pod dies. Mnemonic: "CPU throttles, Memory kills." Set memory limits conservatively and monitor for OOMKill events.
Resource Requests vs Limits¶
requests: guaranteed resources (used for scheduling)
limits: maximum allowed (pod killed/throttled if exceeded)
CPU: throttled when exceeding limit (not killed)
Memory: OOMKilled when exceeding limit
resources:
requests:
cpu: "250m" # 0.25 CPU cores guaranteed
memory: "256Mi" # 256 MiB guaranteed
limits:
cpu: "500m" # Throttled above this
memory: "512Mi" # OOMKilled above this
Right-Sizing with VPA¶
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Off" # Off = recommend only, Auto = apply
resourcePolicy:
containerPolicies:
- containerName: "*"
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: 2
memory: 4Gi
# Check VPA recommendations
kubectl describe vpa my-app-vpa
# Look for:
# Target: cpu=250m, memory=512Mi ← ideal
# Lower: cpu=100m, memory=256Mi ← minimum
# Upper: cpu=1, memory=2Gi ← maximum observed
# Uncapped: cpu=350m, memory=768Mi ← if no limits
Gotcha: VPA
updateMode: "Auto"restarts pods to apply new resource requests — it cannot change resources on a running pod. This means VPA in Auto mode causes periodic pod restarts during traffic. Most teams useupdateMode: "Off"(recommend only) and apply changes during maintenance windows. Also: VPA and HPA cannot both target the same Deployment on CPU — they will fight each other.
Namespace Quotas¶
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-quota
namespace: team-a
spec:
hard:
requests.cpu: "10"
requests.memory: "20Gi"
limits.cpu: "20"
limits.memory: "40Gi"
pods: "50"
persistentvolumeclaims: "10"
---
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: team-a
spec:
limits:
- default: # Default limit
cpu: "500m"
memory: "512Mi"
defaultRequest: # Default request
cpu: "100m"
memory: "128Mi"
type: Container
Default trap: Without a LimitRange, pods in a namespace can be created with no resource requests, bypassing ResourceQuota limits entirely. Quotas only count resources that are requested. Always pair ResourceQuota with a LimitRange that sets
defaultRequest— this ensures every pod consumes quota even if the developer forgets to set requests.
Spot/Preemptible Instances¶
# Karpenter NodePool for spot
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: spot-pool
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: node.kubernetes.io/instance-type
operator: In
values: ["m5.large", "m5.xlarge", "m5a.large", "m5a.xlarge"]
disruption:
consolidationPolicy: WhenUnderutilized
# Tolerate spot interruptions
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 80
preference:
matchExpressions:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
Cost Monitoring PromQL¶
# CPU cost waste: requested but unused
sum(kube_pod_container_resource_requests{resource="cpu"})
- sum(rate(container_cpu_usage_seconds_total[5m]))
# Memory overprovisioning
sum(kube_pod_container_resource_requests{resource="memory"})
/ sum(container_memory_working_set_bytes)
# > 2x means significant waste
# Idle nodes
count(kube_node_info) - count(sum by(node)(kube_pod_info))
# Namespace cost allocation (relative)
sum by(namespace)(kube_pod_container_resource_requests{resource="cpu"})
Quick Wins Checklist¶
[ ] Delete idle Deployments (0 traffic for 7+ days)
[ ] Right-size requests based on VPA or actual usage
[ ] Use spot instances for stateless, fault-tolerant workloads
[ ] Set ResourceQuotas per namespace
[ ] Use LimitRange for default requests (prevent bare pods)
[ ] Enable cluster autoscaler / Karpenter consolidation
[ ] Delete unused PVCs and snapshots
[ ] Right-size PVCs (can't shrink, but can migrate)
[ ] Use StorageClass with appropriate IOPS tier
[ ] Schedule dev/staging shutdown outside business hours
Cost Allocation Strategy¶
1. Tag everything: team, env, service, cost-center
2. Enforce tags via policy engine (Kyverno/OPA)
3. ResourceQuotas per team namespace
4. Showback reports → Chargeback
5. Review monthly, optimize quarterly
Cloud Provider Savings¶
| Strategy | Savings | Commitment |
|---|---|---|
| On-demand | 0% | None |
| Spot/Preemptible | 60-90% | Can be reclaimed |
| 1yr Reserved/Savings Plan | 30-40% | 1 year |
| 3yr Reserved/Savings Plan | 50-60% | 3 years |