Skip to content

FinOps & Cost Optimization Cheat Sheet

Remember: CPU and Memory limits behave differently: CPU is throttled (slowed down) when exceeding its limit, but Memory is OOMKilled (terminated). This asymmetry is crucial: a CPU-limited pod still works, just slower. A memory-limited pod dies. Mnemonic: "CPU throttles, Memory kills." Set memory limits conservatively and monitor for OOMKill events.

Resource Requests vs Limits

requests: guaranteed resources (used for scheduling)
limits:   maximum allowed (pod killed/throttled if exceeded)

CPU:    throttled when exceeding limit (not killed)
Memory: OOMKilled when exceeding limit
resources:
  requests:
    cpu: "250m"       # 0.25 CPU cores guaranteed
    memory: "256Mi"   # 256 MiB guaranteed
  limits:
    cpu: "500m"       # Throttled above this
    memory: "512Mi"   # OOMKilled above this

Right-Sizing with VPA

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"  # Off = recommend only, Auto = apply
  resourcePolicy:
    containerPolicies:
    - containerName: "*"
      minAllowed:
        cpu: 50m
        memory: 64Mi
      maxAllowed:
        cpu: 2
        memory: 4Gi
# Check VPA recommendations
kubectl describe vpa my-app-vpa

# Look for:
#   Target:     cpu=250m, memory=512Mi    ← ideal
#   Lower:      cpu=100m, memory=256Mi    ← minimum
#   Upper:      cpu=1,    memory=2Gi      ← maximum observed
#   Uncapped:   cpu=350m, memory=768Mi    ← if no limits

Gotcha: VPA updateMode: "Auto" restarts pods to apply new resource requests — it cannot change resources on a running pod. This means VPA in Auto mode causes periodic pod restarts during traffic. Most teams use updateMode: "Off" (recommend only) and apply changes during maintenance windows. Also: VPA and HPA cannot both target the same Deployment on CPU — they will fight each other.

Namespace Quotas

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: "10"
    requests.memory: "20Gi"
    limits.cpu: "20"
    limits.memory: "40Gi"
    pods: "50"
    persistentvolumeclaims: "10"
---
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-a
spec:
  limits:
  - default:           # Default limit
      cpu: "500m"
      memory: "512Mi"
    defaultRequest:     # Default request
      cpu: "100m"
      memory: "128Mi"
    type: Container

Default trap: Without a LimitRange, pods in a namespace can be created with no resource requests, bypassing ResourceQuota limits entirely. Quotas only count resources that are requested. Always pair ResourceQuota with a LimitRange that sets defaultRequest — this ensures every pod consumes quota even if the developer forgets to set requests.

Spot/Preemptible Instances

# Karpenter NodePool for spot
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: spot-pool
spec:
  template:
    spec:
      requirements:
      - key: karpenter.sh/capacity-type
        operator: In
        values: ["spot"]
      - key: node.kubernetes.io/instance-type
        operator: In
        values: ["m5.large", "m5.xlarge", "m5a.large", "m5a.xlarge"]
  disruption:
    consolidationPolicy: WhenUnderutilized
# Tolerate spot interruptions
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 80
        preference:
          matchExpressions:
          - key: karpenter.sh/capacity-type
            operator: In
            values: ["spot"]
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway

Cost Monitoring PromQL

# CPU cost waste: requested but unused
sum(kube_pod_container_resource_requests{resource="cpu"})
- sum(rate(container_cpu_usage_seconds_total[5m]))

# Memory overprovisioning
sum(kube_pod_container_resource_requests{resource="memory"})
/ sum(container_memory_working_set_bytes)
# > 2x means significant waste

# Idle nodes
count(kube_node_info) - count(sum by(node)(kube_pod_info))

# Namespace cost allocation (relative)
sum by(namespace)(kube_pod_container_resource_requests{resource="cpu"})

Quick Wins Checklist

[ ] Delete idle Deployments (0 traffic for 7+ days)
[ ] Right-size requests based on VPA or actual usage
[ ] Use spot instances for stateless, fault-tolerant workloads
[ ] Set ResourceQuotas per namespace
[ ] Use LimitRange for default requests (prevent bare pods)
[ ] Enable cluster autoscaler / Karpenter consolidation
[ ] Delete unused PVCs and snapshots
[ ] Right-size PVCs (can't shrink, but can migrate)
[ ] Use StorageClass with appropriate IOPS tier
[ ] Schedule dev/staging shutdown outside business hours

Cost Allocation Strategy

1. Tag everything: team, env, service, cost-center
2. Enforce tags via policy engine (Kyverno/OPA)
3. ResourceQuotas per team namespace
4. Showback reports → Chargeback
5. Review monthly, optimize quarterly

Cloud Provider Savings

Strategy Savings Commitment
On-demand 0% None
Spot/Preemptible 60-90% Can be reclaimed
1yr Reserved/Savings Plan 30-40% 1 year
3yr Reserved/Savings Plan 50-60% 3 years