Finops¶
19 cards — 🟢 3 easy | 🟡 4 medium | 🔴 3 hard
🟢 Easy (3)¶
1. What are the three phases of the FinOps framework?
Show answer
Inform (visibility into who spends what and why), Optimize (right-size, use commitments, eliminate waste), and Operate (continuous governance, budgets, alerts).Name origin: FinOps = Financial Operations, formalized by the FinOps Foundation (Linux Foundation project) in 2019.
Ref: https://www.finops.org/framework/
2. What is the single biggest cost optimization opportunity in Kubernetes?
Show answer
The request/limit gap. You pay for resource requests, not actual usage. If a pod requests 500m CPU but only uses 100m, you are paying for 400m of idle resources.Example: 100 pods each requesting 500m CPU but using 100m wastes 40 cores — 5-10 nodes of idle paid capacity.
Gotcha: setting requests too low causes OOMKills and scheduling failures. Use VPA recommendations, not raw usage minimums.
3. What are spot/preemptible instances and how much can they save?
Show answer
Spot instances cost 60-90% less than on-demand pricing but can be reclaimed by the cloud provider with 2-minute notice. They are safe for stateless web apps, batch jobs, dev/staging environments, and CI/CD runners.🟡 Medium (4)¶
1. Describe the right-sizing process using VPA (Vertical Pod Autoscaler).
Show answer
1. Deploy VPA in "Off" (recommendation-only) mode.2. Collect 7 days of usage data.
3. Set requests to VPA's "target" recommendation.
4. Set limits to 2-3x the request (or remove CPU limits for bursty workloads).
5. Monitor for OOMKills or throttling.
6. Repeat quarterly.
2. What workloads are safe and unsafe for spot instances?
Show answer
Safe: stateless web apps with multiple replicas, retryable batch jobs, dev/staging environments, CI/CD runners. Unsafe: single-replica databases, stateful workloads without graceful shutdown, long-running jobs that cannot checkpoint.Remember: SAFE for spot = Stateless, Auto-scaling, Fault-tolerant, Ephemeral. SAFE mnemonic.
Gotcha: Even safe workloads need graceful shutdown handling — use SIGTERM handlers and preStop hooks.
3. How does cross-AZ traffic drive cloud costs and how can you reduce it?
Show answer
In AWS, cross-AZ traffic costs $0.01/GB each way. Reduce by enabling topology-aware routing (service.kubernetes.io/topology-mode: Auto) to prefer same-zone backends, using VPC endpoints for AWS services, and pulling container images from ECR in the same region.4. How do you find and eliminate wasted storage in Kubernetes?
Show answer
Find unbound PVCs with kubectl get pvc -A | grep -v Bound. Find PVCs not mounted by any pod by comparing PVC names to pod volume claims. Use tiered storage: SSD for databases, HDD for logs and archives, object storage (S3/GCS) for long-term backups.🔴 Hard (3)¶
1. How does Karpenter differ from Cluster Autoscaler and what makes it more cost-effective?
Show answer
Karpenter provisions individual nodes based on pod requirements rather than scaling pre-defined node groups. It can mix spot and on-demand, select from multiple instance types, consolidate underutilized nodes (consolidationPolicy: WhenUnderutilized), and auto-expire nodes for freshness. This leads to better bin-packing and lower waste.2. How do Pod Disruption Budgets protect applications on spot instances?
Show answer
A PodDisruptionBudget (PDB) sets minAvailable or maxUnavailable constraints. When spot nodes are reclaimed, Kubernetes respects PDBs during eviction, ensuring at least the specified number of pods remain running. This prevents too many replicas from being evicted simultaneously during spot reclamation events.3. Name five cost visibility tools and their primary use cases.