K8S Hpa¶
11 cards — 🟢 3 easy | 🟡 5 medium | 🔴 3 hard
🟢 Easy (3)¶
1. What must be set on pods for HPA CPU-based scaling to work?
Show answer
Resource requests (resources.requests.cpu) must be defined. HPA computes utilization as currentUsage / request, so without requests the metric is undefined and HPA cannot function.2. What is the default HPA scale-down stabilization window?
Show answer
300 seconds (5 minutes). The controller looks back over this window and picks the highest (most conservative) replica count recommendation to prevent flapping.3. How do you verify that metrics-server is running and providing data?
Show answer
Run kubectl top nodes and kubectl top pods. If they return metrics, the server is working. Also check kubectl get apiservices | grep metrics and kubectl -n kube-system get pods -l k8s-app=metrics-server.🟡 Medium (5)¶
1. What formula does the HPA use to compute the desired replica count?
Show answer
desiredReplicas = ceil(currentReplicas * (currentMetricValue / desiredMetricValue)). For example, if current CPU is 90% and target is 70%, the scale factor is 90/70 = 1.28, so replicas increase by roughly 28%.2. What are the four metric types supported by HPA v2 and when would you use each?
Show answer
Resource (CPU/memory from metrics-server), Pods (per-pod app metrics like RPS via custom metrics adapter), External (cloud service metrics like SQS queue depth via external adapter), and Object (metrics from a specific Kubernetes object like Ingress RPS via custom adapter).3. Why should VPA and HPA not target the same metric?
Show answer
They will conflict. HPA adjusts replica count based on per-pod utilization, while VPA adjusts resource requests on individual pods. If both act on CPU, HPA might scale out while VPA simultaneously changes the request denominator, causing an unstable feedback loop.4. Why is memory generally a poor primary metric for HPA scaling?
Show answer
Many applications (JVM, Python) allocate memory and never release it even after load drops. Memory utilization stays high regardless of current demand, so HPA never scales down. CPU is preferred because it correlates better with active request load.5. When HPA is configured with multiple metrics, how does it decide the replica count?
Show answer
It evaluates each metric independently and takes the maximum desired replica count across all metrics. The most demanding metric wins. This means combining metrics with very different response characteristics can lead to unexpected scaling behavior.🔴 Hard (3)¶
1. Explain the selectPolicy field in HPA behavior and how Max vs Min affect scaling aggressiveness.
Show answer
selectPolicy determines which policy to apply when multiple policies are defined. Max picks whichever policy allows the largest change (most aggressive scaling). Min picks the smallest change (most conservative). Disabled prevents scaling in that direction entirely. For example, with both a Percent(100%) and Pods(5) policy on scaleUp with selectPolicy: Max, the HPA uses whichever allows adding more pods.2. How can PodDisruptionBudget conflict with HPA scale-down, and what is the best practice to avoid it?
Show answer
PDB enforces minAvailable during voluntary disruptions. HPA sets the desired replica count, but if scaling down would violate PDB constraints during node drains or spot terminations, evictions are blocked. The HPA controller itself does not check PDB. Best practice: set HPA minReplicas to at least what PDB requires as minimum available.3. Why can HPA not scale to zero, and what are the alternatives?