K8S Hpa¶

11 cards — 🟢 3 easy | 🟡 5 medium | 🔴 3 hard

🟢 Easy (3)¶

1. What must be set on pods for HPA CPU-based scaling to work?

Show answer

Resource requests (resources.requests.cpu) must be defined. HPA computes utilization as currentUsage / request, so without requests the metric is undefined and HPA cannot function.

2. What is the default HPA scale-down stabilization window?

Show answer

300 seconds (5 minutes). The controller looks back over this window and picks the highest (most conservative) replica count recommendation to prevent flapping.

3. How do you verify that metrics-server is running and providing data?

Show answer

Run kubectl top nodes and kubectl top pods. If they return metrics, the server is working. Also check kubectl get apiservices | grep metrics and kubectl -n kube-system get pods -l k8s-app=metrics-server.

🟡 Medium (5)¶

1. What formula does the HPA use to compute the desired replica count?

Show answer

desiredReplicas = ceil(currentReplicas * (currentMetricValue / desiredMetricValue)). For example, if current CPU is 90% and target is 70%, the scale factor is 90/70 = 1.28, so replicas increase by roughly 28%.

2. What are the four metric types supported by HPA v2 and when would you use each?

Show answer

Resource (CPU/memory from metrics-server), Pods (per-pod app metrics like RPS via custom metrics adapter), External (cloud service metrics like SQS queue depth via external adapter), and Object (metrics from a specific Kubernetes object like Ingress RPS via custom adapter).

3. Why should VPA and HPA not target the same metric?

Show answer

They will conflict. HPA adjusts replica count based on per-pod utilization, while VPA adjusts resource requests on individual pods. If both act on CPU, HPA might scale out while VPA simultaneously changes the request denominator, causing an unstable feedback loop.

4. Why is memory generally a poor primary metric for HPA scaling?

Show answer

Many applications (JVM, Python) allocate memory and never release it even after load drops. Memory utilization stays high regardless of current demand, so HPA never scales down. CPU is preferred because it correlates better with active request load.

5. When HPA is configured with multiple metrics, how does it decide the replica count?

Show answer

It evaluates each metric independently and takes the maximum desired replica count across all metrics. The most demanding metric wins. This means combining metrics with very different response characteristics can lead to unexpected scaling behavior.

🔴 Hard (3)¶

1. Explain the selectPolicy field in HPA behavior and how Max vs Min affect scaling aggressiveness.

Show answer

selectPolicy determines which policy to apply when multiple policies are defined. Max picks whichever policy allows the largest change (most aggressive scaling). Min picks the smallest change (most conservative). Disabled prevents scaling in that direction entirely. For example, with both a Percent(100%) and Pods(5) policy on scaleUp with selectPolicy: Max, the HPA uses whichever allows adding more pods.

2. How can PodDisruptionBudget conflict with HPA scale-down, and what is the best practice to avoid it?

Show answer

PDB enforces minAvailable during voluntary disruptions. HPA sets the desired replica count, but if scaling down would violate PDB constraints during node drains or spot terminations, evictions are blocked. The HPA controller itself does not check PDB. Best practice: set HPA minReplicas to at least what PDB requires as minimum available.

3. Why can HPA not scale to zero, and what are the alternatives?

Show answer

HPA requires minReplicas >= 1. It cannot scale to zero because with zero pods there are no metrics to evaluate for scale-up decisions. For scale-to-zero capability (cost savings on idle workloads), use KEDA (Kubernetes Event-Driven Autoscaling), Knative Serving, or a custom controller that can scale from zero based on external signals like queue depth or incoming HTTP requests.