Quiz: Kubernetes Pods & Scheduling¶

7 questions

L1 (4 questions)¶

1. HPA shows for CPU. What's wrong?

Show answer

Metrics server is not installed, or the deployment doesn't have resource requests defined. HPA needs requests to calculate percentage utilization.

2. kubectl describe pod shows Pending with event 'Insufficient cpu'. Node CPU usage averages 40%. Why won't the scheduler place the pod?

Show answer

The scheduler uses resource requests, not actual usage, for placement decisions. If nodes have 40% actual CPU usage but 95% of CPU is already requested by existing pods, there is no room for new requests even though actual usage is low. Check with: kubectl describe node — look at 'Allocated resources' vs 'Capacity'. The fix depends on the cause:
1. Pods are over-requesting (requests >> actual usage) — right-size requests based on actual consumption.
2. Cluster genuinely needs more nodes — add capacity.
3. Low-priority pods can be evicted — use PriorityClasses to let the scheduler preempt less important pods. Never set requests to 0 as a workaround — this creates BestEffort QoS pods that are first to be evicted under pressure.

3. Your readinessProbe fails intermittently. Kubernetes removes the pod from the Service endpoints, causing traffic drops. The pod is actually healthy — the probe endpoint is just slow under load. How do you fix this?

Show answer

Tune the probe parameters:
1. Increase timeoutSeconds (default 1s is too aggressive for a loaded app — try 3-5s).
2. Increase failureThreshold (default 3 — increase to 5 so transient slowness doesn't trigger removal).
3. Use a dedicated lightweight health endpoint that doesn't share resources with the main request path.
4. Consider a separate startupProbe with generous timeouts for slow-starting apps — this prevents readinessProbe from running during startup. Do NOT remove the readinessProbe entirely — that means traffic hits pods that genuinely aren't ready. The goal is to distinguish 'temporarily slow' from 'genuinely unhealthy'.

4. You create a StatefulSet named 'db' with 3 replicas but forget to create a headless Service. Pod db-0 boots successfully but cannot resolve db-1.db.default.svc.cluster.local. Why?

Show answer

StatefulSets require a headless Service (clusterIP: None) to provide stable DNS names for each pod. Without it, Kubernetes does not create the individual pod DNS records (pod-name.service-name.namespace.svc.cluster.local). The StatefulSet controller still creates pods with stable names (db-0, db-1, db-2), but DNS resolution fails because no Service owns the DNS registration. Create a headless Service with the same name specified in the StatefulSet's serviceName field. The Service selector must match the StatefulSet's pod labels. After creating the Service, existing pods will get DNS records without restart.

L2 (3 questions)¶

1. Pod A has requests=limits (CPU 500m, memory 256Mi). Pod B has requests (CPU 100m, memory 128Mi) but no limits. Pod C has no requests or limits. The node runs out of memory. Which pod is killed first and why?

Show answer

Pod C is killed first — it has QoS class BestEffort (no requests, no limits), which is the lowest priority for eviction. Pod B is killed next — it has QoS class Burstable (requests set, limits differ or missing). Pod A is killed last — it has QoS class Guaranteed (requests == limits for all containers), which is the highest eviction priority. Within the same QoS class, Kubernetes kills pods that exceed their requests by the largest margin. Guaranteed pods are only killed if the system itself needs memory (kernel, kubelet). This is why production workloads should set requests=limits for predictable behavior and eviction protection.

2. You have a PodDisruptionBudget with minAvailable=2 on a 3-pod Deployment. Cluster autoscaler wants to drain a node that runs 2 of your pods. What happens?

Show answer

The drain is blocked. PDB with minAvailable=2 means at least 2 pods must be available at all times. Draining a node with 2 of your 3 pods would leave only 1 running (while the drained pods restart elsewhere), violating the PDB. The cluster autoscaler will skip this node or wait. Fix options:
1. Change PDB to maxUnavailable=1 (equivalent intent, but autoscaler can proceed if pods are spread correctly).
2. Ensure pods are spread across nodes using topology spread constraints or pod anti-affinity so no single node holds more than 1 pod.
3. Increase replicas to 4+ so losing 2 from one node still satisfies minAvailable=2. PDBs protect availability but can block infrastructure operations if not paired with good pod distribution.

3. Your Deployment rolling update is stuck: 2 new pods are Pending (insufficient resources), 2 old pods are still Running. maxSurge=2, maxUnavailable=0. Why can't the update make progress?

Show answer

With maxUnavailable=0, Kubernetes cannot terminate any old pod until a new pod is Ready. With maxSurge=2, it created 2 new pods (total 6 desired: 4 old + 2 new), but those 2 are Pending because the cluster lacks capacity for 6 simultaneous pods. Deadlock: old pods can't be removed (maxUnavailable=0) and new pods can't start (no capacity). Fix options:
1. Set maxUnavailable=1 — allows killing an old pod to free resources for a new one.
2. Add cluster capacity.
3. Reduce maxSurge to 1 and set maxUnavailable=1 for a slower but resource-efficient rollout. The combination maxSurge>0 with maxUnavailable=0 is dangerous when cluster capacity is tight.