Lab 17: Performance Tuning¶

Field	Value
Tier	4 — Advanced
Estimated Time	90 minutes
Prerequisites	k3s cluster
Auto-Grade	Yes

Scenario¶

The product team reports that page load times have increased from 200ms to over 3 seconds in the past month. Users are churning and the CEO has escalated this to a P1. The application hasn't changed — the same code runs on the same infrastructure. The problem is resource configuration, not code.

Your investigation needs to find and fix five performance bottlenecks hidden in the Kubernetes deployment configuration: under-provisioned CPU requests causing throttling, memory limits too close to actual usage causing OOM pressure, missing horizontal pod autoscaler during peak traffic, missing pod disruption budget causing restarts during rolling updates, and a misconfigured liveness probe that is too aggressive and causes unnecessary restarts.

Objectives¶

Fix CPU requests/limits that cause throttling (requests too low, limits too tight)
Fix memory limits that cause OOM pressure (increase headroom to 2x average usage)
Add HPA to handle traffic spikes (min 2, max 10, CPU target 60%)
Add PodDisruptionBudget (minAvailable: 1) to prevent update-caused downtime
Fix the liveness probe timing (increase period and failure threshold)
Verify no pods are in CrashLoopBackOff or being throttled
Document findings in /tmp/lab-perf/tuning-report.txt

Setup¶

./setup.sh

Deploys a poorly configured application in namespace lab-perf.

Hints¶

Hint 1: Detecting CPU throttling

Check `kubectl top pods -n lab-perf`. If CPU usage is near the limit, the container is being throttled. Increase the CPU limit or set requests appropriately.

Hint 2: Memory headroom

Rule of thumb: set memory limits to 1.5-2x the average usage. Check current usage with `kubectl top pods` and compare to limits in the deployment spec.

Hint 3: HPA configuration

kubectl autoscale deployment app -n lab-perf --min=2 --max=10 --cpu-percent=60

Hint 4: PodDisruptionBudget

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: app-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: webapp

Hint 5: Probe tuning

A liveness probe with `periodSeconds: 3` and `failureThreshold: 1` will kill pods at the slightest hiccup. Use `periodSeconds: 10` and `failureThreshold: 3` for production workloads.

Grading¶

./grade.sh

Solution¶

See the solution/ directory for optimized deployment manifests.