Lab 16: Chaos Engineering¶
| Field | Value |
|---|---|
| Tier | 4 — Advanced |
| Estimated Time | 90 minutes |
| Prerequisites | k3s cluster, Helm |
| Auto-Grade | Yes |
Scenario¶
Your team claims the application stack is "highly available," but nobody has tested what happens when things actually fail. The VP of Engineering wants proof — not promises — that the system can handle node failures, pod crashes, network partitions, and resource exhaustion. You have been tasked with designing and executing chaos experiments that stress-test the resilience of the stack.
The application is a three-tier architecture (frontend, API, database) deployed with multiple replicas and health checks. Your job is to inject five different failure modes, observe the system's behavior, verify it self-heals (or document that it does not), and write a resilience report with recommendations.
Objectives¶
- Deploy a resilient 3-tier app stack with multiple replicas and probes
- Experiment 1: Kill a random API pod and verify traffic shifts to surviving pods
- Experiment 2: Inject CPU stress on a node and verify pods are evicted/rescheduled
- Experiment 3: Add network latency to the database and measure API response degradation
- Experiment 4: Fill a container's ephemeral storage and verify it is restarted
- Experiment 5: Simulate a DNS failure and verify retries work
- Write a resilience report to
/tmp/lab-chaos/resilience-report.txt
Setup¶
Deploys the target application stack in namespace lab-chaos.
Hints¶
Hint 1: Killing pods
`kubectl delete podHint 2: CPU stress
Run a stress container: `kubectl run stress --image=alpine -n lab-chaos -- sh -c "apk add stress-ng && stress-ng --cpu 4 --timeout 60s"`. Watch node resource usage with `kubectl top nodes`.Hint 3: Network latency injection
Use `tc` inside a container: `kubectl execHint 4: Ephemeral storage exhaustion
`kubectl execHint 5: Resilience report structure
For each experiment: hypothesis, method, observation, result (pass/fail), recommendation. Include a summary of overall system resilience.Grading¶
Solution¶
See the solution/ directory for experiment scripts and report template.