Skip to content

Lab 19: Multi-Cluster

Field Value
Tier 4 — Advanced
Estimated Time 90 minutes
Prerequisites 2x k3s clusters (or simulated via namespaces)
Auto-Grade Yes

Scenario

Your company is expanding to a second region for disaster recovery. The business requires that if the primary cluster goes down, traffic automatically fails over to the secondary cluster within 60 seconds. You need to deploy the same application to both clusters (simulated as two namespaces), configure health checking between them, set up a failover mechanism, and verify it works by simulating a primary cluster failure.

In this lab, we simulate two clusters using two namespaces: lab-cluster-primary and lab-cluster-secondary. A "router" pod in namespace lab-cluster-router simulates DNS-based failover by health-checking both clusters and routing to the healthy one.

Objectives

  • Deploy the application in lab-cluster-primary (3 replicas, service)
  • Deploy the same application in lab-cluster-secondary (3 replicas, service)
  • Deploy a health-check router in lab-cluster-router
  • Router correctly routes to primary when both clusters are healthy
  • Simulate primary failure (scale to 0) and verify failover to secondary
  • Restore primary and verify traffic returns to primary
  • Write a failover report to /tmp/lab-multicluster/failover-report.txt

Setup

./setup.sh

Creates the three namespaces and deploys a router skeleton.

Hints

Hint 1: Simulating clusters with namespaces Each namespace acts as an independent "cluster." Cross-namespace service access uses `..svc.cluster.local`.
Hint 2: Health check script The router can run a loop that checks each cluster's health endpoint:
if wget -qO- --timeout=3 http://app.lab-cluster-primary.svc:80/ >/dev/null 2>&1; then
  echo "primary"
else
  echo "secondary"
fi
Hint 3: Simulating failure `kubectl scale deployment app --replicas=0 -n lab-cluster-primary` simulates a primary cluster outage. The router should detect this within its check interval.
Hint 4: Failover verification After scaling primary to 0, the router should return responses from the secondary. Check logs or responses to verify the active backend changed.
Hint 5: Failback Scale primary back to 3: `kubectl scale deployment app --replicas=3 -n lab-cluster-primary`. The router should prefer primary when it is healthy again.

Grading

./grade.sh

Solution

See the solution/ directory for complete multi-cluster configuration.