Skip to content

Multi-Cluster & Federation - Exercises & Reference

Why Multi-Cluster?

Reason Example
High availability Survive full cluster failure
Geographic distribution Serve users from nearby region
Isolation Separate dev/staging/prod
Regulatory compliance Data must stay in specific region
Blast radius Limit impact of bad deployments

Multi-Cluster Patterns

Pattern 1: Independent Clusters (Most Common)

[Cluster: dev]     [Cluster: staging]     [Cluster: prod]
     |                    |                     |
     +-------- GitOps (ArgoCD) deploys to all ---+

Each cluster is fully independent. ArgoCD or Flux manages deployments to all clusters from a single Git repo.

Pattern 2: Active-Passive (DR)

[Primary Cluster] ---replication---> [Standby Cluster]
       |                                    |
   DNS (active)                        DNS (passive)

Traffic goes to primary. Standby has synced config and data. DNS failover on primary failure.

Pattern 3: Active-Active (Multi-Region)

[Cluster: us-east] <----> [Cluster: eu-west]
        |                        |
    [Global LB / DNS routing]

Both clusters serve traffic. Global load balancer routes by geography.

Tools for Multi-Cluster

Tool What it does
ArgoCD Deploy to multiple clusters from one control plane
Flux GitOps with multi-cluster support via Kustomization
Submariner Cross-cluster pod networking and service discovery
Cilium ClusterMesh Multi-cluster networking with Cilium CNI
Liqo Virtual node abstraction across clusters
Karmada Kubernetes federation API
kubefed (deprecated) Original federation project

Exercise 1: Multi-Cluster with ArgoCD [I]

Register a second cluster with ArgoCD and deploy to both.

# Register a cluster
argocd cluster add <context-name>

# List registered clusters
argocd cluster list

# Create an Application targeting the second cluster
argocd app create grokdevops-staging \
  --repo https://github.com/your-org/grokdevops.git \
  --path devops/k8s \
  --dest-server https://staging-cluster:6443 \
  --dest-namespace grokdevops

Verify: Application is deployed to the staging cluster.


Exercise 2: ApplicationSet for Multi-Cluster [I]

Use ArgoCD ApplicationSet to deploy to all registered clusters.

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: grokdevops-multi
  namespace: argocd
spec:
  generators:
    - clusters:
        selector:
          matchLabels:
            environment: production
  template:
    metadata:
      name: 'grokdevops-{{name}}'
    spec:
      project: default
      source:
        repoURL: https://github.com/your-org/grokdevops.git
        targetRevision: main
        path: devops/helm/grokdevops
      destination:
        server: '{{server}}'
        namespace: grokdevops
      syncPolicy:
        automated:
          selfHeal: true
        syncOptions:
          - CreateNamespace=true

Exercise 3: Cross-Cluster Service Discovery [H]

Set up DNS-based service discovery across clusters.

# Option 1: ExternalDNS with split-horizon
# Each cluster registers its services in Route53/CloudDNS

# Option 2: CoreDNS with forward plugin
# Forward queries for remote services to the other cluster's DNS
# CoreDNS ConfigMap: forward remote namespace to other cluster
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns-custom
  namespace: kube-system
data:
  remote-cluster.server: |
    remote-cluster.local:53 {
      forward . 10.20.0.10  # Remote cluster's CoreDNS IP
      cache 30
    }

Exercise 4: Multi-Cluster Monitoring [I]

Set up Prometheus federation to aggregate metrics from multiple clusters.

# On the central Prometheus: federate from remote clusters
scrape_configs:
  - job_name: 'federate-staging'
    honor_labels: true
    metrics_path: /federate
    params:
      'match[]':
        - '{job="grokdevops"}'
        - 'up'
    static_configs:
      - targets:
          - staging-prometheus:9090
        labels:
          cluster: staging

  - job_name: 'federate-prod'
    honor_labels: true
    metrics_path: /federate
    params:
      'match[]':
        - '{job="grokdevops"}'
        - 'up'
    static_configs:
      - targets:
          - prod-prometheus:9090
        labels:
          cluster: prod

Alternative: Use Thanos or Cortex for scalable multi-cluster metrics.


Exercise 5: kubeconfig Multi-Cluster Management [E]

Manage multiple cluster contexts efficiently.

# List all contexts
kubectl config get-contexts

# Switch context
kubectl config use-context production

# Run command against specific context
kubectl --context=staging get pods -n grokdevops

# Merge kubeconfigs
KUBECONFIG=~/.kube/config:~/.kube/staging-config kubectl config view --flatten > ~/.kube/merged
mv ~/.kube/merged ~/.kube/config

# Use kubectx for fast switching
kubectx            # List contexts
kubectx staging    # Switch to staging
kubectx -          # Switch to previous context

Exercise 6: Failover Drill [H]

Practice failing over from primary to secondary cluster.

# 1. Verify both clusters are healthy
kubectl --context=primary get nodes
kubectl --context=secondary get nodes

# 2. Verify app is running on both
kubectl --context=primary get pods -n grokdevops
kubectl --context=secondary get pods -n grokdevops

# 3. Simulate primary failure (scale to 0)
kubectl --context=primary scale deployment grokdevops -n grokdevops --replicas=0

# 4. Update DNS to point to secondary
# (or verify global LB health check switches)

# 5. Verify secondary is serving traffic
curl -v https://app.example.com/health

# 6. Restore primary
kubectl --context=primary scale deployment grokdevops -n grokdevops --replicas=3

# 7. Verify primary is healthy before switching DNS back

Exercise 7: Cluster Comparison [E]

Compare configurations across clusters to detect drift.

# Compare deployments
diff <(kubectl --context=prod get deploy -n grokdevops -o yaml) \
     <(kubectl --context=staging get deploy -n grokdevops -o yaml)

# Compare Helm releases
diff <(helm --kube-context=prod list -n grokdevops -o json) \
     <(helm --kube-context=staging list -n grokdevops -o json)

# Compare versions
for ctx in prod staging dev; do
  echo "=== $ctx ==="
  kubectl --context=$ctx get deploy grokdevops -n grokdevops \
    -o jsonpath='{.spec.template.spec.containers[0].image}'
  echo
done

Multi-Cluster Pitfalls

  1. Config drift — Use GitOps to keep all clusters in sync
  2. Certificate mismatch — Each cluster has its own CA. Cross-cluster mTLS needs shared trust
  3. Network complexity — Cross-cluster traffic may need VPN, peering, or Submariner
  4. Data consistency — Databases can't easily span clusters. Use replication or managed services
  5. Monitoring gaps — Each cluster's Prometheus is independent. Federation or Thanos needed
  6. Context confusion — Always verify which context you're using before running commands