GitOps: The Repo Is the Truth

lesson
gitops-principles
argocd-architecture
ci/cd-pipeline-evolution
kubernetes-reconciliation
kustomize/helm
drift-detection
rbac
multi-cluster-management ---# GitOps — The Repo Is the Truth

Topics: GitOps principles, ArgoCD architecture, CI/CD pipeline evolution, Kubernetes reconciliation, Kustomize/Helm, drift detection, RBAC, multi-cluster management Level: L1–L2 (Foundations → Operations) Time: 60–90 minutes Prerequisites: None (Git and Kubernetes concepts explained as we go)

The Mission¶

Your team has been deploying to Kubernetes the old-fashioned way: a GitHub Actions workflow runs kubectl apply at the end of CI. It works — until it doesn't. Last Tuesday, someone ran kubectl scale deploy/payments --replicas=5 during a traffic spike. The CI pipeline ran two hours later and silently scaled it back to 3. The payments service buckled. Nobody noticed for 40 minutes because the deploy "succeeded."

Your job: migrate from push-based CI deploys to ArgoCD-driven GitOps. By the end of this lesson, you'll understand why the old model breaks, how the new model works at every layer, and how to build it so drift can never sneak past you again.

Part 1: The Archaeology — How We Got Here¶

Before we touch ArgoCD, let's trace the history. The migration you're doing isn't arbitrary — it's the result of a decade of deployment evolution, and each step solved the previous step's problem.

The Timeline¶

Era	How deploys worked	What broke
~2010	SSH into server, `git pull`, restart	"It works on my machine." No rollback.
~2013	Capistrano/Fabric scripts	Scripts diverge across teams. Credentials everywhere.
~2015	Jenkins pipeline → `kubectl apply`	CI has cluster creds. Drift accumulates silently.
~2017	GitOps: controller inside the cluster pulls from Git	You're reading this lesson.

Name Origin: The term "GitOps" was coined by Alexis Richardson, CEO of Weaveworks, in a 2017 blog post titled "GitOps — Operations by Pull Request." The name stuck because it captured the core idea in two syllables: Git is your operations source of truth. Weaveworks built Flux, the first GitOps controller, before the term even existed — the tool came first, then they named the pattern.

Trivia: Weaveworks, the company that invented GitOps and built Flux, shut down in February 2024 after failing to secure funding. The movement they started continued to thrive through ArgoCD and the CNCF-hosted Flux project. The company died; the idea didn't.

Push vs Pull: The Architectural Shift¶

Here's the old model — what you're migrating away from:

Push model (traditional CI/CD):
  Developer → git push → CI pipeline → builds image → kubectl apply → Cluster
                                                         ↑
                                              CI needs cluster credentials
                                              No drift detection
                                              No self-healing

And the new model:

Pull model (GitOps):
  Developer → git push → CI pipeline → builds image → updates manifest repo
                                                              ↓
                                              ArgoCD (inside cluster) polls repo
                                                              ↓
                                              Compares desired vs live state
                                                              ↓
                                              Applies diff to cluster

Three things changed:

Credentials flipped. In push, CI needs cluster credentials — every CI runner is an attack surface. In pull, the controller lives inside the cluster. Credentials never leave.
Drift detection appeared. Push can't see manual changes. Pull continuously compares desired state (Git) to actual state (cluster) and flags — or fixes — the gap.
Git became the audit log. Every deploy is a commit. Every rollback is a revert. git log is your change history.

Mental Model: Think of push-based CI as a mail carrier who drops off a package and leaves. If someone moves the package, the carrier doesn't know and doesn't care. GitOps is a security guard who checks the room every 3 minutes and puts everything back where it belongs.

Flashcard Check #1¶

Question	Answer (cover and test yourself)
Who coined the term "GitOps" and when?	Alexis Richardson, CEO of Weaveworks, in 2017.
In push-based CI/CD, who holds cluster credentials?	The CI runner (Jenkins, GitHub Actions, etc.).
In pull-based GitOps, who holds cluster credentials?	The controller running inside the cluster (ArgoCD).
What are the four GitOps principles?	Declarative, Versioned and immutable, Pulled automatically, Continuously reconciled.

Remember: Mnemonic for the four principles: DVPC — Declarative, Versioned, Pulled, Continuously reconciled. If any one is missing, it's not GitOps. CI pushing kubectl apply is declarative and versioned, but not pulled or continuously reconciled.

Part 2: ArgoCD Architecture — What's Actually Running¶

Time to look under the hood. When you install ArgoCD, five (or six) components land in the argocd namespace. Each has a specific job:

┌──────────────────────────────────────────────────────────┐
│                    argocd namespace                       │
│                                                          │
│  ┌─────────────────┐   ┌──────────────────────────────┐  │
│  │  argocd-server   │   │  argocd-application-controller│  │
│  │  (API + Web UI)  │   │  (the brain — diffs & syncs)  │  │
│  └────────┬────────┘   └──────────────┬───────────────┘  │
│           │                           │                  │
│  ┌────────┴────────┐   ┌──────────────┴───────────────┐  │
│  │  argocd-dex-     │   │  argocd-repo-server           │  │
│  │  server          │   │  (clones repos, renders       │  │
│  │  (SSO/OIDC)      │   │   Helm/Kustomize/YAML)       │  │
│  └─────────────────┘   └──────────────┬───────────────┘  │
│                                       │                  │
│                         ┌─────────────┴────────────┐     │
│                         │  Redis (caching layer)    │     │
│                         └──────────────────────────┘     │
└──────────────────────────────────────────────────────────┘

Component	What it does	What breaks if it dies
`argocd-server`	Serves the web UI and API. Handles auth, RBAC.	No UI, no CLI access. Apps keep running.
`argocd-application-controller`	The reconciliation brain. Diffs desired (Git) vs live (cluster).	No syncs, no drift detection. Existing apps still run.
`argocd-repo-server`	Clones Git repos, renders Helm charts/Kustomize overlays into plain YAML.	Can't render new manifests. Syncs stall.
`argocd-dex-server`	SSO/OIDC authentication provider.	Can't log in via SSO. Admin password still works.
Redis	Caches repo state, app state, RBAC data.	Slower performance. Controller re-fetches everything.

Under the Hood: The application controller uses the same watch-and-reconcile loop as every other Kubernetes controller. It's the same pattern the kube-controller-manager uses to manage Deployments, the same pattern kubelet uses to manage pods. ArgoCD didn't invent this — it leveraged a pattern that Kubernetes was already built on. If you've read the what-happens-when-you-kubectl-apply lesson, this is the same control loop operating one layer up.

Name Origin: "Argo" comes from the ship Argo in Greek mythology — the vessel that carried Jason and the Argonauts on their quest. The Argo project family (ArgoCD, Argo Workflows, Argo Rollouts, Argo Events) was created at Applatix, later acquired by Intuit (the TurboTax company). ArgoCD was open-sourced in 2018 and became a CNCF graduated project in 2022.

Trivia: One of the original motivations for creating ArgoCD at Intuit was that Flux (the existing GitOps tool) lacked a graphical interface. Intuit engineers wanted a visual resource tree showing sync status. The ArgoCD UI became one of its most distinguishing features and a major driver of adoption.

Part 3: The Application Resource — Your First ArgoCD Manifest¶

An ArgoCD Application is a CRD (Custom Resource Definition) that answers three questions:

Where is the desired state? (Git repo, branch, path)
Where should it be deployed? (cluster, namespace)
How should syncing behave? (automatic vs manual, prune, self-heal)

Here's the Application manifest for your payments service migration:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: payments-service
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io   # cascade-delete on app deletion
spec:
  project: default
  source:
    repoURL: https://github.com/acme-corp/gitops-manifests.git
    targetRevision: main
    path: apps/payments/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: payments
  syncPolicy:
    automated:
      prune: true          # delete resources removed from Git
      selfHeal: true       # revert manual kubectl changes
    syncOptions:
      - CreateNamespace=true
      - PrunePropagationPolicy=foreground
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

Let's break this down field by field:

Field	What it controls	What happens if you get it wrong
`targetRevision: main`	Which branch/tag to track	`HEAD` in prod means every merge deploys immediately
`path`	Directory in the repo containing manifests	Wrong path = ArgoCD syncs nothing or the wrong app
`prune: true`	Delete cluster resources removed from Git	Rename a directory = ArgoCD deletes everything in it
`selfHeal: true`	Revert manual `kubectl` changes	Someone's emergency hotfix gets silently reverted
`finalizers`	Delete managed resources when the Application is deleted	Without it, deleting the app orphans resources in the cluster

Gotcha: prune: true is the most dangerous setting in ArgoCD. A team renamed a Helm chart directory in their GitOps repo. ArgoCD treated every resource from the old path as orphaned and pruned them all — including a PostgreSQL StatefulSet with 500GB of data. The PVCs were deleted. Recovery required restoring from a 6-hour-old backup. Protect critical resources with argocd.argoproj.io/sync-options: Prune=false.

Part 4: The Reconciliation Loop — How ArgoCD Actually Works¶

This is the core of GitOps and the part most people hand-wave past. Let's trace exactly what happens every 3 minutes (the default polling interval):

Every 3 minutes:
  1. Application controller checks: "which apps need sync?"
  2. Repo server clones the Git repo (or uses cached version)
  3. Repo server renders manifests:
     - Plain YAML? Pass through.
     - Helm chart? helm template with values files
     - Kustomize? kustomize build on the overlay path
  4. Controller compares rendered manifests to live cluster state
  5. Comparison result:
     ├─ Synced: desired == live. Do nothing.
     ├─ OutOfSync: desired != live.
     │   ├─ If automated sync: apply the diff.
     │   └─ If manual sync: flag it, wait for human.
     └─ Unknown: can't reach cluster or repo. Alert.

The comparison isn't a naive YAML diff. ArgoCD normalizes both sides — stripping server-generated fields like creationTimestamp, resourceVersion, and kubectl.kubernetes.io/last-applied-configuration. Getting this normalization right was one of ArgoCD's hardest engineering challenges, and edge cases in drift detection remain the most common source of bug reports.

Under the Hood: The 3-minute polling interval is a deliberate design choice, not a limitation. ArgoCD polls Git rather than relying on webhooks for simplicity and reliability — webhooks can fail silently, be blocked by firewalls, or get rate-limited. You can configure webhooks for near-instant sync, but the polling ensures ArgoCD catches changes even when webhooks fail. Configure the interval in the argocd-cm ConfigMap: timeout.reconciliation: 180s.

The Diff in Action¶

Let's see what happens when someone runs kubectl scale directly:

# Someone scales the payments service manually during a traffic spike
kubectl -n payments scale deploy/payments --replicas=5

# ArgoCD sees the drift within 3 minutes
argocd app diff payments-service

Output:

===== apps/Deployment payments/payments =====
--- desired (Git)
+++ live (cluster)
@@ -1,5 +1,5 @@
 spec:
-  replicas: 3
+  replicas: 5

With selfHeal: true, ArgoCD reverts this within one reconciliation cycle. The payments service goes back to 3 replicas. If that's not what you want, you need to update Git — not the cluster.

War Story: This is exactly the incident from your mission. A team had selfHeal: true enabled. During Black Friday, an SRE scaled a service from 3 to 8 replicas. ArgoCD scaled it back to 3 within minutes. The SRE scaled it up again. ArgoCD scaled it back. This loop repeated four times before someone realized the "random scaling failures" were ArgoCD doing its job. The fix: commit the scale change to Git, or use an HPA (Horizontal Pod Autoscaler) and tell ArgoCD to ignore the replicas field.

# Tell ArgoCD to ignore HPA-managed fields
spec:
  ignoreDifferences:
    - group: apps
      kind: Deployment
      jsonPointers:
        - /spec/replicas

Flashcard Check #2¶

Question	Answer
How often does ArgoCD poll Git by default?	Every 3 minutes (180 seconds).
What does `selfHeal: true` do?	Reverts any manual changes to cluster state that diverge from Git.
What is the ArgoCD sync status when desired state equals live state?	Synced.
Why does ArgoCD normalize manifests before comparing?	To strip server-generated fields (creationTimestamp, resourceVersion, etc.) that would cause false diffs.
How do you prevent ArgoCD from fighting with an HPA?	Use `ignoreDifferences` with a jsonPointer to `/spec/replicas`.

Part 5: Imperative vs Declarative — The Real Comparison¶

Let's make the old-vs-new contrast concrete with your payments service migration.

The Old Way: Push-Based CI¶

# .github/workflows/deploy.yml — the pipeline you're replacing
name: Deploy
on:
  push:
    branches: [main]
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build image
        run: docker build -t ghcr.io/acme-corp/payments:${{ github.sha }} .
      - name: Push image
        run: docker push ghcr.io/acme-corp/payments:${{ github.sha }}
      - name: Deploy to cluster
        env:
          KUBECONFIG_DATA: ${{ secrets.KUBECONFIG }}
        run: |
          echo "$KUBECONFIG_DATA" | base64 -d > /tmp/kubeconfig
          kubectl --kubeconfig=/tmp/kubeconfig \
            set image deploy/payments \
            payments=ghcr.io/acme-corp/payments:${{ github.sha }} \
            -n payments

Problems with this approach:

Problem	Why it hurts
CI runner has `KUBECONFIG` secret	Compromised runner = cluster access
No drift detection	`kubectl edit` changes persist until next CI run
Rollback means re-running an old pipeline	Slow, error-prone, assumes old image still exists
No audit trail beyond CI logs	"Who deployed what when?" requires digging through pipeline logs
No health verification	Pipeline exits 0 as soon as `kubectl set image` returns

The New Way: GitOps with ArgoCD¶

# .github/workflows/ci.yml — CI still builds, but does NOT deploy
name: CI
on:
  push:
    branches: [main]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build image
        run: docker build -t ghcr.io/acme-corp/payments:${{ github.sha }} .
      - name: Push image
        run: docker push ghcr.io/acme-corp/payments:${{ github.sha }}
      - name: Update GitOps repo
        run: |
          git clone https://x-access-token:${{ secrets.GITOPS_TOKEN }}@github.com/acme-corp/gitops-manifests.git
          cd gitops-manifests
          cd apps/payments/overlays/production
          kustomize edit set image ghcr.io/acme-corp/payments=ghcr.io/acme-corp/payments:${{ github.sha }}
          git add .
          git commit -m "deploy: payments ${{ github.sha }}"
          git push

CI pushes the intent to a Git repo. ArgoCD picks it up and applies it. The pipeline never touches the cluster.

Interview Bridge: "What is the difference between GitOps and CI/CD?" is an increasingly common interview question. The key: in CI/CD, the pipeline pushes changes to the cluster (push model). In GitOps, an agent in the cluster pulls desired state from Git and continuously reconciles (pull model). CI never needs cluster credentials, and git log becomes the audit trail.

Part 6: Sync Policies, Waves, and Health Checks¶

Sync Waves — Ordering Your Deploy¶

When ArgoCD syncs, it doesn't apply everything at once. Sync waves let you control ordering:

Wave -2: Namespace, RBAC, ServiceAccount
Wave -1: ConfigMaps, Secrets
Wave  0: Database migration Job (PreSync hook)
Wave  1: Application Deployment
Wave  2: Ingress, HPA, PodDisruptionBudget

Annotate resources to assign them to waves:

metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "-1"    # applies before wave 0

ArgoCD waits for all resources in wave N to be healthy before starting wave N+1. This is where health checks become critical.

Health Checks — What "Healthy" Means¶

ArgoCD has built-in health checks for common resources:

Resource	Healthy when
Deployment	`availableReplicas == desiredReplicas`
StatefulSet	`readyReplicas == replicas`
Job	`succeeded >= 1`
PVC	`phase == Bound`
Ingress	`status.loadBalancer.ingress` has at least one entry

Gotcha: If your Deployment has no readiness probe, ArgoCD considers it healthy as soon as the pod is Running — even if the app hasn't finished initializing. A wave-1 app can start connecting to a wave-0 database that isn't ready yet. Always define readiness probes on ArgoCD-managed Deployments.

Sync Hooks — Running Jobs at Deploy Time¶

Database migrations are the classic use case. Run them before the app deploys:

apiVersion: batch/v1
kind: Job
metadata:
  name: payments-db-migrate
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
    argocd.argoproj.io/sync-wave: "-1"
spec:
  activeDeadlineSeconds: 300
  backoffLimit: 2
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: migrate
          image: ghcr.io/acme-corp/payments:v2.1.0
          command: ["python", "manage.py", "migrate"]
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: payments-db-credentials
                  key: url

Hook phase	When it runs	Use case
`PreSync`	Before main sync	DB migrations, schema validation
`Sync`	During main sync (with resources)	Rare — most things are just resources
`PostSync`	After all resources are healthy	Smoke tests, notifications
`SyncFail`	When sync fails	Cleanup, alerting

Gotcha: Migrations must be idempotent. If a sync fails and retries, the PreSync hook runs again. A migration that tries to CREATE TABLE without IF NOT EXISTS will fail on retry and block all future deploys. Use a migration framework (Alembic, Flyway, Liquibase) that tracks which migrations have already run.

Part 7: App of Apps — Bootstrapping an Entire Cluster¶

One Application per service is manageable. Thirty Applications across three clusters is not. The App of Apps pattern solves this: one root Application points to a directory of Application manifests.

gitops-manifests/
├── root-app.yaml              ← you apply this once, manually
├── apps/
│   ├── payments.yaml          ← Application for payments service
│   ├── users.yaml             ← Application for users service
│   ├── monitoring.yaml        ← Application for Prometheus stack
│   ├── ingress-nginx.yaml     ← Application for ingress controller
│   └── cert-manager.yaml      ← Application for TLS certificates
└── apps/payments/
    ├── base/
    │   ├── deployment.yaml
    │   ├── service.yaml
    │   └── kustomization.yaml
    └── overlays/
        ├── dev/
        ├── staging/
        └── production/

The root app:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: platform-root
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/acme-corp/gitops-manifests.git
    targetRevision: main
    path: apps
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

Bootstrap:

# One command to rule them all
kubectl apply -f root-app.yaml
# ArgoCD syncs root-app → creates child Applications → each child syncs its workloads

Trivia: The App of Apps pattern was discovered, not designed. Users noticed that an ArgoCD Application can manage any Kubernetes resource — including other Application CRDs. The community started using this to bootstrap entire clusters from a single commit. The Argo team later formalized it in the documentation.

ApplicationSet — When App of Apps Isn't Enough¶

For multi-cluster deployments, ApplicationSet generates Applications from templates:

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: payments-all-clusters
  namespace: argocd
spec:
  generators:
    - list:
        elements:
          - cluster: prod-us-east
            url: https://prod-us-east.example.com
            env: prod
          - cluster: prod-eu-west
            url: https://prod-eu-west.example.com
            env: prod
          - cluster: staging
            url: https://staging.example.com
            env: staging
  template:
    metadata:
      name: "payments-{{cluster}}"
    spec:
      project: default
      source:
        repoURL: https://github.com/acme-corp/gitops-manifests.git
        targetRevision: main
        path: "apps/payments/overlays/{{env}}"
      destination:
        server: "{{url}}"
        namespace: payments
      syncPolicy:
        automated:
          prune: true
          selfHeal: true

One manifest, three clusters. Add a new cluster by adding an element to the list. Remove one and its Application — and all its resources — get pruned.

Part 8: Kustomize and Helm Integration¶

ArgoCD doesn't care how you write manifests. It supports both major templating tools.

Kustomize — Overlays for Environments¶

Your repo structure for Kustomize:

apps/payments/
├── base/
│   ├── deployment.yaml       # replicas: 2, image: ghcr.io/acme-corp/payments:latest
│   ├── service.yaml
│   └── kustomization.yaml
└── overlays/
    ├── dev/
    │   └── kustomization.yaml    # replicas: 1, dev resources
    ├── staging/
    │   └── kustomization.yaml    # replicas: 2, staging DB
    └── production/
        ├── kustomization.yaml    # replicas: 5, prod DB, resource limits
        └── patches/
            └── replicas.yaml

ArgoCD source config for Kustomize:

source:
  repoURL: https://github.com/acme-corp/gitops-manifests.git
  targetRevision: main
  path: apps/payments/overlays/production

ArgoCD automatically detects the kustomization.yaml and runs kustomize build.

Helm — Charts with Values Files¶

source:
  repoURL: https://charts.bitnami.com/bitnami
  chart: postgresql
  targetRevision: 13.2.0
  helm:
    valueFiles:
      - values-prod.yaml
    parameters:
      - name: auth.postgresPassword
        value: "$POSTGRES_PASSWORD"     # use External Secrets instead

Gotcha: When ArgoCD manages a Helm chart, helm ls won't show it. ArgoCD renders the chart via helm template and manages the raw manifests — it doesn't create a Helm release. Running helm upgrade manually alongside ArgoCD creates a fight where both try to manage the same resources. One owner per release, always.

Part 9: RBAC in ArgoCD — Who Can Deploy What¶

ArgoCD uses Casbin policies (stored in the argocd-rbac-cm ConfigMap) to control access. This is separate from Kubernetes RBAC — ArgoCD has its own permission model.

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-rbac-cm
  namespace: argocd
data:
  policy.default: role:readonly
  policy.csv: |
    # Payments team: can view and sync their apps
    p, role:payments-team, applications, get, payments/*, allow
    p, role:payments-team, applications, sync, payments/*, allow

    # Platform team: can sync anything, manage clusters
    p, role:platform-admin, applications, *, */*, allow
    p, role:platform-admin, clusters, *, *, allow
    p, role:platform-admin, repositories, *, *, allow

    # Bind SSO groups to roles
    g, acme-corp:payments-engineers, role:payments-team
    g, acme-corp:platform-team, role:platform-admin
  scopes: "[groups]"

The format is Casbin policy syntax: p, SUBJECT, RESOURCE, ACTION, OBJECT, EFFECT.

AppProject adds another layer — it restricts which repos, clusters, and namespaces an Application can reference:

apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: payments
  namespace: argocd
spec:
  sourceRepos:
    - https://github.com/acme-corp/gitops-manifests
    - https://github.com/acme-corp/shared-charts
  destinations:
    - namespace: payments-*
      server: https://kubernetes.default.svc
  clusterResourceWhitelist:
    - group: ""
      kind: Namespace

Mental Model: Think of RBAC as who can do what, and AppProject as what apps are allowed to touch. RBAC says "the payments team can sync apps." AppProject says "apps in the payments project can only deploy to the payments-* namespace." Both constraints must pass.

Flashcard Check #3¶

Question	Answer
What is the App of Apps pattern?	A root Application that manages a directory of child Application manifests, bootstrapping an entire cluster from one commit.
What does `argocd.argoproj.io/sync-wave: "-1"` mean?	This resource syncs before wave 0 resources.
Why can't you use `helm ls` with ArgoCD-managed charts?	ArgoCD renders charts via `helm template` and manages raw manifests — no Helm release metadata is created.
What is an AppProject?	A CRD that restricts which repos, clusters, and namespaces a set of Applications can access — the multi-tenancy boundary.
What does `hook-delete-policy: BeforeHookCreation` do?	Deletes the old hook resource before creating the new one on the next sync, preventing "already exists" errors.

Part 10: The GitOps Workflow vs Traditional CI/CD¶

Let's walk through the complete lifecycle side by side.

Scenario: Deploy a New Feature¶

Traditional CI/CD:

1. Developer merges PR to app repo
2. CI builds image → ghcr.io/acme-corp/payments:abc123
3. CI pushes image to registry
4. CI runs: kubectl set image deploy/payments payments=...abc123
5. kubectl returns 0 (image updated, not verified healthy)
6. Developer assumes it worked

GitOps:

1. Developer merges PR to app repo
2. CI builds image → ghcr.io/acme-corp/payments:abc123
3. CI pushes image to registry
4. CI commits new image tag to gitops-manifests repo
5. ArgoCD detects change within 3 minutes (or instantly via webhook)
6. ArgoCD renders manifests, diffs against live state
7. ArgoCD applies diff, monitors health checks
8. ArgoCD marks app as Synced + Healthy (or Degraded if probes fail)
9. Team sees status in ArgoCD UI and Slack notification

Scenario: Rollback¶

Traditional CI/CD:

1. Find the last good commit SHA
2. Re-run the CI pipeline for that SHA
3. Hope the old image still exists in the registry
4. Wait for CI to finish (build + test + deploy again)

GitOps:

# Option 1: Git revert
cd gitops-manifests
git revert HEAD
git push

# Option 2: ArgoCD CLI
argocd app rollback payments-service 3    # rollback to revision 3

# Option 3: ArgoCD UI → click "History" → click "Rollback"

Rollback in GitOps is a git revert — seconds, not minutes.

Part 11: Multi-Cluster Management¶

Your company has three clusters: dev, staging, prod. Here's how ArgoCD manages all three from a single installation.

Register Clusters¶

# ArgoCD runs in the management cluster
# Register external clusters
argocd cluster add prod-us-east --name prod-us-east
argocd cluster add staging --name staging

# Verify
argocd cluster list

ArgoCD stores cluster credentials as Secrets in the argocd namespace. The in-cluster (where ArgoCD runs) is always available as https://kubernetes.default.svc.

Hub-Spoke vs Instance-per-Cluster¶

Model	How it works	Good for	Risk
Hub-spoke	One ArgoCD manages all clusters	Centralized visibility, single RBAC	ArgoCD is a SPOF
Instance-per-cluster	Each cluster runs its own ArgoCD	Isolation, blast radius containment	More operational overhead

Most organizations start hub-spoke and split when they hit scale limits or compliance boundaries.

Part 12: Secrets — The Hard Part¶

GitOps says "everything in Git." Secrets say "not me."

Every GitOps team hits this tension. The solutions:

Approach	How it works	Tradeoffs
Sealed Secrets	Encrypt secrets with a cluster-side key; commit ciphertext to Git	Simple. Key rotation is manual. Secrets are cluster-specific.
SOPS + age	Encrypt values in YAML files; decrypt at apply time	Works with any Git workflow. Requires key management.
External Secrets Operator	CRD references a secret in Vault/AWS SM; controller fetches it	Secrets never touch Git. Adds a dependency on the external store.
Vault Agent Injector	Vault sidecar injects secrets into pod at runtime	Pod-level injection. Tightest integration with Vault.

Gotcha: Even with External Secrets Operator, ArgoCD's diff view may show the resulting Secret as "OutOfSync" because the live Secret (populated by ESO) differs from the ExternalSecret CRD that ArgoCD manages. Use ignoreDifferences on Secret resources managed by ESO.

Exercises¶

Exercise 1: Read the Diff (2 minutes)¶

ArgoCD shows this diff for your payments service:

--- desired (Git)
+++ live (cluster)
@@ -4,7 +4,7 @@
 spec:
   replicas: 3
   template:
     spec:
       containers:
       - name: payments
-        image: ghcr.io/acme-corp/payments:v2.1.0
+        image: ghcr.io/acme-corp/payments:v2.0.9

Questions: 1. Is the cluster ahead of or behind Git? 2. What likely happened? 3. Should you sync to Git, or update Git to match the cluster?

Answer

1. The cluster is *behind* Git — running an older image (v2.0.9 vs v2.1.0). 2. A sync likely failed partway through, or someone manually rolled back the image. 3. Check if v2.1.0 was intentionally deployed and whether it caused issues. If v2.1.0 is the desired version, sync. If v2.0.9 was a deliberate rollback, update Git to v2.0.9 and investigate why v2.1.0 failed.

Exercise 2: Write an Application Manifest (10 minutes)¶

Create an ArgoCD Application for a service called user-api with these requirements: - Git repo: https://github.com/acme-corp/gitops-manifests.git - Path: apps/user-api/overlays/staging - Branch: main - Namespace: user-api - Automatic sync with self-heal but without prune (staging, not prod) - Auto-create the namespace

Solution

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: user-api-staging
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/acme-corp/gitops-manifests.git
    targetRevision: main
    path: apps/user-api/overlays/staging
  destination:
    server: https://kubernetes.default.svc
    namespace: user-api
  syncPolicy:
    automated:
      selfHeal: true
      prune: false
    syncOptions:
      - CreateNamespace=true

Exercise 3: Design a Sync Wave Strategy (15 minutes)¶

You're deploying a full-stack app with: - A PostgreSQL StatefulSet - A Redis Deployment - A database migration Job - The application Deployment - An Ingress - An HPA

Design the sync wave ordering. Which hook type does the migration need? What happens if you put everything in wave 0?