GitOps Footguns¶

Mistakes that break your deployment pipeline, cause drift, or make GitOps harder than manual deploys.

1. Committing secrets to the GitOps repo¶

Your entire cluster state is in Git. Including that Secret YAML with the base64-encoded database password. Base64 is not encryption. Anyone with repo access can decode it.

Fix: Use Sealed Secrets, SOPS, or External Secrets Operator. Never commit plain Secrets to Git. Use a secrets manager (Vault, AWS SM) as the source of truth.

2. ArgoCD auto-sync with no rollback¶

You enable auto-sync on all apps. A broken manifest merges to main. ArgoCD immediately applies it. The broken config is now in production. Auto-sync means the broken state is the "desired" state — ArgoCD won't rollback because Git says this is correct.

Fix: Use manual sync for production. Or use auto-sync with selfHeal: false so you can intervene. Protect main with PR reviews and CI validation.

3. Sync wave ordering disasters¶

You deploy a database migration in sync wave 1 and the app in sync wave 2. But you forget to add a Job hook, so the migration and app deploy simultaneously. The app starts before the migration finishes and crashes on the missing column.

Fix: Use sync waves AND sync hooks correctly. Migrations should be PreSync hooks with sync-wave: "-1". Verify ordering with dry-run.

4. Out-of-band changes triggering constant sync¶

Someone runs kubectl edit to hotfix production. ArgoCD detects drift and reverts to the Git state. The hotfix disappears. The person re-applies. ArgoCD reverts again. This loops until someone realizes what's happening.

Fix: Educate the team: all changes go through Git. For emergency hotfixes, commit to Git first. Use ArgoCD notifications to alert on drift. Document the emergency change process.

5. Kustomize overlays that diverge silently¶

Your base has 10 resources. The production overlay patches 3 of them. Someone adds a new resource to base but forgets to configure it in the production overlay. It deploys with dev defaults in production.

Fix: Use ArgoCD diffs to review every change before sync. Require PR reviews that compare rendered manifests, not just Kustomize patches. Test overlay rendering in CI.

6. Helm values.yaml conflict between ArgoCD and manual helm¶

You manage a Helm release with ArgoCD. Someone also runs helm upgrade manually. The two fight over the release state. Helm history gets confused. ArgoCD shows "OutOfSync" permanently.

Fix: One owner per release. If ArgoCD manages it, no manual helm commands. Delete the manual Helm release and let ArgoCD take over. Never mix management approaches.

7. App-of-Apps with no health checks¶

You use the App-of-Apps pattern. The parent app creates child apps. A child app fails to sync. The parent still shows "Healthy" because it only checks that the Application CR exists, not that it's synced.

Fix: Add custom health checks for Application CRs. Monitor child app sync status independently. Set up notifications for OutOfSync states.

8. Git repo as single point of failure¶

Your entire deployment pipeline depends on Git. GitHub has an outage. You can't deploy, can't rollback, can't do anything. Your "highly available" infrastructure is blocked by a single SaaS dependency.

Fix: Have a break-glass procedure for deploying without Git. Keep a local mirror. Document how to kubectl apply directly in emergencies. Practice the emergency procedure.

9. Too many apps in one repo¶

You put 200 microservices in one monorepo. A change to any file triggers ArgoCD to re-evaluate all 200 apps. Sync takes 10 minutes. A broken manifest in one app blocks CI for all 200.

Fix: Split repos by team or domain. Use ArgoCD ApplicationSets for dynamic app generation. Configure ArgoCD path filters so only relevant apps re-sync.

10. Image tag updates without promotion gates¶

Your CI builds an image, updates the tag in Git, and ArgoCD auto-syncs to production. There's no staging deployment, no smoke test, no approval. A broken image goes straight to prod because the pipeline has no gates.

Fix: Use environment promotion: CI → dev → staging → prod. Require manual promotion (or automated tests) between stages. Use image updater tools that create PRs instead of direct commits.