Skip to content

ArgoCD & GitOps Footguns

Mistakes that cause outages, data loss, or broken deployments with ArgoCD and GitOps.


1. Enabling prune: true Without Understanding the Blast Radius

You enable automated pruning. A developer refactors the GitOps repo, renames a directory, and pushes. ArgoCD sees the old resources as orphaned and deletes them from the cluster — including a StatefulSet with PVCs that are not automatically recreated.

The delete happens silently, at most 3 minutes after the push. No explicit confirmation is required.

Fix: Never enable prune: true on StatefulSets, PVCs, or Namespaces without adding argocd.argoproj.io/sync-options: Prune=false annotations to those resources. Always test a rename with argocd app sync --dry-run --prune before pushing. Consider leaving prune disabled on prod and running it manually after reviewing the diff.

War story: A widely-reported ArgoCD incident involved a team that renamed a Helm chart directory in their GitOps repo. ArgoCD treated every resource from the old path as orphaned and pruned them all — including a PostgreSQL StatefulSet with 500GB of data. The PVCs were deleted. Recovery required restoring from a backup that was 6 hours old.


2. Using targetRevision: HEAD in Production

Your GitOps repo's main branch is your production truth. A developer accidentally merges a broken PR at 11pm. Self-heal kicks in and applies the broken manifests to production within 3 minutes. There's no promotion gate — HEAD is always deployed.

Fix: Tag releases explicitly. Use targetRevision: v1.5.0 for production Applications. Promotion means bumping the tag in the Application spec (or in the Kustomize overlay), which itself requires a commit. This creates an explicit promotion step that can be gated.


3. Forgetting Finalizers = Resources Leak on App Deletion

You delete an Application without the resources-finalizer.argocd.argoproj.io finalizer. ArgoCD removes the Application CRD object but leaves all the Kubernetes resources it managed (Deployments, Services, Ingresses) orphaned in the namespace. You think you've cleaned up; the cluster disagrees.

Fix: Always include the finalizer in Application manifests:

metadata:
  finalizers:
    - resources-finalizer.argocd.argoproj.io
If you're deleting an Application and want to keep the resources, delete with --cascade=false: argocd app delete my-app --cascade=false.


4. Storing Secrets in Git (Even Encrypted "Enough")

Teams start with Sealed Secrets or SOPS and gradually stop caring about key rotation. Eventually someone commits a .env file directly. Or the Sealed Secrets private key is stored in the same repo. Now your GitOps repo is a credentials dump.

Fix: Never store raw secrets in Git. Use a secrets operator pattern: sealed-secrets, SOPS + age, External Secrets Operator pulling from Vault/AWS Secrets Manager. The GitOps repo contains only sealed/encrypted references, and the decryption key lives in the cluster. Rotate the sealed-secrets key annually.


5. Self-Heal Fighting a Mutating Admission Webhook

An admission webhook modifies a Deployment after ArgoCD applies it (e.g., injecting a sidecar, adding resource limits from a LimitRange, annotating with a timestamp). ArgoCD sees the live object differ from Git, reverts it, the webhook modifies it again. This creates a sync loop that generates hundreds of events per hour, hammers the API server, and triggers alerting noise.

Fix: Use ignoreDifferences to tell ArgoCD to ignore fields mutated by external controllers:

spec:
  ignoreDifferences:
    - group: apps
      kind: Deployment
      jsonPointers:
        - /spec/template/metadata/annotations/kubectl.kubernetes.io~1last-applied-configuration
    - group: ""
      kind: ServiceAccount
      jsonPointers:
        - /secrets


6. AppProject Too Permissive (or Missing Entirely)

You deploy everything in the default project. Any team with ArgoCD access can create an Application pointing at any repo and any namespace. A mistake (or a compromised account) can deploy arbitrary workloads anywhere in the cluster.

Fix: Create a dedicated AppProject per team. Set sourceRepos, destinations, and clusterResourceWhitelist explicitly. The principle of least privilege applies: a team's project should not be able to deploy to namespaces owned by other teams, and definitely not be able to create ClusterRoles.


7. Running Sync Hooks Without Idempotency

Your PreSync Job runs a database migration. The sync fails for an unrelated reason and is retried. The migration Job runs again. It tries to create a column that already exists and throws an error, blocking all future deploys.

Fix: Migrations must be idempotent. Use IF NOT EXISTS in DDL, use a migration framework (Flyway, Liquibase, Alembic) that tracks which migrations have run. Set hook-delete-policy: BeforeHookCreation so the Job is deleted before the next sync attempt re-creates it with a clean slate.


8. Not Setting Resource Limits in Hook Jobs

A PreSync migration Job gets scheduled and sits pending because the cluster has no headroom. The sync times out. The Job is not cleaned up (wrong hook-delete-policy). On the next sync, ArgoCD tries to create the hook Job but it already exists — sync fails with "already exists" error. Now you have a stuck sync that requires manual Job deletion to unblock.

Fix: Set resource requests/limits on all hook Jobs. Set hook-delete-policy: BeforeHookCreation. Set a activeDeadlineSeconds on the Job so it times out cleanly:

spec:
  activeDeadlineSeconds: 300
  backoffLimit: 2


9. Ignoring the ArgoCD Upgrade Path

You skip three minor versions when upgrading ArgoCD (e.g., 2.4 → 2.8). CRD schemas changed. Application resources with old fields fail validation. The application controller crashes on startup. The cluster is now unmanaged.

Fix: Upgrade ArgoCD one minor version at a time. Read the upgrade notes for each version. Always upgrade in a staging cluster first. Before upgrading: kubectl -n argocd get app -o yaml > apps-backup.yaml.


10. Using ArgoCD for Secret Management (Not Just Config)

Teams route secrets through ArgoCD because "it's all GitOps." Secrets appear in ArgoCD's diff view, are logged by the repo server, and may be stored in ArgoCD's cache. Anyone with ArgoCD UI access can read your database passwords.

Fix: Keep secrets out of ArgoCD's management. Use External Secrets Operator or Vault Agent Injector. The secret resource in Git is a reference (an ExternalSecret CRD), not the actual data. ArgoCD manages the ExternalSecret manifest; the secrets operator fetches and injects the actual value.

Gotcha: Even with External Secrets Operator, ArgoCD's diff view may show the resolved Secret as "OutOfSync" because the live cluster Secret (populated by ESO) differs from what ArgoCD expects (the ExternalSecret CRD). Use ignoreDifferences on Secret resources managed by ESO, or configure ArgoCD to track the ExternalSecret CRD rather than the resulting Secret.


11. Sync Waves Without Health Checks

You set wave 0 for your database and wave 1 for your application. But your database Deployment has no readiness probe. ArgoCD considers it "Healthy" as soon as the Pod is Running, even if the DB process hasn't finished initializing. The wave-1 application starts before the DB is ready and fails with connection errors.

Fix: Every Deployment managed by ArgoCD must have a readiness probe. ArgoCD's health check for a Deployment is availableReplicas == desiredReplicas, and a pod counts as available only when its readiness probe passes.


12. Forgetting That ApplicationSet Generates Apps at Write Time

You add a new cluster to a git-generator ApplicationSet. ArgoCD immediately renders and applies an Application for that cluster. If the cluster credentials aren't registered yet in ArgoCD, the Application appears Unknown. If you've set prune: true on the generated apps, removing a cluster from the generator list immediately deletes its Application — and with the finalizer, all its resources.

Fix: When using ApplicationSet with cluster or git generators, test in a non-prod instance first. Use the syncPolicy.automated on the ApplicationSet conservatively. Keep pruning disabled until you've verified the generator logic is correct.


13. Bootstrapping Without a Secret for Repo Credentials

You apply the root App of Apps but forgot to pre-create the Secret for the private Git repo. ArgoCD can't clone the repo; all child Applications fail to render. They show as Unknown with an authentication error. Debugging this requires reading repo-server logs, not just the App status in the UI.

Fix: Bootstrap order matters. Pre-create all repository credential secrets before applying the root Application:

kubectl -n argocd create secret generic my-gitops-repo \
  --from-literal=url=https://github.com/myorg/gitops \
  --from-literal=username=git \
  --from-literal=password=${GITHUB_TOKEN}
kubectl -n argocd label secret my-gitops-repo \
  argocd.argoproj.io/secret-type=repository
# Then apply root-app.yaml