Portal | Level: L2: Operations | Topics: GitOps | Domain: DevOps & Tooling
Scenario: GitOps Drift Causing Outage¶
The Prompt¶
"ArgoCD shows our production app as 'OutOfSync' with a 'Degraded' health status. Someone apparently made manual changes to production, and now ArgoCD's self-heal is fighting with an HPA. Pods are scaling up and down constantly."
Initial Report¶
Alert: "grokdevops deployment has been flapping between 3 and 10 replicas for the last 30 minutes. ArgoCD shows OutOfSync. Error rate is 15%."
Constraints¶
- Time pressure: Error rate is elevated due to the scaling flap.
- ArgoCD has selfHeal enabled: Any manual fix may be reverted by ArgoCD.
Observable Evidence¶
- ArgoCD UI shows the app as OutOfSync with a diff on
spec.replicas - HPA is trying to scale to 8 replicas based on CPU
- ArgoCD keeps reverting replicas to 3 (the value in Git)
- Pods are being created and terminated in a loop
Expected Investigation Path¶
# 1. Check ArgoCD app status
argocd app get grokdevops
# 2. View the diff
argocd app diff grokdevops
# Shows: spec.replicas: 3 (Git) vs 8 (live, set by HPA)
# 3. Check HPA
kubectl get hpa -n grokdevops
# Shows: desired=8, current=3 (being reverted)
# 4. IMMEDIATE FIX: Tell ArgoCD to ignore replicas
argocd app set grokdevops --ignore-difference '{"group":"apps","kind":"Deployment","jsonPointers":["/spec/replicas"]}'
# 5. PERMANENT FIX: Update the Application in Git
# Add to the Application spec:
# spec:
# ignoreDifferences:
# - group: apps
# kind: Deployment
# jsonPointers:
# - /spec/replicas
# 6. Verify stabilization
kubectl get pods -n grokdevops -w
argocd app get grokdevops # Should be Synced now
Root Cause¶
HPA manages spec.replicas dynamically. ArgoCD sees the HPA-set value as drift from Git and reverts it. This creates an infinite loop: HPA scales up -> ArgoCD reverts -> HPA scales up.
What a Strong Answer Includes¶
- Quick identification of the HPA vs ArgoCD conflict
- Knowledge of
ignoreDifferencesto resolve sync loops - Understanding that this is a common GitOps pitfall (not a bug)
- The fix should be made in Git (not just via argocd CLI) to be durable
- Mention of other sync loop causes: server-side defaults, mutating webhooks
- Post-incident: audit all ArgoCD-managed deployments that also have HPAs
Wiki Navigation¶
Related Content¶
- Argo Flashcards (CLI) (flashcard_deck, L1) — GitOps
- GitOps (Topic Pack, L1) — GitOps
- GitOps & ArgoCD Drills (Drill, L2) — GitOps
- Gitops Flashcards (CLI) (flashcard_deck, L1) — GitOps
- Interview: Config Drift Detected (Scenario, L2) — GitOps
- Lab: GitOps Sync and Drift (CLI) (Lab, L2) — GitOps
- Runbook: ArgoCD Out of Sync (Runbook, L2) — GitOps
- Runbook: Deploy Rollback (Runbook, L1) — GitOps
- Skillcheck: GitOps (Assessment, L2) — GitOps
- Track: Helm & Release Ops (Reference, L1) — GitOps
Pages that link here¶
- ArgoCD & GitOps - Primer
- GitOps & ArgoCD Drills
- GitOps (ArgoCD) - Skill Check
- Gitops
- Interview Gauntlet: Deploy Succeeded but Old Version Visible
- Interview Gauntlet: GitOps or Traditional CI/CD?
- Interview Scenarios
- Level 6: Advanced Platform Engineering
- Master Curriculum: 40 Weeks
- Runbook: ArgoCD Application OutOfSync
- Runbook: Deploy Rollback
- Scenario: Config Drift Detected in Production
- Track: Advanced Platform Engineering
- Track: Helm & Release Operations