Skip to content

Portal | Level: L2: Operations | Topics: GitOps | Domain: DevOps & Tooling

Scenario: GitOps Drift Causing Outage

The Prompt

"ArgoCD shows our production app as 'OutOfSync' with a 'Degraded' health status. Someone apparently made manual changes to production, and now ArgoCD's self-heal is fighting with an HPA. Pods are scaling up and down constantly."

Initial Report

Alert: "grokdevops deployment has been flapping between 3 and 10 replicas for the last 30 minutes. ArgoCD shows OutOfSync. Error rate is 15%."

Constraints

  • Time pressure: Error rate is elevated due to the scaling flap.
  • ArgoCD has selfHeal enabled: Any manual fix may be reverted by ArgoCD.

Observable Evidence

  • ArgoCD UI shows the app as OutOfSync with a diff on spec.replicas
  • HPA is trying to scale to 8 replicas based on CPU
  • ArgoCD keeps reverting replicas to 3 (the value in Git)
  • Pods are being created and terminated in a loop

Expected Investigation Path

# 1. Check ArgoCD app status
argocd app get grokdevops

# 2. View the diff
argocd app diff grokdevops
# Shows: spec.replicas: 3 (Git) vs 8 (live, set by HPA)

# 3. Check HPA
kubectl get hpa -n grokdevops
# Shows: desired=8, current=3 (being reverted)

# 4. IMMEDIATE FIX: Tell ArgoCD to ignore replicas
argocd app set grokdevops --ignore-difference '{"group":"apps","kind":"Deployment","jsonPointers":["/spec/replicas"]}'

# 5. PERMANENT FIX: Update the Application in Git
# Add to the Application spec:
#   spec:
#     ignoreDifferences:
#       - group: apps
#         kind: Deployment
#         jsonPointers:
#           - /spec/replicas

# 6. Verify stabilization
kubectl get pods -n grokdevops -w
argocd app get grokdevops  # Should be Synced now

Root Cause

HPA manages spec.replicas dynamically. ArgoCD sees the HPA-set value as drift from Git and reverts it. This creates an infinite loop: HPA scales up -> ArgoCD reverts -> HPA scales up.

What a Strong Answer Includes

  • Quick identification of the HPA vs ArgoCD conflict
  • Knowledge of ignoreDifferences to resolve sync loops
  • Understanding that this is a common GitOps pitfall (not a bug)
  • The fix should be made in Git (not just via argocd CLI) to be durable
  • Mention of other sync loop causes: server-side defaults, mutating webhooks
  • Post-incident: audit all ArgoCD-managed deployments that also have HPAs

Wiki Navigation