ArgoCD & GitOps - Street-Level Ops¶

Real-world patterns and gotchas from production GitOps operations.

Quick Diagnosis Commands¶

# Application status
argocd app get <app-name>

# Sync status for all apps
argocd app list

# View diff between Git and live
argocd app diff <app-name>

# Force refresh from Git (don't wait for poll interval)
argocd app get <app-name> --refresh

# View sync history
argocd app history <app-name>

# Check ArgoCD controller logs
kubectl logs -n argocd deploy/argocd-application-controller --tail=100

# Check repo server (manifest rendering)
kubectl logs -n argocd deploy/argocd-repo-server --tail=100

Gotcha: Sync Loops with HPA¶

HPA changes spec.replicas, ArgoCD sees drift, reverts it, HPA changes it again.

Fix: Tell ArgoCD to ignore the replicas field:

spec:
  ignoreDifferences:
    - group: apps
      kind: Deployment
      jsonPointers:
        - /spec/replicas

Gotcha: CRDs Must Exist Before Custom Resources¶

If your app creates a CRD and a custom resource in the same sync, the CR will fail because the CRD isn't registered yet.

Fix: Use sync waves:

# CRD: wave -1
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "-1"

# Custom Resource: wave 1
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "1"

Gotcha: Helm Hooks¶

ArgoCD handles Helm hooks differently than helm install. Pre-install/pre-upgrade hooks become ArgoCD PreSync hooks. Post-install/post-upgrade become PostSync.

Pitfall: If a hook Job doesn't complete, the sync hangs. Always set ttlSecondsAfterFinished and activeDeadlineSeconds.

Pattern: Repo Structure for Multi-Env¶

# Recommended: separate directories per environment
config-repo/
  base/
    deployment.yaml
    service.yaml
    kustomization.yaml
  overlays/
    dev/
      kustomization.yaml
      patches/
    staging/
      kustomization.yaml
      patches/
    prod/
      kustomization.yaml
      patches/

Pattern: Image Tag Promotion¶

Don't use latest tag. Promote specific image tags through environments:

# CI builds and pushes image
docker build -t ghcr.io/org/app:v1.2.3 .
docker push ghcr.io/org/app:v1.2.3

# Promote to dev (automated by Image Updater or CI)
# Update overlays/dev/kustomization.yaml:
#   images:
#     - name: ghcr.io/org/app
#       newTag: v1.2.3

# Promote to prod (manual PR)
# Same change in overlays/prod/kustomization.yaml

Pattern: Notifications¶

ArgoCD can notify on sync status changes:

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-notifications-cm
  namespace: argocd
data:
  trigger.on-sync-failed: |
    - when: app.status.operationState.phase in ['Error', 'Failed']
      send: [slack-notification]
  template.slack-notification: |
    message: |
      Application {{.app.metadata.name}} sync {{.app.status.operationState.phase}}.
      {{.app.status.operationState.message}}
  service.slack: |
    token: $slack-token
    channel: deployments

Emergency: Force Sync¶

When ArgoCD won't sync due to validation errors but you need to push a fix:

# Bypass validation (use sparingly)
argocd app sync <app-name> --force

# Replace resources instead of patching
argocd app sync <app-name> --replace

# Sync specific resources only
argocd app sync <app-name> --resource :Service:my-service

Emergency: ArgoCD Itself Is Down¶

# Check all ArgoCD pods
kubectl get pods -n argocd

# Common issue: repo-server OOM (large repos)
kubectl describe pod -n argocd -l app.kubernetes.io/name=argocd-repo-server

# Restart specific component
kubectl rollout restart deployment argocd-application-controller -n argocd

# Nuclear option: your apps keep running even if ArgoCD is down
# ArgoCD is control plane only - it doesn't proxy traffic

Pattern: Progressive Delivery with Argo Rollouts¶

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: grokdevops
spec:
  strategy:
    canary:
      steps:
        - setWeight: 10
        - pause: {duration: 5m}
        - setWeight: 30
        - pause: {duration: 5m}
        - setWeight: 60
        - pause: {duration: 5m}
      canaryService: grokdevops-canary
      stableService: grokdevops-stable

This gives you automated canary with pause gates, managed entirely through Git.