ArgoCD & GitOps - Street-Level Ops¶
Real-world patterns and gotchas from production GitOps operations.
Quick Diagnosis Commands¶
# Application status
argocd app get <app-name>
# Sync status for all apps
argocd app list
# View diff between Git and live
argocd app diff <app-name>
# Force refresh from Git (don't wait for poll interval)
argocd app get <app-name> --refresh
# View sync history
argocd app history <app-name>
# Check ArgoCD controller logs
kubectl logs -n argocd deploy/argocd-application-controller --tail=100
# Check repo server (manifest rendering)
kubectl logs -n argocd deploy/argocd-repo-server --tail=100
Gotcha: Sync Loops with HPA¶
HPA changes spec.replicas, ArgoCD sees drift, reverts it, HPA changes it again.
Fix: Tell ArgoCD to ignore the replicas field:
Gotcha: CRDs Must Exist Before Custom Resources¶
If your app creates a CRD and a custom resource in the same sync, the CR will fail because the CRD isn't registered yet.
Fix: Use sync waves:
# CRD: wave -1
metadata:
annotations:
argocd.argoproj.io/sync-wave: "-1"
# Custom Resource: wave 1
metadata:
annotations:
argocd.argoproj.io/sync-wave: "1"
Gotcha: Helm Hooks¶
ArgoCD handles Helm hooks differently than helm install. Pre-install/pre-upgrade hooks become ArgoCD PreSync hooks. Post-install/post-upgrade become PostSync.
Pitfall: If a hook Job doesn't complete, the sync hangs. Always set ttlSecondsAfterFinished and activeDeadlineSeconds.
Pattern: Repo Structure for Multi-Env¶
# Recommended: separate directories per environment
config-repo/
base/
deployment.yaml
service.yaml
kustomization.yaml
overlays/
dev/
kustomization.yaml
patches/
staging/
kustomization.yaml
patches/
prod/
kustomization.yaml
patches/
Pattern: Image Tag Promotion¶
Don't use latest tag. Promote specific image tags through environments:
# CI builds and pushes image
docker build -t ghcr.io/org/app:v1.2.3 .
docker push ghcr.io/org/app:v1.2.3
# Promote to dev (automated by Image Updater or CI)
# Update overlays/dev/kustomization.yaml:
# images:
# - name: ghcr.io/org/app
# newTag: v1.2.3
# Promote to prod (manual PR)
# Same change in overlays/prod/kustomization.yaml
Pattern: Notifications¶
ArgoCD can notify on sync status changes:
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-notifications-cm
namespace: argocd
data:
trigger.on-sync-failed: |
- when: app.status.operationState.phase in ['Error', 'Failed']
send: [slack-notification]
template.slack-notification: |
message: |
Application {{.app.metadata.name}} sync {{.app.status.operationState.phase}}.
{{.app.status.operationState.message}}
service.slack: |
token: $slack-token
channel: deployments
Emergency: Force Sync¶
When ArgoCD won't sync due to validation errors but you need to push a fix:
# Bypass validation (use sparingly)
argocd app sync <app-name> --force
# Replace resources instead of patching
argocd app sync <app-name> --replace
# Sync specific resources only
argocd app sync <app-name> --resource :Service:my-service
Emergency: ArgoCD Itself Is Down¶
# Check all ArgoCD pods
kubectl get pods -n argocd
# Common issue: repo-server OOM (large repos)
kubectl describe pod -n argocd -l app.kubernetes.io/name=argocd-repo-server
# Restart specific component
kubectl rollout restart deployment argocd-application-controller -n argocd
# Nuclear option: your apps keep running even if ArgoCD is down
# ArgoCD is control plane only - it doesn't proxy traffic
Pattern: Progressive Delivery with Argo Rollouts¶
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: grokdevops
spec:
strategy:
canary:
steps:
- setWeight: 10
- pause: {duration: 5m}
- setWeight: 30
- pause: {duration: 5m}
- setWeight: 60
- pause: {duration: 5m}
canaryService: grokdevops-canary
stableService: grokdevops-stable
This gives you automated canary with pause gates, managed entirely through Git.