Skip to content

Answer Key: The Deploy That Didn't Deploy

The System

A notification microservice that sends emails and SMS messages, managed by ArgoCD (GitOps) and deployed to Kubernetes via Kustomize.

[ArgoCD] --watches--> [Git repo (kustomization.yaml)]
                            |
                       image: notification-service:latest
                            |
                       [Kubernetes Deployment (3 replicas)]
                            |
                  [notification-service pods]
                       /         \
              [Email Gateway]  [SMS Gateway (legacy)]
                                    |
                              (v2.5.0 adds sms_provider_v2
                               but v2.5.0 never deployed)

CI pipeline: builds Docker image, pushes to registry with latest tag. ArgoCD: syncs Git manifests to cluster. The gap: nobody told Kubernetes to actually pull the new image.

What's Broken

Root cause: The deployment uses the latest tag with the default imagePullPolicy: IfNotPresent. When the CI pipeline pushes a new image to registry.corp.io/notification-service:latest, the tag is updated in the registry, but:

  1. The Kustomize overlay still says newTag: latest — the manifest has not changed
  2. ArgoCD compares the manifest in Git with the manifest in the cluster, sees they are identical (both say latest), and reports "Synced, 0 updated"
  3. Kubernetes does not pull the image because the tag name has not changed and imagePullPolicy is IfNotPresent — the node's local cache has the old digest
  4. The pods continue running v2.3.1 (sha256:3e7a9f2b) instead of v2.5.0 (sha256:9c4d8e1a)

The operational impact: the sms_provider_v2 feature flag check returns not_found because v2.3.1 does not know about that flag. The app falls back to the legacy SMS gateway, which has a 35% failure rate (2,104 / 5,995).

Key clue: Pod image digest (3e7a9f2b) does not match CI push digest (9c4d8e1a), and app_build_info reports version 2.3.1 (built October 8) while CI pushed 2.5.0 (November 19).

The Fix

Immediate (force the image pull)

# Restart the deployment to force image pull
kubectl rollout restart deployment notification-service -n comms

# Or patch to force pull policy
kubectl patch deployment notification-service -n comms \
  --type='json' -p='[{"op":"add","path":"/spec/template/spec/containers/0/imagePullPolicy","value":"Always"}]'

Permanent (stop using latest)

  1. Change the Kustomize overlay to use immutable tags:

    images:
      - name: registry.corp.io/notification-service
        newTag: v2.5.0   # Immutable version tag, not 'latest'
    

  2. Update the CI pipeline to commit the new tag to the Git repo so ArgoCD detects the change:

    # CI step: update kustomization.yaml after build
    - name: Update deployment tag
      script: |
        cd k8s/overlays/prod
        kustomize edit set image registry.corp.io/notification-service:${CI_COMMIT_TAG}
        git commit -am "deploy: notification-service ${CI_COMMIT_TAG}"
        git push
    

  3. Set imagePullPolicy: Always as a safety net (or use digest-based references).

Verification

# Confirm new version is running
kubectl exec -n comms deploy/notification-service -- env | grep VERSION

# Check build info metric
curl -s http://notification-service.comms:8080/metrics | grep app_build_info

# Verify image digest matches CI push
kubectl get pods -n comms -l app=notification-service \
  -o jsonpath='{.items[0].status.containerStatuses[0].imageID}'

# Check SMS provider v2 is active
curl -s http://notification-service.comms:8080/metrics | grep sms_provider_v2

Artifact Decoder

Artifact What It Revealed What Was Misleading
CLI Output Image digest in pods differs from CI push — stale image ArgoCD says "Synced" and "Healthy" — everything looks green
Metrics app_build_info shows v2.3.1 from October, not v2.5.0; SMS failure rate is 35% Email metrics look fine, masking the SMS degradation
IaC Snippet newTag: latest in Kustomize — the root of the problem The Kustomize config looks simple and correct at first glance
Log Lines CI pushed v2.5.0 on Nov 19; ArgoCD synced with "0 updated" on Nov 20 — the deploy was a no-op ArgoCD "Sync succeeded" log makes it look like the deploy worked

Skills Demonstrated

  • Understanding the latest tag anti-pattern and imagePullPolicy behavior
  • Recognizing the gap between GitOps sync status and actual deployed state
  • Correlating application version metrics with expected deployment versions
  • Understanding the difference between image tags and image digests
  • Tracing the full CI/CD pipeline from build through deploy

Prerequisite Topic Packs