Answer Key: The Deploy That Didn't Deploy¶
The System¶
A notification microservice that sends emails and SMS messages, managed by ArgoCD (GitOps) and deployed to Kubernetes via Kustomize.
[ArgoCD] --watches--> [Git repo (kustomization.yaml)]
|
image: notification-service:latest
|
[Kubernetes Deployment (3 replicas)]
|
[notification-service pods]
/ \
[Email Gateway] [SMS Gateway (legacy)]
|
(v2.5.0 adds sms_provider_v2
but v2.5.0 never deployed)
CI pipeline: builds Docker image, pushes to registry with latest tag.
ArgoCD: syncs Git manifests to cluster.
The gap: nobody told Kubernetes to actually pull the new image.
What's Broken¶
Root cause: The deployment uses the latest tag with the default imagePullPolicy: IfNotPresent. When the CI pipeline pushes a new image to registry.corp.io/notification-service:latest, the tag is updated in the registry, but:
- The Kustomize overlay still says
newTag: latest— the manifest has not changed - ArgoCD compares the manifest in Git with the manifest in the cluster, sees they are identical (both say
latest), and reports "Synced, 0 updated" - Kubernetes does not pull the image because the tag name has not changed and
imagePullPolicyisIfNotPresent— the node's local cache has the old digest - The pods continue running v2.3.1 (sha256:3e7a9f2b) instead of v2.5.0 (sha256:9c4d8e1a)
The operational impact: the sms_provider_v2 feature flag check returns not_found because v2.3.1 does not know about that flag. The app falls back to the legacy SMS gateway, which has a 35% failure rate (2,104 / 5,995).
Key clue: Pod image digest (3e7a9f2b) does not match CI push digest (9c4d8e1a), and app_build_info reports version 2.3.1 (built October 8) while CI pushed 2.5.0 (November 19).
The Fix¶
Immediate (force the image pull)¶
# Restart the deployment to force image pull
kubectl rollout restart deployment notification-service -n comms
# Or patch to force pull policy
kubectl patch deployment notification-service -n comms \
--type='json' -p='[{"op":"add","path":"/spec/template/spec/containers/0/imagePullPolicy","value":"Always"}]'
Permanent (stop using latest)¶
-
Change the Kustomize overlay to use immutable tags:
-
Update the CI pipeline to commit the new tag to the Git repo so ArgoCD detects the change:
-
Set
imagePullPolicy: Alwaysas a safety net (or use digest-based references).
Verification¶
# Confirm new version is running
kubectl exec -n comms deploy/notification-service -- env | grep VERSION
# Check build info metric
curl -s http://notification-service.comms:8080/metrics | grep app_build_info
# Verify image digest matches CI push
kubectl get pods -n comms -l app=notification-service \
-o jsonpath='{.items[0].status.containerStatuses[0].imageID}'
# Check SMS provider v2 is active
curl -s http://notification-service.comms:8080/metrics | grep sms_provider_v2
Artifact Decoder¶
| Artifact | What It Revealed | What Was Misleading |
|---|---|---|
| CLI Output | Image digest in pods differs from CI push — stale image | ArgoCD says "Synced" and "Healthy" — everything looks green |
| Metrics | app_build_info shows v2.3.1 from October, not v2.5.0; SMS failure rate is 35% |
Email metrics look fine, masking the SMS degradation |
| IaC Snippet | newTag: latest in Kustomize — the root of the problem |
The Kustomize config looks simple and correct at first glance |
| Log Lines | CI pushed v2.5.0 on Nov 19; ArgoCD synced with "0 updated" on Nov 20 — the deploy was a no-op | ArgoCD "Sync succeeded" log makes it look like the deploy worked |
Skills Demonstrated¶
- Understanding the
latesttag anti-pattern andimagePullPolicybehavior - Recognizing the gap between GitOps sync status and actual deployed state
- Correlating application version metrics with expected deployment versions
- Understanding the difference between image tags and image digests
- Tracing the full CI/CD pipeline from build through deploy