The Git Deploy That Deployed Nothing¶
Category: The Mystery Domains: ci-cd, git Read time: ~5 min
Setting the Scene¶
We had a fairly standard GitOps-style deploy pipeline: push to main, GitHub Actions builds the image, tags it with the output of git describe --tags --always, pushes to ECR, and updates a Kubernetes deployment manifest with the new tag. It had worked reliably for eight months and about 400 deploys. Then one Wednesday, we shipped a critical bugfix for a payment processing race condition, the pipeline went green, and... nothing changed.
The bug was still there. In production. I checked the pipeline -- all green. I checked ECR -- new image was there. I checked the deployment -- it had rolled out successfully. But the code running in the pods was the previous version.
What Happened¶
I started with the obvious: kubectl describe deployment payments-svc showed the image tag v2.14.3-12-g3a8b2c1. That looked right -- it should be 12 commits after tag v2.14.3. I exec'd into a running pod and checked the binary's embedded version string. It said v2.14.3-8-gf991e02. That was the previous deploy. The image tag said one thing; the binary inside said another.
I pulled the image from ECR and inspected it locally. docker run --rm payments-svc:v2.14.3-12-g3a8b2c1 --version returned v2.14.3-8-gf991e02. The image was tagged with the new version string, but the binary inside was compiled from the old commit.
I went into the GitHub Actions logs and found the problem. Our workflow started with actions/checkout@v3, which by default does a shallow clone with fetch-depth: 1. The build step ran git describe --tags --always to generate the version string for the image tag. But git describe with a shallow clone doesn't have access to the tag history -- it found the nearest annotated tag in the shallow history, which happened to be a stale reference.
Here's where it got really sneaky: the image tag in ECR was generated by a separate step that used git describe after a git fetch --tags (which did pull the correct tag info). So the ECR tag was correct (v2.14.3-12-g3a8b2c1), but the binary inside was compiled with the version from the shallow clone (v2.14.3-8-gf991e02). The Kubernetes rollout saw a new image tag, pulled the "new" image, and deployed it. But the image content was built from the wrong commit.
The Moment of Truth¶
The real problem wasn't just the version string -- it was the build cache. Our Dockerfile used a multi-stage build, and Docker's layer caching on the GitHub Actions runner had cached the compilation layer from the previous build. Since the shallow clone didn't change enough context for Docker to invalidate the cache, it reused the old compiled binary. The image tag was new, but every layer inside was old.
I added fetch-depth: 0 to the checkout step and --no-cache to the Docker build. The next deploy actually deployed the new code.
The Aftermath¶
The payment race condition had been "fixed" and "deployed" for 18 hours before we realized it was still live. Fortunately, the impact was limited -- the bug caused duplicate idempotency keys, not duplicate charges. We added a post-deploy smoke test that hit a /version endpoint and compared the response to the expected git SHA. We also switched from git describe to the literal GITHUB_SHA for both tagging and version embedding.
The Lessons¶
- Shallow clones can surprise you:
git describe,git log, and other history-dependent commands behave differently with shallow clones. Usefetch-depth: 0or explicitly fetch what you need. - Verify what you deployed, post-deploy: A green pipeline means the pipeline succeeded, not that the right code is running. Add a smoke test that confirms the deployed version matches the expected commit.
- Smoke test after every deploy: Hit a health or version endpoint. Compare SHA. If it doesn't match, roll back automatically. This takes 5 lines in your pipeline and saves hours of debugging.
What I'd Do Differently¶
Embed the git SHA at compile time using ldflags or build args, not git describe. Use GITHUB_SHA as the single source of truth for both the image tag and the version string. And add a mandatory post-deploy verification step that blocks the pipeline until the running version matches the expected SHA.
The Quote¶
"The pipeline was green, the image was pushed, the rollout was complete, and we'd deployed absolutely nothing."
Cross-References¶
- Topic Packs: CI/CD Pipelines & Patterns, Git Advanced, GitHub Actions, Container Images