Skip to content

Symptoms: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured

Domains: devops_tooling | networking | kubernetes_ops Level: L3 Estimated time: 45 min

Initial Alert

No alert fires initially. The incident is discovered 2 hours after a canary deployment when a customer reports unexpected behavior:

Customer support ticket #48291:
"Since about 2 hours ago, the product search returns different results
depending on when I refresh. Sometimes I see the new UI, sometimes the old one.
The new UI is showing wrong prices on some products."

Investigation reveals:

WARNING: product-search canary metrics look healthy — 0% error rate, p99 < 100ms
INFO: Canary deployment product-search-v4.2.0 — 10% traffic split active
ANOMALY: product-search-v4.2.0 serving 10% of traffic, but cart-service-v3.1.0 also receiving 10% of product-search traffic

Observable Symptoms

  • The canary deployment product-search-v4.2.0 is running alongside the stable product-search-v4.1.0.
  • Canary health metrics (error rate, latency) look perfect — 0% errors, sub-100ms latency.
  • 10% of traffic is being routed to the canary, as intended.
  • However, some users see incorrect prices — products show wholesale prices instead of retail prices.
  • The wholesale pricing issue only affects users hitting the canary.
  • The canary's product-search v4.2.0 code does not modify pricing logic.
  • The Ingress controller logs show 10% of traffic going to a backend called product-search-canary.

The Misleading Signal

The canary deployment looks healthy by all standard DevOps metrics — zero errors, low latency, correct traffic split. The pricing issue seems like an application bug in v4.2.0. Engineers begin code-reviewing the v4.2.0 diff looking for pricing-related changes. Since the canary is "healthy" according to the deployment pipeline, the deployment automation has no reason to roll back. The pricing issue is treated as a code bug rather than a routing problem.