Investigation: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured¶
Phase 1: DevOps Tooling Investigation (Dead End)¶
Check the canary deployment status:
$ kubectl get deployment -n prod -l app=product-search
NAME READY UP-TO-DATE AVAILABLE AGE
product-search-stable 9/9 9 9 14d
product-search-canary 1/1 1 1 2h
$ kubectl get canary product-search -n prod
NAME STATUS WEIGHT LASTTRANSITIONTIME
product-search Progressing 10 2026-03-19T14:30:00Z
Canary is at 10% weight, progressing normally. Check the Flagger canary analysis:
$ kubectl describe canary product-search -n prod | grep -A10 "Status"
Status:
Canary Weight: 10
Failed Checks: 0
Phase: Progressing
Conditions:
- Type: Promoted
Status: False
- Message: "Canary analysis passed: error-rate 0.00% < 1%, latency p99 82ms < 500ms"
Zero failed checks. All metrics pass. Review the v4.2.0 code diff:
$ git diff v4.1.0..v4.2.0 -- src/
# Changes:
# - Updated search algorithm (relevance scoring)
# - Added new filter for category-based search
# - Updated UI template for search results page
# No pricing-related changes
No pricing changes in the code. But users on the canary see wholesale prices. The canary code is not the problem.
Check what the canary is actually serving:
$ kubectl exec product-search-canary-7c6d5e4f3-a1b2c -n prod -- \
curl -s http://localhost:8080/v1/products/12345 | jq '.price'
14.99
The canary pod returns the correct retail price when called directly. The pricing issue is not in the canary application.
The Pivot¶
If the canary's application is returning correct prices, but users hitting the canary see wrong prices, the problem must be in the routing — traffic is going to the wrong backend. Check the Ingress configuration:
$ kubectl get ingress product-search -n prod -o yaml | grep -A20 "rules"
spec:
rules:
- host: www.example.com
http:
paths:
- path: /v1/products
pathType: Prefix
backend:
service:
name: product-search-primary
port:
number: 8080
- path: /v1/products
pathType: Prefix
backend:
service:
name: product-search-canary
port:
number: 8080
Two backends for the same path — the canary annotation should handle the weight split. Check the canary-specific Ingress:
$ kubectl get ingress product-search-canary -n prod -o yaml
metadata:
annotations:
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "10"
spec:
rules:
- host: www.example.com
http:
paths:
- path: /v1/products
pathType: Prefix
backend:
service:
name: product-search-canary
port:
number: 8080
The canary Ingress looks correct. But check what Service product-search-canary actually points to:
$ kubectl get service product-search-canary -n prod -o yaml | grep -A5 "selector"
selector:
app: product-search-canary
$ kubectl get endpoints product-search-canary -n prod
NAME ENDPOINTS AGE
product-search-canary 10.244.3.42:8080 2h
$ kubectl get pods -n prod -l app=product-search-canary -o wide
NAME READY STATUS RESTARTS IP
product-search-canary-7c6d5e4f3-a1b2c 1/1 Running 0 10.244.3.42
The canary Service points to the correct canary pod (10.244.3.42). But wait — check if there is another Service or endpoint interfering:
$ kubectl get endpoints -n prod | grep "10.244.3.42"
product-search-canary 10.244.3.42:8080 2h
wholesale-pricing-api 10.244.3.42:8080 2h
The canary pod's IP (10.244.3.42) is ALSO listed as an endpoint for wholesale-pricing-api. That should not be possible unless the pod matches the wholesale-pricing-api Service's selector.
Phase 2: Networking Investigation (Root Cause)¶
Check the wholesale-pricing-api Service:
$ kubectl get service wholesale-pricing-api -n prod -o yaml | grep -A5 "selector"
selector:
app: product-search
tier: pricing
$ kubectl get pods -n prod -l "app=product-search,tier=pricing"
NAME READY STATUS RESTARTS
product-search-canary-7c6d5e4f3-a1b2c 1/1 Running 0
wholesale-pricing-api-8d7e6f5a4-j9k8l 1/1 Running 0
The canary pod matches the wholesale-pricing-api selector. Check the pod's labels:
$ kubectl get pod product-search-canary-7c6d5e4f3-a1b2c -n prod --show-labels
NAME LABELS
product-search-canary-7c6d5e4f3-a1b2c app=product-search,tier=pricing,version=v4.2.0,track=canary
The canary pod has app=product-search AND tier=pricing. The wholesale-pricing-api Service selects on app=product-search,tier=pricing. The canary pod inadvertently became an endpoint for the wholesale pricing API.
But why does the canary pod have tier=pricing? Check the canary deployment template:
$ kubectl get deployment product-search-canary -n prod -o yaml | grep -A10 "template:" | grep -A10 "labels"
labels:
app: product-search
tier: pricing
version: v4.2.0
track: canary
The tier: pricing label was copied from a deployment template. Check the stable deployment:
$ kubectl get deployment product-search-stable -n prod -o yaml | grep -A10 "template:" | grep -A10 "labels"
labels:
app: product-search
tier: frontend
version: v4.1.0
track: stable
The stable deployment has tier: frontend. The canary deployment template has tier: pricing — a copy-paste error from the wholesale pricing deployment template. This means:
- 10% of product search traffic goes to the canary pod via the canary Ingress (works correctly).
- The canary pod is ALSO serving wholesale pricing API requests because it matches the wholesale Service selector.
- The wholesale pricing API returns wholesale prices, and some users' requests are hitting the canary pod through the wholesale pricing endpoint instead of the product search endpoint.
The "wrong prices" are not from the canary's product search — they are from the wholesale pricing API responses leaking into the product search traffic due to endpoint contamination.
Domain Bridge: Why This Crossed Domains¶
Key insight: The symptom appeared as a canary deployment showing wrong data (devops_tooling), the root cause was a label mismatch causing the canary pod to join an unrelated Service's endpoints (networking), and the fix requires correcting the Kubernetes deployment template labels (kubernetes_ops). This is common because: Kubernetes Service selectors use label matching, and a pod can match multiple Services. Canary deployments create pods with modified labels, and a copy-paste error can cause a pod to join an unrelated Service. Standard canary metrics (error rate, latency) do not detect endpoint contamination because the pod serves requests correctly — it is just the wrong requests.
Root Cause¶
The canary deployment template had tier: pricing instead of tier: frontend due to a copy-paste error from the wholesale pricing deployment. This caused the canary pod to match the wholesale-pricing-api Service selector, making it an endpoint for both the canary product search AND the wholesale pricing API. When customers hit the canary via the pricing API's endpoint (load-balanced alongside the real pricing pods), they received wholesale prices instead of retail prices.