Skip to content

Investigation: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured

Phase 1: DevOps Tooling Investigation (Dead End)

Check the canary deployment status:

$ kubectl get deployment -n prod -l app=product-search
NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
product-search-stable           9/9     9            9           14d
product-search-canary           1/1     1            1           2h

$ kubectl get canary product-search -n prod
NAME             STATUS      WEIGHT   LASTTRANSITIONTIME
product-search   Progressing   10     2026-03-19T14:30:00Z

Canary is at 10% weight, progressing normally. Check the Flagger canary analysis:

$ kubectl describe canary product-search -n prod | grep -A10 "Status"
Status:
  Canary Weight: 10
  Failed Checks: 0
  Phase: Progressing
  Conditions:
    - Type: Promoted
      Status: False
    - Message: "Canary analysis passed: error-rate 0.00% < 1%, latency p99 82ms < 500ms"

Zero failed checks. All metrics pass. Review the v4.2.0 code diff:

$ git diff v4.1.0..v4.2.0 -- src/
# Changes:
# - Updated search algorithm (relevance scoring)
# - Added new filter for category-based search
# - Updated UI template for search results page
# No pricing-related changes

No pricing changes in the code. But users on the canary see wholesale prices. The canary code is not the problem.

Check what the canary is actually serving:

$ kubectl exec product-search-canary-7c6d5e4f3-a1b2c -n prod -- \
    curl -s http://localhost:8080/v1/products/12345 | jq '.price'
14.99

The canary pod returns the correct retail price when called directly. The pricing issue is not in the canary application.

The Pivot

If the canary's application is returning correct prices, but users hitting the canary see wrong prices, the problem must be in the routing — traffic is going to the wrong backend. Check the Ingress configuration:

$ kubectl get ingress product-search -n prod -o yaml | grep -A20 "rules"
spec:
  rules:
  - host: www.example.com
    http:
      paths:
      - path: /v1/products
        pathType: Prefix
        backend:
          service:
            name: product-search-primary
            port:
              number: 8080
      - path: /v1/products
        pathType: Prefix
        backend:
          service:
            name: product-search-canary
            port:
              number: 8080

Two backends for the same path — the canary annotation should handle the weight split. Check the canary-specific Ingress:

$ kubectl get ingress product-search-canary -n prod -o yaml
metadata:
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "10"
spec:
  rules:
  - host: www.example.com
    http:
      paths:
      - path: /v1/products
        pathType: Prefix
        backend:
          service:
            name: product-search-canary
            port:
              number: 8080

The canary Ingress looks correct. But check what Service product-search-canary actually points to:

$ kubectl get service product-search-canary -n prod -o yaml | grep -A5 "selector"
  selector:
    app: product-search-canary

$ kubectl get endpoints product-search-canary -n prod
NAME                       ENDPOINTS           AGE
product-search-canary      10.244.3.42:8080    2h

$ kubectl get pods -n prod -l app=product-search-canary -o wide
NAME                                       READY   STATUS    RESTARTS   IP
product-search-canary-7c6d5e4f3-a1b2c      1/1     Running   0          10.244.3.42

The canary Service points to the correct canary pod (10.244.3.42). But wait — check if there is another Service or endpoint interfering:

$ kubectl get endpoints -n prod | grep "10.244.3.42"
product-search-canary      10.244.3.42:8080    2h
wholesale-pricing-api      10.244.3.42:8080    2h

The canary pod's IP (10.244.3.42) is ALSO listed as an endpoint for wholesale-pricing-api. That should not be possible unless the pod matches the wholesale-pricing-api Service's selector.

Phase 2: Networking Investigation (Root Cause)

Check the wholesale-pricing-api Service:

$ kubectl get service wholesale-pricing-api -n prod -o yaml | grep -A5 "selector"
  selector:
    app: product-search
    tier: pricing

$ kubectl get pods -n prod -l "app=product-search,tier=pricing"
NAME                                       READY   STATUS    RESTARTS
product-search-canary-7c6d5e4f3-a1b2c      1/1     Running   0
wholesale-pricing-api-8d7e6f5a4-j9k8l      1/1     Running   0

The canary pod matches the wholesale-pricing-api selector. Check the pod's labels:

$ kubectl get pod product-search-canary-7c6d5e4f3-a1b2c -n prod --show-labels
NAME                                       LABELS
product-search-canary-7c6d5e4f3-a1b2c      app=product-search,tier=pricing,version=v4.2.0,track=canary

The canary pod has app=product-search AND tier=pricing. The wholesale-pricing-api Service selects on app=product-search,tier=pricing. The canary pod inadvertently became an endpoint for the wholesale pricing API.

But why does the canary pod have tier=pricing? Check the canary deployment template:

$ kubectl get deployment product-search-canary -n prod -o yaml | grep -A10 "template:" | grep -A10 "labels"
      labels:
        app: product-search
        tier: pricing
        version: v4.2.0
        track: canary

The tier: pricing label was copied from a deployment template. Check the stable deployment:

$ kubectl get deployment product-search-stable -n prod -o yaml | grep -A10 "template:" | grep -A10 "labels"
      labels:
        app: product-search
        tier: frontend
        version: v4.1.0
        track: stable

The stable deployment has tier: frontend. The canary deployment template has tier: pricing — a copy-paste error from the wholesale pricing deployment template. This means:

  1. 10% of product search traffic goes to the canary pod via the canary Ingress (works correctly).
  2. The canary pod is ALSO serving wholesale pricing API requests because it matches the wholesale Service selector.
  3. The wholesale pricing API returns wholesale prices, and some users' requests are hitting the canary pod through the wholesale pricing endpoint instead of the product search endpoint.

The "wrong prices" are not from the canary's product search — they are from the wholesale pricing API responses leaking into the product search traffic due to endpoint contamination.

Domain Bridge: Why This Crossed Domains

Key insight: The symptom appeared as a canary deployment showing wrong data (devops_tooling), the root cause was a label mismatch causing the canary pod to join an unrelated Service's endpoints (networking), and the fix requires correcting the Kubernetes deployment template labels (kubernetes_ops). This is common because: Kubernetes Service selectors use label matching, and a pod can match multiple Services. Canary deployments create pods with modified labels, and a copy-paste error can cause a pod to join an unrelated Service. Standard canary metrics (error rate, latency) do not detect endpoint contamination because the pod serves requests correctly — it is just the wrong requests.

Root Cause

The canary deployment template had tier: pricing instead of tier: frontend due to a copy-paste error from the wholesale pricing deployment. This caused the canary pod to match the wholesale-pricing-api Service selector, making it an endpoint for both the canary product search AND the wholesale pricing API. When customers hit the canary via the pricing API's endpoint (load-balanced alongside the real pricing pods), they received wholesale prices instead of retail prices.