Skip to content

Remediation: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured

Immediate Fix (Kubernetes Ops — Domain C)

The fix is to correct the canary deployment labels and verify endpoint isolation.

Step 1: Fix the canary pod labels

$ kubectl patch deployment product-search-canary -n prod --type=json \
    -p='[{"op":"replace","path":"/spec/template/metadata/labels/tier","value":"frontend"}]'
deployment.apps/product-search-canary patched

Step 2: Verify the canary pod is removed from the wholesale Service

# Wait for the pod to be recreated with new labels
$ kubectl rollout status deployment/product-search-canary -n prod
deployment "product-search-canary" successfully rolled out

$ kubectl get endpoints wholesale-pricing-api -n prod
NAME                       ENDPOINTS                               AGE
wholesale-pricing-api      10.244.3.51:8080                        47d

# Only the actual wholesale pricing pod is in the endpoints now
$ kubectl get pods -n prod -l "app=product-search,tier=pricing" -o wide
NAME                                       READY   IP
wholesale-pricing-api-8d7e6f5a4-j9k8l      1/1     10.244.3.51

The canary pod is no longer an endpoint for the wholesale pricing API.

Step 3: Fix the deployment template in source control

In the Helm values or Kustomize overlay for the canary:

# devops/helm/grokdevops/templates/canary-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Release.Name }}-canary
spec:
  template:
    metadata:
      labels:
        app: {{ .Release.Name }}
        tier: frontend          # MUST match stable, not pricing
        track: canary
        version: {{ .Values.canary.version }}

Step 4: Add label validation to the CI pipeline

# .github/workflows/canary-deploy.yml
- name: Validate canary labels
  run: |
    TIER=$(yq '.spec.template.metadata.labels.tier' canary-deployment.yaml)
    STABLE_TIER=$(yq '.spec.template.metadata.labels.tier' deployment.yaml)
    if [ "$TIER" != "$STABLE_TIER" ]; then
      echo "ERROR: Canary tier label ($TIER) does not match stable ($STABLE_TIER)"
      exit 1
    fi

Verification

Domain A (DevOps Tooling) — Canary metrics and behavior correct

# Test the canary endpoint directly
$ for i in $(seq 1 10); do
    curl -s -H "Host: www.example.com" http://ingress-controller/v1/products/12345 | jq '.price'
done
14.99
14.99
14.99
14.99
14.99
14.99
14.99
14.99
14.99
14.99
# All responses show retail price (14.99), not wholesale (7.50)

Domain B (Networking) — Endpoint isolation verified

$ kubectl get endpoints -n prod | grep product-search
product-search-primary     10.244.3.31:8080,10.244.3.32:8080,...   14d
product-search-canary      10.244.3.52:8080                        2h

$ kubectl get endpoints -n prod | grep wholesale
wholesale-pricing-api      10.244.3.51:8080                        47d

No endpoint overlap between services.

Domain C (Kubernetes) — Labels correct

$ kubectl get pods -n prod -l track=canary --show-labels
NAME                                       LABELS
product-search-canary-9e8f7a6b5-l4m3n      app=product-search,tier=frontend,track=canary,version=v4.2.0

tier=frontend (correct), not tier=pricing.

Prevention

  • Monitoring: Add an endpoint overlap detector that alerts when a pod appears in multiple Service endpoints unexpectedly. Check kubectl get endpoints for IP addresses that appear in more than one endpoint.
- alert: PodInMultipleServices
  expr: |
    count by (pod_ip) (kube_endpoint_address_available) > 1
    unless count by (pod_ip) (kube_endpoint_address_available{service=~".*-canary|.*-primary"}) > 1
  for: 5m
  labels:
    severity: warning
  • Runbook: Canary deployment templates must be generated from the stable deployment template, not copied from other deployments. All labels except track and version must match the stable deployment exactly.

  • Architecture: Use a canary deployment tool (Flagger, Argo Rollouts) that automatically generates the canary deployment from the stable one, ensuring label consistency. Add an OPA/Gatekeeper policy that prevents pods from matching Service selectors they are not intended for.