Remediation: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured¶
Immediate Fix (Kubernetes Ops — Domain C)¶
The fix is to correct the canary deployment labels and verify endpoint isolation.
Step 1: Fix the canary pod labels¶
$ kubectl patch deployment product-search-canary -n prod --type=json \
-p='[{"op":"replace","path":"/spec/template/metadata/labels/tier","value":"frontend"}]'
deployment.apps/product-search-canary patched
Step 2: Verify the canary pod is removed from the wholesale Service¶
# Wait for the pod to be recreated with new labels
$ kubectl rollout status deployment/product-search-canary -n prod
deployment "product-search-canary" successfully rolled out
$ kubectl get endpoints wholesale-pricing-api -n prod
NAME ENDPOINTS AGE
wholesale-pricing-api 10.244.3.51:8080 47d
# Only the actual wholesale pricing pod is in the endpoints now
$ kubectl get pods -n prod -l "app=product-search,tier=pricing" -o wide
NAME READY IP
wholesale-pricing-api-8d7e6f5a4-j9k8l 1/1 10.244.3.51
The canary pod is no longer an endpoint for the wholesale pricing API.
Step 3: Fix the deployment template in source control¶
In the Helm values or Kustomize overlay for the canary:
# devops/helm/grokdevops/templates/canary-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ .Release.Name }}-canary
spec:
template:
metadata:
labels:
app: {{ .Release.Name }}
tier: frontend # MUST match stable, not pricing
track: canary
version: {{ .Values.canary.version }}
Step 4: Add label validation to the CI pipeline¶
# .github/workflows/canary-deploy.yml
- name: Validate canary labels
run: |
TIER=$(yq '.spec.template.metadata.labels.tier' canary-deployment.yaml)
STABLE_TIER=$(yq '.spec.template.metadata.labels.tier' deployment.yaml)
if [ "$TIER" != "$STABLE_TIER" ]; then
echo "ERROR: Canary tier label ($TIER) does not match stable ($STABLE_TIER)"
exit 1
fi
Verification¶
Domain A (DevOps Tooling) — Canary metrics and behavior correct¶
# Test the canary endpoint directly
$ for i in $(seq 1 10); do
curl -s -H "Host: www.example.com" http://ingress-controller/v1/products/12345 | jq '.price'
done
14.99
14.99
14.99
14.99
14.99
14.99
14.99
14.99
14.99
14.99
# All responses show retail price (14.99), not wholesale (7.50)
Domain B (Networking) — Endpoint isolation verified¶
$ kubectl get endpoints -n prod | grep product-search
product-search-primary 10.244.3.31:8080,10.244.3.32:8080,... 14d
product-search-canary 10.244.3.52:8080 2h
$ kubectl get endpoints -n prod | grep wholesale
wholesale-pricing-api 10.244.3.51:8080 47d
No endpoint overlap between services.
Domain C (Kubernetes) — Labels correct¶
$ kubectl get pods -n prod -l track=canary --show-labels
NAME LABELS
product-search-canary-9e8f7a6b5-l4m3n app=product-search,tier=frontend,track=canary,version=v4.2.0
tier=frontend (correct), not tier=pricing.
Prevention¶
- Monitoring: Add an endpoint overlap detector that alerts when a pod appears in multiple Service endpoints unexpectedly. Check
kubectl get endpointsfor IP addresses that appear in more than one endpoint.
- alert: PodInMultipleServices
expr: |
count by (pod_ip) (kube_endpoint_address_available) > 1
unless count by (pod_ip) (kube_endpoint_address_available{service=~".*-canary|.*-primary"}) > 1
for: 5m
labels:
severity: warning
-
Runbook: Canary deployment templates must be generated from the stable deployment template, not copied from other deployments. All labels except
trackandversionmust match the stable deployment exactly. -
Architecture: Use a canary deployment tool (Flagger, Argo Rollouts) that automatically generates the canary deployment from the stable one, ensuring label consistency. Add an OPA/Gatekeeper policy that prevents pods from matching Service selectors they are not intended for.