Solution¶
Triage¶
- Check the Service and its selector:
- Check the endpoints:
- List pods with their labels:
- Compare the selector label key/value with the actual pod labels.
Root Cause¶
The Service selector uses app: user-svc but the pods have the label app: user-service. This mismatch occurred when the junior engineer recreated the Service manifest from memory and used a shortened label value. Since no pods match the selector, the Endpoints object has empty subsets, and the Service has nothing to route traffic to.
The Service itself is functional (has a ClusterIP, port is configured), but with no backends, any connection to the ClusterIP is immediately refused.
Fix¶
-
Update the Service selector to match the pod labels:
Or edit the manifest and apply: -
Verify endpoints are now populated:
Expected output should show pod IPs and ports. -
Test connectivity:
Rollback / Safety¶
- Changing a Service selector is non-disruptive; it takes effect immediately.
- If the wrong pods are selected, traffic could be routed to the wrong backend. Verify pod labels are unique per service.
- Update the source manifest (Helm chart, Kustomize, etc.) to prevent the fix from being overwritten on next deploy.
Common Traps¶
- Checking only the Service, not the Endpoints. Always check
kubectl get endpointsfirst when debugging service connectivity. - Assuming pods are the problem when they show Running/Ready. If pods are healthy, the issue is almost always in the Service selector or readiness probes.
- Label key typos vs. value typos. Both cause mismatches.
App: user-service(capital A) does not matchapp: user-service. - Multiple labels in selector. The selector is AND logic. All labels must match. A single mismatch on any label key-value pair means no match.
- Not checking targetPort. Even with correct selectors, if
targetPortdoes not match the container's listening port, connections will be refused despite endpoints being populated.