Kubernetes Services & Ingress - Street Ops¶
What experienced Kubernetes operators know about networking that gets asked in interviews and matters when traffic isn't reaching your pods.
Service Not Routing to Pods¶
This is the #1 networking issue. You create a service, you create a deployment, but the service returns connection refused or times out.
Step 1: Check Endpoints¶
If endpoints are <none>, the service selector doesn't match any running pods.
# Check the service selector
kubectl get svc api-server -n production -o jsonpath='{.spec.selector}'
# Output: {"app":"api-server"}
# Check pod labels
kubectl get pods -n production --show-labels | grep api
# Compare — do the labels match?
Common mismatches:
- Service selector says app: api-server, pod label says app: api (typo)
- Service selector says app: api-server, but there's also a version: v2 that doesn't match
- Pods exist but none are in Ready state (readiness probe failing)
Step 2: Check Pod Readiness¶
Unready pods are excluded from endpoints.
kubectl get pods -n production -l app=api-server
# Look for pods with 0/1 READY
kubectl describe pod api-server-abc123 -n production
# Look for readiness probe failures in Events
Step 3: Check Port Mismatch¶
# Service targetPort must match what the container is listening on
kubectl get svc api-server -n production -o jsonpath='{.spec.ports[*].targetPort}'
# Returns: 8000
# Verify the container actually listens on 8000
kubectl exec -it api-server-abc123 -n production -- ss -tlnp
# Or: netstat -tlnp
Step 4: Test Connectivity¶
# From another pod in the same namespace
kubectl run debug --image=busybox:1.36 --restart=Never --rm -it -- wget -qO- http://api-server:80
# From a different namespace
kubectl run debug --image=busybox:1.36 -n default --restart=Never --rm -it -- wget -qO- http://api-server.production.svc.cluster.local:80
Debugging 502/503 from Ingress¶
An ingress returning 502 means the ingress controller can't reach the backend. 503 means no backend is available.
NGINX Ingress 502 Debugging¶
# Check ingress controller logs
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=100
# Look for:
# "upstream prematurely closed connection" — backend crashed or timed out
# "no live upstreams" — no healthy endpoints
# "connect() failed (111: Connection refused)" — wrong port or pod not listening
# Check the ingress resource
kubectl describe ingress api-ingress -n production
# Look at: Rules, Backend, and Annotations
# Verify the backend service has endpoints
kubectl get endpoints api-v1 -n production
Common 502 Causes¶
- Pod not ready: Readiness probe failing, so endpoints are empty
- Port mismatch: Ingress backend port doesn't match the service port
- Container crash: Pod is in CrashLoopBackOff, service has no ready endpoints
- Timeout: Backend is too slow, ingress proxy times out
# Increase backend timeout (nginx ingress)
# Add annotation to the Ingress resource:
# nginx.ingress.kubernetes.io/proxy-read-timeout: "120"
# nginx.ingress.kubernetes.io/proxy-send-timeout: "120"
Common 503 Causes¶
- No backend defined: Ingress rule has no matching service
- Service doesn't exist: Typo in service name
- Rate limiting: Some ingress controllers return 503 when rate limits are exceeded
DNS Resolution Debugging¶
The Standard DNS Debug Flow¶
# 1. Can the pod resolve anything?
kubectl exec -it debug-pod -- nslookup kubernetes.default
# If this fails → CoreDNS is broken
# 2. Can it resolve cluster services?
kubectl exec -it debug-pod -- nslookup api-server.production.svc.cluster.local
# If this fails → service doesn't exist or CoreDNS can't find it
# 3. Can it resolve external names?
kubectl exec -it debug-pod -- nslookup google.com
# If this fails → CoreDNS upstream forwarding is broken
Check CoreDNS Health¶
# Are CoreDNS pods running?
kubectl get pods -n kube-system -l k8s-app=kube-dns
# CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=50
# Common error: "plugin/forward: no nameservers found"
# Means CoreDNS can't reach upstream DNS (check node /etc/resolv.conf)
Check Pod DNS Configuration¶
# See what resolv.conf the pod has
kubectl exec -it debug-pod -- cat /etc/resolv.conf
# Expected:
# nameserver 10.96.0.10 (CoreDNS service IP)
# search default.svc.cluster.local svc.cluster.local cluster.local
# options ndots:5
If the nameserver IP is wrong or missing, check the kubelet --cluster-dns flag.
Dedicated DNS Debug Pod¶
kubectl run dnsutils --image=registry.k8s.io/e2e-test-images/jessie-dnsutils:1.3 \
--restart=Never -- sleep 3600
kubectl exec -it dnsutils -- nslookup api-server.production.svc.cluster.local
kubectl exec -it dnsutils -- dig api-server.production.svc.cluster.local +short
kubectl exec -it dnsutils -- dig @10.96.0.10 api-server.production.svc.cluster.local
NodePort Not Accessible¶
You created a NodePort service but can't reach it from outside.
Checklist¶
# 1. Is the service actually NodePort?
kubectl get svc api-server -n production
# TYPE should be NodePort, PORTS should show something like 80:30080/TCP
# 2. Is the node reachable on that port?
# From your machine:
curl http://<node-external-ip>:30080
# 3. Is there a firewall/security group blocking?
# AWS: check the security group attached to the node instances
# GCP: check the firewall rules for the node network
# On-prem: check iptables on the node
# On the node itself, verify kube-proxy is listening
ss -tlnp | grep 30080
# Check iptables rules for the NodePort
iptables-save | grep 30080
Common causes:
- Cloud security groups don't allow the NodePort range (30000-32767)
- Node has a host firewall (UFW, firewalld) blocking the port
- kube-proxy is not running on the node
- The service has externalTrafficPolicy: Local and there are no pods on the node you're hitting
LoadBalancer Stuck in Pending¶
Cloud Provider Issues¶
# Check events on the service
kubectl describe svc api-server -n production
# Look for events from the cloud controller
# Common errors:
# "Error creating load balancer" — IAM permissions, quota exceeded
# "Error syncing load balancer" — subnet issues, tag conflicts
AWS: Check that the cloud controller manager has the right IAM role. Check that subnets are tagged with kubernetes.io/cluster/<cluster-name>.
GCP: Check that the project has the Compute Engine API enabled and quotas aren't exhausted.
On-prem (bare metal): You need MetalLB or similar. Without a cloud provider, there's nothing to provision a load balancer. The service stays in Pending forever.
Ingress TLS Not Working¶
HTTPS isn't working on your ingress. Traffic falls back to HTTP or returns a certificate error.
Check the TLS Secret¶
# Does the secret exist in the right namespace?
kubectl get secret api-tls-secret -n production
# Is it the right type?
kubectl get secret api-tls-secret -n production -o jsonpath='{.type}'
# Should be: kubernetes.io/tls
# Does it have the right keys?
kubectl get secret api-tls-secret -n production -o jsonpath='{.data}' | jq 'keys'
# Should have: tls.crt and tls.key
Verify the Certificate¶
# Decode and check the cert
kubectl get secret api-tls-secret -n production -o jsonpath='{.data.tls\.crt}' | \
base64 -d | openssl x509 -text -noout
# Check:
# - Subject/SAN matches the hostname in the ingress rule
# - Not expired (check Not Before / Not After)
# - Issued by a trusted CA (or you need to trust the CA)
Common TLS Issues¶
- Secret in wrong namespace: TLS secret must be in the same namespace as the Ingress
- Hostname mismatch: Certificate is for
*.example.combut ingress host isapi.staging.example.com - Certificate chain incomplete: Missing intermediate CA certificates in
tls.crt - cert-manager not issuing: Check cert-manager logs and Certificate/CertificateRequest resources
# If using cert-manager
kubectl get certificate -n production
kubectl describe certificate api-cert -n production
kubectl get certificaterequest -n production
kubectl logs -n cert-manager -l app=cert-manager --tail=50
Network Policy Blocking Traffic¶
After applying a network policy, some traffic stops working.
Identify What's Blocked¶
# List all network policies in the namespace
kubectl get networkpolicy -n production
# Inspect a specific policy
kubectl describe networkpolicy allow-api -n production
Test Connectivity¶
# From the source pod, try to reach the destination
kubectl exec -it frontend-pod -n production -- wget -qO- --timeout=5 http://api-server:80
# If it hangs/times out → network policy is blocking
# Verify without network policies (temporarily delete to test)
# DON'T do this in production — use a staging namespace
Common Network Policy Mistakes¶
- Default deny without DNS exception: You block all egress but forget to allow port 53. Every pod in the namespace can't resolve DNS.
# Always include this in your egress rules:
egress:
- to:
- namespaceSelector: {}
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
- Namespace selector mismatch: The namespace you're selecting from doesn't have the label you're matching on.
# Check namespace labels
kubectl get namespace production --show-labels
# If the policy uses namespaceSelector: {matchLabels: {env: production}}
# the namespace needs that label
kubectl label namespace production env=production
- AND vs OR confusion: A single
fromentry with bothpodSelectorandnamespaceSelectoris AND'd. Separate entries are OR'd. Getting this wrong either blocks too much or allows too much.
Service Mesh vs Ingress¶
When people ask "should we use a service mesh?" they're usually conflating it with ingress.
Ingress controller — handles north-south traffic (external to cluster). Terminates TLS, routes by host/path, load balances to services.
Service mesh (Istio, Linkerd, Cilium Service Mesh) — handles east-west traffic (service to service within the cluster). Adds mutual TLS, retries, circuit breaking, observability, traffic policies between services.
You need an ingress controller. You might need a service mesh. Don't add a mesh just for mTLS — simpler alternatives exist (cert-manager + application-level TLS, network policies).
Canary Deployments with Ingress¶
NGINX Ingress Canary¶
# Primary ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-primary
namespace: production
spec:
ingressClassName: nginx
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-stable
port:
number: 80
---
# Canary ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-canary
namespace: production
annotations:
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "10"
spec:
ingressClassName: nginx
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-canary
port:
number: 80
10% of traffic goes to the canary. Increase the weight gradually, monitoring error rates.
Gateway API Traffic Splitting¶
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: api-canary-route
spec:
parentRefs:
- name: production-gw
hostnames:
- api.example.com
rules:
- backendRefs:
- name: api-stable
port: 80
weight: 90
- name: api-canary
port: 80
weight: 10
Debugging kube-proxy iptables Rules¶
When service routing isn't working and you've confirmed endpoints exist, the problem might be in kube-proxy's iptables rules.
# On the node, dump all kube-proxy rules
iptables-save | grep -c "KUBE-"
# Thousands of rules is normal
# Find rules for a specific service (by ClusterIP)
iptables-save | grep "10.96.45.12"
# Find rules by service name (kube-proxy adds comments)
iptables-save | grep "api-server"
# Check if kube-proxy is running and healthy
kubectl get pods -n kube-system -l k8s-app=kube-proxy
kubectl logs -n kube-system -l k8s-app=kube-proxy --tail=20
IPVS Debugging¶
# List all virtual servers (services)
ipvsadm -Ln
# Show a specific service
ipvsadm -Ln -t 10.96.45.12:80
# Check connection tracking
ipvsadm -Lnc | head -20
# If IPVS entries are stale, restart kube-proxy
kubectl rollout restart daemonset kube-proxy -n kube-system
External DNS for Automatic DNS Records¶
ExternalDNS watches Services and Ingresses and creates DNS records in your DNS provider (Route53, CloudFlare, Google Cloud DNS, etc.).
# Annotate a service to get an automatic DNS record
apiVersion: v1
kind: Service
metadata:
name: api-server
annotations:
external-dns.alpha.kubernetes.io/hostname: api.example.com
external-dns.alpha.kubernetes.io/ttl: "300"
spec:
type: LoadBalancer
# ...
# Check ExternalDNS logs
kubectl logs -n external-dns -l app=external-dns --tail=50
# Common issues:
# - IAM permissions to modify Route53
# - Domain filter too restrictive
# - TXT ownership records conflicting
Ingress with ExternalDNS¶
ExternalDNS can also read hostnames from Ingress resources:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
annotations:
external-dns.alpha.kubernetes.io/hostname: api.example.com
spec:
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-server
port:
number: 80
ExternalDNS creates an A record (for LoadBalancer IP) or CNAME (for LoadBalancer hostname) pointing api.example.com to the ingress controller's external address.