Skip to content

Kubernetes Services & Ingress - Street Ops

What experienced Kubernetes operators know about networking that gets asked in interviews and matters when traffic isn't reaching your pods.


Service Not Routing to Pods

This is the #1 networking issue. You create a service, you create a deployment, but the service returns connection refused or times out.

Step 1: Check Endpoints

kubectl get endpoints api-server -n production

If endpoints are <none>, the service selector doesn't match any running pods.

# Check the service selector
kubectl get svc api-server -n production -o jsonpath='{.spec.selector}'
# Output: {"app":"api-server"}

# Check pod labels
kubectl get pods -n production --show-labels | grep api

# Compare — do the labels match?

Common mismatches: - Service selector says app: api-server, pod label says app: api (typo) - Service selector says app: api-server, but there's also a version: v2 that doesn't match - Pods exist but none are in Ready state (readiness probe failing)

Step 2: Check Pod Readiness

Unready pods are excluded from endpoints.

kubectl get pods -n production -l app=api-server
# Look for pods with 0/1 READY

kubectl describe pod api-server-abc123 -n production
# Look for readiness probe failures in Events

Step 3: Check Port Mismatch

# Service targetPort must match what the container is listening on
kubectl get svc api-server -n production -o jsonpath='{.spec.ports[*].targetPort}'
# Returns: 8000

# Verify the container actually listens on 8000
kubectl exec -it api-server-abc123 -n production -- ss -tlnp
# Or: netstat -tlnp

Step 4: Test Connectivity

# From another pod in the same namespace
kubectl run debug --image=busybox:1.36 --restart=Never --rm -it -- wget -qO- http://api-server:80

# From a different namespace
kubectl run debug --image=busybox:1.36 -n default --restart=Never --rm -it -- wget -qO- http://api-server.production.svc.cluster.local:80

Debugging 502/503 from Ingress

An ingress returning 502 means the ingress controller can't reach the backend. 503 means no backend is available.

NGINX Ingress 502 Debugging

# Check ingress controller logs
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=100

# Look for:
# "upstream prematurely closed connection" — backend crashed or timed out
# "no live upstreams" — no healthy endpoints
# "connect() failed (111: Connection refused)" — wrong port or pod not listening
# Check the ingress resource
kubectl describe ingress api-ingress -n production
# Look at: Rules, Backend, and Annotations

# Verify the backend service has endpoints
kubectl get endpoints api-v1 -n production

Common 502 Causes

  1. Pod not ready: Readiness probe failing, so endpoints are empty
  2. Port mismatch: Ingress backend port doesn't match the service port
  3. Container crash: Pod is in CrashLoopBackOff, service has no ready endpoints
  4. Timeout: Backend is too slow, ingress proxy times out
# Increase backend timeout (nginx ingress)
# Add annotation to the Ingress resource:
#   nginx.ingress.kubernetes.io/proxy-read-timeout: "120"
#   nginx.ingress.kubernetes.io/proxy-send-timeout: "120"

Common 503 Causes

  1. No backend defined: Ingress rule has no matching service
  2. Service doesn't exist: Typo in service name
  3. Rate limiting: Some ingress controllers return 503 when rate limits are exceeded

DNS Resolution Debugging

The Standard DNS Debug Flow

# 1. Can the pod resolve anything?
kubectl exec -it debug-pod -- nslookup kubernetes.default
# If this fails → CoreDNS is broken

# 2. Can it resolve cluster services?
kubectl exec -it debug-pod -- nslookup api-server.production.svc.cluster.local
# If this fails → service doesn't exist or CoreDNS can't find it

# 3. Can it resolve external names?
kubectl exec -it debug-pod -- nslookup google.com
# If this fails → CoreDNS upstream forwarding is broken

Check CoreDNS Health

# Are CoreDNS pods running?
kubectl get pods -n kube-system -l k8s-app=kube-dns

# CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=50

# Common error: "plugin/forward: no nameservers found"
# Means CoreDNS can't reach upstream DNS (check node /etc/resolv.conf)

Check Pod DNS Configuration

# See what resolv.conf the pod has
kubectl exec -it debug-pod -- cat /etc/resolv.conf

# Expected:
# nameserver 10.96.0.10      (CoreDNS service IP)
# search default.svc.cluster.local svc.cluster.local cluster.local
# options ndots:5

If the nameserver IP is wrong or missing, check the kubelet --cluster-dns flag.

Dedicated DNS Debug Pod

kubectl run dnsutils --image=registry.k8s.io/e2e-test-images/jessie-dnsutils:1.3 \
  --restart=Never -- sleep 3600

kubectl exec -it dnsutils -- nslookup api-server.production.svc.cluster.local
kubectl exec -it dnsutils -- dig api-server.production.svc.cluster.local +short
kubectl exec -it dnsutils -- dig @10.96.0.10 api-server.production.svc.cluster.local

NodePort Not Accessible

You created a NodePort service but can't reach it from outside.

Checklist

# 1. Is the service actually NodePort?
kubectl get svc api-server -n production
# TYPE should be NodePort, PORTS should show something like 80:30080/TCP

# 2. Is the node reachable on that port?
# From your machine:
curl http://<node-external-ip>:30080

# 3. Is there a firewall/security group blocking?
# AWS: check the security group attached to the node instances
# GCP: check the firewall rules for the node network
# On-prem: check iptables on the node
# On the node itself, verify kube-proxy is listening
ss -tlnp | grep 30080

# Check iptables rules for the NodePort
iptables-save | grep 30080

Common causes: - Cloud security groups don't allow the NodePort range (30000-32767) - Node has a host firewall (UFW, firewalld) blocking the port - kube-proxy is not running on the node - The service has externalTrafficPolicy: Local and there are no pods on the node you're hitting


LoadBalancer Stuck in Pending

kubectl get svc api-server -n production
# EXTERNAL-IP shows <pending> for more than a few minutes

Cloud Provider Issues

# Check events on the service
kubectl describe svc api-server -n production
# Look for events from the cloud controller

# Common errors:
# "Error creating load balancer" — IAM permissions, quota exceeded
# "Error syncing load balancer" — subnet issues, tag conflicts

AWS: Check that the cloud controller manager has the right IAM role. Check that subnets are tagged with kubernetes.io/cluster/<cluster-name>.

GCP: Check that the project has the Compute Engine API enabled and quotas aren't exhausted.

On-prem (bare metal): You need MetalLB or similar. Without a cloud provider, there's nothing to provision a load balancer. The service stays in Pending forever.

# Quick MetalLB check
kubectl get pods -n metallb-system
kubectl get ipaddresspool -A

Ingress TLS Not Working

HTTPS isn't working on your ingress. Traffic falls back to HTTP or returns a certificate error.

Check the TLS Secret

# Does the secret exist in the right namespace?
kubectl get secret api-tls-secret -n production

# Is it the right type?
kubectl get secret api-tls-secret -n production -o jsonpath='{.type}'
# Should be: kubernetes.io/tls

# Does it have the right keys?
kubectl get secret api-tls-secret -n production -o jsonpath='{.data}' | jq 'keys'
# Should have: tls.crt and tls.key

Verify the Certificate

# Decode and check the cert
kubectl get secret api-tls-secret -n production -o jsonpath='{.data.tls\.crt}' | \
  base64 -d | openssl x509 -text -noout

# Check:
# - Subject/SAN matches the hostname in the ingress rule
# - Not expired (check Not Before / Not After)
# - Issued by a trusted CA (or you need to trust the CA)

Common TLS Issues

  1. Secret in wrong namespace: TLS secret must be in the same namespace as the Ingress
  2. Hostname mismatch: Certificate is for *.example.com but ingress host is api.staging.example.com
  3. Certificate chain incomplete: Missing intermediate CA certificates in tls.crt
  4. cert-manager not issuing: Check cert-manager logs and Certificate/CertificateRequest resources
# If using cert-manager
kubectl get certificate -n production
kubectl describe certificate api-cert -n production
kubectl get certificaterequest -n production
kubectl logs -n cert-manager -l app=cert-manager --tail=50

Network Policy Blocking Traffic

After applying a network policy, some traffic stops working.

Identify What's Blocked

# List all network policies in the namespace
kubectl get networkpolicy -n production

# Inspect a specific policy
kubectl describe networkpolicy allow-api -n production

Test Connectivity

# From the source pod, try to reach the destination
kubectl exec -it frontend-pod -n production -- wget -qO- --timeout=5 http://api-server:80
# If it hangs/times out → network policy is blocking

# Verify without network policies (temporarily delete to test)
# DON'T do this in production — use a staging namespace

Common Network Policy Mistakes

  1. Default deny without DNS exception: You block all egress but forget to allow port 53. Every pod in the namespace can't resolve DNS.
# Always include this in your egress rules:
egress:
  - to:
      - namespaceSelector: {}
    ports:
      - protocol: UDP
        port: 53
      - protocol: TCP
        port: 53
  1. Namespace selector mismatch: The namespace you're selecting from doesn't have the label you're matching on.
# Check namespace labels
kubectl get namespace production --show-labels
# If the policy uses namespaceSelector: {matchLabels: {env: production}}
# the namespace needs that label
kubectl label namespace production env=production
  1. AND vs OR confusion: A single from entry with both podSelector and namespaceSelector is AND'd. Separate entries are OR'd. Getting this wrong either blocks too much or allows too much.

Service Mesh vs Ingress

When people ask "should we use a service mesh?" they're usually conflating it with ingress.

Ingress controller — handles north-south traffic (external to cluster). Terminates TLS, routes by host/path, load balances to services.

Service mesh (Istio, Linkerd, Cilium Service Mesh) — handles east-west traffic (service to service within the cluster). Adds mutual TLS, retries, circuit breaking, observability, traffic policies between services.

You need an ingress controller. You might need a service mesh. Don't add a mesh just for mTLS — simpler alternatives exist (cert-manager + application-level TLS, network policies).


Canary Deployments with Ingress

NGINX Ingress Canary

# Primary ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-primary
  namespace: production
spec:
  ingressClassName: nginx
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: api-stable
                port:
                  number: 80
---
# Canary ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-canary
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "10"
spec:
  ingressClassName: nginx
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: api-canary
                port:
                  number: 80

10% of traffic goes to the canary. Increase the weight gradually, monitoring error rates.

Gateway API Traffic Splitting

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: api-canary-route
spec:
  parentRefs:
    - name: production-gw
  hostnames:
    - api.example.com
  rules:
    - backendRefs:
        - name: api-stable
          port: 80
          weight: 90
        - name: api-canary
          port: 80
          weight: 10

Debugging kube-proxy iptables Rules

When service routing isn't working and you've confirmed endpoints exist, the problem might be in kube-proxy's iptables rules.

# On the node, dump all kube-proxy rules
iptables-save | grep -c "KUBE-"
# Thousands of rules is normal

# Find rules for a specific service (by ClusterIP)
iptables-save | grep "10.96.45.12"

# Find rules by service name (kube-proxy adds comments)
iptables-save | grep "api-server"

# Check if kube-proxy is running and healthy
kubectl get pods -n kube-system -l k8s-app=kube-proxy
kubectl logs -n kube-system -l k8s-app=kube-proxy --tail=20

IPVS Debugging

# List all virtual servers (services)
ipvsadm -Ln

# Show a specific service
ipvsadm -Ln -t 10.96.45.12:80

# Check connection tracking
ipvsadm -Lnc | head -20

# If IPVS entries are stale, restart kube-proxy
kubectl rollout restart daemonset kube-proxy -n kube-system

External DNS for Automatic DNS Records

ExternalDNS watches Services and Ingresses and creates DNS records in your DNS provider (Route53, CloudFlare, Google Cloud DNS, etc.).

# Annotate a service to get an automatic DNS record
apiVersion: v1
kind: Service
metadata:
  name: api-server
  annotations:
    external-dns.alpha.kubernetes.io/hostname: api.example.com
    external-dns.alpha.kubernetes.io/ttl: "300"
spec:
  type: LoadBalancer
  # ...
# Check ExternalDNS logs
kubectl logs -n external-dns -l app=external-dns --tail=50

# Common issues:
# - IAM permissions to modify Route53
# - Domain filter too restrictive
# - TXT ownership records conflicting

Ingress with ExternalDNS

ExternalDNS can also read hostnames from Ingress resources:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-ingress
  annotations:
    external-dns.alpha.kubernetes.io/hostname: api.example.com
spec:
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: api-server
                port:
                  number: 80

ExternalDNS creates an A record (for LoadBalancer IP) or CNAME (for LoadBalancer hostname) pointing api.example.com to the ingress controller's external address.