Kubernetes Services: How Traffic Finds Your Pod

lesson
kubernetes-services
kube-proxy
iptables
ipvs
dns
coredns
ingress
gateway-api
load-balancing
tls-termination
debugging ---# Kubernetes Services: How Traffic Finds Your Pod

Topics: Kubernetes services, kube-proxy, iptables, IPVS, DNS, CoreDNS, Ingress, Gateway API, load balancing, TLS termination, debugging Level: L1–L2 (Foundations to Operations) Time: 75–90 minutes Prerequisites: None (networking fundamentals explained inline)

The Mission¶

It's 2pm on a Tuesday. You just deployed a new microservice — a payments API. The pod is running. The health check passes. But when you curl the service:

$ kubectl exec -it debug-pod -- curl http://payments.production.svc.cluster.local/health
curl: (7) Failed to connect to payments.production.svc.cluster.local port 80: Connection refused

The pod is healthy. The service exists. But traffic is not arriving. Somewhere between your curl command and the container, the packet is getting lost — or worse, actively rejected.

This lesson traces how traffic moves from a client inside the cluster all the way to a container's listening socket. You will learn every layer it passes through, how each layer can break, and how to diagnose each one. By the end, you will solve this mission — and know exactly where to look the next time traffic goes missing.

Part 1: What a Service Actually Is¶

Before we debug, we need to understand what we are debugging. A Kubernetes Service is not a load balancer. It is not a proxy process. It is a set of iptables rules (or IPVS entries) programmed into every node in your cluster.

When you create a Service, here is what actually happens — step by step:

The API server stores the Service object in etcd
A controller creates an Endpoints (or EndpointSlice) object listing the IPs of all pods matching the Service's label selector
kube-proxy on every node detects the new Service and Endpoints
kube-proxy programs iptables rules (or IPVS entries) that intercept packets destined for the Service's ClusterIP and rewrite them to a backend pod IP
CoreDNS detects the new Service and creates a DNS A record: payments.production.svc.cluster.local pointing to the ClusterIP

No daemon is listening on the ClusterIP. No process is proxying traffic. The ClusterIP is a virtual IP — it exists only as a destination address in iptables rules.

Under the Hood: The ClusterIP is allocated from a reserved CIDR range (typically 10.96.0.0/12 or 10.43.0.0/16 depending on your distribution). This range is never routed — no network interface has an address in it. Packets destined for a ClusterIP are intercepted by netfilter DNAT rules before they leave the sending node. The packet's destination is rewritten from the ClusterIP to a real pod IP, and only then does it get routed across the cluster network.

Mental Model: Think of a ClusterIP like a phone extension number at a big company. The extension number is not a real phone line — the PBX (iptables) intercepts the call and redirects it to a real desk phone (pod IP). If the PBX goes down, the extension stops working even though the desk phones are fine.

Part 2: The DNS Resolution Chain¶

When your curl command runs inside a pod, it doesn't start by sending a TCP packet to the ClusterIP. It starts by looking up the name payments.production.svc.cluster.local.

Here is the full chain:

┌──────────────┐    ┌──────────────┐    ┌──────────────┐    ┌────────────┐
│   Your Pod   │───▶│   CoreDNS    │───▶│  Service DB  │───▶│  ClusterIP │
│ /etc/resolv  │    │  (kube-dns   │    │  (watches    │    │  returned  │
│   .conf      │    │   service)   │    │   API for    │    │  10.96.x.y │
│  nameserver  │    │  10.96.0.10  │    │   changes)   │    │            │
│  10.96.0.10  │    │              │    │              │    │            │
└──────────────┘    └──────────────┘    └──────────────┘    └────────────┘

Let's look at each step.

Step 1: The pod reads /etc/resolv.conf¶

Every pod gets a /etc/resolv.conf injected by the kubelet:

nameserver 10.96.0.10
search production.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

Three things matter here:

Field	What it does
`nameserver 10.96.0.10`	All DNS queries go to CoreDNS (this is the ClusterIP of the `kube-dns` service)
`search ...`	Short names get each search domain appended before trying the bare name
`ndots:5`	If the name has fewer than 5 dots, try search domains first

Step 2: The ndots trap¶

Your pod resolves payments. That name has zero dots, which is less than 5. So the resolver tries, in order:

payments.production.svc.cluster.local — match (CoreDNS returns 10.96.45.12)

That worked on the first try. But what if you were resolving an external name like api.stripe.com? That has 2 dots — still less than 5:

api.stripe.com.production.svc.cluster.local — NXDOMAIN
api.stripe.com.svc.cluster.local — NXDOMAIN
api.stripe.com.cluster.local — NXDOMAIN
api.stripe.com. — success

Four DNS queries instead of one. At scale — thousands of pods making external calls — this 4x amplification can overwhelm CoreDNS.

Gotcha: The default ndots:5 is tuned for in-cluster service discovery where short names are common. If your pods make heavy external DNS queries, either lower ndots to 2 (dnsConfig.options) or use fully qualified names with a trailing dot: api.stripe.com. — the trailing dot tells the resolver "this is already a complete name, don't search."

Step 3: CoreDNS returns the ClusterIP¶

CoreDNS watches the Kubernetes API for Service objects. When it sees payments.production.svc.cluster.local, it returns the ClusterIP — not the pod IPs (unless the service is headless).

# Verify DNS resolution from a debug pod
kubectl exec -it debug-pod -- nslookup payments.production.svc.cluster.local

Server:    10.96.0.10
Address:   10.96.0.10#53

Name:   payments.production.svc.cluster.local
Address: 10.96.45.12

Trivia: CoreDNS replaced kube-dns as the default cluster DNS in Kubernetes 1.13 (December 2018). kube-dns was actually three containers in a pod — dnsmasq for caching, a Go sidecar for health checks, and a Go binary for the Kubernetes plugin. CoreDNS collapsed all of that into a single Go binary with a plugin architecture. The label k8s-app=kube-dns persisted for backward compatibility — even though the pods are running CoreDNS. You can verify with kubectl get pods -n kube-system -l k8s-app=kube-dns.

Flashcard Check #1: DNS and Services¶

Cover the answers. Test yourself.

Question	Answer
What does `ndots:5` mean?	If the hostname has fewer than 5 dots, try appending each search domain before querying the bare name
Where does a pod's DNS config come from?	The kubelet injects `/etc/resolv.conf` at pod creation time
What IP does the `nameserver` line in a pod point to?	The ClusterIP of the `kube-dns` service (CoreDNS)
What does CoreDNS return for a regular (non-headless) Service?	The Service's ClusterIP
What label do CoreDNS pods use, and why is it confusing?	`k8s-app=kube-dns` — the label kept its old name after CoreDNS replaced kube-dns for backward compatibility

Part 3: How kube-proxy Programs iptables¶

Now your pod has the ClusterIP (10.96.45.12). It sends a TCP SYN packet to 10.96.45.12:80. What happens next is pure netfilter magic.

kube-proxy runs as a DaemonSet on every node. When it sees a new Service with endpoints, it writes iptables rules that do DNAT — destination network address translation. The packet's destination is rewritten from the Service ClusterIP to a real pod IP, before the packet leaves the node.

Here is a simplified view of what those rules look like. On a real node, run:

# Dump all kube-proxy rules (there will be thousands)
iptables-save | grep -c "KUBE-"
# Output: 4,237

# Find the rules for a specific service (by ClusterIP)
iptables-save | grep "10.96.45.12"

Here is the actual chain of rules for a Service with two backend pods:

# Step 1: Packet arrives at PREROUTING or OUTPUT chain
-A KUBE-SERVICES -d 10.96.45.12/32 -p tcp -m tcp --dport 80 \
    -j KUBE-SVC-XYZABC123

# Step 2: KUBE-SVC chain does random load balancing
-A KUBE-SVC-XYZABC123 -m statistic --mode random --probability 0.50000 \
    -j KUBE-SEP-POD1HASH
-A KUBE-SVC-XYZABC123 \
    -j KUBE-SEP-POD2HASH

# Step 3: KUBE-SEP chains do the actual DNAT
-A KUBE-SEP-POD1HASH -p tcp -j DNAT --to-destination 10.244.1.15:8000
-A KUBE-SEP-POD2HASH -p tcp -j DNAT --to-destination 10.244.2.23:8000

Reading those rules line by line:

Rule	What it does
`KUBE-SERVICES`	Catches all packets going to any ClusterIP and jumps to the right service chain
`KUBE-SVC-*`	Picks a backend pod randomly. With 2 pods: 50/50. With 3: first gets 33%, second gets 50% of remaining, third gets the rest
`KUBE-SEP-*`	Rewrites the destination IP from 10.96.45.12 → 10.244.1.15 (a real pod IP)

After DNAT, the packet has a real destination IP and gets routed through the CNI network to the target pod's node.

Under the Hood: The --probability values in iptables create weighted random selection. For 3 backends: the first rule has probability 0.33333 (1/3), the second has 0.50000 (1/2 of remaining), and the third has no probability (catches everything left). This is stateless — there is no least-connections or round-robin. Every packet is an independent coin flip. The conntrack module ensures that once a TCP connection is established, all subsequent packets in that connection go to the same backend.

Remember: kube-proxy does not proxy traffic. Despite its name, in iptables mode it only programs rules and gets out of the way. The kernel handles everything at wire speed. In the early days of Kubernetes (before 1.2), kube-proxy actually was a userspace proxy — packets went from kernel to kube-proxy process and back. That was slow, so they moved to iptables mode and kept the name.

Part 4: IPVS Mode — The Other Path¶

If you have hundreds or thousands of services, iptables mode becomes a problem. Every Service adds rules that are evaluated linearly. At 5,000+ Services, the rule sync itself can take seconds, during which new connections may fail.

IPVS (IP Virtual Server) is the alternative. It uses kernel hash tables for O(1) lookup regardless of how many services exist.

# Check which mode kube-proxy is using
kubectl get configmap kube-proxy -n kube-system -o yaml | grep mode
# mode: ""          ← empty string means iptables (default)
# mode: "ipvs"     ← IPVS mode

# In IPVS mode, inspect virtual servers
ipvsadm -Ln
# IP Virtual Server version 1.2.1 (size=4096)
# Prot LocalAddress:Port Scheduler Flags
#   -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
# TCP  10.96.45.12:80 rr
#   -> 10.244.1.15:8000             Masq    1      3          0
#   -> 10.244.2.23:8000             Masq    1      2          0

Feature	iptables mode	IPVS mode
Lookup complexity	O(n) — linear scan	O(1) — hash table
Load balancing	Random only	Round-robin, least-connections, weighted, source hash
Connection draining	No	Yes
Performance at scale	Degrades past ~5,000 services	Constant
Debugging	`iptables-save \\| grep`	`ipvsadm -Ln`
Kernel modules needed	Always present	`ip_vs`, `ip_vs_rr`, `ip_vs_wrr`, `ip_vs_sh`, `nf_conntrack`

Gotcha: Switching from iptables to IPVS mode is not a live toggle. You need to edit the kube-proxy ConfigMap, ensure the IPVS kernel modules are loaded on every node, then restart kube-proxy. The old iptables rules linger until you clean them up. On managed clusters (EKS, GKE), you may not have the option — check your provider's docs.

Part 5: Solving the Mission¶

Let's go back to our broken payments service. Time to work the diagnostic ladder.

Step 1: Does the Service have endpoints?¶

kubectl get endpoints payments -n production

NAME       ENDPOINTS   AGE
payments   <none>      12m

<none>. The service has no endpoints. This means zero pods match the Service's label selector. This is the #1 cause of "connection refused" on a Kubernetes service.

Mental Model: No endpoints = no iptables rules = no DNAT = packet reaches the ClusterIP and the kernel has no idea what to do with it. Since nothing is listening on that virtual IP (nothing ever is — it is virtual), the kernel sends back RST. Connection refused.

Step 2: Compare selectors and labels¶

# What is the Service selecting?
kubectl get svc payments -n production -o jsonpath='{.spec.selector}'
# {"app":"payments-api"}

# What labels do the pods have?
kubectl get pods -n production --show-labels | grep payment
# payments-7d8f5b-x4k2j   1/1   Running   app=payments,version=v2

Found it. The Service selector says app: payments-api. The pod label says app: payments. One word difference. No endpoints. No traffic.

Step 3: Fix it¶

# Option A: Fix the Service selector
kubectl patch svc payments -n production -p '{"spec":{"selector":{"app":"payments"}}}'

# Option B: Fix the pod labels (via the Deployment)
kubectl patch deployment payments -n production \
  -p '{"spec":{"template":{"metadata":{"labels":{"app":"payments-api"}}}}}'

After fixing:

kubectl get endpoints payments -n production
# NAME       ENDPOINTS                         AGE
# payments   10.244.1.15:8000,10.244.2.23:8000   15m

kubectl exec -it debug-pod -- curl http://payments.production.svc.cluster.local/health
# {"status": "ok"}

War Story: At a mid-size fintech company, a deployment pipeline generated Service manifests from a template. The template used app: {{ .name }} for the selector but app: {{ .name }}-svc for the Service metadata name. For months, nobody noticed because the labels happened to match. Then someone renamed a service, the template generated mismatched selectors, and the payment processing service silently dropped 100% of traffic for 23 minutes during business hours. The fix was a CI check that runs kubectl get endpoints after every deployment and fails if any service has zero endpoints.

Flashcard Check #2: kube-proxy and Endpoints¶

Question	Answer
What does kube-proxy actually do in iptables mode?	Programs iptables DNAT rules on every node to redirect ClusterIP traffic to pod IPs. It does not proxy packets.
What does `kubectl get endpoints <svc>` showing `<none>` mean?	The Service's label selector doesn't match any running, ready pods
How does iptables mode select a backend when there are 3 pods?	Random selection using `--probability` rules: 1/3 chance for first, 1/2 of remaining for second, rest for third
Name two advantages of IPVS over iptables mode	O(1) lookup (vs O(n)), multiple load-balancing algorithms (round-robin, least-connections, etc.)
What happens when a packet reaches a ClusterIP but the Service has no endpoints?	No DNAT rule matches, packet hits the virtual IP where nothing listens, kernel returns RST (connection refused)

Part 6: Service Types — Four Ways to Expose¶

Not all services are internal. Here is how the four types build on each other:

                    ┌───────────────────┐
                    │   ExternalName    │  DNS alias, no proxying
                    └───────────────────┘

┌───────────────┐   ┌───────────────┐   ┌───────────────┐
│   ClusterIP   │ ◀─│   NodePort    │ ◀─│ LoadBalancer   │
│  Internal VIP │   │ ClusterIP +   │   │ NodePort +     │
│  iptables/    │   │ port on every │   │ cloud LB       │
│  IPVS rules   │   │ node          │   │ provisioned    │
└───────────────┘   └───────────────┘   └───────────────┘

Each type includes everything to its left. A LoadBalancer service also has a NodePort and a ClusterIP.

ClusterIP — internal only¶

The default. Everything we have covered so far. Accessible only from inside the cluster.

NodePort — every node listens¶

Opens a port (30000–32767) on every node in the cluster. Traffic arriving at <any-node-ip>:30080 gets forwarded to the Service, which then forwards to a pod.

kubectl get svc payments -n production
# NAME       TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)        AGE
# payments   NodePort   10.96.45.12   <none>        80:30080/TCP   1h

Gotcha: NodePort opens the port on all nodes, even nodes that don't have a matching pod. If externalTrafficPolicy is Cluster (the default), the node will forward traffic to a pod on another node — adding a network hop and losing the original source IP via SNAT. If you set externalTrafficPolicy: Local, the node only forwards to local pods — but if there are no local pods, traffic is dropped. Neither option is perfect.

LoadBalancer — cloud provider does the work¶

Asks the cloud provider (AWS, GCP, Azure) to provision an actual load balancer. The LB gets a public IP and forwards traffic to the NodePorts.

kubectl get svc payments -n production
# NAME       TYPE           CLUSTER-IP    EXTERNAL-IP      PORT(S)        AGE
# payments   LoadBalancer   10.96.45.12   203.0.113.42     80:30080/TCP   1h

On bare metal with no cloud provider, EXTERNAL-IP stays <pending> forever. You need MetalLB or similar to assign IPs from a local pool.

ExternalName — just a DNS trick¶

Maps a Service name to an external DNS name via CNAME. No iptables rules, no proxying.

apiVersion: v1
kind: Service
metadata:
  name: external-db
spec:
  type: ExternalName
  externalName: mydb.us-east-1.rds.amazonaws.com

Pods that resolve external-db.default.svc.cluster.local get a CNAME to the RDS endpoint. Useful for migrating from in-cluster to external databases without changing application code.

Headless Services — DNS returns pod IPs¶

Set clusterIP: None and DNS skips the virtual IP entirely. Instead, it returns A records for every pod matching the selector.

kubectl exec -it debug-pod -- nslookup db-headless.default.svc.cluster.local
# Name:    db-headless.default.svc.cluster.local
# Address: 10.244.1.5
# Address: 10.244.2.8
# Address: 10.244.3.12

Headless services are essential for StatefulSets. Each pod gets a stable DNS name: postgres-0.db-headless.default.svc.cluster.local. The client connects to a specific replica, not a random one.

Remember: Regular services: DNS returns one VIP, iptables picks the pod. Headless services: DNS returns all pod IPs, the client picks.

Part 7: Ingress — HTTP Routing into the Cluster¶

Services handle L4 (TCP/UDP). Ingress handles L7 (HTTP/HTTPS). An Ingress resource defines routing rules — but it does nothing by itself. You need an Ingress controller to implement them.

The Architecture¶

                Internet
                   │
                   ▼
         ┌─────────────────┐
         │  Cloud LB / IP  │     Single entry point
         └────────┬────────┘
                  │
                  ▼
         ┌─────────────────┐
         │ Ingress          │     Nginx/Traefik/HAProxy pod(s)
         │ Controller       │     Watches Ingress resources via API
         │ (nginx pod)      │     Generates nginx.conf dynamically
         └────────┬────────┘
                  │
        ┌─────────┼──────────┐
        ▼         ▼          ▼
   ┌────────┐ ┌────────┐ ┌────────┐
   │ Svc A  │ │ Svc B  │ │ Svc C  │    Backend Services
   │(api)   │ │(web)   │ │(admin) │
   └───┬────┘ └───┬────┘ └───┬────┘
       ▼          ▼          ▼
     Pods        Pods       Pods

The Ingress controller is just pods running nginx (or traefik, or HAProxy). It watches the Kubernetes API for Ingress resources and regenerates its routing config whenever something changes. It is exposed via a single LoadBalancer or NodePort service — one entry point for all your HTTP traffic.

Path-based vs Host-based Routing¶

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  namespace: production
spec:
  ingressClassName: nginx
  rules:
    # Host-based: different domains → different services
    - host: api.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: api-server
                port:
                  number: 80
    # Path-based: same domain, different paths → different services
    - host: www.example.com
      http:
        paths:
          - path: /api
            pathType: Prefix
            backend:
              service:
                name: api-server
                port:
                  number: 80
          - path: /
            pathType: Prefix
            backend:
              service:
                name: web-frontend
                port:
                  number: 80

Gotcha: Path type Prefix with path /api also matches /api-docs, /api2, and /api-anything. This is character-level prefix matching in nginx-ingress, not path-segment matching. If you want /api to match only /api and /api/*, use /api/ with a trailing slash, or use pathType: Exact for the base path. Gateway API's PathPrefix is segment-aware by specification — one of the reasons it was created.

TLS Termination¶

The Ingress controller can terminate TLS so your backend services don't need to handle certificates:

spec:
  tls:
    - hosts:
        - api.example.com
      secretName: api-tls-secret    # Must be in the same namespace
  rules:
    - host: api.example.com
      # ...

# Verify the TLS secret exists and is valid
kubectl get secret api-tls-secret -n production -o jsonpath='{.type}'
# Should output: kubernetes.io/tls

# Check the certificate's expiry and SANs
kubectl get secret api-tls-secret -n production \
  -o jsonpath='{.data.tls\.crt}' | base64 -d | \
  openssl x509 -text -noout | grep -A1 "Not After\|Subject Alternative"

Interview Bridge: "Explain TLS termination" is a common interview question. The answer: the TLS handshake happens at the Ingress controller (or load balancer), which decrypts the traffic and forwards plaintext HTTP to the backend. This offloads CPU-intensive crypto from your application pods and lets you manage certificates in one place. The tradeoff: traffic between the ingress controller and your pods is unencrypted inside the cluster network. For zero-trust environments, you add mTLS between services (service mesh) or use the backend-protocol: "HTTPS" annotation to re-encrypt.

Popular Controllers¶

Controller	Strengths	Best for
ingress-nginx	Most widely deployed, rich annotation set	General purpose, battle-tested
Traefik	Auto Let's Encrypt, middleware chains, built-in dashboard	Smaller clusters, edge deployments
HAProxy Ingress	Raw performance, TCP/UDP support, fine-grained rate limiting	High-throughput, non-HTTP workloads
AWS Load Balancer Controller	Provisions native ALBs, no in-cluster proxy	AWS-native, direct ALB-to-pod routing

Part 8: Gateway API — The Future¶

Gateway API reached GA (v1.0) in October 2023. It does not replace Ingress — both coexist. But new features are only being added to Gateway API.

The key difference: separation of concerns.

GatewayClass      →  "What implementation?" (infra team picks the controller)
Gateway           →  "What listeners?"      (platform team configures ports, TLS)
HTTPRoute         →  "What routing rules?"  (app team defines paths, backends)

Ingress mashes all three roles into a single resource with annotations. Gateway API splits them across resources with RBAC boundaries. The app developer can define routing without needing permission to change TLS certificates or listener ports.

# App developer only needs permission to create HTTPRoutes
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: payments-route
  namespace: production
spec:
  parentRefs:
    - name: production-gateway
      namespace: infra
  hostnames:
    - payments.example.com
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /v2
      backendRefs:
        - name: payments-v2
          port: 80
          weight: 90
        - name: payments-v2-canary
          port: 80
          weight: 10

Notice the weight field — traffic splitting for canary deployments is built into the spec, not hacked via controller-specific annotations.

Flashcard Check #3: Ingress and Gateway API¶

Question	Answer
What does an Ingress controller actually run as?	Pods (usually a Deployment) running nginx/traefik/HAProxy that watch the API for Ingress resources
What happens if you create an Ingress resource but no Ingress controller is installed?	Nothing. The Ingress resource sits inert in etcd. No routing happens.
What is the difference between path-based and host-based routing?	Host-based: different domains go to different services. Path-based: same domain, different URL paths go to different services.
How does Gateway API separate concerns vs Ingress?	Three resources (GatewayClass, Gateway, HTTPRoute) with different RBAC — infra, platform, and app teams each own their layer
How does TLS termination work at an Ingress controller?	The controller handles the TLS handshake and decryption, then forwards plaintext HTTP to backend pods

Part 9: EndpointSlices and Session Affinity¶

Two features worth knowing about that affect how traffic is distributed.

EndpointSlices¶

The original Endpoints resource stored all pod IPs in a single object. For a service with 1,000 pods, that is one large object that changes every time a pod scales up, down, or restarts. Every change triggers a full object sync to every node.

EndpointSlices split this into chunks of 100 (by default). Only the affected slice gets updated and synced.

# See EndpointSlices for a service
kubectl get endpointslices -l kubernetes.io/service-name=payments -n production
# NAME              ADDRESSTYPE   PORTS   ENDPOINTS         AGE
# payments-abc12    IPv4          8000    10.244.1.15,...    1h
# payments-def34    IPv4          8000    10.244.2.23,...    1h

You rarely interact with EndpointSlices directly. Just know they exist and that they replaced the old Endpoints resource for scalability.

Session Affinity¶

By default, each request can hit any backend pod. If your app stores session state in memory (not recommended, but common), you can pin clients:

spec:
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 10800    # 3 hours

Gotcha: Kubernetes session affinity is source-IP based. If all your traffic comes through a reverse proxy or NAT gateway, all clients share one source IP, and all traffic goes to one pod. For proper session affinity behind a proxy, use cookie-based affinity at the Ingress level:
annotations:
  nginx.ingress.kubernetes.io/affinity: "cookie"
  nginx.ingress.kubernetes.io/session-cookie-name: "route"

Part 10: The Full Debugging Playbook¶

When traffic is not reaching your pod, work this ladder. Each step eliminates one layer.

Step 1: Does the Service exist?           kubectl get svc <name> -n <ns>
Step 2: Does it have endpoints?           kubectl get endpoints <name> -n <ns>
Step 3: Are the pods Ready?               kubectl get pods -n <ns> -l <selector>
Step 4: Does the port match?              Compare targetPort with container listen port
Step 5: Is DNS resolving?                 kubectl exec debug -- nslookup <svc>.<ns>
Step 6: Can you reach the pod directly?   kubectl exec debug -- curl <pod-ip>:<port>
Step 7: Is a NetworkPolicy blocking?      kubectl get networkpolicy -n <ns>
Step 8: Is kube-proxy running?            kubectl get pods -n kube-system -l k8s-app=kube-proxy

Step-by-step for our mission¶

# Step 1: Service exists
kubectl get svc payments -n production
# NAME       TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE
# payments   ClusterIP   10.96.45.12   <none>        80/TCP    1h
# ✓ Exists

# Step 2: Endpoints
kubectl get endpoints payments -n production
# NAME       ENDPOINTS   AGE
# payments   <none>      1h
# ✗ No endpoints! This is the problem.

# Step 3: Why no endpoints? Check selector vs labels
kubectl get svc payments -n production -o jsonpath='{.spec.selector}'
# {"app":"payments-api"}

kubectl get pods -n production --show-labels | grep pay
# payments-7d8f5b-x4k2j   1/1   Running   app=payments
# ← Mismatch: selector wants "payments-api", pod has "payments"

# Step 4: Also check targetPort while we're here
kubectl get svc payments -n production -o jsonpath='{.spec.ports[0].targetPort}'
# 8000

kubectl exec -it payments-7d8f5b-x4k2j -n production -- ss -tlnp
# State   Recv-Q  Send-Q  Local Address:Port   Peer Address:Port
# LISTEN  0       128     0.0.0.0:8000          0.0.0.0:*
# ✓ Pod is listening on 8000, targetPort is 8000. Match.

# Step 5: Fix the selector
kubectl patch svc payments -n production -p '{"spec":{"selector":{"app":"payments"}}}'

# Step 6: Verify endpoints appeared
kubectl get endpoints payments -n production
# NAME       ENDPOINTS                           AGE
# payments   10.244.1.15:8000,10.244.2.23:8000   1h
# ✓ Endpoints populated

# Step 7: Verify connectivity
kubectl exec -it debug-pod -- curl http://payments.production.svc.cluster.local/health
# {"status": "ok"}

War Story: A SaaS company migrated from Docker Compose to Kubernetes. Their CI pipeline deployed 40 services. Service number 37 — a notification worker — had nodePort: 30037 hardcoded in every environment, including production. During a cluster expansion, a new monitoring agent was configured to use port 30037 on the host. The NodePort and the host port collided silently — kube-proxy's iptables rules won the race condition on some nodes but not others. Traffic to the notification service was intermittently routed to the monitoring agent. It took three days to diagnose because the failure was non-deterministic and the monitoring agent happened to return HTTP 200 on its health endpoint. The fix: stop hardcoding NodePorts. Let Kubernetes allocate them, or better yet, use an Ingress controller.

Exercises¶

Exercise 1: Find the Broken Service (2 minutes)¶

A service called cache-redis in namespace data returns connection refused. The pod is running and healthy. Describe the exact sequence of commands you would run, in order, to diagnose the problem.

Solution

# 1. Check endpoints
kubectl get endpoints cache-redis -n data
# If <none>, check selectors:

# 2. Compare selector and labels
kubectl get svc cache-redis -n data -o jsonpath='{.spec.selector}'
kubectl get pods -n data --show-labels | grep redis

# 3. If selectors match but endpoints still empty, check readiness
kubectl get pods -n data -l app=cache-redis
# Look for pods not in Ready state

# 4. If endpoints exist, check port match
kubectl get svc cache-redis -n data -o jsonpath='{.spec.ports[0].targetPort}'
kubectl exec -it <redis-pod> -n data -- ss -tlnp

Exercise 2: Trace the DNS Chain (5 minutes)¶

From inside a debug pod, determine: 1. What nameserver your pod is using 2. What search domains are configured 3. How many DNS queries are generated when you resolve api.stripe.com 4. How to make it generate only one query

Hints

- Check `/etc/resolv.conf` inside the pod - Count the search domains plus one (for the bare query) - A trailing dot on a hostname means "this is fully qualified"

Solution

kubectl exec -it debug-pod -- cat /etc/resolv.conf
# nameserver 10.96.0.10
# search default.svc.cluster.local svc.cluster.local cluster.local
# options ndots:5

# api.stripe.com has 2 dots (< 5), so:
# Query 1: api.stripe.com.default.svc.cluster.local  → NXDOMAIN
# Query 2: api.stripe.com.svc.cluster.local          → NXDOMAIN
# Query 3: api.stripe.com.cluster.local               → NXDOMAIN
# Query 4: api.stripe.com.                            → success
# Total: 4 queries

# To make it one query: use a trailing dot
kubectl exec -it debug-pod -- nslookup api.stripe.com.
# Or lower ndots in the pod spec:
# dnsConfig:
#   options:
#     - name: ndots
#       value: "2"

Exercise 3: Design the Ingress (10 minutes)¶

You have three services: api (port 80), web (port 3000), and admin (port 8080). Write an Ingress resource that: - Routes api.example.com/* to api - Routes www.example.com/* to web - Routes www.example.com/admin/* to admin - Terminates TLS using a secret called wildcard-tls - Forces HTTPS redirect

Solution

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - api.example.com
        - www.example.com
      secretName: wildcard-tls
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: api
                port:
                  number: 80
    - host: www.example.com
      http:
        paths:
          - path: /admin
            pathType: Prefix
            backend:
              service:
                name: admin
                port:
                  number: 8080
          - path: /
            pathType: Prefix
            backend:
              service:
                name: web
                port:
                  number: 3000

Note: the `/admin` path must come before `/` — Ingress evaluates rules in order, and a more specific path should appear first.

Cheat Sheet¶

Service Debugging¶

Command	What it tells you
`kubectl get endpoints <svc> -n <ns>`	Which pod IPs the service routes to (empty = broken selector)
`kubectl get svc <svc> -n <ns> -o jsonpath='{.spec.selector}'`	What labels the service is looking for
`kubectl get pods -n <ns> --show-labels`	What labels the pods actually have
`kubectl exec debug -- nslookup <svc>.<ns>.svc.cluster.local`	Whether DNS resolves the service name
`kubectl exec debug -- curl -v <pod-ip>:<port>`	Whether the pod is reachable directly (bypasses service layer)
`kubectl exec <pod> -- ss -tlnp`	What ports the container is listening on

kube-proxy Debugging¶

Command	What it tells you
`kubectl get cm kube-proxy -n kube-system -o yaml \\| grep mode`	iptables or IPVS mode
`iptables-save \\| grep <cluster-ip>`	iptables rules for a specific service (run on node)
`iptables-save \\| grep -c KUBE-`	Total number of kube-proxy rules
`ipvsadm -Ln`	IPVS virtual servers and backends (run on node, IPVS mode only)

DNS Debugging¶

Command	What it tells you
`kubectl exec debug -- cat /etc/resolv.conf`	Pod's nameserver, search domains, ndots value
`kubectl get pods -n kube-system -l k8s-app=kube-dns`	Whether CoreDNS is running
`kubectl logs -n kube-system -l k8s-app=kube-dns --tail=20`	CoreDNS errors
`kubectl exec debug -- nslookup kubernetes.default`	Basic DNS health test

Ingress Debugging¶

Command	What it tells you
`kubectl get ingress -A`	All ingress resources and their hosts
`kubectl describe ingress <name> -n <ns>`	Routing rules, backends, and any errors
`kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=50`	Ingress controller errors
`kubectl get ingressclass`	Available ingress classes and which is default

Takeaways¶

A Kubernetes Service is iptables rules, not a proxy. The ClusterIP is virtual — netfilter DNAT rewrites the destination to a real pod IP at the kernel level. No daemon listens on the ClusterIP.
"Connection refused" + healthy pod = check endpoints first. A service with no endpoints means the label selector does not match the pod labels. This is the #1 cause of broken services in Kubernetes.
DNS adds an invisible layer of complexity. The default ndots:5 causes extra lookups for external names. CoreDNS is a single point of failure for all service discovery. If CoreDNS is down, nothing can find anything.
Ingress is an API resource, not infrastructure. Without a running Ingress controller, Ingress resources do nothing. The controller (nginx, traefik, etc.) is the actual HTTP router.
Gateway API separates who configures what. Infra team picks the controller (GatewayClass), platform team configures listeners and TLS (Gateway), app team defines routing (HTTPRoute). This is the future of Kubernetes ingress.
Debug systematically, not randomly. The diagnostic ladder — service exists, endpoints exist, pods ready, ports match, DNS resolves, pod reachable directly, no NetworkPolicy blocking — eliminates one layer at a time instead of guessing.

Connection Refused — The full "connection refused" differential diagnosis across every layer of the stack (bare metal to Kubernetes)
iptables: Following a Packet Through the Chains — Deep dive into how netfilter processes packets through PREROUTING, INPUT, FORWARD, OUTPUT, POSTROUTING
The Load Balancer Lied — When health checks pass but the app is broken: L4 vs L7 checks, connection draining, graceful shutdown
What Happens When You kubectl apply — The end-to-end trace from YAML to running pod: API server, etcd, scheduler, kubelet, container runtime

Kubernetes Services: How Traffic Finds Your Pod

The Mission¶

Part 1: What a Service Actually Is¶

Part 2: The DNS Resolution Chain¶

Step 1: The pod reads /etc/resolv.conf¶

Step 2: The ndots trap¶

Step 3: CoreDNS returns the ClusterIP¶

Flashcard Check #1: DNS and Services¶

Part 3: How kube-proxy Programs iptables¶

Part 4: IPVS Mode — The Other Path¶

Part 5: Solving the Mission¶

Step 1: Does the Service have endpoints?¶

Step 2: Compare selectors and labels¶

Step 3: Fix it¶

Flashcard Check #2: kube-proxy and Endpoints¶

Part 6: Service Types — Four Ways to Expose¶

ClusterIP — internal only¶

NodePort — every node listens¶

LoadBalancer — cloud provider does the work¶

ExternalName — just a DNS trick¶

Headless Services — DNS returns pod IPs¶

Part 7: Ingress — HTTP Routing into the Cluster¶

The Architecture¶

Path-based vs Host-based Routing¶

TLS Termination¶

Popular Controllers¶

Part 8: Gateway API — The Future¶

Flashcard Check #3: Ingress and Gateway API¶

Part 9: EndpointSlices and Session Affinity¶

EndpointSlices¶

Session Affinity¶

Part 10: The Full Debugging Playbook¶

Step-by-step for our mission¶

Exercises¶

Exercise 1: Find the Broken Service (2 minutes)¶

Exercise 2: Trace the DNS Chain (5 minutes)¶

Exercise 3: Design the Ingress (10 minutes)¶

Cheat Sheet¶

Service Debugging¶

kube-proxy Debugging¶

DNS Debugging¶

Ingress Debugging¶

Takeaways¶

Related Lessons¶

Pages that link here¶