Kubernetes Services: How Traffic Finds Your Pod
- lesson
- kubernetes-services
- kube-proxy
- iptables
- ipvs
- dns
- coredns
- ingress
- gateway-api
- load-balancing
- tls-termination
- debugging ---# Kubernetes Services: How Traffic Finds Your Pod
Topics: Kubernetes services, kube-proxy, iptables, IPVS, DNS, CoreDNS, Ingress, Gateway API, load balancing, TLS termination, debugging Level: L1–L2 (Foundations to Operations) Time: 75–90 minutes Prerequisites: None (networking fundamentals explained inline)
The Mission¶
It's 2pm on a Tuesday. You just deployed a new microservice — a payments API. The pod is running. The health check passes. But when you curl the service:
$ kubectl exec -it debug-pod -- curl http://payments.production.svc.cluster.local/health
curl: (7) Failed to connect to payments.production.svc.cluster.local port 80: Connection refused
The pod is healthy. The service exists. But traffic is not arriving. Somewhere between your curl command and the container, the packet is getting lost — or worse, actively rejected.
This lesson traces how traffic moves from a client inside the cluster all the way to a container's listening socket. You will learn every layer it passes through, how each layer can break, and how to diagnose each one. By the end, you will solve this mission — and know exactly where to look the next time traffic goes missing.
Part 1: What a Service Actually Is¶
Before we debug, we need to understand what we are debugging. A Kubernetes Service is not a load balancer. It is not a proxy process. It is a set of iptables rules (or IPVS entries) programmed into every node in your cluster.
When you create a Service, here is what actually happens — step by step:
- The API server stores the Service object in etcd
- A controller creates an Endpoints (or EndpointSlice) object listing the IPs of all pods matching the Service's label selector
- kube-proxy on every node detects the new Service and Endpoints
- kube-proxy programs iptables rules (or IPVS entries) that intercept packets destined for the Service's ClusterIP and rewrite them to a backend pod IP
- CoreDNS detects the new Service and creates a DNS A record:
payments.production.svc.cluster.localpointing to the ClusterIP
No daemon is listening on the ClusterIP. No process is proxying traffic. The ClusterIP is a virtual IP — it exists only as a destination address in iptables rules.
Under the Hood: The ClusterIP is allocated from a reserved CIDR range (typically
10.96.0.0/12or10.43.0.0/16depending on your distribution). This range is never routed — no network interface has an address in it. Packets destined for a ClusterIP are intercepted by netfilter DNAT rules before they leave the sending node. The packet's destination is rewritten from the ClusterIP to a real pod IP, and only then does it get routed across the cluster network.Mental Model: Think of a ClusterIP like a phone extension number at a big company. The extension number is not a real phone line — the PBX (iptables) intercepts the call and redirects it to a real desk phone (pod IP). If the PBX goes down, the extension stops working even though the desk phones are fine.
Part 2: The DNS Resolution Chain¶
When your curl command runs inside a pod, it doesn't start by sending a TCP packet to the
ClusterIP. It starts by looking up the name payments.production.svc.cluster.local.
Here is the full chain:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌────────────┐
│ Your Pod │───▶│ CoreDNS │───▶│ Service DB │───▶│ ClusterIP │
│ /etc/resolv │ │ (kube-dns │ │ (watches │ │ returned │
│ .conf │ │ service) │ │ API for │ │ 10.96.x.y │
│ nameserver │ │ 10.96.0.10 │ │ changes) │ │ │
│ 10.96.0.10 │ │ │ │ │ │ │
└──────────────┘ └──────────────┘ └──────────────┘ └────────────┘
Let's look at each step.
Step 1: The pod reads /etc/resolv.conf¶
Every pod gets a /etc/resolv.conf injected by the kubelet:
nameserver 10.96.0.10
search production.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
Three things matter here:
| Field | What it does |
|---|---|
nameserver 10.96.0.10 |
All DNS queries go to CoreDNS (this is the ClusterIP of the kube-dns service) |
search ... |
Short names get each search domain appended before trying the bare name |
ndots:5 |
If the name has fewer than 5 dots, try search domains first |
Step 2: The ndots trap¶
Your pod resolves payments. That name has zero dots, which is less than 5. So the resolver
tries, in order:
payments.production.svc.cluster.local— match (CoreDNS returns 10.96.45.12)
That worked on the first try. But what if you were resolving an external name like
api.stripe.com? That has 2 dots — still less than 5:
api.stripe.com.production.svc.cluster.local— NXDOMAINapi.stripe.com.svc.cluster.local— NXDOMAINapi.stripe.com.cluster.local— NXDOMAINapi.stripe.com.— success
Four DNS queries instead of one. At scale — thousands of pods making external calls — this 4x amplification can overwhelm CoreDNS.
Gotcha: The default
ndots:5is tuned for in-cluster service discovery where short names are common. If your pods make heavy external DNS queries, either lower ndots to 2 (dnsConfig.options) or use fully qualified names with a trailing dot:api.stripe.com.— the trailing dot tells the resolver "this is already a complete name, don't search."
Step 3: CoreDNS returns the ClusterIP¶
CoreDNS watches the Kubernetes API for Service objects. When it sees
payments.production.svc.cluster.local, it returns the ClusterIP — not the pod IPs (unless
the service is headless).
# Verify DNS resolution from a debug pod
kubectl exec -it debug-pod -- nslookup payments.production.svc.cluster.local
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: payments.production.svc.cluster.local
Address: 10.96.45.12
Trivia: CoreDNS replaced kube-dns as the default cluster DNS in Kubernetes 1.13 (December 2018). kube-dns was actually three containers in a pod — dnsmasq for caching, a Go sidecar for health checks, and a Go binary for the Kubernetes plugin. CoreDNS collapsed all of that into a single Go binary with a plugin architecture. The label
k8s-app=kube-dnspersisted for backward compatibility — even though the pods are running CoreDNS. You can verify withkubectl get pods -n kube-system -l k8s-app=kube-dns.
Flashcard Check #1: DNS and Services¶
Cover the answers. Test yourself.
| Question | Answer |
|---|---|
What does ndots:5 mean? |
If the hostname has fewer than 5 dots, try appending each search domain before querying the bare name |
| Where does a pod's DNS config come from? | The kubelet injects /etc/resolv.conf at pod creation time |
What IP does the nameserver line in a pod point to? |
The ClusterIP of the kube-dns service (CoreDNS) |
| What does CoreDNS return for a regular (non-headless) Service? | The Service's ClusterIP |
| What label do CoreDNS pods use, and why is it confusing? | k8s-app=kube-dns — the label kept its old name after CoreDNS replaced kube-dns for backward compatibility |
Part 3: How kube-proxy Programs iptables¶
Now your pod has the ClusterIP (10.96.45.12). It sends a TCP SYN packet to
10.96.45.12:80. What happens next is pure netfilter magic.
kube-proxy runs as a DaemonSet on every node. When it sees a new Service with endpoints, it writes iptables rules that do DNAT — destination network address translation. The packet's destination is rewritten from the Service ClusterIP to a real pod IP, before the packet leaves the node.
Here is a simplified view of what those rules look like. On a real node, run:
# Dump all kube-proxy rules (there will be thousands)
iptables-save | grep -c "KUBE-"
# Output: 4,237
# Find the rules for a specific service (by ClusterIP)
iptables-save | grep "10.96.45.12"
Here is the actual chain of rules for a Service with two backend pods:
# Step 1: Packet arrives at PREROUTING or OUTPUT chain
-A KUBE-SERVICES -d 10.96.45.12/32 -p tcp -m tcp --dport 80 \
-j KUBE-SVC-XYZABC123
# Step 2: KUBE-SVC chain does random load balancing
-A KUBE-SVC-XYZABC123 -m statistic --mode random --probability 0.50000 \
-j KUBE-SEP-POD1HASH
-A KUBE-SVC-XYZABC123 \
-j KUBE-SEP-POD2HASH
# Step 3: KUBE-SEP chains do the actual DNAT
-A KUBE-SEP-POD1HASH -p tcp -j DNAT --to-destination 10.244.1.15:8000
-A KUBE-SEP-POD2HASH -p tcp -j DNAT --to-destination 10.244.2.23:8000
Reading those rules line by line:
| Rule | What it does |
|---|---|
KUBE-SERVICES |
Catches all packets going to any ClusterIP and jumps to the right service chain |
KUBE-SVC-* |
Picks a backend pod randomly. With 2 pods: 50/50. With 3: first gets 33%, second gets 50% of remaining, third gets the rest |
KUBE-SEP-* |
Rewrites the destination IP from 10.96.45.12 → 10.244.1.15 (a real pod IP) |
After DNAT, the packet has a real destination IP and gets routed through the CNI network to the target pod's node.
Under the Hood: The
--probabilityvalues in iptables create weighted random selection. For 3 backends: the first rule has probability 0.33333 (1/3), the second has 0.50000 (1/2 of remaining), and the third has no probability (catches everything left). This is stateless — there is no least-connections or round-robin. Every packet is an independent coin flip. The conntrack module ensures that once a TCP connection is established, all subsequent packets in that connection go to the same backend.Remember: kube-proxy does not proxy traffic. Despite its name, in iptables mode it only programs rules and gets out of the way. The kernel handles everything at wire speed. In the early days of Kubernetes (before 1.2), kube-proxy actually was a userspace proxy — packets went from kernel to kube-proxy process and back. That was slow, so they moved to iptables mode and kept the name.
Part 4: IPVS Mode — The Other Path¶
If you have hundreds or thousands of services, iptables mode becomes a problem. Every Service adds rules that are evaluated linearly. At 5,000+ Services, the rule sync itself can take seconds, during which new connections may fail.
IPVS (IP Virtual Server) is the alternative. It uses kernel hash tables for O(1) lookup regardless of how many services exist.
# Check which mode kube-proxy is using
kubectl get configmap kube-proxy -n kube-system -o yaml | grep mode
# mode: "" ← empty string means iptables (default)
# mode: "ipvs" ← IPVS mode
# In IPVS mode, inspect virtual servers
ipvsadm -Ln
# IP Virtual Server version 1.2.1 (size=4096)
# Prot LocalAddress:Port Scheduler Flags
# -> RemoteAddress:Port Forward Weight ActiveConn InActConn
# TCP 10.96.45.12:80 rr
# -> 10.244.1.15:8000 Masq 1 3 0
# -> 10.244.2.23:8000 Masq 1 2 0
| Feature | iptables mode | IPVS mode |
|---|---|---|
| Lookup complexity | O(n) — linear scan | O(1) — hash table |
| Load balancing | Random only | Round-robin, least-connections, weighted, source hash |
| Connection draining | No | Yes |
| Performance at scale | Degrades past ~5,000 services | Constant |
| Debugging | iptables-save \| grep |
ipvsadm -Ln |
| Kernel modules needed | Always present | ip_vs, ip_vs_rr, ip_vs_wrr, ip_vs_sh, nf_conntrack |
Gotcha: Switching from iptables to IPVS mode is not a live toggle. You need to edit the kube-proxy ConfigMap, ensure the IPVS kernel modules are loaded on every node, then restart kube-proxy. The old iptables rules linger until you clean them up. On managed clusters (EKS, GKE), you may not have the option — check your provider's docs.
Part 5: Solving the Mission¶
Let's go back to our broken payments service. Time to work the diagnostic ladder.
Step 1: Does the Service have endpoints?¶
<none>. The service has no endpoints. This means zero pods match the Service's label
selector. This is the #1 cause of "connection refused" on a Kubernetes service.
Mental Model: No endpoints = no iptables rules = no DNAT = packet reaches the ClusterIP and the kernel has no idea what to do with it. Since nothing is listening on that virtual IP (nothing ever is — it is virtual), the kernel sends back RST. Connection refused.
Step 2: Compare selectors and labels¶
# What is the Service selecting?
kubectl get svc payments -n production -o jsonpath='{.spec.selector}'
# {"app":"payments-api"}
# What labels do the pods have?
kubectl get pods -n production --show-labels | grep payment
# payments-7d8f5b-x4k2j 1/1 Running app=payments,version=v2
Found it. The Service selector says app: payments-api. The pod label says app: payments.
One word difference. No endpoints. No traffic.
Step 3: Fix it¶
# Option A: Fix the Service selector
kubectl patch svc payments -n production -p '{"spec":{"selector":{"app":"payments"}}}'
# Option B: Fix the pod labels (via the Deployment)
kubectl patch deployment payments -n production \
-p '{"spec":{"template":{"metadata":{"labels":{"app":"payments-api"}}}}}'
After fixing:
kubectl get endpoints payments -n production
# NAME ENDPOINTS AGE
# payments 10.244.1.15:8000,10.244.2.23:8000 15m
kubectl exec -it debug-pod -- curl http://payments.production.svc.cluster.local/health
# {"status": "ok"}
War Story: At a mid-size fintech company, a deployment pipeline generated Service manifests from a template. The template used
app: {{ .name }}for the selector butapp: {{ .name }}-svcfor the Service metadata name. For months, nobody noticed because the labels happened to match. Then someone renamed a service, the template generated mismatched selectors, and the payment processing service silently dropped 100% of traffic for 23 minutes during business hours. The fix was a CI check that runskubectl get endpointsafter every deployment and fails if any service has zero endpoints.
Flashcard Check #2: kube-proxy and Endpoints¶
| Question | Answer |
|---|---|
| What does kube-proxy actually do in iptables mode? | Programs iptables DNAT rules on every node to redirect ClusterIP traffic to pod IPs. It does not proxy packets. |
What does kubectl get endpoints <svc> showing <none> mean? |
The Service's label selector doesn't match any running, ready pods |
| How does iptables mode select a backend when there are 3 pods? | Random selection using --probability rules: 1/3 chance for first, 1/2 of remaining for second, rest for third |
| Name two advantages of IPVS over iptables mode | O(1) lookup (vs O(n)), multiple load-balancing algorithms (round-robin, least-connections, etc.) |
| What happens when a packet reaches a ClusterIP but the Service has no endpoints? | No DNAT rule matches, packet hits the virtual IP where nothing listens, kernel returns RST (connection refused) |
Part 6: Service Types — Four Ways to Expose¶
Not all services are internal. Here is how the four types build on each other:
┌───────────────────┐
│ ExternalName │ DNS alias, no proxying
└───────────────────┘
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ ClusterIP │ ◀─│ NodePort │ ◀─│ LoadBalancer │
│ Internal VIP │ │ ClusterIP + │ │ NodePort + │
│ iptables/ │ │ port on every │ │ cloud LB │
│ IPVS rules │ │ node │ │ provisioned │
└───────────────┘ └───────────────┘ └───────────────┘
Each type includes everything to its left. A LoadBalancer service also has a NodePort and a ClusterIP.
ClusterIP — internal only¶
The default. Everything we have covered so far. Accessible only from inside the cluster.
NodePort — every node listens¶
Opens a port (30000–32767) on every node in the cluster. Traffic arriving at
<any-node-ip>:30080 gets forwarded to the Service, which then forwards to a pod.
kubectl get svc payments -n production
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
# payments NodePort 10.96.45.12 <none> 80:30080/TCP 1h
Gotcha: NodePort opens the port on all nodes, even nodes that don't have a matching pod. If
externalTrafficPolicyisCluster(the default), the node will forward traffic to a pod on another node — adding a network hop and losing the original source IP via SNAT. If you setexternalTrafficPolicy: Local, the node only forwards to local pods — but if there are no local pods, traffic is dropped. Neither option is perfect.
LoadBalancer — cloud provider does the work¶
Asks the cloud provider (AWS, GCP, Azure) to provision an actual load balancer. The LB gets a public IP and forwards traffic to the NodePorts.
kubectl get svc payments -n production
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
# payments LoadBalancer 10.96.45.12 203.0.113.42 80:30080/TCP 1h
On bare metal with no cloud provider, EXTERNAL-IP stays <pending> forever. You need
MetalLB or similar to assign IPs from a local pool.
ExternalName — just a DNS trick¶
Maps a Service name to an external DNS name via CNAME. No iptables rules, no proxying.
apiVersion: v1
kind: Service
metadata:
name: external-db
spec:
type: ExternalName
externalName: mydb.us-east-1.rds.amazonaws.com
Pods that resolve external-db.default.svc.cluster.local get a CNAME to the RDS endpoint.
Useful for migrating from in-cluster to external databases without changing application code.
Headless Services — DNS returns pod IPs¶
Set clusterIP: None and DNS skips the virtual IP entirely. Instead, it returns A records
for every pod matching the selector.
kubectl exec -it debug-pod -- nslookup db-headless.default.svc.cluster.local
# Name: db-headless.default.svc.cluster.local
# Address: 10.244.1.5
# Address: 10.244.2.8
# Address: 10.244.3.12
Headless services are essential for StatefulSets. Each pod gets a stable DNS name:
postgres-0.db-headless.default.svc.cluster.local. The client connects to a specific
replica, not a random one.
Remember: Regular services: DNS returns one VIP, iptables picks the pod. Headless services: DNS returns all pod IPs, the client picks.
Part 7: Ingress — HTTP Routing into the Cluster¶
Services handle L4 (TCP/UDP). Ingress handles L7 (HTTP/HTTPS). An Ingress resource defines routing rules — but it does nothing by itself. You need an Ingress controller to implement them.
The Architecture¶
Internet
│
▼
┌─────────────────┐
│ Cloud LB / IP │ Single entry point
└────────┬────────┘
│
▼
┌─────────────────┐
│ Ingress │ Nginx/Traefik/HAProxy pod(s)
│ Controller │ Watches Ingress resources via API
│ (nginx pod) │ Generates nginx.conf dynamically
└────────┬────────┘
│
┌─────────┼──────────┐
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐
│ Svc A │ │ Svc B │ │ Svc C │ Backend Services
│(api) │ │(web) │ │(admin) │
└───┬────┘ └───┬────┘ └───┬────┘
▼ ▼ ▼
Pods Pods Pods
The Ingress controller is just pods running nginx (or traefik, or HAProxy). It watches the Kubernetes API for Ingress resources and regenerates its routing config whenever something changes. It is exposed via a single LoadBalancer or NodePort service — one entry point for all your HTTP traffic.
Path-based vs Host-based Routing¶
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-ingress
namespace: production
spec:
ingressClassName: nginx
rules:
# Host-based: different domains → different services
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-server
port:
number: 80
# Path-based: same domain, different paths → different services
- host: www.example.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api-server
port:
number: 80
- path: /
pathType: Prefix
backend:
service:
name: web-frontend
port:
number: 80
Gotcha: Path type
Prefixwith path/apialso matches/api-docs,/api2, and/api-anything. This is character-level prefix matching in nginx-ingress, not path-segment matching. If you want/apito match only/apiand/api/*, use/api/with a trailing slash, or usepathType: Exactfor the base path. Gateway API'sPathPrefixis segment-aware by specification — one of the reasons it was created.
TLS Termination¶
The Ingress controller can terminate TLS so your backend services don't need to handle certificates:
spec:
tls:
- hosts:
- api.example.com
secretName: api-tls-secret # Must be in the same namespace
rules:
- host: api.example.com
# ...
# Verify the TLS secret exists and is valid
kubectl get secret api-tls-secret -n production -o jsonpath='{.type}'
# Should output: kubernetes.io/tls
# Check the certificate's expiry and SANs
kubectl get secret api-tls-secret -n production \
-o jsonpath='{.data.tls\.crt}' | base64 -d | \
openssl x509 -text -noout | grep -A1 "Not After\|Subject Alternative"
Interview Bridge: "Explain TLS termination" is a common interview question. The answer: the TLS handshake happens at the Ingress controller (or load balancer), which decrypts the traffic and forwards plaintext HTTP to the backend. This offloads CPU-intensive crypto from your application pods and lets you manage certificates in one place. The tradeoff: traffic between the ingress controller and your pods is unencrypted inside the cluster network. For zero-trust environments, you add mTLS between services (service mesh) or use the
backend-protocol: "HTTPS"annotation to re-encrypt.
Popular Controllers¶
| Controller | Strengths | Best for |
|---|---|---|
| ingress-nginx | Most widely deployed, rich annotation set | General purpose, battle-tested |
| Traefik | Auto Let's Encrypt, middleware chains, built-in dashboard | Smaller clusters, edge deployments |
| HAProxy Ingress | Raw performance, TCP/UDP support, fine-grained rate limiting | High-throughput, non-HTTP workloads |
| AWS Load Balancer Controller | Provisions native ALBs, no in-cluster proxy | AWS-native, direct ALB-to-pod routing |
Part 8: Gateway API — The Future¶
Gateway API reached GA (v1.0) in October 2023. It does not replace Ingress — both coexist. But new features are only being added to Gateway API.
The key difference: separation of concerns.
GatewayClass → "What implementation?" (infra team picks the controller)
Gateway → "What listeners?" (platform team configures ports, TLS)
HTTPRoute → "What routing rules?" (app team defines paths, backends)
Ingress mashes all three roles into a single resource with annotations. Gateway API splits them across resources with RBAC boundaries. The app developer can define routing without needing permission to change TLS certificates or listener ports.
# App developer only needs permission to create HTTPRoutes
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: payments-route
namespace: production
spec:
parentRefs:
- name: production-gateway
namespace: infra
hostnames:
- payments.example.com
rules:
- matches:
- path:
type: PathPrefix
value: /v2
backendRefs:
- name: payments-v2
port: 80
weight: 90
- name: payments-v2-canary
port: 80
weight: 10
Notice the weight field — traffic splitting for canary deployments is built into the
spec, not hacked via controller-specific annotations.
Flashcard Check #3: Ingress and Gateway API¶
| Question | Answer |
|---|---|
| What does an Ingress controller actually run as? | Pods (usually a Deployment) running nginx/traefik/HAProxy that watch the API for Ingress resources |
| What happens if you create an Ingress resource but no Ingress controller is installed? | Nothing. The Ingress resource sits inert in etcd. No routing happens. |
| What is the difference between path-based and host-based routing? | Host-based: different domains go to different services. Path-based: same domain, different URL paths go to different services. |
| How does Gateway API separate concerns vs Ingress? | Three resources (GatewayClass, Gateway, HTTPRoute) with different RBAC — infra, platform, and app teams each own their layer |
| How does TLS termination work at an Ingress controller? | The controller handles the TLS handshake and decryption, then forwards plaintext HTTP to backend pods |
Part 9: EndpointSlices and Session Affinity¶
Two features worth knowing about that affect how traffic is distributed.
EndpointSlices¶
The original Endpoints resource stored all pod IPs in a single object. For a service with 1,000 pods, that is one large object that changes every time a pod scales up, down, or restarts. Every change triggers a full object sync to every node.
EndpointSlices split this into chunks of 100 (by default). Only the affected slice gets updated and synced.
# See EndpointSlices for a service
kubectl get endpointslices -l kubernetes.io/service-name=payments -n production
# NAME ADDRESSTYPE PORTS ENDPOINTS AGE
# payments-abc12 IPv4 8000 10.244.1.15,... 1h
# payments-def34 IPv4 8000 10.244.2.23,... 1h
You rarely interact with EndpointSlices directly. Just know they exist and that they replaced the old Endpoints resource for scalability.
Session Affinity¶
By default, each request can hit any backend pod. If your app stores session state in memory (not recommended, but common), you can pin clients:
Gotcha: Kubernetes session affinity is source-IP based. If all your traffic comes through a reverse proxy or NAT gateway, all clients share one source IP, and all traffic goes to one pod. For proper session affinity behind a proxy, use cookie-based affinity at the Ingress level:
Part 10: The Full Debugging Playbook¶
When traffic is not reaching your pod, work this ladder. Each step eliminates one layer.
Step 1: Does the Service exist? kubectl get svc <name> -n <ns>
Step 2: Does it have endpoints? kubectl get endpoints <name> -n <ns>
Step 3: Are the pods Ready? kubectl get pods -n <ns> -l <selector>
Step 4: Does the port match? Compare targetPort with container listen port
Step 5: Is DNS resolving? kubectl exec debug -- nslookup <svc>.<ns>
Step 6: Can you reach the pod directly? kubectl exec debug -- curl <pod-ip>:<port>
Step 7: Is a NetworkPolicy blocking? kubectl get networkpolicy -n <ns>
Step 8: Is kube-proxy running? kubectl get pods -n kube-system -l k8s-app=kube-proxy
Step-by-step for our mission¶
# Step 1: Service exists
kubectl get svc payments -n production
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
# payments ClusterIP 10.96.45.12 <none> 80/TCP 1h
# ✓ Exists
# Step 2: Endpoints
kubectl get endpoints payments -n production
# NAME ENDPOINTS AGE
# payments <none> 1h
# ✗ No endpoints! This is the problem.
# Step 3: Why no endpoints? Check selector vs labels
kubectl get svc payments -n production -o jsonpath='{.spec.selector}'
# {"app":"payments-api"}
kubectl get pods -n production --show-labels | grep pay
# payments-7d8f5b-x4k2j 1/1 Running app=payments
# ← Mismatch: selector wants "payments-api", pod has "payments"
# Step 4: Also check targetPort while we're here
kubectl get svc payments -n production -o jsonpath='{.spec.ports[0].targetPort}'
# 8000
kubectl exec -it payments-7d8f5b-x4k2j -n production -- ss -tlnp
# State Recv-Q Send-Q Local Address:Port Peer Address:Port
# LISTEN 0 128 0.0.0.0:8000 0.0.0.0:*
# ✓ Pod is listening on 8000, targetPort is 8000. Match.
# Step 5: Fix the selector
kubectl patch svc payments -n production -p '{"spec":{"selector":{"app":"payments"}}}'
# Step 6: Verify endpoints appeared
kubectl get endpoints payments -n production
# NAME ENDPOINTS AGE
# payments 10.244.1.15:8000,10.244.2.23:8000 1h
# ✓ Endpoints populated
# Step 7: Verify connectivity
kubectl exec -it debug-pod -- curl http://payments.production.svc.cluster.local/health
# {"status": "ok"}
War Story: A SaaS company migrated from Docker Compose to Kubernetes. Their CI pipeline deployed 40 services. Service number 37 — a notification worker — had
nodePort: 30037hardcoded in every environment, including production. During a cluster expansion, a new monitoring agent was configured to use port 30037 on the host. The NodePort and the host port collided silently — kube-proxy's iptables rules won the race condition on some nodes but not others. Traffic to the notification service was intermittently routed to the monitoring agent. It took three days to diagnose because the failure was non-deterministic and the monitoring agent happened to return HTTP 200 on its health endpoint. The fix: stop hardcoding NodePorts. Let Kubernetes allocate them, or better yet, use an Ingress controller.
Exercises¶
Exercise 1: Find the Broken Service (2 minutes)¶
A service called cache-redis in namespace data returns connection refused. The pod
is running and healthy. Describe the exact sequence of commands you would run, in order,
to diagnose the problem.
Solution
# 1. Check endpoints
kubectl get endpoints cache-redis -n data
# If <none>, check selectors:
# 2. Compare selector and labels
kubectl get svc cache-redis -n data -o jsonpath='{.spec.selector}'
kubectl get pods -n data --show-labels | grep redis
# 3. If selectors match but endpoints still empty, check readiness
kubectl get pods -n data -l app=cache-redis
# Look for pods not in Ready state
# 4. If endpoints exist, check port match
kubectl get svc cache-redis -n data -o jsonpath='{.spec.ports[0].targetPort}'
kubectl exec -it <redis-pod> -n data -- ss -tlnp
Exercise 2: Trace the DNS Chain (5 minutes)¶
From inside a debug pod, determine:
1. What nameserver your pod is using
2. What search domains are configured
3. How many DNS queries are generated when you resolve api.stripe.com
4. How to make it generate only one query
Hints
- Check `/etc/resolv.conf` inside the pod - Count the search domains plus one (for the bare query) - A trailing dot on a hostname means "this is fully qualified"Solution
kubectl exec -it debug-pod -- cat /etc/resolv.conf
# nameserver 10.96.0.10
# search default.svc.cluster.local svc.cluster.local cluster.local
# options ndots:5
# api.stripe.com has 2 dots (< 5), so:
# Query 1: api.stripe.com.default.svc.cluster.local → NXDOMAIN
# Query 2: api.stripe.com.svc.cluster.local → NXDOMAIN
# Query 3: api.stripe.com.cluster.local → NXDOMAIN
# Query 4: api.stripe.com. → success
# Total: 4 queries
# To make it one query: use a trailing dot
kubectl exec -it debug-pod -- nslookup api.stripe.com.
# Or lower ndots in the pod spec:
# dnsConfig:
# options:
# - name: ndots
# value: "2"
Exercise 3: Design the Ingress (10 minutes)¶
You have three services: api (port 80), web (port 3000), and admin (port 8080).
Write an Ingress resource that:
- Routes api.example.com/* to api
- Routes www.example.com/* to web
- Routes www.example.com/admin/* to admin
- Terminates TLS using a secret called wildcard-tls
- Forces HTTPS redirect
Solution
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-ingress
namespace: production
annotations:
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- api.example.com
- www.example.com
secretName: wildcard-tls
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api
port:
number: 80
- host: www.example.com
http:
paths:
- path: /admin
pathType: Prefix
backend:
service:
name: admin
port:
number: 8080
- path: /
pathType: Prefix
backend:
service:
name: web
port:
number: 3000
Cheat Sheet¶
Service Debugging¶
| Command | What it tells you |
|---|---|
kubectl get endpoints <svc> -n <ns> |
Which pod IPs the service routes to (empty = broken selector) |
kubectl get svc <svc> -n <ns> -o jsonpath='{.spec.selector}' |
What labels the service is looking for |
kubectl get pods -n <ns> --show-labels |
What labels the pods actually have |
kubectl exec debug -- nslookup <svc>.<ns>.svc.cluster.local |
Whether DNS resolves the service name |
kubectl exec debug -- curl -v <pod-ip>:<port> |
Whether the pod is reachable directly (bypasses service layer) |
kubectl exec <pod> -- ss -tlnp |
What ports the container is listening on |
kube-proxy Debugging¶
| Command | What it tells you |
|---|---|
kubectl get cm kube-proxy -n kube-system -o yaml \| grep mode |
iptables or IPVS mode |
iptables-save \| grep <cluster-ip> |
iptables rules for a specific service (run on node) |
iptables-save \| grep -c KUBE- |
Total number of kube-proxy rules |
ipvsadm -Ln |
IPVS virtual servers and backends (run on node, IPVS mode only) |
DNS Debugging¶
| Command | What it tells you |
|---|---|
kubectl exec debug -- cat /etc/resolv.conf |
Pod's nameserver, search domains, ndots value |
kubectl get pods -n kube-system -l k8s-app=kube-dns |
Whether CoreDNS is running |
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=20 |
CoreDNS errors |
kubectl exec debug -- nslookup kubernetes.default |
Basic DNS health test |
Ingress Debugging¶
| Command | What it tells you |
|---|---|
kubectl get ingress -A |
All ingress resources and their hosts |
kubectl describe ingress <name> -n <ns> |
Routing rules, backends, and any errors |
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=50 |
Ingress controller errors |
kubectl get ingressclass |
Available ingress classes and which is default |
Takeaways¶
-
A Kubernetes Service is iptables rules, not a proxy. The ClusterIP is virtual — netfilter DNAT rewrites the destination to a real pod IP at the kernel level. No daemon listens on the ClusterIP.
-
"Connection refused" + healthy pod = check endpoints first. A service with no endpoints means the label selector does not match the pod labels. This is the #1 cause of broken services in Kubernetes.
-
DNS adds an invisible layer of complexity. The default
ndots:5causes extra lookups for external names. CoreDNS is a single point of failure for all service discovery. If CoreDNS is down, nothing can find anything. -
Ingress is an API resource, not infrastructure. Without a running Ingress controller, Ingress resources do nothing. The controller (nginx, traefik, etc.) is the actual HTTP router.
-
Gateway API separates who configures what. Infra team picks the controller (GatewayClass), platform team configures listeners and TLS (Gateway), app team defines routing (HTTPRoute). This is the future of Kubernetes ingress.
-
Debug systematically, not randomly. The diagnostic ladder — service exists, endpoints exist, pods ready, ports match, DNS resolves, pod reachable directly, no NetworkPolicy blocking — eliminates one layer at a time instead of guessing.
Related Lessons¶
- Connection Refused — The full "connection refused" differential diagnosis across every layer of the stack (bare metal to Kubernetes)
- iptables: Following a Packet Through the Chains — Deep dive into how netfilter processes packets through PREROUTING, INPUT, FORWARD, OUTPUT, POSTROUTING
- The Load Balancer Lied — When health checks pass but the app is broken: L4 vs L7 checks, connection draining, graceful shutdown
- What Happens When You
kubectl apply— The end-to-end trace from YAML to running pod: API server, etcd, scheduler, kubelet, container runtime
Pages that link here¶
- Api Gateways The Front Door To Your Microservices
- Bgp How The Internet Routes Your Packets
- Connection Refused
- Container Registries Where Your Images Actually Live
- Cross-Domain Lessons
- Dns Ops When Nslookup Isnt Enough
- Iptables Following A Packet Through The Chains
- Kubernetes Debugging When Pods Wont Behave
- Kubernetes From Scratch To Production Upgrade
- Linux Networking Bridges Bonds And Vlans
- The Load Balancer Lied
- What Happens When You Kubectl Apply