Skip to content

Portal | Level: L1: Foundations | Topics: TCP/IP, DNS, Linux Networking Tools | Domain: Networking

Networking Drills

Remember: The network debugging order: DNS -> Routing -> Firewall -> Application. Most connectivity issues are DNS (wrong name, stale cache, CoreDNS down) or firewall (SecurityGroup, NetworkPolicy, iptables). Mnemonic: "DRFA" — always start at the bottom of the stack and work up.

Gotcha: In Kubernetes, curl: connection refused and curl: connection timed out mean very different things. Refused = the target host is reachable but nothing listens on that port (check if the process is running). Timed out = packets are being dropped (check NetworkPolicy, security groups, or routing). Never debug the application when the problem is the network.

Under the hood: Kubernetes Services are implemented by kube-proxy writing iptables/IPVS rules on every node. When you curl svc-name:port, the kernel intercepts the packet at the ClusterIP and DNAT's it to a random backend pod IP. If endpoints are empty (label mismatch), the connection is refused — not timed out — because the iptables rule sends a TCP RST.

Drill 1: DNS Resolution

Difficulty: Easy

Q: A pod can't resolve backend-svc. Walk through the DNS resolution chain in Kubernetes.

Answer
Pod → /etc/resolv.conf → CoreDNS (kube-dns service at 10.96.0.10)
  1. Try: backend-svc.same-namespace.svc.cluster.local
  2. Try: backend-svc.svc.cluster.local
  3. Try: backend-svc.cluster.local
  4. Try: backend-svc (upstream DNS)
# Debug DNS
kubectl exec -it test-pod -- nslookup backend-svc
kubectl exec -it test-pod -- cat /etc/resolv.conf

# Check CoreDNS is running
kubectl get pods -n kube-system -l k8s-app=kube-dns

# Check CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=20

# Full FQDN (skips search chain, faster)
backend-svc.production.svc.cluster.local

Drill 2: Service Types

Difficulty: Easy

Q: Explain ClusterIP, NodePort, LoadBalancer, and ExternalName. When do you use each?

Answer | Type | Access | Port | Use Case | |------|--------|------|----------| | **ClusterIP** | Internal only | Cluster IP:port | Default. Service-to-service. | | **NodePort** | External via node IP | NodeIP:30000-32767 | Dev, bare-metal without LB | | **LoadBalancer** | External via cloud LB | LB IP:port | Production external access | | **ExternalName** | DNS CNAME | N/A | Alias to external service |
# ClusterIP (default)
spec:
  type: ClusterIP
  ports:
  - port: 80
    targetPort: 8080

# ExternalName (DNS alias)
spec:
  type: ExternalName
  externalName: mydb.rds.amazonaws.com
In production: use Ingress (L7 routing) in front of ClusterIP services, not LoadBalancer per service.

Drill 3: NetworkPolicy

Difficulty: Medium

Q: Write a NetworkPolicy that allows the api pods to receive traffic only from frontend pods on port 8080, and blocks everything else.

Answer
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-allow-frontend
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080
Key rules: - If no NetworkPolicy selects a pod, all traffic is allowed (default allow) - Once any policy selects a pod, everything not explicitly allowed is denied - `policyTypes: [Ingress]` means only ingress is restricted; egress is still open - Add `policyTypes: [Ingress, Egress]` with egress rules for full lockdown

Drill 4: Debug Connectivity

Difficulty: Medium

Q: Pod A can't reach Pod B on port 8080. Walk through the debugging steps.

Answer
# 1. Verify Pod B is running and has an IP
kubectl get pod pod-b -o wide

# 2. Check the Service endpoints
kubectl get endpoints svc-b
# Empty endpoints = label selector doesn't match pods

# 3. Test from Pod A
kubectl exec pod-a -- curl -v pod-b-svc:8080
kubectl exec pod-a -- nslookup pod-b-svc

# 4. Test direct pod IP (bypass Service)
kubectl exec pod-a -- curl -v <pod-b-ip>:8080

# 5. Check NetworkPolicy
kubectl get networkpolicy -n <ns>
kubectl describe networkpolicy -n <ns>

# 6. Check if the port is actually listening in Pod B
kubectl exec pod-b -- ss -tlnp | grep 8080

# 7. Check container logs
kubectl logs pod-b --tail=20
Common causes: - Wrong label selector on Service (empty endpoints) - NetworkPolicy blocking traffic - App listening on localhost (127.0.0.1) not 0.0.0.0 - Wrong port (containerPort vs targetPort)

Drill 5: Ingress

Difficulty: Medium

Q: Write an Ingress that routes /api to api-svc:8080 and / to frontend-svc:80 with TLS.

Answer
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - app.example.com
    secretName: app-tls
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-svc
            port:
              number: 8080
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend-svc
            port:
              number: 80
Order matters: more specific paths first. `/api` before `/`.

Drill 6: TCP/IP Fundamentals

Difficulty: Easy

Q: A service is unreachable. You run curl -v and see "Connection refused" vs "Connection timed out." What's the difference?

Answer **Connection refused** (RST): - The host is reachable but nothing is listening on that port - Got a TCP RST packet back - Debug: check if the process is running, check the port **Connection timed out**: - Packets are being dropped (no response at all) - Firewall, security group, NetworkPolicy, or bad routing - Debug: check firewall rules, security groups, NetworkPolicy, routing tables **Connection reset by peer**: - Connection was established then forcibly closed - Backend crashed, overloaded, or TLS mismatch - Debug: check backend logs, connection limits
# Quick checks
curl -v http://service:8080              # HTTP-level test
nc -zv service 8080                       # TCP-level test
traceroute service                        # Routing test
kubectl exec pod -- ss -tlnp             # What's listening

Drill 7: CIDR and Subnetting

Difficulty: Medium

Q: How many usable IPs are in a /24? A /16? What CIDR would you use for a subnet with 500 hosts?

Answer
/32 = 1 IP (single host)
/24 = 256 IPs, 254 usable (network + broadcast reserved)
/20 = 4,096 IPs
/16 = 65,536 IPs

For 500 hosts: need /23 (512 IPs) or /22 (1024 IPs) for growth room.
Common Kubernetes CIDR ranges: - Pod network: `10.244.0.0/16` (64K pods) - Service network: `10.96.0.0/12` (1M services) - Node network: `10.0.0.0/16` (VPC) Quick math: `2^(32-prefix) = total IPs` - `/24` → `2^8 = 256` - `/20` → `2^12 = 4096`

Drill 8: kube-proxy Modes

Difficulty: Hard

Q: What are the kube-proxy modes and how do they affect Service routing?

Answer | Mode | How | Performance | Features | |------|-----|-------------|----------| | **iptables** (default) | Writes iptables rules per Service/endpoint | Good for <1000 services | Random load balancing | | **IPVS** | Linux Virtual Server in kernel | Better for >1000 services | Round-robin, least-conn, etc. | | **nftables** | Modern iptables replacement | Similar to iptables | K8s 1.29+ |
# Check current mode
kubectl get configmap kube-proxy -n kube-system -o yaml | grep mode

# IPVS enables more load balancing algorithms:
# rr (round-robin), lc (least connections), sh (source hashing)
iptables mode with 10K+ services causes slow rule updates and high CPU. Switch to IPVS for large clusters.

Drill 9: Default Deny NetworkPolicy

Difficulty: Easy

Q: Write a NetworkPolicy that blocks all ingress and egress for a namespace, then allow DNS egress.

Answer
# Block everything
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}    # Matches all pods
  policyTypes:
  - Ingress
  - Egress
---
# Allow DNS (required for service discovery)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Egress
  egress:
  - to: []
    ports:
    - protocol: UDP
      port: 53
    - protocol: TCP
      port: 53
Start with deny-all, then add specific allow rules per service. This is the "zero trust" approach.

Drill 10: Headless Service

Difficulty: Medium

Q: What is a headless Service? When and why would you use one?

Answer A headless Service has `clusterIP: None`. Instead of load-balancing to one pod, DNS returns **all pod IPs**.
apiVersion: v1
kind: Service
metadata:
  name: postgres
spec:
  clusterIP: None
  selector:
    app: postgres
  ports:
  - port: 5432
# Normal Service: returns one virtual IP
nslookup web-svc  10.96.1.100

# Headless Service: returns all pod IPs
nslookup postgres  10.244.1.5, 10.244.2.8, 10.244.3.12
Use cases: - **StatefulSets**: each pod needs a stable DNS name (`postgres-0.postgres.ns.svc`) - **Client-side load balancing**: app picks which pod to connect to - **Service discovery**: client needs to know all endpoints

Wiki Navigation

Prerequisites