Kubernetes Networking — Trivia & Interesting Facts¶

Surprising, historical, and little-known facts about Kubernetes networking.

Kubernetes networking has exactly three fundamental rules¶

The Kubernetes networking model has three non-negotiable requirements: (1) every pod gets its own IP address, (2) any pod can communicate with any other pod without NAT, (3) the IP a pod sees for itself is the same IP others see. These three rules seem simple but were revolutionary — Docker's default networking used NAT and port mapping, which made service discovery painful. Kubernetes delegated the implementation to CNI plugins while enforcing the model.

kube-proxy does not actually proxy anything in most clusters¶

Despite its name, kube-proxy does not sit in the data path. In its default iptables mode (since Kubernetes 1.2), it programs iptables rules that redirect traffic at the kernel level. In ipvs mode (GA since 1.11), it uses the kernel's IPVS load balancer. Actual packets never touch the kube-proxy process. The name is a holdover from Kubernetes 1.0 when it was literally a userspace TCP proxy — a design that was too slow and was replaced within months.

A single Kubernetes Service creates dozens of iptables rules¶

In iptables mode, each Service generates one rule per endpoint (pod) in the KUBE-SVC and KUBE-SEP chains, plus probability-based load balancing rules. A cluster with 5,000 Services averaging 3 endpoints each creates approximately 45,000 iptables rules. At this scale, rule updates take seconds and packets must traverse long chains, adding latency. This is why large clusters switch to IPVS mode, which uses hash tables and handles 100,000+ rules efficiently.

NetworkPolicy defaults to "allow all" — most clusters have zero network segmentation¶

Out of the box, every pod can talk to every other pod in a Kubernetes cluster. NetworkPolicies are opt-in and additive (they only restrict, never explicitly allow beyond the default). A 2023 survey by Isovalent found that fewer than 30% of production clusters had any NetworkPolicies deployed. This means that a compromised pod in most clusters can freely probe every service, database, and control plane component.

CoreDNS replaced kube-dns in Kubernetes 1.11 and most people did not notice¶

The switch from kube-dns (a combination of dnsmasq, SkyDNS, and a sidecar) to CoreDNS happened in 2018. CoreDNS is a single Go binary with a plugin architecture, replacing three separate processes. The migration was seamless because CoreDNS maintained identical DNS record formats. The only visible difference: CoreDNS configuration is a Corefile ConfigMap instead of command-line flags, which is dramatically more readable.

DNS lookups in Kubernetes generate 4-10x more queries than you expect¶

When a pod resolves my-service, the resolver appends search domains from /etc/resolv.conf and tries each one before falling back to the bare name. A typical pod has search domains for <namespace>.svc.cluster.local, svc.cluster.local, cluster.local, and the node's domain. Each lookup generates 4-5 DNS queries (A and AAAA for each search domain). At scale, this amplification can overwhelm CoreDNS. The fix is ndots:2 or FQDN usage (appending a trailing dot).

Cilium replaces kube-proxy, iptables, and NetworkPolicy with eBPF¶

Cilium, created by Isovalent (acquired by Cisco in 2023), implements Kubernetes networking entirely in eBPF — programs that run in the Linux kernel without modifying kernel source. Cilium replaces iptables for service routing, kube-proxy for load balancing, and traditional CNI plugins for pod networking. Benchmarks show 40-60% latency reduction compared to iptables at scale, because eBPF uses hash maps instead of linear rule chains.

Pod CIDR exhaustion is a real scaling limit that surprises teams¶

Each node is assigned a pod CIDR (typically /24, giving 256 addresses). With a cluster CIDR of /16, you can have at most 256 nodes. Teams that start with a small CIDR range hit this limit and discover that changing the cluster CIDR requires rebuilding the cluster. AWS EKS, GKE, and AKS all have different defaults and limits, and secondary CIDR ranges or IPv6 are the usual escape hatches.

Headless Services expose individual pod IPs via DNS¶

Setting clusterIP: None on a Service creates a "headless" Service that returns the IP addresses of all backing pods in DNS A records instead of a single virtual IP. This is essential for stateful workloads (databases, Kafka brokers) where clients need to connect to specific pods. The DNS records update as pods come and go, providing a lightweight service discovery mechanism without any load balancing.

LoadBalancer Services cost real money — one per Service¶

Each type: LoadBalancer Service in a cloud environment provisions an actual cloud load balancer (AWS NLB/ALB, GCP TCP LB, Azure LB). At $15-20/month each, a cluster with 50 LoadBalancer Services costs $750-1000/month just in load balancer fees. This is why Ingress controllers and Gateway API exist — they consolidate all external traffic through a single LoadBalancer, routing by hostname/path to internal Services.

The Container Network Interface (CNI) specification is remarkably simple¶

The entire CNI specification fits in a few pages. A CNI plugin is just an executable that accepts a JSON configuration and network namespace path, sets up networking for a container, and returns the assigned IP. The simplicity is deliberate — it allows dozens of implementations (Calico, Cilium, Flannel, Weave, Antrea) to coexist with radically different networking approaches while maintaining a common interface.

Kubernetes does not natively support multi-network pods — but Multus does¶

By default, a pod gets exactly one network interface (plus loopback). Multus CNI, a CNCF Sandbox project, allows pods to attach to multiple networks — essential for telco/NFV workloads that need separate management, data, and control plane networks. Multus acts as a "meta-plugin" that delegates to other CNI plugins for each additional interface.