Load Balancing — Trivia & Interesting Facts¶

Surprising, historical, and little-known facts about load balancing technology and operations.

Round-robin DNS was the Internet's first "load balancer"¶

Before dedicated load balancers existed, the only way to distribute traffic was round-robin DNS — returning multiple A records for a domain and relying on clients to pick one. This was how early high-traffic websites (including the original Yahoo!) distributed load. The approach had no health checking, no session persistence, and relied on DNS TTLs for failover. It was terrible, but it was free and it worked well enough.

The F5 BIG-IP became a billion-dollar product because HTTP was too complex for routers¶

In the late 1990s, F5 Networks realized that Layer 4 load balancing (distributing TCP connections) wasn't enough — you needed to inspect HTTP headers, manage cookies, and terminate SSL to make intelligent routing decisions. The BIG-IP's ability to do Layer 7 load balancing, SSL offloading, and cookie-based persistence made it the dominant hardware load balancer for 15 years and spawned a cottage industry of iRules scripting.

Consistent hashing was invented at MIT for web caching, not load balancing¶

David Karger introduced consistent hashing in 1997 to solve the problem of distributed web caches (where adding or removing a cache server would invalidate almost every cached object). The algorithm ensures that adding a server only remaps ~1/n of the keys. It was later adopted for load balancing (and powers systems like Amazon DynamoDB and memcached ring). Maglev, Google's load balancer, uses a variation that provides better distribution than basic consistent hashing.

The "power of two random choices" is surprisingly close to optimal¶

The "power of two choices" algorithm — pick two random backend servers and send the request to the one with fewer connections — provides exponential improvement over simple random selection. With pure random, the most loaded server has O(log n / log log n) extra connections; with two choices, it drops to O(log log n). This algorithm, described by Michael Mitzenmacher in 2001, is used in Envoy, nginx, and HAProxy because it delivers near-optimal distribution with minimal state.

Health checks that are too aggressive can cause cascading failures¶

If a load balancer's health check interval is too short and threshold too low (e.g., 1-second checks, fail after 2 misses), a brief CPU spike on a backend can cause it to miss two checks and get removed. The remaining backends get more traffic, become slower, start failing health checks themselves, and the cascade continues until every backend is marked unhealthy. This "thundering herd of health checks" pattern has caused many major outages.

DSR (Direct Server Return) makes the load balancer invisible on the return path¶

In DSR mode, the load balancer handles only incoming traffic — the backend servers respond directly to the client, bypassing the load balancer entirely. This reduces the load balancer's bandwidth requirements by 90%+ (since responses are typically much larger than requests). DSR was critical in the pre-cloud era when hardware load balancers were bandwidth-limited, and it's still used for high-throughput services like video streaming.

Kubernetes Services use iptables rules, and this doesn't scale¶

The default Kubernetes kube-proxy implementation creates iptables rules for every Service endpoint. For a cluster with 5,000 Services and 10 endpoints each, this means 50,000+ iptables rules that are evaluated linearly for every packet. At scale, iptables rule updates can take seconds and cause traffic disruption. This scaling problem drove the development of IPVS mode (kernel-level L4 load balancer) and eBPF-based alternatives like Cilium.

The "sticky sessions" antipattern is still everywhere¶

Session affinity (sending the same user to the same backend) was essential when applications stored session state in server memory. Modern best practice is stateless backends with external session storage (Redis, database), but sticky sessions persist in production because migrating session state is hard. Every load balancer still supports cookie-based and source-IP-based affinity, and disabling it almost always breaks something the first time.

Global server load balancing (GSLB) is just fancy DNS¶

Despite its impressive name, GSLB typically works by returning different DNS answers based on the client's geographic location, server health, and current load. There is nothing magical about it — it's DNS-based traffic steering with health checks. This means GSLB inherits all of DNS's limitations: TTL caching, resolver behavior, and the inability to redirect mid-session. True anycast-based global load balancing (as used by Cloudflare and Google) is fundamentally different and doesn't rely on DNS steering.

The "thundering herd" problem at startup has a name: cold start¶

When a backend server first joins a load balancer pool, it has empty caches, cold JIT compilers, and uninitialized connection pools. Sending it an equal share of traffic immediately can overwhelm it. Slow start (gradually ramping up traffic to new backends) was added to most load balancers specifically for this reason. HAProxy, Envoy, and AWS ALB all support slow start, but many operators don't enable it because they don't know it exists.