Load Testing — Trivia & Interesting Facts¶

Surprising, historical, and little-known facts about load testing.

The first load testing tools were just shell scripts running curl in a loop¶

Before commercial and open-source load testing tools existed, engineers tested web application capacity with shell scripts running wget or curl in parallel loops. The approach was crude but effective at finding basic capacity limits. ApacheBench (ab), bundled with the Apache HTTP Server since 1996, was one of the first purpose-built tools and is still used for quick-and-dirty load tests today.

JMeter was created at Apache in 1998 and is still the most widely used load testing tool¶

Apache JMeter was created by Stefano Mazzocchi in 1998 as a Java-based load testing tool. Despite being over 25 years old and having a notoriously clunky GUI, JMeter remains the most widely used open-source load testing tool in the world. Its longevity is attributed to its extensive protocol support, plugin ecosystem, and the fact that millions of engineers already know how to use it.

Coordinated omission is a load testing bug that makes results look 10-100x better than reality¶

Gil Tene (CTO of Azul Systems) identified and named "coordinated omission" — a systematic error in most load testing tools where slow responses cause the tool to reduce its request rate, which in turn makes the measured response times look better. His 2013 presentations showed that this bug existed in virtually every major load testing tool, and that real-world latencies were often 10-100x worse than reported.

Netflix's load testing sends real traffic to shadow services, not synthetic requests¶

Netflix's load testing approach, called "replay traffic testing," captures real production traffic and replays it against new service versions. This produces far more realistic results than synthetic load tests because real traffic patterns are complex and bursty in ways that are nearly impossible to simulate. The technique requires careful handling to avoid side effects (like sending duplicate emails).

The C10K problem was considered extreme in 1999; now C10M is the target¶

Dan Kegel's 1999 paper "The C10K Problem" asked how to handle 10,000 concurrent connections on a single server — considered ambitious at the time. The solutions (epoll, kqueue, io_uring) became the foundation of modern high-performance servers. By 2015, the C10M problem (10 million connections) had become the new frontier, driven by mobile apps and IoT devices maintaining persistent connections.

Gatling was created because its author was frustrated with JMeter's architecture¶

Stephane Landelle created Gatling in 2012 specifically because JMeter's thread-per-user model consumed too much memory for large-scale tests. Gatling uses Akka actors and Scala, allowing a single machine to simulate orders of magnitude more users than JMeter. The tool was named after the Gatling gun — rapid-fire requests. Gatling's DSL-based test scripts are also far more readable than JMeter's XML configuration.

k6 was acquired by Grafana Labs because load testing and observability are inseparable¶

k6, a modern load testing tool written in Go with JavaScript test scripts, was acquired by Grafana Labs in 2021. The acquisition reflected a fundamental insight: load testing produces metrics that only make sense in the context of system observability. By integrating k6 with Grafana, Prometheus, and Loki, teams can see both the load test results and the system's internal behavior in a single dashboard.

Amazon found that every 100ms of latency costs 1% in sales¶

Amazon's famous 2006 finding that every 100 milliseconds of added page load time reduced sales by 1% became one of the most cited statistics in web performance. Google found similar results: a 500ms delay in search results caused a 20% drop in traffic. These findings made load testing and performance optimization direct business metrics rather than purely technical concerns.

Load testing production is controversial but increasingly common¶

Traditionally, load testing was done in staging environments that didn't match production. Companies like Netflix, Google, and Amazon now load test production directly using techniques like traffic shifting, dark launching, and shadow testing. The argument is simple: staging environments lie. They have different data, different scale, and different network characteristics. Only production tells you how production will behave.

The "hockey stick" traffic pattern breaks more systems than gradual ramps¶

Gradual traffic ramps (used in most load tests) give systems time to scale up and warm caches. Real-world traffic spikes — a viral tweet, a product launch, a breaking news event — look like hockey sticks: flat, then nearly vertical. These sudden spikes expose auto-scaling lag, cold cache performance, connection pool exhaustion, and thundering herd problems that gradual ramp tests completely miss.

Locust was created to make load testing "code-first" and Pythonic¶

Locust, created by Jonatan Heyman in 2011, took the radical position that load test scenarios should be plain Python code, not XML configurations or GUI click-throughs. Users define behavior as Python classes, and Locust handles the distributed execution. The name "Locust" references a swarm — many small agents creating massive aggregate load. Its simplicity made it enormously popular in the Python ecosystem.