Portal | Level: L1: Foundations | Topics: Load Testing | Domain: DevOps & Tooling

Load Testing — Primer¶

Why This Matters¶

Deploying code without load testing is flying blind. You discover your breaking point in production, during a real incident, with real customers. Load testing lets you find the breaking point on your schedule, in a controlled environment, with time to fix it. Beyond finding limits, it validates SLOs before launch, catches performance regressions in CI, and gives you confidence when traffic spikes.

Test Type Taxonomy¶

Different test types answer different questions. Use the wrong type and you get the wrong answer.

Test Type	Pattern	Duration	Goal
Load test	Constant or ramping to target load	10–30 min	Validate behavior at expected traffic
Stress test	Ramp past expected limits	30–60 min	Find the breaking point
Soak test	Sustained expected load	4–24 hours	Find memory leaks, connection pool exhaustion
Spike test	Instant jump to 10x normal	5–10 min	Validate autoscaling, circuit breakers
Breakpoint test	Slow ramp until failure	60–120 min	Determine exact capacity ceiling

Don't conflate them. A 5-minute load test won't catch a memory leak that manifests after 2 hours.

k6¶

Who made it: k6 was created by Load Impact (a Swedish performance testing company, founded 2010), which rebranded to Grafana k6 after Grafana Labs acquired them in 2021. The tool was open-sourced in 2017 and intentionally avoided JMeter's UI-heavy approach in favor of code-as-config.

k6 is the best tool for most teams. It's written in Go, scripts in JavaScript (ES6), produces clean metrics, and integrates natively with Grafana.

Basic Script Structure¶

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 50 },   // ramp up to 50 VUs over 2 minutes
    { duration: '5m', target: 50 },   // stay at 50 VUs
    { duration: '2m', target: 0 },    // ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'],  // 95th percentile under 500ms
    http_req_failed: ['rate<0.01'],    // error rate under 1%
  },
};

export default function () {
  const res = http.get('https://api.example.com/health');

  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 500ms': (r) => r.timings.duration < 500,
  });

  sleep(1);  // think time between requests
}

VUs vs Arrival Rate¶

This is one of the most important distinctions in k6.

VU-based (default): k6 maintains N virtual users, each executing the script loop as fast as possible. If your script has sleep(1), throughput ≈ N VUs / (response_time + sleep_time). Throughput varies with response time.

Arrival rate (open model): k6 sends a fixed number of iterations per time unit regardless of response time. This models real user behavior — if your server slows down, users don't slow down.

export const options = {
  scenarios: {
    constant_arrival_rate: {
      executor: 'constant-arrival-rate',
      rate: 100,           // 100 iterations per second
      timeUnit: '1s',
      duration: '5m',
      preAllocatedVUs: 50, // how many VUs to pre-allocate
      maxVUs: 200,         // cap on VUs k6 can spin up
    },
  },
};

Use arrival rate when: modeling real traffic (external users don't wait for your server), testing autoscaling (server slows → backlog builds → measures queue depth), comparing scenarios fairly.

Use VU-based when: modeling a fixed number of concurrent sessions (e.g., 50 logged-in users browsing), testing with expensive setup (login, authentication flow).

Checks vs Thresholds¶

These are different things. Confusing them causes misconfigured tests.

Checks are assertions on individual responses. They do NOT fail the test. They generate pass/fail counters in the output.

check(res, {
  'status 200': (r) => r.status === 200,
  'body contains id': (r) => r.json('id') !== undefined,
});

Thresholds are SLO-level assertions on aggregated metrics. They DO fail the test (exit code 99).

export const options = {
  thresholds: {
    http_req_duration: ['p(95)<500', 'p(99)<1000'],
    http_req_failed: ['rate<0.01'],
    'checks{name:status 200}': ['rate>0.99'],  // threshold on a specific check
  },
};

Custom Metrics¶

import { Counter, Gauge, Rate, Trend } from 'k6/metrics';

const cartErrors = new Counter('cart_errors');
const activeUsers = new Gauge('active_users');
const checkoutSuccess = new Rate('checkout_success_rate');
const checkoutDuration = new Trend('checkout_duration');

export default function () {
  const start = Date.now();
  const res = http.post('/api/checkout', JSON.stringify(payload), params);

  checkoutDuration.add(Date.now() - start);
  checkoutSuccess.add(res.status === 200);

  if (res.status !== 200) {
    cartErrors.add(1);
  }
}

Scenarios (Multiple Workloads)¶

export const options = {
  scenarios: {
    browse: {
      executor: 'constant-arrival-rate',
      rate: 200,
      timeUnit: '1s',
      duration: '10m',
      preAllocatedVUs: 100,
      exec: 'browsing',
    },
    checkout: {
      executor: 'constant-arrival-rate',
      rate: 10,
      timeUnit: '1s',
      duration: '10m',
      preAllocatedVUs: 20,
      exec: 'checkout',
    },
  },
};

export function browsing() { /* ... */ }
export function checkout() { /* ... */ }

k6 Output to InfluxDB + Grafana¶

# Run with InfluxDB output
k6 run --out influxdb=http://localhost:8086/k6 script.js

# Or with environment variable
K6_OUT=influxdb=http://localhost:8086/k6 k6 run script.js

docker-compose for the Grafana stack:

version: '3'
services:
  influxdb:
    image: influxdb:1.8
    ports: ["8086:8086"]
    environment:
      INFLUXDB_DB: k6

  grafana:
    image: grafana/grafana:latest
    ports: ["3000:3000"]
    environment:
      GF_AUTH_ANONYMOUS_ENABLED: "true"
    volumes:
      - grafana_data:/var/lib/grafana

volumes:
  grafana_data:

Import the k6 dashboard (ID 2587) from grafana.com. It includes VUs, RPS, p95/p99, error rate, and data sent/received.

k6 Output to Prometheus (Remote Write)¶

# Requires k6 v0.42+
k6 run --out experimental-prometheus-rw script.js

# With config
K6_PROMETHEUS_RW_SERVER_URL=http://prometheus:9090/api/v1/write \
K6_PROMETHEUS_RW_TREND_AS_NATIVE_HISTOGRAM=true \
k6 run --out experimental-prometheus-rw script.js

Running k6 in CI (GitHub Actions)¶

- name: Run load test
  uses: grafana/k6-action@v0.3.1
  with:
    filename: tests/load/api.js
  env:
    K6_CLOUD_TOKEN: ${{ secrets.K6_CLOUD_TOKEN }}

- name: Upload results
  uses: actions/upload-artifact@v3
  with:
    name: k6-results
    path: results.json

Locust¶

Python-based load testing. Slower than k6 but excellent for complex user flows with Python logic.

from locust import HttpUser, task, between, constant_throughput

class APIUser(HttpUser):
    wait_time = between(1, 3)   # random wait between requests

    def on_start(self):
        """Called once per user at startup — use for login."""
        self.client.post("/login", json={
            "username": "testuser",
            "password": "testpass",
        })

    @task(3)   # weight: 3x more likely than weight-1 tasks
    def browse_products(self):
        with self.client.get("/products", catch_response=True) as resp:
            if resp.elapsed.total_seconds() > 2:
                resp.failure("Response too slow")

    @task(1)
    def checkout(self):
        self.client.post("/cart/checkout", json={"cart_id": "test-123"})

# Headless mode
locust -f locustfile.py --headless -u 100 -r 10 --run-time 5m \
  --host https://api.example.com \
  --html report.html --csv results

# -u: peak users, -r: spawn rate (users/second)

Distributed Locust:

# Master node
locust -f locustfile.py --master --expect-workers 4

# Worker nodes (run on separate machines)
locust -f locustfile.py --worker --master-host=192.168.1.10

Gatling¶

JVM-based, Scala DSL, excellent HTML reports. Best choice when your team already uses Java/Scala or needs the rich report format.

class BasicSimulation extends Simulation {

  val httpProtocol = http
    .baseUrl("https://api.example.com")
    .acceptHeader("application/json")

  val scn = scenario("Browse and Search")
    .exec(
      http("Home page")
        .get("/")
        .check(status.is(200))
    )
    .pause(1)
    .exec(
      http("Search products")
        .get("/products?q=widget")
        .check(
          status.is(200),
          jsonPath("$.count").gt("0")
        )
    )

  setUp(
    scn.inject(
      rampUsers(50).during(2.minutes),
      constantUsersPerSec(100).during(5.minutes),
      rampUsersPerSec(100).to(0).during(2.minutes)
    )
  ).protocols(httpProtocol)
   .assertions(
     global.responseTime.percentile3.lt(500),  // p95 < 500ms
     global.successfulRequests.percent.gt(99)
   )
}

Run: mvn gatling:test or gradle gatlingRun

Interpreting Results¶

The Metrics That Matter¶

Metric	What it means	Target
p50 (median)	Half of requests faster than this	< 200ms
p95	95% of requests faster than this	< 500ms
p99	99% of requests faster than this	< 1000ms
p99.9	1 in 1000 requests	Varies
Error rate	% of failed requests	< 0.1%
Throughput (RPS)	Requests per second	Match SLO
Active VUs	Concurrent virtual users	As designed

Why p95/p99, not average: Averages hide outliers. A service averaging 100ms but with 5% of requests taking 5000ms is broken for 1 in 20 users. p99 is what your worst-case users experience.

Remember: Mnemonic: "Averages lie, percentiles don't." In load testing, always report p50, p95, and p99. The gap between p50 and p99 reveals how consistent your service is — a small gap means predictable performance; a large gap means some users are having a terrible experience.

The Shape of a Breaking System¶

When you're approaching capacity limits, you see these patterns in order:

Latency climbs — requests start queuing
Error rate rises — timeouts and 503s appear
Throughput plateaus or drops — server can't process more requests
Connection errors — socket limits, accept queue overflow

         |   throughput
RPS/lat  |   ____
         |  /    \___
latency  |         ___________
         |_________________________________________
                          ↑ breaking point

Warm-Up Period¶

Always throw away the first 1–2 minutes of results. JVM JIT compilation, connection pool establishment, DNS cache, CDN warming — all of these affect early results. A spike at the start of a test is almost always warm-up, not a real problem.

export const options = {
  stages: [
    { duration: '2m', target: 50 },  // warm-up: ramp to target
    { duration: '10m', target: 50 }, // test: stable at target
    { duration: '1m', target: 0 },   // cool-down
  ],
  // k6 Cloud: tag the ramp-up so results exclude it
};

Coordinated Omission Problem¶

This is a subtle but important issue that makes most load testers report better latency than actually exists.

The problem: If your test uses VU-based execution with sleep(1), and the server slows down (say each request takes 5 seconds instead of 0.5 seconds), your VUs are stuck waiting — so you're sending fewer requests per second. The test automatically backs off. You think you're testing at 100 RPS but you're actually testing at 20 RPS. The slow requests happened, but you didn't measure all the responses a real user would have experienced.

What actually happens in production: Real users don't back off because your server is slow. They keep sending requests. The queue grows. Latency compounds.

Fun fact: The term "coordinated omission" was coined by Gil Tene (Azul Systems) in a 2015 talk. He showed that most benchmarking tools systematically underreport latency by 10-100x because they slow down their request rate when the server slows down — exactly the opposite of what real users do.

The fix: Use arrival-rate executors in k6 (constant-arrival-rate, ramping-arrival-rate). These maintain the target rate regardless of response time. When the server slows, VUs back up, you hit maxVUs, and k6 reports dropped iterations — which is what actually happens to users.

// This executor is immune to coordinated omission
export const options = {
  scenarios: {
    load: {
      executor: 'constant-arrival-rate',
      rate: 100,
      timeUnit: '1s',
      duration: '10m',
      preAllocatedVUs: 100,
      maxVUs: 500,
    },
  },
};

Testing in Production Safely¶

Traffic Shadowing¶

Mirror production traffic to your new service without affecting users. Tools: Goreplay, Envoy's mirror filter.

# Envoy mirror filter
routes:
  - match:
      prefix: "/api/v2/"
    route:
      cluster: production-v2
      request_mirror_policies:
        - cluster: production-v3-shadow
          runtime_fraction:
            default_value:
              numerator: 10      # mirror 10% of traffic
              denominator: HUNDRED

Production Load Tests with Rate Limits¶

When you must run against production:

Use a dedicated test account that's excluded from business metrics
Cap rate: never exceed 10–20% of current traffic
Have a kill switch: k6 run --abort-on-fail or a feature flag to disable
Run during off-peak hours
Coordinate with on-call — treat it as a planned incident
Tag test traffic with a header (X-Load-Test: true) so it can be filtered from metrics

export const options = {
  scenarios: {
    prod_test: {
      executor: 'constant-arrival-rate',
      rate: 5,          // 5 RPS max — conservative
      timeUnit: '1s',
      duration: '10m',
      maxVUs: 20,
    },
  },
  thresholds: {
    http_req_failed: ['rate<0.05'],  // abort if error rate exceeds 5%
  },
};

export default function () {
  const params = {
    headers: {
      'X-Load-Test': 'true',
      'Authorization': `Bearer ${__ENV.TEST_TOKEN}`,
    },
  };
  http.get('https://api.production.example.com/products', params);
}

Quick Reference¶

k6 CLI Options¶

k6 run script.js                         # basic run
k6 run -u 50 -d 5m script.js            # 50 VUs for 5 minutes (overrides options)
k6 run --vus 100 --iterations 1000 script.js  # 1000 total iterations
k6 run --out json=results.json script.js # write results to JSON
k6 run --out csv=results.csv script.js   # write results to CSV
k6 run -e BASE_URL=https://staging.example.com script.js  # pass env vars
k6 cloud script.js                       # run on k6 Cloud

Locust CLI Options¶

locust -f locustfile.py -u 100 -r 10 --run-time 5m --headless
locust -f locustfile.py --web-port 8089  # web UI at localhost:8089
locust -f locustfile.py --csv=results    # write CSV

Useful k6 Modules¶

import http from 'k6/http';          // HTTP requests
import { check, sleep, group } from 'k6';  // assertions, timing, grouping
import { SharedArray } from 'k6/data';    // shared test data (loaded once)
import { Counter, Rate, Trend } from 'k6/metrics';  // custom metrics
import exec from 'k6/execution';          // VU and scenario info

Common Threshold Patterns¶

thresholds: {
  // Latency SLOs
  http_req_duration: ['p(95)<500', 'p(99)<2000'],

  // Error rate SLO
  http_req_failed: ['rate<0.01'],

  // Check pass rate
  'checks{scenario:checkout}': ['rate>0.99'],

  // Custom metric threshold
  checkout_duration: ['p(95)<3000'],

  // Abort on threshold breach (exit code 99 AND stop test)
  http_req_duration: [{ threshold: 'p(95)<500', abortOnFail: true, delayAbortEval: '30s' }],
}

Prerequisites¶

Observability Deep Dive (Topic Pack, L2)