Skip to content

Load Testing Footguns

Common mistakes that produce misleading results, skip real problems, or accidentally harm production.


1. Using VU-based load tests to model open arrival systems

You have 100 VUs with sleep(1). Each VU does one request per second — but only if the server responds in < 1 second. When the server slows down to 3 seconds per response, your "100 RPS" test is actually sending ~25 RPS. The test automatically backs off. You think your server handles 100 RPS fine. It doesn't.

Fix: Use constant-arrival-rate or ramping-arrival-rate executors in k6 for any test modeling real user traffic. Reserve VU-based tests for "N concurrent session" modeling where each session has a defined flow.


2. Not throwing away warm-up data

You see a big latency spike in the first 2 minutes of your test and include it in p95/p99 calculations. Your report says p99 is 2 seconds. Actually it's 400ms — the spike was JVM JIT compilation, DNS cache priming, connection pool establishment, and CDN warming.

Fix: Always structure tests with a warm-up stage (ramp-up period) and exclude it from SLO measurement. In k6 Cloud, tag stages. In self-hosted, check the timestamps on your results and filter out the first N minutes.

export const options = {
  stages: [
    { duration: '3m', target: 100 },  // warm-up: exclude from reporting
    { duration: '10m', target: 100 }, // test window: this is your data
    { duration: '2m', target: 0 },    // cool-down
  ],
};

3. Treating checks as test assertions

You write checks for every response and assume a passing test means checks passed. But checks only affect the checks counter metric — they don't fail the test. You can have 100% of checks failing and k6 exits 0.

Fix: Always add thresholds for your checks. If checks are important, make them thresholds.

export const options = {
  thresholds: {
    checks: ['rate>0.99'],                    // 99%+ of checks must pass
    'checks{name:payment created}': ['rate>0.999'],  // payment specifically: 99.9%
  },
};

4. Testing a single endpoint in isolation

Your API gateway passes. But in production, a single user flow hits 8 endpoints in sequence. You never tested the interaction: the /cart endpoint hammers Redis, the /checkout endpoint calls 3 microservices in parallel, and the /inventory check does a full table scan under load.

Fix: Write realistic user journey scenarios. Use group() to organize them.

import { group } from 'k6';

export default function () {
  group('browse', () => {
    http.get('/products');
    http.get('/products/123');
  });
  group('add to cart', () => {
    http.post('/cart', JSON.stringify({ product_id: '123' }));
  });
  group('checkout', () => {
    http.post('/checkout');
  });
}

5. Not setting timeouts

Your test runs with no HTTP timeout. The server starts queuing under load. Requests pile up. k6 VUs sit waiting for responses that take 60 seconds each. Your throughput crashes to zero. You think "the server failed" but actually the test itself degraded — VUs are stuck waiting instead of sending new requests.

Fix: Set explicit timeouts that reflect your SLO, not whatever the server happens to do.

const params = {
  timeout: '10s',  // fail the request if no response in 10 seconds
  headers: { 'Content-Type': 'application/json' },
};

const res = http.get(url, params);

6. Ignoring the coordinated omission problem

You run a VU-based test at 100 RPS (nominally). At 80% of your breaking point, the server starts getting slow. Your VUs back off (they're waiting for responses). You never actually reach 100 RPS. Your test reports great p99 because the slow requests caused other VUs to slow down too, so there are fewer requests in flight at any time. This makes the system look healthier than it is.

Fix: Use arrival-rate executors. They maintain the request rate regardless of server speed. When the server slows, requests queue. k6 reports dropped iterations — which is what would actually happen to real users.


7. Running load tests in the same region as the server

You run k6 on the same machine as your server (localhost), or in the same AWS AZ. Network latency is < 1ms. Your application looks fast. In production, users are in different regions with 50–200ms base latency. Your CDN or load balancer adds more. Your p99 latency result of "20ms" is meaningless.

Fix: Run load tests from the region where your users are. Use k6 Cloud, distributed Locust workers, or EC2 instances in the appropriate regions. Always specify the regions in your test report.


8. Not correlating load test results with server metrics

You see p95 latency of 800ms and error rate of 2%. But why? Is it CPU saturation? Database connection pool exhausted? Downstream service timing out? GC pauses? Without correlating the load test timeline with server-side metrics, you don't know what to fix.

Fix: Always run load tests with server-side metrics visible in parallel. Use Grafana dashboards that show CPU, memory, DB connections, GC activity, and downstream latency during the test window. The moment latency starts climbing, look at what changed on the server side.


9. Hardcoding test data

All 100 VUs send requests for the same user ID, same product ID, same search query. The database query plan is cached. The CDN serves cached responses. Your "load test" is actually a CDN cache hit rate test. Production traffic has thousands of unique users with unique data patterns.

Fix: Use SharedArray with realistic test data. Generate user IDs, product IDs, search queries, and other inputs from a pre-built dataset that approximates your production data distribution.

const users = new SharedArray('users', () => JSON.parse(open('./data/test-users.json')));

export default function () {
  const user = users[__VU % users.length];
  http.get(`/api/users/${user.id}/orders`);
}

10. Accidentally load testing production with no rate limit

You run a test against your staging URL but the environment variable is wrong, or staging points to production. You send 500 RPS to production with no kill switch. Or you're doing a controlled production test but forget to cap the rate and the test script ramps to 1000 VUs.

Fix: Always add an abortOnFail threshold for error rate. Add an explicit cap to max VUs. Validate the target URL at test start.

export const options = {
  scenarios: {
    load: {
      maxVUs: 50,  // hard cap — can't accidentally overload
    },
  },
  thresholds: {
    http_req_failed: [
      { threshold: 'rate<0.10', abortOnFail: true, delayAbortEval: '10s' },
    ],
  },
};

export function setup() {
  // Validate we're not hitting production accidentally
  const res = http.get(`${__ENV.BASE_URL}/health`);
  const body = res.json();
  if (body.environment === 'production') {
    throw new Error('ABORT: BASE_URL points to production');
  }
}

11. Only measuring average latency

Average latency is almost useless for capacity planning. A service with average 100ms but p99 of 5000ms is broken for 1% of users. Average hides bimodal distributions (fast path and slow path). A single 10-second request mixed with 999 100ms requests gives an average of ~110ms — seemingly fine, but one user had a terrible experience.

Fix: Always report p50, p95, p99, and p99.9. Set thresholds on p95 and p99, not the average. If you must report one number, report p95.


12. Not running a soak test before launch

You run a 10-minute load test. Everything looks great. You launch. Six hours later, memory has climbed to 95%, response times are degrading, and the service is falling over. The 10-minute test didn't exercise the memory leak (a small leak in a handler, compounding over time) or the connection pool leak (connections not returned to pool on certain error paths).

Fix: Run at least one soak test (4–8 hours at expected load) before any significant launch. Monitor RSS memory, open file descriptors, database connection count, and thread count over the duration.

# Watch for memory leak during soak test
while true; do
  ps -o pid,rss,vsz,cmd -p $(pgrep -f "uvicorn|gunicorn|node") | tail -n +2
  sleep 60
done

13. Ignoring DNS and connection overhead

Your test reuses HTTP connections (keep-alive). Production uses service mesh with mTLS — new certificate validation on each connection. Or production uses a different DNS TTL, causing more DNS lookups. Your test shows 50ms latency; production shows 200ms.

Fix: Configure your test to match production connection behavior. If production uses short DNS TTLs, disable DNS caching in your test. If production always uses new connections, disable keep-alive.

export const options = {
  // Disable keep-alive to model connection-per-request services
  noConnectionReuse: true,
  // Or configure per-scenario:
  scenarios: {
    fresh_connections: {
      executor: 'constant-arrival-rate',
      rate: 50,
      timeUnit: '1s',
      duration: '5m',
      preAllocatedVUs: 100,
      options: {
        noConnectionReuse: true,
      },
    },
  },
};