HAProxy & Nginx Load Balancing Footguns¶
Mistakes that drop connections, route traffic to dead servers, or turn your load balancer into a bottleneck.
1. No health checks on backend servers¶
You define backend servers without check. HAProxy happily sends traffic to a server that crashed 20 minutes ago. Clients see connection timeouts. You do not find out until someone complains.
Fix: Always enable health checks. Use application-level checks (option httpchk GET /health), not just TCP connect. Set fall 3 rise 2 so one flaky check does not flap the server.
Under the hood: TCP health checks only verify that the port is open — the kernel accepts the connection even if the application is deadlocked or returning errors. HTTP health checks (
option httpchk) make a real HTTP request and check the response code. This catches application-level failures that TCP checks miss. Theinter 5s fall 3 rise 2settings mean: check every 5s, mark down after 3 failures, mark up after 2 successes.
2. Restarting backends without draining connections¶
You deploy by restarting the application process. 150 active HTTP connections get RST. Users see "connection reset" errors. Your monitoring shows a spike of 5xx responses every deploy.
Fix: Drain the server first: echo "set server backend/server1 state drain" | socat stdio /var/run/haproxy.sock. Wait for active connections to finish. Restart the backend. Re-enable. Automate this in your deploy script.
Debug clue: Seeing a spike of 502/503 errors on every deploy? Check
haproxy_backend_connection_errors_totalin your metrics. If it correlates exactly with deploy timestamps, you're killing connections instead of draining them. The fix is in the deploy script, not the LB config.
3. Nginx upstream DNS cached at startup¶
You use a hostname in proxy_pass and Nginx resolves it once at config load. The backend IP changes (container rescheduled, DNS failover). Nginx keeps sending traffic to the old IP. The stale IP may be unreachable or serving a different service.
Fix: Use a resolver directive and a variable: set $backend "http://hostname:port"; proxy_pass $backend;. This forces Nginx to re-resolve on each request.
Default trap: Nginx (open source) resolves upstream hostnames only at config load time and caches the result forever. Nginx Plus has dynamic resolution built in. For open-source Nginx, the
resolver+ variable workaround is mandatory for any upstream that can change IPs (containers, cloud services, DNS failover).
4. Timeout too short for file uploads¶
Your default proxy_read_timeout is 30 seconds. A user uploads a 500 MB file. The backend takes 45 seconds to process it. Nginx returns 504 Gateway Timeout. The user retries and the upload starts over.
Fix: Set longer timeouts per-location for upload endpoints. location /upload { proxy_read_timeout 300s; client_max_body_size 1g; }. Keep the default short for everything else.
5. HAProxy maxconn set globally but not per-server¶
Global maxconn 50000 but no per-server limit. One slow backend accumulates connections while others are idle. The slow server hits its own connection limit and starts rejecting. Meanwhile, HAProxy has capacity but all connections are queued behind the slow server.
Fix: Set maxconn per server based on what the backend can handle. server app1 10.0.1.10:8080 check maxconn 200. Use balance leastconn for backends with variable response times.
6. Missing X-Forwarded-For header¶
Your reverse proxy does not set X-Forwarded-For. Backend access logs show the proxy IP for every request. Your rate limiter throttles everyone because all requests appear to come from one IP. Your geolocation service puts all users in the datacenter's city.
Fix: Always set proxy headers: proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; in Nginx. In HAProxy, use option forwardfor. Backend must read the header and trust only the proxy IP.
7. Health check that always returns 200¶
Your /health endpoint returns 200 if the process is running. The database connection pool is exhausted. Every real request returns 500. The load balancer thinks all servers are healthy and keeps sending traffic. The failure is invisible at the LB layer.
Fix: Health check endpoints should verify actual service readiness: database connectivity, critical dependency availability, disk space. Return 503 if the service cannot handle requests, even if the process is running.
Gotcha: Don't make health checks too thorough — a health check that queries the database on every call adds load and can itself become a source of failure. Check connection pool status (are connections available?) rather than running a query. Cache the health check result for 5-10 seconds to prevent the health check from becoming a DDoS on your dependencies.
8. TLS termination without renewing certs on the LB¶
You terminate TLS at the load balancer. The cert expires. You renewed it on the backend servers but forgot the LB has its own copy. All HTTPS traffic gets cert warnings. Your monitoring checks TLS on the backend directly and shows green.
Fix: Monitor certificate expiry on the load balancer itself, not just the backend. Use cert-manager or automation to deploy certs to the LB. Alert 30 days before expiry on every TLS endpoint.
9. Round-robin across backends with different capacity¶
You have three backends: a 32-core server and two 4-core servers. HAProxy round-robins equally. The small servers get overwhelmed while the large server is at 30% utilization. Response times are inconsistent.
Fix: Use weight to match server capacity. server big 10.0.1.10:8080 check weight 80 and server small 10.0.1.11:8080 check weight 10. Or use balance leastconn to distribute based on current load.
10. Testing config changes by restarting the LB in production¶
You edit the HAProxy config, run systemctl restart haproxy. There is a syntax error. HAProxy does not start. All traffic to your service drops. You scramble to fix the typo while the site is down.
Fix: Always validate the config first: haproxy -c -f /etc/haproxy/haproxy.cfg or nginx -t. Use reload instead of restart — reload is hitless if the config is valid, and the old process keeps running if the new config is invalid.
Remember:
nginx -s reload= graceful (old workers finish current requests, new workers use new config).systemctl restart nginx= stop then start (brief gap).haproxy -sf $(cat /var/run/haproxy.pid)= soft-stop old process (seamless). Always reload, never restart, in production.