Portal | Level: L2: Operations | Topics: Load Balancing | Domain: Networking
HAProxy & Nginx for Ops - Primer¶
Why This Matters¶
Every production service that handles real traffic sits behind a load balancer or reverse proxy. HAProxy and Nginx are the two workhorses of this layer. HAProxy is purpose-built for high-performance load balancing with deep health checking and connection management. Nginx started as a web server but became the dominant reverse proxy for its simplicity and flexibility. Understanding both — when to use which, how to configure them for ops scenarios, and how to debug them under load — is essential for anyone running production infrastructure.
This is not about serving static files or configuring PHP-FPM. This is about using HAProxy and Nginx as infrastructure components: load balancing, health checks, connection draining, TLS termination, rate limiting, and routing traffic during deployments.
Core Concepts¶
1. HAProxy Architecture¶
┌─────────────┐
Clients ──────────> │ Frontend │ (bind address, ACLs, routing rules)
└──────┬──────┘
│
┌──────▼──────┐
│ Backend │ (server pool, health checks, balancing)
└──┬───┬───┬──┘
│ │ │
┌──▼┐┌─▼─┐┌▼──┐
│S1 ││S2 ││S3 │ (backend servers)
└───┘└───┘└───┘
# /etc/haproxy/haproxy.cfg
global
log /dev/log local0
maxconn 50000
user haproxy
group haproxy
daemon
stats socket /var/run/haproxy.sock mode 660 level admin
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5s
timeout client 30s
timeout server 30s
retries 3
option redispatch # Retry on a different server if one fails
frontend http_front
bind *:80
bind *:443 ssl crt /etc/haproxy/certs/
redirect scheme https code 301 if !{ ssl_fc }
default_backend app_servers
# ACL-based routing
acl is_api path_beg /api/
acl is_admin path_beg /admin/
use_backend api_servers if is_api
use_backend admin_servers if is_admin
backend app_servers
balance roundrobin
option httpchk GET /health
http-check expect status 200
server app1 10.0.1.10:8080 check inter 5s fall 3 rise 2
server app2 10.0.1.11:8080 check inter 5s fall 3 rise 2
server app3 10.0.1.12:8080 check inter 5s fall 3 rise 2
backend api_servers
balance leastconn
option httpchk GET /health
server api1 10.0.2.10:8080 check
server api2 10.0.2.11:8080 check
listen stats
bind *:8404
stats enable
stats uri /stats
stats refresh 5s
stats auth admin:securepassword
2. HAProxy Health Checks¶
| Check Type | Config | Use Case |
|---|---|---|
| TCP connect | check |
Basic port check |
| HTTP check | option httpchk GET /health |
Application-level health |
| Custom expect | http-check expect string "ok" |
Verify response body |
| Agent check | agent-check agent-port 8081 |
App reports its own weight |
# Health check parameters
server app1 10.0.1.10:8080 check inter 5s fall 3 rise 2
# │ │ │ │
# │ │ │ └ Healthy after 2 consecutive passes
# │ │ └ Unhealthy after 3 consecutive failures
# │ └ Check interval
# └ Enable health checking
3. HAProxy Connection Draining¶
When you need to take a server out of rotation without dropping active connections:
# Via the stats socket (runtime API)
echo "set server app_servers/app1 state drain" | socat stdio /var/run/haproxy.sock
# Server stops accepting NEW connections but finishes existing ones
# Check server state
echo "show servers state" | socat stdio /var/run/haproxy.sock
# Fully disable (for maintenance)
echo "set server app_servers/app1 state maint" | socat stdio /var/run/haproxy.sock
# Re-enable
echo "set server app_servers/app1 state ready" | socat stdio /var/run/haproxy.sock
4. HAProxy Stick Tables¶
Stick tables provide server affinity and real-time traffic tracking:
frontend http_front
bind *:80
# Track request rates per source IP
stick-table type ip size 100k expire 30s store http_req_rate(10s)
http-request track-sc0 src
# Rate limit: deny if more than 100 requests per 10 seconds
http-request deny deny_status 429 if { sc_http_req_rate(0) gt 100 }
backend app_servers
balance roundrobin
# Session persistence via cookie
cookie SERVERID insert indirect nocache
server app1 10.0.1.10:8080 check cookie s1
server app2 10.0.1.11:8080 check cookie s2
5. Nginx as Reverse Proxy¶
# /etc/nginx/nginx.conf
worker_processes auto;
worker_rlimit_nofile 65535;
events {
worker_connections 4096;
multi_accept on;
}
http {
# Upstream definition
upstream app_backend {
least_conn;
server 10.0.1.10:8080 max_fails=3 fail_timeout=30s;
server 10.0.1.11:8080 max_fails=3 fail_timeout=30s;
server 10.0.1.12:8080 max_fails=3 fail_timeout=30s;
keepalive 32; # Keep connections to backends alive
}
upstream api_backend {
server 10.0.2.10:8080;
server 10.0.2.11:8080;
}
server {
listen 80;
server_name app.example.com;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl http2;
server_name app.example.com;
ssl_certificate /etc/nginx/certs/app.crt;
ssl_certificate_key /etc/nginx/certs/app.key;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 1d;
location / {
proxy_pass http://app_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
proxy_set_header Connection ""; # Enable keepalive to upstream
}
location /api/ {
proxy_pass http://api_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_connect_timeout 5s;
proxy_read_timeout 60s;
}
}
}
6. Nginx Rate Limiting¶
http {
# Define rate limit zones
limit_req_zone $binary_remote_addr zone=general:10m rate=10r/s;
limit_req_zone $binary_remote_addr zone=api:10m rate=50r/s;
limit_conn_zone $binary_remote_addr zone=addr:10m;
server {
# Apply rate limiting
location / {
limit_req zone=general burst=20 nodelay;
limit_conn addr 10; # Max 10 concurrent connections per IP
proxy_pass http://app_backend;
}
location /api/ {
limit_req zone=api burst=100 nodelay;
proxy_pass http://api_backend;
}
# Custom rate limit error response
error_page 429 = @rate_limited;
location @rate_limited {
return 429 '{"error": "rate_limited", "retry_after": 10}';
add_header Content-Type application/json;
add_header Retry-After 10;
}
}
}
7. Blue-Green and Canary Routing¶
# HAProxy: blue-green with weight shifting
backend app_servers
balance roundrobin
server blue 10.0.1.10:8080 check weight 100
server green 10.0.1.20:8080 check weight 0
# Shift traffic to green:
# echo "set weight app_servers/green 10" | socat stdio /var/run/haproxy.sock
# echo "set weight app_servers/green 50" | socat stdio /var/run/haproxy.sock
# echo "set weight app_servers/green 100" | socat stdio /var/run/haproxy.sock
# echo "set weight app_servers/blue 0" | socat stdio /var/run/haproxy.sock
# Nginx: canary routing with split_clients
split_clients $remote_addr $app_backend {
10% canary_backend;
* stable_backend;
}
upstream stable_backend {
server 10.0.1.10:8080;
server 10.0.1.11:8080;
}
upstream canary_backend {
server 10.0.1.20:8080;
}
server {
location / {
proxy_pass http://$app_backend;
}
}
8. TLS Termination Patterns¶
Pattern 1: TLS at LB (most common)
Client --[TLS]--> HAProxy/Nginx --[HTTP]--> Backend
Pro: Simple, one place for cert management
Con: Backend traffic is unencrypted
Pattern 2: TLS passthrough
Client --[TLS]--> HAProxy (mode tcp) --[TLS]--> Backend
Pro: End-to-end encryption, LB can't read traffic
Con: No L7 routing, backend manages certs
Pattern 3: TLS re-encryption
Client --[TLS]--> HAProxy/Nginx --[TLS]--> Backend
Pro: End-to-end encryption with L7 routing at LB
Con: Double TLS overhead, two cert sets to manage
Common Pitfalls¶
- No health checks — Without active health checks, the LB sends traffic to dead backends until TCP timeouts expire. Users see 30-second delays, not instant errors.
- Forgetting proxy protocol headers — Without
X-Forwarded-For, backend access logs show the LB's IP for every request. Rate limiting, geolocation, and audit trails break. - Too-short timeouts for long-running requests — A 30-second
proxy_read_timeoutkills file uploads and report generation. Set timeouts per-location. - Session affinity hiding failures — Sticky sessions mean one bad server affects only a subset of users. Those users suffer while your overall health metrics look fine.
- Not draining connections during deploys — You restart a backend server and 200 active connections get RST. Users see errors during every deployment.
- Health check endpoint that returns 200 when the app is broken — Your
/healthendpoint checks if the web framework is running but not if the database is reachable. The LB thinks the server is healthy while it returns 500s to real requests.
Wiki Navigation¶
Prerequisites¶
- Networking Deep Dive (Topic Pack, L1)
Related Content¶
- API Gateways & Ingress (Topic Pack, L2) — Load Balancing
- Load Balancing Flashcards (CLI) (flashcard_deck, L1) — Load Balancing
- Nginx & Web Servers (Topic Pack, L1) — Load Balancing
- Runbook: Load Balancer Health Check Failure (Runbook, L2) — Load Balancing
Pages that link here¶
- API Gateways & Ingress
- Anti-Primer: Load Balancing
- Certification Prep: AWS SAA — Solutions Architect Associate
- Comparison: Ingress Controllers
- HAProxy & Nginx for Ops
- Master Curriculum: 40 Weeks
- Nginx & Web Servers
- Production Readiness Review: Study Plans
- Runbook: Load Balancer Health Check Failure
- Thinking Out Loud: Load Balancing