Nginx: The Swiss Army Server
- lesson
- nginx-architecture
- configuration-hierarchy
- location-matching
- reverse-proxy
- load-balancing
- tls-termination
- rate-limiting
- caching
- access-control
- logging
- nginx-vs-apache ---# Nginx -- The Swiss Army Server
Topics: Nginx architecture, configuration hierarchy, location matching, reverse proxy, load balancing, TLS termination, rate limiting, caching, access control, logging, Nginx vs Apache Level: L1-L2 (Foundations to Operations) Time: 60-90 minutes Prerequisites: None (basic Linux command line helps)
The Mission¶
It's Tuesday at 3pm. Your monitoring fires: 502 Bad Gateway errors on checkout.example.com, intermittent, affecting roughly 20% of requests. The app servers are running. The health endpoint returns 200. But users are getting errors.
You're the person with the Nginx config access. Time to figure out what's going on.
We'll diagnose this incident step by step -- and along the way, you'll learn how Nginx actually works, from its process model to its most dangerous configuration traps. By the end, you'll understand Nginx well enough to configure it from scratch and debug it at 3am.
Name Origin: Nginx is pronounced "engine-X," not "N-G-I-N-X." Igor Sysoev started building it in 2002 to solve the C10K problem -- handling 10,000 concurrent connections on a single server -- for Rambler, Russia's second-largest website at the time. He released it publicly in 2004. F5 Networks acquired Nginx, Inc. in 2019 for $670 million, which is a remarkable arc for a side project by a single developer.
Part 1: How Nginx Actually Works¶
Before we debug anything, you need a mental model of what's running on the server.
The Master-Worker Architecture¶
┌─────────────────┐
Clients ──────>│ Master Process │ reads config, manages workers
└────────┬────────┘
│ fork()
┌──────────────┼──────────────┐
v v v
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Worker 1 │ │ Worker 2 │ │ Worker N │
│ (event │ │ (event │ │ (event │
│ loop) │ │ loop) │ │ loop) │
└─────────┘ └─────────┘ └─────────┘
The master process runs as root (to bind ports 80/443). It reads the config, forks worker processes, and manages their lifecycle. Workers run as an unprivileged user (www-data or nginx) and handle all the actual connections.
Each worker runs a single-threaded event loop using epoll (Linux) or kqueue (BSD). One worker can handle thousands of connections simultaneously because it never blocks on I/O -- it registers interest in events and moves on. When data arrives, the kernel notifies the worker.
Under the Hood: Apache's classic
preforkmodel spawns one process per connection. At 10,000 connections, that's 10,000 processes, each consuming ~10MB of RAM = 100GB. Nginx handles the same load with 4 workers consuming ~50MB total. The trade-off: Nginx can't run application code (like PHP) inside the worker -- it must proxy to a separate process. Apache can embed the interpreter (mod_php). This is why Nginx dominates as a reverse proxy while Apache still appears in shared hosting environments.
Worker Tuning -- The One Knob¶
# /etc/nginx/nginx.conf
worker_processes auto; # one per CPU core (auto detects)
worker_rlimit_nofile 65535; # max open files per worker
events {
worker_connections 4096; # max connections per worker
multi_accept on; # accept all pending connections at once
}
Total capacity = worker_processes x worker_connections. With 4 cores and 4096 connections per worker: 16,384 simultaneous connections. Each proxied connection uses two file descriptors (client-side + backend-side), so worker_rlimit_nofile should be at least 2x worker_connections.
Trivia: Nginx's process model has essentially one tuning knob:
worker_processes auto. Compare that to Apache's MPM configuration (prefork vs worker vs event, each with MaxRequestWorkers, ServerLimit, ThreadsPerChild, MinSpareThreads, MaxSpareThreads...). This simplicity is a feature, not a limitation.
Part 2: Back to the 502s -- First Steps¶
OK, model in hand. Let's diagnose.
# Step 1: Is Nginx running?
systemctl status nginx
# Active: active (running). Good.
# Step 2: What does the error log say?
tail -50 /var/log/nginx/error.log
You see lines like:
2026/03/23 15:02:14 [error] 1234#0: *5678 connect() failed
(111: Connection refused) while connecting to upstream,
client: 203.0.113.50, server: checkout.example.com,
request: "GET /api/cart HTTP/2.0",
upstream: "http://10.0.1.12:8080/api/cart"
Connection refused to 10.0.1.12:8080. That's one of your backend servers. Let's check if it's listening:
# Step 3: Can we reach the backend directly?
curl -v http://10.0.1.10:8080/health # 200 OK
curl -v http://10.0.1.11:8080/health # 200 OK
curl -v http://10.0.1.12:8080/health # Connection refused!
Server 12 is down. But Nginx is still sending traffic to it. Why? Because open-source Nginx only does passive health checks.
Flashcard Check #1¶
Q: What is the difference between active and passive health checks?
A: Active checks probe backends on a schedule (every N seconds). Passive checks only notice failures when real user traffic gets errors. Nginx open source uses passive checks only -- it marks a server down after
max_failsfailures infail_timeoutseconds. Nginx Plus adds active health checks. HAProxy supports both out of the box.
Q: What does a 502 Bad Gateway mean in Nginx?
A: Nginx is running fine, but it cannot get a valid response from the upstream backend. Common causes: backend process down, connection refused, backend crashed mid-response. The answer is almost always in
/var/log/nginx/error.log.
Part 3: The Configuration Hierarchy¶
Let's look at the config that's routing to the dead backend. Nginx config is hierarchical:
main (global)
├── events { }
├── http { } # All HTTP traffic
│ ├── upstream backend { } # Server pools
│ ├── server { } # Virtual host (domain)
│ │ ├── location / { } # URI matching
│ │ └── location /api/ { }
│ └── server { } # Another domain
└── stream { } # TCP/UDP proxy (non-HTTP)
Directives inherit downward. A setting in http {} applies to all server {} blocks unless overridden. A setting in server {} applies to all its location {} blocks unless overridden.
http {
gzip on; # applies everywhere
server {
gzip off; # overrides for this server
location /api {
gzip on; # overrides again for this location
}
}
}
Gotcha: The
add_headerdirective does not follow this inheritance pattern. If you add anyadd_headerin alocationblock, alladd_headerdirectives from parent contexts are dropped for that location. This has silently removed security headers (HSTS, CSP, X-Frame-Options) in countless production configs. You must repeat every header in every location that defines any.
Here's the upstream block for our checkout service:
No max_fails. No fail_timeout. No keepalive. This is the default "round-robin with no safety net" configuration. Let's fix it.
Part 4: Upstream Blocks and Load Balancing¶
Load Balancing Algorithms¶
| Algorithm | Directive | Best For |
|---|---|---|
| Round-robin | (default) | Stateless services, uniform backends |
| Least connections | least_conn; |
Long-lived requests, uneven response times |
| IP hash | ip_hash; |
Sticky sessions (same client -> same backend) |
| Generic hash | hash $key; |
Cache affinity, consistent routing |
| Random | random two least_conn; |
Large pools, two-choice power |
Mental Model: Round-robin is a dumb waiter distributing plates evenly regardless of who's still eating. Least-connections is a smart waiter who gives the next plate to whoever has the fewest. IP-hash is a host who assigns each guest to a specific table for the whole meal.
The Fixed Upstream Block¶
upstream checkout_backend {
least_conn;
server 10.0.1.10:8080 max_fails=3 fail_timeout=30s;
server 10.0.1.11:8080 max_fails=3 fail_timeout=30s;
server 10.0.1.12:8080 max_fails=3 fail_timeout=30s backup;
keepalive 32;
}
What changed:
| Directive | What It Does |
|---|---|
least_conn |
Routes to the server with fewest active connections |
max_fails=3 |
Mark server down after 3 failed requests |
fail_timeout=30s |
Window for counting failures; also how long the server stays "down" |
backup |
Only receives traffic when all non-backup servers are down |
keepalive 32 |
Maintain 32 idle persistent connections to backends per worker |
The keepalive directive requires two companion settings in the location block:
location / {
proxy_pass http://checkout_backend;
proxy_http_version 1.1;
proxy_set_header Connection ""; # clear the Connection header for keepalive
}
Under the Hood: Without
keepalive, Nginx opens a new TCP connection to the backend for every request. At 1000 req/s, that's 1000 TCP handshakes per second, each costing ~1ms on a LAN. Withkeepalive 32, connections are reused. Monitor withss -s-- look at connection churn.
Nginx Plus vs Open Source¶
| Feature | Open Source | Nginx Plus |
|---|---|---|
| Active health checks | No | Yes (health_check directive) |
| Session persistence | ip_hash only |
Cookies, learn, route |
| Live dashboard | stub_status (basic) |
Full API + dashboard |
| Dynamic reconfiguration | Reload required | Runtime API |
| DNS re-resolution | Manual workaround | Built-in |
| Price | Free | ~$2,500/year/instance |
For most teams, open-source Nginx + the passive health check workaround is sufficient. If you need active health checks without paying, HAProxy is the common alternative.
Part 5: The Location Matching Algorithm¶
While we're in the config, let's understand how Nginx decides which location block handles a request. This is the most misunderstood part of Nginx configuration.
The Decision Tree¶
Request URI arrives (e.g., /api/v2/users)
│
├─ 1. Check all EXACT matches (=)
│ Match found? ──> USE IT. Stop.
│
├─ 2. Find the LONGEST PREFIX match
│ │
│ ├─ Is it a ^~ prefix?
│ │ Yes ──> USE IT. Stop. (skip regex)
│ │
│ └─ Remember it, keep going...
│
├─ 3. Check REGEX locations (~ and ~*) in config order
│ First match found? ──> USE IT. Stop.
│
└─ 4. No regex matched? ──> Use the remembered longest prefix from step 2.
A Concrete Example¶
location = /health { return 200 "ok"; } # A: exact
location ^~ /static/ { root /var/www; } # B: prefix, skip regex
location ~ \.php$ { proxy_pass http://php; } # C: regex
location ~* \.(jpg|png|gif)$ { expires 30d; } # D: case-insensitive regex
location /api/ { proxy_pass http://backend; } # E: prefix
location / { proxy_pass http://frontend; } # F: catch-all prefix
| Request | Matches | Winner | Why |
|---|---|---|---|
/health |
A, F | A | Exact match, highest priority |
/static/logo.png |
B, D, F | B | ^~ prefix stops regex evaluation |
/app/index.php |
C, F | C | Regex beats plain prefix |
/api/users |
E, F | E | Longest prefix (no regex matches) |
/about |
F | F | Only the catch-all matches |
/static/style.css |
B, F | B | ^~ prefix, no regex can override |
Remember: The matching priority mnemonic is "E-P-R-P": Exact (
=), Preferential prefix (^~), Regex (~/~*), Plain prefix (longest). When in doubt, use=for health checks and^~for static file directories.Interview Bridge: "Explain Nginx location matching order" is a common DevOps interview question. The key insight interviewers want: regex locations can override prefix locations (unless
^~is used), and regexes are evaluated in config file order (first match wins), not longest match.
Flashcard Check #2¶
Q: You have location /api/ { } and location ~ ^/api { }. A request to /api/users arrives. Which wins?
A: The regex
~ ^/apiwins. Regex locations override plain prefix locations. To force the prefix to win, change it tolocation ^~ /api/ { }.
Q: You add location = /api/health { }. Does it override the regex?
A: Yes. Exact matches (
=) have the highest priority and are checked first.
Part 6: The proxy_pass Trailing Slash Trap¶
This is the single most common Nginx misconfiguration. One character changes everything.
Without Trailing Slash: Pass-Through¶
location /app/ {
proxy_pass http://backend;
}
# Request: GET /app/page
# Backend receives: GET /app/page
# The full URI passes through unchanged.
With Trailing Slash: URI Rewriting¶
location /app/ {
proxy_pass http://backend/;
}
# Request: GET /app/page
# Backend receives: GET /page
# The location prefix (/app/) is stripped!
With a Path: Substitution¶
location /app/ {
proxy_pass http://backend/v2/;
}
# Request: GET /app/page
# Backend receives: GET /v2/page
# /app/ is replaced with /v2/
Remember: No slash = No change. Slash = Substitute. The trailing
/onproxy_passactivates URI rewriting -- it strips thelocationprefix from the request path.War Story: A team spent two days debugging why their API returned HTML error pages instead of JSON. The root cause:
proxy_pass http://backend/(trailing slash) stripped the/api/v2/prefix. The backend received bare paths like/usersinstead of/api/v2/users, hit its catch-all 404 handler, and returned an HTML error page. One character. Two days.Gotcha: The "off-by-slash" vulnerability (presented by Orange Tsai at Black Hat) exploits this exact confusion. A missing trailing slash on the
locationcombined with a trailing slash onproxy_passcan enable path traversal:GET /api../internal/secretresolves to the internal endpoint. This is a common finding in security audits. Always ensure yourlocationdirective ends with/when theproxy_passURL does.
Part 7: TLS Configuration -- The Modern Profile¶
Our checkout service needs HTTPS. Here's a production-ready TLS configuration following Mozilla's "modern" profile (TLS 1.2+, strong ciphers only):
server {
listen 443 ssl http2;
server_name checkout.example.com;
# Certificates (Let's Encrypt via certbot)
ssl_certificate /etc/letsencrypt/live/checkout.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/checkout.example.com/privkey.pem;
# Protocol and ciphers -- modern profile
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384;
ssl_prefer_server_ciphers off; # let TLS 1.3 handle cipher negotiation
# OCSP stapling -- server fetches revocation status so clients don't have to
ssl_stapling on;
ssl_stapling_verify on;
resolver 8.8.8.8 8.8.4.4 valid=300s;
# Session resumption -- avoid repeated handshakes
ssl_session_cache shared:SSL:10m; # 10MB shared cache across workers
ssl_session_timeout 1d;
ssl_session_tickets off; # off for forward secrecy
# HSTS -- tell browsers to always use HTTPS
add_header Strict-Transport-Security "max-age=63072000; includeSubDomains" always;
}
# HTTP -> HTTPS redirect
server {
listen 80;
server_name checkout.example.com;
return 301 https://$host$request_uri;
}
| Directive | Why |
|---|---|
ssl_protocols TLSv1.2 TLSv1.3 |
TLS 1.0/1.1 deprecated since 2020 (RFC 8996) |
ssl_prefer_server_ciphers off |
TLS 1.3 clients negotiate ciphers better than servers |
ssl_stapling on |
Server includes OCSP response in handshake -- faster, more private |
ssl_session_tickets off |
Session tickets break forward secrecy unless keys are rotated |
HSTS max-age=63072000 |
Two years. Once set, browsers refuse plain HTTP. Start with a short max-age when testing. |
Gotcha: HSTS is hard to undo. If you set a long
max-ageand then need to revert to HTTP (unlikely but possible), browsers that cached the HSTS header will refuse to connect. Start withmax-age=300(5 minutes) during testing, increase to the full value once confirmed.
Testing Your TLS Config¶
# Check what cipher and protocol are negotiated
openssl s_client -connect checkout.example.com:443 -servername checkout.example.com </dev/null 2>/dev/null | grep -E "Protocol|Cipher"
# Check certificate expiry
echo | openssl s_client -connect checkout.example.com:443 2>/dev/null | openssl x509 -noout -dates
# Full scan with Mozilla's SSL config test
# https://ssl-config.mozilla.org/ (config generator)
# https://www.ssllabs.com/ssltest/ (grade your config)
Part 8: Rate Limiting -- The Leaky Bucket¶
Your checkout endpoint is getting hammered by a bot. Time for rate limiting. Nginx implements the leaky bucket algorithm.
How the Leaky Bucket Works¶
Imagine a bucket with a hole in the bottom. Water (requests) pours in from the top. Water leaks out the bottom at a constant rate. If the bucket overflows, excess water (requests) is rejected.
Requests arrive (variable rate)
│ │ │ │ │ │
v v v v v v
┌──────────────────┐
│ Burst Buffer │ capacity = burst parameter
│ ┌──────────┐ │
│ │ queued │ │
│ │ requests │ │
│ └────┬─────┘ │
│ │ │
└────────┼──────────┘
│ leaks at constant rate
v
Processed at: rate parameter (e.g., 10r/s)
Overflow? ──> 429 Too Many Requests
The Config¶
http {
# Zone definition: 10MB shared memory, 10 requests/second per client IP
limit_req_zone $binary_remote_addr zone=checkout:10m rate=10r/s;
server {
location /api/checkout {
limit_req zone=checkout burst=20 nodelay;
limit_req_status 429;
proxy_pass http://checkout_backend;
}
}
}
The Math¶
rate=10r/s: One request allowed every 100ms (1000ms / 10).burst=20: The bucket holds 20 extra requests.nodelay: Burst requests are processed immediately instead of being queued at the leak rate.
Without nodelay, requests arriving faster than the rate are delayed (queued) until a slot opens. With nodelay, they're processed immediately up to the burst limit, then rejected. For APIs, nodelay is almost always what you want.
| Scenario (rate=10r/s, burst=20) | Result |
|---|---|
| 10 requests in one second | All 10 served immediately |
| 30 requests in one second | 20 served (burst), 10 rejected with 429 |
| 1 request/second for 20 seconds, then 25 at once | 20 served from burst buffer, 5 rejected |
Gotcha: The
10min the zone definition is 10 megabytes of shared memory for tracking client IPs. Each$binary_remote_addrentry uses ~128 bytes, so 10MB tracks about 80,000 unique IPs. If you're behind a CDN, all traffic may appear from one IP -- use$http_x_forwarded_foror a request header instead.
Part 9: Proxy Caching¶
Backend responses that don't change often? Cache them at the Nginx layer to avoid hitting the backend at all.
http {
# Define cache: 1GB max, inactive entries purged after 60 min
proxy_cache_path /var/cache/nginx levels=1:2
keys_zone=app_cache:10m
max_size=1g
inactive=60m
use_temp_path=off;
server {
location /api/products {
proxy_cache app_cache;
proxy_cache_valid 200 302 10m; # cache 200/302 for 10 min
proxy_cache_valid 404 1m; # cache 404 for 1 min
proxy_cache_use_stale error timeout updating http_500 http_502;
add_header X-Cache-Status $upstream_cache_status;
proxy_pass http://checkout_backend;
}
}
}
The proxy_cache_use_stale directive is the safety net: if the backend is down or slow, Nginx serves the stale cached version instead of returning an error. The X-Cache-Status header tells you whether each response was a HIT, MISS, STALE, or BYPASS -- invaluable for debugging.
Trivia: Even 1-second caching (microcaching) dramatically reduces backend load during traffic spikes. If 500 users hit the same product page in one second, the backend handles 1 request, the cache handles 499. Enable with
proxy_cache_valid 200 1sandproxy_cache_lock on(prevents a stampede of requests all trying to populate the cache simultaneously).
Part 10: Access Control¶
Multiple layers, from IP-based to external authentication.
IP Allow/Deny¶
location /admin/ {
allow 10.0.0.0/8; # internal network
allow 192.168.1.0/24; # office network
deny all; # everyone else
proxy_pass http://admin_backend;
}
Rules are evaluated top to bottom; first match wins.
Basic Auth¶
location /staging/ {
auth_basic "Staging Environment";
auth_basic_user_file /etc/nginx/.htpasswd;
proxy_pass http://staging_backend;
}
External Auth (auth_request)¶
For OAuth, JWT, or custom auth -- delegate to an external service:
location /api/ {
auth_request /auth;
auth_request_set $auth_user $upstream_http_x_auth_user;
proxy_set_header X-Auth-User $auth_user;
proxy_pass http://backend;
}
location = /auth {
internal;
proxy_pass http://auth-service:9000/validate;
proxy_pass_request_body off;
proxy_set_header Content-Length "";
proxy_set_header X-Original-URI $request_uri;
}
Nginx makes a subrequest to /auth before every request to /api/. If the auth service returns 200, the request proceeds. If it returns 401 or 403, Nginx returns that status to the client. The internal directive prevents external clients from hitting the /auth location directly.
Part 11: Logging for Debugging¶
The default Nginx log format is fine for basic use, but in production you need request timing and upstream information.
log_format detailed '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent" '
'rt=$request_time uct=$upstream_connect_time '
'urt=$upstream_response_time ucs=$upstream_cache_status';
access_log /var/log/nginx/access.log detailed;
error_log /var/log/nginx/error.log warn;
| Variable | What It Tells You |
|---|---|
$request_time |
Total time from first client byte to last byte sent back |
$upstream_connect_time |
Time to establish connection to backend |
$upstream_response_time |
Time from connection to full response from backend |
$upstream_cache_status |
HIT, MISS, STALE, BYPASS, EXPIRED |
If $request_time is high but $upstream_response_time is low, the bottleneck is between Nginx and the client (slow client, network). If $upstream_response_time is high, the backend is slow.
# Quick log analysis: find slow requests
awk '$NF > 2.0 {print $0}' /var/log/nginx/access.log | tail -20
# Status code distribution
awk '{print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -rn
# Requests per second
awk '{print $4}' /var/log/nginx/access.log | cut -d: -f1-3 | uniq -c | sort -rn | head
Part 12: The DNS Caching War Story¶
War Story: A team ran Nginx in front of a service that scaled via Kubernetes. The upstream block used the service hostname:
server checkout-svc.default.svc.cluster.local:8080. Everything worked -- until a rolling deployment happened. Kubernetes assigned new pod IPs. The DNS record updated. But Nginx had resolved the hostname at config load and cached the old IPs. Traffic kept going to the old, now-dead pods. 502s everywhere.The fix required two changes:
# BAD: DNS resolved once at startup, cached forever
upstream backend {
server checkout-svc.default.svc.cluster.local:8080;
}
# GOOD: Force runtime DNS re-resolution
resolver 10.96.0.10 valid=30s; # kube-dns IP
server {
location / {
set $backend "http://checkout-svc.default.svc.cluster.local:8080";
proxy_pass $backend;
}
}
The
set $variabletrick forces Nginx to re-resolve the hostname on each request using theresolverdirective. Without the variable, Nginx treats the hostname as a static IP resolved at config load. This hits every Nginx installation in dynamic environments -- Kubernetes, Docker, AWS ECS, any system where backend IPs change. Nginx Plus handles this natively; open-source Nginx requires the workaround.
Part 13: Nginx vs Apache -- When It Matters¶
| Dimension | Nginx | Apache |
|---|---|---|
| Architecture | Event-driven, non-blocking | Process/thread per connection (event MPM available) |
| Memory per connection | ~2.5 KB | ~10 MB (prefork), ~2 MB (event MPM) |
| Static files | Extremely fast (sendfile, async I/O) | Fast, but more overhead |
| .htaccess | Not supported | Per-directory config overrides |
| Embedded scripting | No (proxy to PHP-FPM, etc.) | Yes (mod_php, mod_perl) |
| Reverse proxy | Primary use case | Possible but less common |
| Configuration reload | Zero-downtime (SIGHUP) | Graceful restart |
| Market share (2024) | ~34% of active sites | ~29% of active sites |
Mental Model: Nginx is a traffic cop -- it directs requests efficiently but doesn't do the work itself. Apache is a Swiss Army knife that can embed the work inside itself. In modern architectures where the application runs as a separate process (Node.js, Go, Python), Nginx's traffic-cop model is a perfect fit. Apache's embedded model is better for traditional shared hosting where each user's PHP runs inside the web server.
Flashcard Check #3¶
Q: What happens when you run nginx -s reload?
A: The master process re-reads the config, forks new worker processes with the new config, and tells old workers to finish their current requests and exit. Zero downtime. Under the hood, it sends SIGHUP to the master process.
Q: Nginx open source only supports _____ health checks. What fills the blank?
A: Passive. It only notices backend failures when real requests fail. Active health checks (probing backends on a schedule) require Nginx Plus or an alternative like HAProxy.
Q: What does proxy_cache_use_stale error timeout http_502 do?
A: When the backend returns an error, times out, or returns 502, Nginx serves a stale (expired) cached response instead of passing the error to the client. It's a safety net that keeps your site up during backend outages.
Q: You're behind a CDN. Rate limiting on $binary_remote_addr limits all users to 10r/s total. Why?
A: The CDN's IP is the remote address for all requests. All users appear to come from the same IP. Use a forwarded header (
$http_x_forwarded_for) or CDN-specific header instead.
Exercises¶
Exercise 1 (2 min): Write an Nginx location block that exact-matches /health and returns 200 OK with the body "healthy", with no access log.
Solution
The `=` ensures only `/health` matches (not `/health/check`). `access_log off` prevents health check probes from filling your logs.Exercise 2 (5 min): You have this config. A request to /static/images/logo.png arrives. Which location handles it? What if you remove the ^~?
location ^~ /static/ { root /var/www; }
location ~* \.(png|jpg|gif)$ { expires 30d; proxy_pass http://backend; }
location / { proxy_pass http://frontend; }
Solution
With `^~`: the `/static/` prefix wins because `^~` prevents regex evaluation. Nginx serves the file from `/var/www/static/images/logo.png`. Without `^~`: the regex `~* \.(png|jpg|gif)$` wins because regex locations override plain prefix locations. The request is proxied to the backend with a 30-day cache header -- probably not what you want for local static files.Exercise 3 (10 min): Write a complete Nginx config for a service at api.example.com that: (a) redirects HTTP to HTTPS, (b) uses least_conn across 3 backends with passive health checks, (c) rate limits to 20 requests/second per IP with a burst of 40, and (d) returns a custom JSON error for 429s.
Solution
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=20r/s;
upstream api_backend {
least_conn;
server 10.0.1.10:8080 max_fails=3 fail_timeout=30s;
server 10.0.1.11:8080 max_fails=3 fail_timeout=30s;
server 10.0.1.12:8080 max_fails=3 fail_timeout=30s;
keepalive 32;
}
server {
listen 80;
server_name api.example.com;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl http2;
server_name api.example.com;
ssl_certificate /etc/letsencrypt/live/api.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/api.example.com/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
location / {
limit_req zone=api_limit burst=40 nodelay;
limit_req_status 429;
proxy_pass http://api_backend;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
error_page 429 = @rate_limited;
location @rate_limited {
default_type application/json;
return 429 '{"error": "rate_limited", "message": "Too many requests", "retry_after": 1}';
}
}
Cheat Sheet¶
Essential Commands¶
| Task | Command |
|---|---|
| Test config syntax | nginx -t |
| Safe reload | nginx -t && nginx -s reload |
| Dump full merged config | nginx -T |
| Show version and modules | nginx -V |
| Error log (live) | tail -f /var/log/nginx/error.log |
| Check listening ports | ss -tlnp \| grep nginx |
| Show active connections | curl localhost/nginx_status (requires stub_status) |
proxy_pass Cheat¶
| Config | Request /app/page |
Backend Sees |
|---|---|---|
proxy_pass http://back |
/app/page |
/app/page |
proxy_pass http://back/ |
/app/page |
/page |
proxy_pass http://back/v2/ |
/app/page |
/v2/page |
Location Priority (High to Low)¶
= exact location = /health
^~ prefix (final) location ^~ /static/
~ regex (cs) location ~ \.php$
~* regex (ci) location ~* \.(jpg|png)$
prefix location /api/
catch-all location /
Rate Limiting Quick Reference¶
# Zone: 10MB memory, 10 req/s per IP
limit_req_zone $binary_remote_addr zone=name:10m rate=10r/s;
# Apply: burst of 20, process immediately
limit_req zone=name burst=20 nodelay;
Takeaways¶
-
Nginx is event-driven, not thread-per-connection. Workers use epoll to handle thousands of connections in one thread. This is why it dominates as a reverse proxy.
-
The trailing slash on proxy_pass rewrites the URI. No slash = pass-through. Slash = strip and substitute. Get this wrong and your backend receives mangled paths. Every Nginx operator gets burned by this at least once.
-
Open-source Nginx only has passive health checks. It won't notice a dead backend until real users hit it. Set
max_failsandfail_timeout, or switch to HAProxy for active checks. -
DNS is resolved at config load and cached forever. In dynamic environments (Kubernetes, Docker), use
resolver+ aset $variableinproxy_passto force runtime re-resolution. -
Location matching is not top-to-bottom. Exact (
=) beats preferential prefix (^~) beats regex (~) beats plain prefix. Regex locations are checked in config order; first match wins. Use^~to protect prefix locations from regex override. -
Always
nginx -tbeforenginx -s reload. Test validates syntax. Reload is zero-downtime. Restart drops connections. Never restart in production unless you're upgrading the binary.
Related Lessons¶
- The Nginx Config That Broke Everything -- focused deep-dive on the six most common config traps
- Deploy a Web App From Nothing -- Nginx as the reverse proxy layer in a full deployment
- Connection Refused -- what to check when Nginx can't reach the backend
- What Happens When Your Certificate Expires -- TLS lifecycle and Nginx cert management
- The Load Balancer Lied -- when health checks pass but users see errors
- Kubernetes Services: How Traffic Finds Your Pod -- Nginx Ingress Controller in Kubernetes