Nginx: The Swiss Army Server

lesson
nginx-architecture
configuration-hierarchy
location-matching
reverse-proxy
load-balancing
tls-termination
rate-limiting
caching
access-control
logging
nginx-vs-apache ---# Nginx -- The Swiss Army Server

Topics: Nginx architecture, configuration hierarchy, location matching, reverse proxy, load balancing, TLS termination, rate limiting, caching, access control, logging, Nginx vs Apache Level: L1-L2 (Foundations to Operations) Time: 60-90 minutes Prerequisites: None (basic Linux command line helps)

The Mission¶

It's Tuesday at 3pm. Your monitoring fires: 502 Bad Gateway errors on checkout.example.com, intermittent, affecting roughly 20% of requests. The app servers are running. The health endpoint returns 200. But users are getting errors.

You're the person with the Nginx config access. Time to figure out what's going on.

We'll diagnose this incident step by step -- and along the way, you'll learn how Nginx actually works, from its process model to its most dangerous configuration traps. By the end, you'll understand Nginx well enough to configure it from scratch and debug it at 3am.

Name Origin: Nginx is pronounced "engine-X," not "N-G-I-N-X." Igor Sysoev started building it in 2002 to solve the C10K problem -- handling 10,000 concurrent connections on a single server -- for Rambler, Russia's second-largest website at the time. He released it publicly in 2004. F5 Networks acquired Nginx, Inc. in 2019 for $670 million, which is a remarkable arc for a side project by a single developer.

Part 1: How Nginx Actually Works¶

Before we debug anything, you need a mental model of what's running on the server.

The Master-Worker Architecture¶

                 ┌─────────────────┐
  Clients ──────>│  Master Process  │  reads config, manages workers
                 └────────┬────────┘
                          │ fork()
           ┌──────────────┼──────────────┐
           v              v              v
      ┌─────────┐   ┌─────────┐   ┌─────────┐
      │ Worker 1 │   │ Worker 2 │   │ Worker N │
      │ (event   │   │ (event   │   │ (event   │
      │  loop)   │   │  loop)   │   │  loop)   │
      └─────────┘   └─────────┘   └─────────┘

The master process runs as root (to bind ports 80/443). It reads the config, forks worker processes, and manages their lifecycle. Workers run as an unprivileged user (www-data or nginx) and handle all the actual connections.

Each worker runs a single-threaded event loop using epoll (Linux) or kqueue (BSD). One worker can handle thousands of connections simultaneously because it never blocks on I/O -- it registers interest in events and moves on. When data arrives, the kernel notifies the worker.

Under the Hood: Apache's classic prefork model spawns one process per connection. At 10,000 connections, that's 10,000 processes, each consuming ~10MB of RAM = 100GB. Nginx handles the same load with 4 workers consuming ~50MB total. The trade-off: Nginx can't run application code (like PHP) inside the worker -- it must proxy to a separate process. Apache can embed the interpreter (mod_php). This is why Nginx dominates as a reverse proxy while Apache still appears in shared hosting environments.

Worker Tuning -- The One Knob¶

# /etc/nginx/nginx.conf
worker_processes auto;          # one per CPU core (auto detects)
worker_rlimit_nofile 65535;     # max open files per worker

events {
    worker_connections 4096;    # max connections per worker
    multi_accept on;            # accept all pending connections at once
}

Total capacity = worker_processes x worker_connections. With 4 cores and 4096 connections per worker: 16,384 simultaneous connections. Each proxied connection uses two file descriptors (client-side + backend-side), so worker_rlimit_nofile should be at least 2x worker_connections.

Trivia: Nginx's process model has essentially one tuning knob: worker_processes auto. Compare that to Apache's MPM configuration (prefork vs worker vs event, each with MaxRequestWorkers, ServerLimit, ThreadsPerChild, MinSpareThreads, MaxSpareThreads...). This simplicity is a feature, not a limitation.

Part 2: Back to the 502s -- First Steps¶

OK, model in hand. Let's diagnose.

# Step 1: Is Nginx running?
systemctl status nginx
# Active: active (running). Good.

# Step 2: What does the error log say?
tail -50 /var/log/nginx/error.log

You see lines like:

2026/03/23 15:02:14 [error] 1234#0: *5678 connect() failed
    (111: Connection refused) while connecting to upstream,
    client: 203.0.113.50, server: checkout.example.com,
    request: "GET /api/cart HTTP/2.0",
    upstream: "http://10.0.1.12:8080/api/cart"

Connection refused to 10.0.1.12:8080. That's one of your backend servers. Let's check if it's listening:

# Step 3: Can we reach the backend directly?
curl -v http://10.0.1.10:8080/health   # 200 OK
curl -v http://10.0.1.11:8080/health   # 200 OK
curl -v http://10.0.1.12:8080/health   # Connection refused!

Server 12 is down. But Nginx is still sending traffic to it. Why? Because open-source Nginx only does passive health checks.

Flashcard Check #1¶

Q: What is the difference between active and passive health checks?

A: Active checks probe backends on a schedule (every N seconds). Passive checks only notice failures when real user traffic gets errors. Nginx open source uses passive checks only -- it marks a server down after max_fails failures in fail_timeout seconds. Nginx Plus adds active health checks. HAProxy supports both out of the box.

Q: What does a 502 Bad Gateway mean in Nginx?

A: Nginx is running fine, but it cannot get a valid response from the upstream backend. Common causes: backend process down, connection refused, backend crashed mid-response. The answer is almost always in /var/log/nginx/error.log.

Part 3: The Configuration Hierarchy¶

Let's look at the config that's routing to the dead backend. Nginx config is hierarchical:

main (global)
├── events { }
├── http { }                        # All HTTP traffic
│   ├── upstream backend { }        # Server pools
│   ├── server { }                  # Virtual host (domain)
│   │   ├── location / { }          # URI matching
│   │   └── location /api/ { }
│   └── server { }                  # Another domain
└── stream { }                      # TCP/UDP proxy (non-HTTP)

Directives inherit downward. A setting in http {} applies to all server {} blocks unless overridden. A setting in server {} applies to all its location {} blocks unless overridden.

http {
    gzip on;                    # applies everywhere

    server {
        gzip off;               # overrides for this server

        location /api {
            gzip on;            # overrides again for this location
        }
    }
}

Gotcha: The add_header directive does not follow this inheritance pattern. If you add any add_header in a location block, all add_header directives from parent contexts are dropped for that location. This has silently removed security headers (HSTS, CSP, X-Frame-Options) in countless production configs. You must repeat every header in every location that defines any.

Here's the upstream block for our checkout service:

upstream checkout_backend {
    server 10.0.1.10:8080;
    server 10.0.1.11:8080;
    server 10.0.1.12:8080;
}

No max_fails. No fail_timeout. No keepalive. This is the default "round-robin with no safety net" configuration. Let's fix it.

Part 4: Upstream Blocks and Load Balancing¶

Load Balancing Algorithms¶

Algorithm	Directive	Best For
Round-robin	(default)	Stateless services, uniform backends
Least connections	`least_conn;`	Long-lived requests, uneven response times
IP hash	`ip_hash;`	Sticky sessions (same client -> same backend)
Generic hash	`hash $key;`	Cache affinity, consistent routing
Random	`random two least_conn;`	Large pools, two-choice power

Mental Model: Round-robin is a dumb waiter distributing plates evenly regardless of who's still eating. Least-connections is a smart waiter who gives the next plate to whoever has the fewest. IP-hash is a host who assigns each guest to a specific table for the whole meal.

The Fixed Upstream Block¶

upstream checkout_backend {
    least_conn;
    server 10.0.1.10:8080 max_fails=3 fail_timeout=30s;
    server 10.0.1.11:8080 max_fails=3 fail_timeout=30s;
    server 10.0.1.12:8080 max_fails=3 fail_timeout=30s backup;
    keepalive 32;
}

What changed:

Directive	What It Does
`least_conn`	Routes to the server with fewest active connections
`max_fails=3`	Mark server down after 3 failed requests
`fail_timeout=30s`	Window for counting failures; also how long the server stays "down"
`backup`	Only receives traffic when all non-backup servers are down
`keepalive 32`	Maintain 32 idle persistent connections to backends per worker

The keepalive directive requires two companion settings in the location block:

location / {
    proxy_pass http://checkout_backend;
    proxy_http_version 1.1;
    proxy_set_header Connection "";   # clear the Connection header for keepalive
}

Under the Hood: Without keepalive, Nginx opens a new TCP connection to the backend for every request. At 1000 req/s, that's 1000 TCP handshakes per second, each costing ~1ms on a LAN. With keepalive 32, connections are reused. Monitor with ss -s -- look at connection churn.

Nginx Plus vs Open Source¶

Feature	Open Source	Nginx Plus
Active health checks	No	Yes (`health_check` directive)
Session persistence	`ip_hash` only	Cookies, learn, route
Live dashboard	`stub_status` (basic)	Full API + dashboard
Dynamic reconfiguration	Reload required	Runtime API
DNS re-resolution	Manual workaround	Built-in
Price	Free	~$2,500/year/instance

For most teams, open-source Nginx + the passive health check workaround is sufficient. If you need active health checks without paying, HAProxy is the common alternative.

Part 5: The Location Matching Algorithm¶

While we're in the config, let's understand how Nginx decides which location block handles a request. This is the most misunderstood part of Nginx configuration.

The Decision Tree¶

Request URI arrives (e.g., /api/v2/users)
│
├─ 1. Check all EXACT matches (=)
│     Match found? ──> USE IT. Stop.
│
├─ 2. Find the LONGEST PREFIX match
│     │
│     ├─ Is it a ^~ prefix?
│     │     Yes ──> USE IT. Stop. (skip regex)
│     │
│     └─ Remember it, keep going...
│
├─ 3. Check REGEX locations (~ and ~*) in config order
│     First match found? ──> USE IT. Stop.
│
└─ 4. No regex matched? ──> Use the remembered longest prefix from step 2.

A Concrete Example¶

location = /health { return 200 "ok"; }          # A: exact
location ^~ /static/ { root /var/www; }           # B: prefix, skip regex
location ~ \.php$ { proxy_pass http://php; }      # C: regex
location ~* \.(jpg|png|gif)$ { expires 30d; }     # D: case-insensitive regex
location /api/ { proxy_pass http://backend; }     # E: prefix
location / { proxy_pass http://frontend; }        # F: catch-all prefix

Request	Matches	Winner	Why
`/health`	A, F	A	Exact match, highest priority
`/static/logo.png`	B, D, F	B	`^~` prefix stops regex evaluation
`/app/index.php`	C, F	C	Regex beats plain prefix
`/api/users`	E, F	E	Longest prefix (no regex matches)
`/about`	F	F	Only the catch-all matches
`/static/style.css`	B, F	B	`^~` prefix, no regex can override

Remember: The matching priority mnemonic is "E-P-R-P": Exact (=), Preferential prefix (^~), Regex (~/~*), Plain prefix (longest). When in doubt, use = for health checks and ^~ for static file directories.

Interview Bridge: "Explain Nginx location matching order" is a common DevOps interview question. The key insight interviewers want: regex locations can override prefix locations (unless ^~ is used), and regexes are evaluated in config file order (first match wins), not longest match.

Flashcard Check #2¶

Q: You have location /api/ { } and location ~ ^/api { }. A request to /api/users arrives. Which wins?

A: The regex ~ ^/api wins. Regex locations override plain prefix locations. To force the prefix to win, change it to location ^~ /api/ { }.

Q: You add location = /api/health { }. Does it override the regex?

A: Yes. Exact matches (=) have the highest priority and are checked first.

Part 6: The proxy_pass Trailing Slash Trap¶

This is the single most common Nginx misconfiguration. One character changes everything.

Without Trailing Slash: Pass-Through¶

location /app/ {
    proxy_pass http://backend;
}
# Request: GET /app/page
# Backend receives: GET /app/page
# The full URI passes through unchanged.

With Trailing Slash: URI Rewriting¶

location /app/ {
    proxy_pass http://backend/;
}
# Request: GET /app/page
# Backend receives: GET /page
# The location prefix (/app/) is stripped!

With a Path: Substitution¶

location /app/ {
    proxy_pass http://backend/v2/;
}
# Request: GET /app/page
# Backend receives: GET /v2/page
# /app/ is replaced with /v2/

Remember: No slash = No change. Slash = Substitute. The trailing / on proxy_pass activates URI rewriting -- it strips the location prefix from the request path.

War Story: A team spent two days debugging why their API returned HTML error pages instead of JSON. The root cause: proxy_pass http://backend/ (trailing slash) stripped the /api/v2/ prefix. The backend received bare paths like /users instead of /api/v2/users, hit its catch-all 404 handler, and returned an HTML error page. One character. Two days.

Gotcha: The "off-by-slash" vulnerability (presented by Orange Tsai at Black Hat) exploits this exact confusion. A missing trailing slash on the location combined with a trailing slash on proxy_pass can enable path traversal: GET /api../internal/secret resolves to the internal endpoint. This is a common finding in security audits. Always ensure your location directive ends with / when the proxy_pass URL does.

Part 7: TLS Configuration -- The Modern Profile¶

Our checkout service needs HTTPS. Here's a production-ready TLS configuration following Mozilla's "modern" profile (TLS 1.2+, strong ciphers only):

server {
    listen 443 ssl http2;
    server_name checkout.example.com;

    # Certificates (Let's Encrypt via certbot)
    ssl_certificate     /etc/letsencrypt/live/checkout.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/checkout.example.com/privkey.pem;

    # Protocol and ciphers -- modern profile
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384;
    ssl_prefer_server_ciphers off;   # let TLS 1.3 handle cipher negotiation

    # OCSP stapling -- server fetches revocation status so clients don't have to
    ssl_stapling on;
    ssl_stapling_verify on;
    resolver 8.8.8.8 8.8.4.4 valid=300s;

    # Session resumption -- avoid repeated handshakes
    ssl_session_cache shared:SSL:10m;   # 10MB shared cache across workers
    ssl_session_timeout 1d;
    ssl_session_tickets off;            # off for forward secrecy

    # HSTS -- tell browsers to always use HTTPS
    add_header Strict-Transport-Security "max-age=63072000; includeSubDomains" always;
}

# HTTP -> HTTPS redirect
server {
    listen 80;
    server_name checkout.example.com;
    return 301 https://$host$request_uri;
}

Directive	Why
`ssl_protocols TLSv1.2 TLSv1.3`	TLS 1.0/1.1 deprecated since 2020 (RFC 8996)
`ssl_prefer_server_ciphers off`	TLS 1.3 clients negotiate ciphers better than servers
`ssl_stapling on`	Server includes OCSP response in handshake -- faster, more private
`ssl_session_tickets off`	Session tickets break forward secrecy unless keys are rotated
`HSTS max-age=63072000`	Two years. Once set, browsers refuse plain HTTP. Start with a short `max-age` when testing.

Gotcha: HSTS is hard to undo. If you set a long max-age and then need to revert to HTTP (unlikely but possible), browsers that cached the HSTS header will refuse to connect. Start with max-age=300 (5 minutes) during testing, increase to the full value once confirmed.

Testing Your TLS Config¶

# Check what cipher and protocol are negotiated
openssl s_client -connect checkout.example.com:443 -servername checkout.example.com </dev/null 2>/dev/null | grep -E "Protocol|Cipher"

# Check certificate expiry
echo | openssl s_client -connect checkout.example.com:443 2>/dev/null | openssl x509 -noout -dates

# Full scan with Mozilla's SSL config test
# https://ssl-config.mozilla.org/ (config generator)
# https://www.ssllabs.com/ssltest/ (grade your config)

Part 8: Rate Limiting -- The Leaky Bucket¶

Your checkout endpoint is getting hammered by a bot. Time for rate limiting. Nginx implements the leaky bucket algorithm.

How the Leaky Bucket Works¶

Imagine a bucket with a hole in the bottom. Water (requests) pours in from the top. Water leaks out the bottom at a constant rate. If the bucket overflows, excess water (requests) is rejected.

    Requests arrive (variable rate)
         │ │ │ │ │ │
         v v v v v v
    ┌──────────────────┐
    │    Burst Buffer   │  capacity = burst parameter
    │   ┌──────────┐   │
    │   │ queued    │   │
    │   │ requests  │   │
    │   └────┬─────┘   │
    │        │          │
    └────────┼──────────┘
             │ leaks at constant rate
             v
    Processed at: rate parameter (e.g., 10r/s)

    Overflow? ──> 429 Too Many Requests

The Config¶

http {
    # Zone definition: 10MB shared memory, 10 requests/second per client IP
    limit_req_zone $binary_remote_addr zone=checkout:10m rate=10r/s;

    server {
        location /api/checkout {
            limit_req zone=checkout burst=20 nodelay;
            limit_req_status 429;
            proxy_pass http://checkout_backend;
        }
    }
}

The Math¶

rate=10r/s: One request allowed every 100ms (1000ms / 10).
burst=20: The bucket holds 20 extra requests.
nodelay: Burst requests are processed immediately instead of being queued at the leak rate.

Without nodelay, requests arriving faster than the rate are delayed (queued) until a slot opens. With nodelay, they're processed immediately up to the burst limit, then rejected. For APIs, nodelay is almost always what you want.

Scenario (rate=10r/s, burst=20)	Result
10 requests in one second	All 10 served immediately
30 requests in one second	20 served (burst), 10 rejected with 429
1 request/second for 20 seconds, then 25 at once	20 served from burst buffer, 5 rejected

Gotcha: The 10m in the zone definition is 10 megabytes of shared memory for tracking client IPs. Each $binary_remote_addr entry uses ~128 bytes, so 10MB tracks about 80,000 unique IPs. If you're behind a CDN, all traffic may appear from one IP -- use $http_x_forwarded_for or a request header instead.

Part 9: Proxy Caching¶

Backend responses that don't change often? Cache them at the Nginx layer to avoid hitting the backend at all.

http {
    # Define cache: 1GB max, inactive entries purged after 60 min
    proxy_cache_path /var/cache/nginx levels=1:2
                     keys_zone=app_cache:10m
                     max_size=1g
                     inactive=60m
                     use_temp_path=off;

    server {
        location /api/products {
            proxy_cache app_cache;
            proxy_cache_valid 200 302 10m;    # cache 200/302 for 10 min
            proxy_cache_valid 404     1m;     # cache 404 for 1 min
            proxy_cache_use_stale error timeout updating http_500 http_502;
            add_header X-Cache-Status $upstream_cache_status;
            proxy_pass http://checkout_backend;
        }
    }
}

The proxy_cache_use_stale directive is the safety net: if the backend is down or slow, Nginx serves the stale cached version instead of returning an error. The X-Cache-Status header tells you whether each response was a HIT, MISS, STALE, or BYPASS -- invaluable for debugging.

Trivia: Even 1-second caching (microcaching) dramatically reduces backend load during traffic spikes. If 500 users hit the same product page in one second, the backend handles 1 request, the cache handles 499. Enable with proxy_cache_valid 200 1s and proxy_cache_lock on (prevents a stampede of requests all trying to populate the cache simultaneously).

Part 10: Access Control¶

Multiple layers, from IP-based to external authentication.

IP Allow/Deny¶

location /admin/ {
    allow 10.0.0.0/8;      # internal network
    allow 192.168.1.0/24;  # office network
    deny all;               # everyone else
    proxy_pass http://admin_backend;
}

Rules are evaluated top to bottom; first match wins.

Basic Auth¶

location /staging/ {
    auth_basic "Staging Environment";
    auth_basic_user_file /etc/nginx/.htpasswd;
    proxy_pass http://staging_backend;
}

# Generate password file
htpasswd -c /etc/nginx/.htpasswd admin

External Auth (auth_request)¶

For OAuth, JWT, or custom auth -- delegate to an external service:

location /api/ {
    auth_request /auth;
    auth_request_set $auth_user $upstream_http_x_auth_user;
    proxy_set_header X-Auth-User $auth_user;
    proxy_pass http://backend;
}

location = /auth {
    internal;
    proxy_pass http://auth-service:9000/validate;
    proxy_pass_request_body off;
    proxy_set_header Content-Length "";
    proxy_set_header X-Original-URI $request_uri;
}

Nginx makes a subrequest to /auth before every request to /api/. If the auth service returns 200, the request proceeds. If it returns 401 or 403, Nginx returns that status to the client. The internal directive prevents external clients from hitting the /auth location directly.

Part 11: Logging for Debugging¶

The default Nginx log format is fine for basic use, but in production you need request timing and upstream information.

log_format detailed '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent" '
                    'rt=$request_time uct=$upstream_connect_time '
                    'urt=$upstream_response_time ucs=$upstream_cache_status';

access_log /var/log/nginx/access.log detailed;
error_log /var/log/nginx/error.log warn;

Variable	What It Tells You
`$request_time`	Total time from first client byte to last byte sent back
`$upstream_connect_time`	Time to establish connection to backend
`$upstream_response_time`	Time from connection to full response from backend
`$upstream_cache_status`	HIT, MISS, STALE, BYPASS, EXPIRED

If $request_time is high but $upstream_response_time is low, the bottleneck is between Nginx and the client (slow client, network). If $upstream_response_time is high, the backend is slow.

# Quick log analysis: find slow requests
awk '$NF > 2.0 {print $0}' /var/log/nginx/access.log | tail -20

# Status code distribution
awk '{print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -rn

# Requests per second
awk '{print $4}' /var/log/nginx/access.log | cut -d: -f1-3 | uniq -c | sort -rn | head

Part 12: The DNS Caching War Story¶

War Story: A team ran Nginx in front of a service that scaled via Kubernetes. The upstream block used the service hostname: server checkout-svc.default.svc.cluster.local:8080. Everything worked -- until a rolling deployment happened. Kubernetes assigned new pod IPs. The DNS record updated. But Nginx had resolved the hostname at config load and cached the old IPs. Traffic kept going to the old, now-dead pods. 502s everywhere.

The fix required two changes:

# BAD: DNS resolved once at startup, cached forever
upstream backend {
    server checkout-svc.default.svc.cluster.local:8080;
}

# GOOD: Force runtime DNS re-resolution
resolver 10.96.0.10 valid=30s;  # kube-dns IP

server {
    location / {
        set $backend "http://checkout-svc.default.svc.cluster.local:8080";
        proxy_pass $backend;
    }
}

The set $variable trick forces Nginx to re-resolve the hostname on each request using the resolver directive. Without the variable, Nginx treats the hostname as a static IP resolved at config load. This hits every Nginx installation in dynamic environments -- Kubernetes, Docker, AWS ECS, any system where backend IPs change. Nginx Plus handles this natively; open-source Nginx requires the workaround.

Part 13: Nginx vs Apache -- When It Matters¶

Dimension	Nginx	Apache
Architecture	Event-driven, non-blocking	Process/thread per connection (event MPM available)
Memory per connection	~2.5 KB	~10 MB (prefork), ~2 MB (event MPM)
Static files	Extremely fast (sendfile, async I/O)	Fast, but more overhead
.htaccess	Not supported	Per-directory config overrides
Embedded scripting	No (proxy to PHP-FPM, etc.)	Yes (mod_php, mod_perl)
Reverse proxy	Primary use case	Possible but less common
Configuration reload	Zero-downtime (SIGHUP)	Graceful restart
Market share (2024)	~34% of active sites	~29% of active sites

Mental Model: Nginx is a traffic cop -- it directs requests efficiently but doesn't do the work itself. Apache is a Swiss Army knife that can embed the work inside itself. In modern architectures where the application runs as a separate process (Node.js, Go, Python), Nginx's traffic-cop model is a perfect fit. Apache's embedded model is better for traditional shared hosting where each user's PHP runs inside the web server.

Flashcard Check #3¶

Q: What happens when you run nginx -s reload?

A: The master process re-reads the config, forks new worker processes with the new config, and tells old workers to finish their current requests and exit. Zero downtime. Under the hood, it sends SIGHUP to the master process.

Q: Nginx open source only supports _____ health checks. What fills the blank?

A: Passive. It only notices backend failures when real requests fail. Active health checks (probing backends on a schedule) require Nginx Plus or an alternative like HAProxy.

Q: What does proxy_cache_use_stale error timeout http_502 do?

A: When the backend returns an error, times out, or returns 502, Nginx serves a stale (expired) cached response instead of passing the error to the client. It's a safety net that keeps your site up during backend outages.

Q: You're behind a CDN. Rate limiting on $binary_remote_addr limits all users to 10r/s total. Why?

A: The CDN's IP is the remote address for all requests. All users appear to come from the same IP. Use a forwarded header ($http_x_forwarded_for) or CDN-specific header instead.

Exercises¶

Exercise 1 (2 min): Write an Nginx location block that exact-matches /health and returns 200 OK with the body "healthy", with no access log.

Solution

location = /health {
    access_log off;
    return 200 "healthy";
}

The `=` ensures only `/health` matches (not `/health/check`). `access_log off` prevents health check probes from filling your logs.

Exercise 2 (5 min): You have this config. A request to /static/images/logo.png arrives. Which location handles it? What if you remove the ^~?

location ^~ /static/ { root /var/www; }
location ~* \.(png|jpg|gif)$ { expires 30d; proxy_pass http://backend; }
location / { proxy_pass http://frontend; }

Solution

With `^~`: the `/static/` prefix wins because `^~` prevents regex evaluation. Nginx serves the file from `/var/www/static/images/logo.png`. Without `^~`: the regex `~* \.(png|jpg|gif)$` wins because regex locations override plain prefix locations. The request is proxied to the backend with a 30-day cache header -- probably not what you want for local static files.

Exercise 3 (10 min): Write a complete Nginx config for a service at api.example.com that: (a) redirects HTTP to HTTPS, (b) uses least_conn across 3 backends with passive health checks, (c) rate limits to 20 requests/second per IP with a burst of 40, and (d) returns a custom JSON error for 429s.

Solution

limit_req_zone $binary_remote_addr zone=api_limit:10m rate=20r/s;

upstream api_backend {
    least_conn;
    server 10.0.1.10:8080 max_fails=3 fail_timeout=30s;
    server 10.0.1.11:8080 max_fails=3 fail_timeout=30s;
    server 10.0.1.12:8080 max_fails=3 fail_timeout=30s;
    keepalive 32;
}

server {
    listen 80;
    server_name api.example.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl http2;
    server_name api.example.com;

    ssl_certificate     /etc/letsencrypt/live/api.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/api.example.com/privkey.pem;
    ssl_protocols TLSv1.2 TLSv1.3;

    location / {
        limit_req zone=api_limit burst=40 nodelay;
        limit_req_status 429;

        proxy_pass http://api_backend;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    error_page 429 = @rate_limited;
    location @rate_limited {
        default_type application/json;
        return 429 '{"error": "rate_limited", "message": "Too many requests", "retry_after": 1}';
    }
}

Cheat Sheet¶

Essential Commands¶

Task	Command
Test config syntax	`nginx -t`
Safe reload	`nginx -t && nginx -s reload`
Dump full merged config	`nginx -T`
Show version and modules	`nginx -V`
Error log (live)	`tail -f /var/log/nginx/error.log`
Check listening ports	`ss -tlnp \\| grep nginx`
Show active connections	`curl localhost/nginx_status` (requires stub_status)

proxy_pass Cheat¶

Config	Request `/app/page`	Backend Sees
`proxy_pass http://back`	`/app/page`	`/app/page`
`proxy_pass http://back/`	`/app/page`	`/page`
`proxy_pass http://back/v2/`	`/app/page`	`/v2/page`

Location Priority (High to Low)¶

=    exact           location = /health
^~   prefix (final)  location ^~ /static/
~    regex (cs)      location ~ \.php$
~*   regex (ci)      location ~* \.(jpg|png)$
     prefix          location /api/
     catch-all       location /

Rate Limiting Quick Reference¶

# Zone: 10MB memory, 10 req/s per IP
limit_req_zone $binary_remote_addr zone=name:10m rate=10r/s;
# Apply: burst of 20, process immediately
limit_req zone=name burst=20 nodelay;

Takeaways¶

Nginx is event-driven, not thread-per-connection. Workers use epoll to handle thousands of connections in one thread. This is why it dominates as a reverse proxy.
The trailing slash on proxy_pass rewrites the URI. No slash = pass-through. Slash = strip and substitute. Get this wrong and your backend receives mangled paths. Every Nginx operator gets burned by this at least once.
Open-source Nginx only has passive health checks. It won't notice a dead backend until real users hit it. Set max_fails and fail_timeout, or switch to HAProxy for active checks.
DNS is resolved at config load and cached forever. In dynamic environments (Kubernetes, Docker), use resolver + a set $variable in proxy_pass to force runtime re-resolution.
Location matching is not top-to-bottom. Exact (=) beats preferential prefix (^~) beats regex (~) beats plain prefix. Regex locations are checked in config order; first match wins. Use ^~ to protect prefix locations from regex override.
Always nginx -t before nginx -s reload. Test validates syntax. Reload is zero-downtime. Restart drops connections. Never restart in production unless you're upgrading the binary.

The Nginx Config That Broke Everything -- focused deep-dive on the six most common config traps
Deploy a Web App From Nothing -- Nginx as the reverse proxy layer in a full deployment
Connection Refused -- what to check when Nginx can't reach the backend
What Happens When Your Certificate Expires -- TLS lifecycle and Nginx cert management
The Load Balancer Lied -- when health checks pass but users see errors
Kubernetes Services: How Traffic Finds Your Pod -- Nginx Ingress Controller in Kubernetes

Nginx: The Swiss Army Server

The Mission¶

Part 1: How Nginx Actually Works¶

The Master-Worker Architecture¶

Worker Tuning -- The One Knob¶

Part 2: Back to the 502s -- First Steps¶

Flashcard Check #1¶

Part 3: The Configuration Hierarchy¶

Part 4: Upstream Blocks and Load Balancing¶

Load Balancing Algorithms¶

The Fixed Upstream Block¶

Nginx Plus vs Open Source¶

Part 5: The Location Matching Algorithm¶

The Decision Tree¶

A Concrete Example¶

Flashcard Check #2¶

Part 6: The proxy_pass Trailing Slash Trap¶

Without Trailing Slash: Pass-Through¶

With Trailing Slash: URI Rewriting¶

With a Path: Substitution¶

Part 7: TLS Configuration -- The Modern Profile¶

Testing Your TLS Config¶

Part 8: Rate Limiting -- The Leaky Bucket¶

How the Leaky Bucket Works¶

The Config¶

The Math¶

Part 9: Proxy Caching¶

Part 10: Access Control¶

IP Allow/Deny¶

Basic Auth¶

External Auth (auth_request)¶

Part 11: Logging for Debugging¶

Part 12: The DNS Caching War Story¶

Part 13: Nginx vs Apache -- When It Matters¶

Flashcard Check #3¶

Exercises¶

Cheat Sheet¶

Essential Commands¶

proxy_pass Cheat¶

Location Priority (High to Low)¶

Rate Limiting Quick Reference¶

Takeaways¶

Related Lessons¶

Pages that link here¶