Skip to content

Nginx & Web Servers - Street-Level Ops

Quick Diagnosis Commands

When Nginx is misbehaving, start here:

# 1. Is Nginx running?
systemctl status nginx
ps aux | grep nginx

# 2. Test config syntax (ALWAYS do this before reload)
nginx -t

# 3. Check error log (real-time)
tail -f /var/log/nginx/error.log

# 4. Check access log for status codes
awk '{print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -rn

# 5. Check which ports Nginx is listening on
ss -tlnp | grep nginx

# 6. Show current connections
ss -an | grep :80 | awk '{print $1}' | sort | uniq -c

# 7. Check upstream backend health
curl -I http://localhost/health

# 8. Show compiled-in modules
nginx -V 2>&1 | tr -- '--' '\n' | grep module

Pattern: Debugging 502 Bad Gateway

502 means Nginx connected to the upstream but got an invalid response (or the upstream died mid-response). Systematic debugging:

# 1. Is the backend process running?
systemctl status myapp
ss -tlnp | grep 8080

# 2. Can Nginx reach the backend directly?
curl -v http://127.0.0.1:8080/

# 3. Check Nginx error log for the specific error
tail -100 /var/log/nginx/error.log | grep 502
# Common messages:
#   "upstream prematurely closed connection"
#   "connect() failed (111: Connection refused)"
#   "no live upstreams"

# 4. Check backend logs
journalctl -u myapp --since "5 minutes ago"

# 5. Check if SELinux is blocking the proxy connection
# (common on RHEL/CentOS)
getenforce
ausearch -m avc --ts recent | grep nginx
setsebool -P httpd_can_network_connect 1   # if SELinux is the issue

# 6. Check file descriptor limits
cat /proc/$(cat /var/run/nginx.pid)/limits | grep "Max open"

Common 502 Causes

Backend is down                    -> restart backend
Backend is slow (timeout)          -> increase proxy_read_timeout
SELinux blocking connections        -> setsebool httpd_can_network_connect
Socket permissions wrong            -> check unix socket owner/perms
Backend crashing under load         -> check backend logs, memory
Upstream keepalive misconfigured    -> ensure proxy_http_version 1.1

Pattern: Debugging 504 Gateway Timeout

504 means Nginx timed out waiting for the upstream to respond.

# Check current timeout settings
grep -r 'proxy_.*timeout' /etc/nginx/

# Increase timeouts (in the relevant location block)
# proxy_connect_timeout 60s;    # time to establish connection
# proxy_send_timeout 60s;       # time between successive writes
# proxy_read_timeout 300s;      # time between successive reads (key one)

If you are increasing proxy_read_timeout past 60 seconds, the real fix is probably making the backend faster — not waiting longer.

Debug clue: A 504 that appears only on some requests (not all) usually means one of several upstream backends is slow. Enable upstream_response_time in your log format and correlate the 504s with a specific upstream IP -- the slow backend will stand out.


Pattern: Reload vs Restart

# RELOAD: graceful, zero-downtime
# - Master process reads new config
# - Spawns new workers with new config
# - Old workers finish existing requests, then exit
nginx -s reload
systemctl reload nginx

# RESTART: drops all connections
# - Process stops, then starts fresh
# - Active connections are killed
systemctl restart nginx

# REOPEN: rotate log files without reload
nginx -s reopen

Always prefer reload. The only time you need restart is when changing settings that require a full process restart (rare — listen socket changes, module loading).


Gotcha: The "if" Directive Is Evil

if inside a location block creates an implicit nested location, and directives from the parent may not apply:

# BROKEN: may cause unexpected behavior
location / {
    set $redirect 0;
    if ($http_x_forwarded_proto != "https") {
        set $redirect 1;
    }
    if ($redirect = 1) {
        return 301 https://$host$request_uri;
    }
    proxy_pass http://backend;  # may not execute inside if
}

# CORRECT: use map + return
map $http_x_forwarded_proto $redirect_to_https {
    default 0;
    "http"  1;
}

server {
    if ($redirect_to_https) {
        return 301 https://$host$request_uri;
    }
    # ...
}

Safe uses of if: return and rewrite inside server context. Anything else — use map or try_files.


Gotcha: Location Matching Order Surprises

# Quiz: which location handles /static/image.jpg?

location / {                     # prefix (lowest priority)
    proxy_pass http://backend;
}

location /static/ {              # prefix
    root /var/www;
}

location ~ \.(jpg|png)$ {       # regex
    expires 30d;
}

Answer: the regex ~ \.(jpg|png)$ wins, because regex locations beat prefix locations (unless the prefix uses ^~).

# Fix: use ^~ to force prefix match over regex
location ^~ /static/ {
    root /var/www;
}

Gotcha: proxy_pass Trailing Slash

# Scenario 1: NO trailing slash
location /app/ {
    proxy_pass http://backend;
}
# /app/page -> backend receives: /app/page

# Scenario 2: WITH trailing slash
location /app/ {
    proxy_pass http://backend/;
}
# /app/page -> backend receives: /page  (path stripped!)

# Scenario 3: With a different path
location /app/ {
    proxy_pass http://backend/v2/;
}
# /app/page -> backend receives: /v2/page

A single / completely changes routing behavior. This is the source of countless misrouted requests.

War story: A team spent two days debugging why their API returned HTML error pages. The root cause: proxy_pass http://backend/ (trailing slash) stripped the /api/v2/ prefix, so the backend received bare paths like /users instead of /api/v2/users and returned its default 404 page. One character.


Pattern: Log Analysis

# Top 20 URLs by request count
awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -20

# Response codes per hour
awk '{print $4, $9}' /var/log/nginx/access.log | cut -d: -f1-2 | sort | uniq -c

# Slow requests (if using $request_time in log format)
awk '$NF > 2.0 {print $NF, $7}' /var/log/nginx/access.log | sort -rn | head -20

# 5xx errors with full details
awk '$9 ~ /^5/' /var/log/nginx/access.log | tail -50

# Requests per second (rough)
awk '{print $4}' /var/log/nginx/access.log | cut -d: -f1-3 | uniq -c | sort -rn | head -10

# Cache hit ratio (if X-Cache-Status header is logged)
awk '{print $NF}' /var/log/nginx/access.log | sort | uniq -c

Custom Log Format for Better Analysis

log_format detailed '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent" '
                    '$request_time $upstream_response_time '
                    '$upstream_cache_status';

access_log /var/log/nginx/access.log detailed;

Gotcha: Buffer Size Misconfigs

# Default proxy buffers are often too small for apps that set large headers
# (e.g., big cookies, long JWT tokens)

# Symptoms: 502 errors with "upstream sent too big header" in error log
# This is one of the most common Nginx 502 causes in apps that use OAuth/JWT

# Fix:
proxy_buffer_size 16k;           # for response headers
proxy_buffers 4 32k;             # for response body
proxy_busy_buffers_size 64k;     # max size for busy buffers

# For large client request headers:
large_client_header_buffers 4 16k;

# For large client request bodies (file uploads):
client_max_body_size 100m;

Pattern: Rate Limiting in Practice

# Define zone in http context
limit_req_zone $binary_remote_addr zone=login:10m rate=5r/s;

# Apply in location
location /login {
    limit_req zone=login burst=10 nodelay;
    limit_req_status 429;
    proxy_pass http://backend;
}
# Test rate limiting
for i in $(seq 1 20); do
    curl -s -o /dev/null -w "%{http_code}\n" http://localhost/login
done
# Should see 200s then 429s

Pattern: Quick SSL Setup with Let's Encrypt

# Install certbot
apt install certbot python3-certbot-nginx   # Debian/Ubuntu
yum install certbot python3-certbot-nginx   # RHEL/CentOS

# Obtain and auto-configure SSL
certbot --nginx -d example.com -d www.example.com

# Test auto-renewal
certbot renew --dry-run

# Manual renewal
certbot renew

# Check certificate expiry
echo | openssl s_client -connect example.com:443 2>/dev/null | openssl x509 -noout -dates

Gotcha: add_header Does Not Inherit

server {
    add_header X-Frame-Options "SAMEORIGIN";

    location /api/ {
        add_header X-Api-Version "v2";
        # X-Frame-Options is NOT set here!
        # add_header in a child block replaces ALL parent add_header directives
        proxy_pass http://backend;
    }
}

If you use add_header in a location block, all add_header directives from parent contexts are dropped. You must repeat them.

# Fix: repeat all headers, or use the headers-more module
location /api/ {
    add_header X-Frame-Options "SAMEORIGIN";
    add_header X-Api-Version "v2";
    proxy_pass http://backend;
}