API Gateways & Ingress - Street-Level Ops¶
Quick Diagnosis Commands¶
# ── Ingress Controller Status ──
kubectl get pods -n ingress-nginx # Nginx ingress pods
kubectl get pods -n traefik # Traefik pods
kubectl logs -n ingress-nginx deploy/ingress-nginx-controller --tail=50
# ── Ingress Resources ──
kubectl get ingress -A # All ingress resources
kubectl describe ingress myapp-ingress -n production # Full details + events
kubectl get ingress myapp-ingress -n production -o yaml # Raw YAML
# ── TLS/Certs ──
kubectl get certificates -A # cert-manager certs
kubectl describe certificate api-tls -n production # Cert status
kubectl get secret api-tls -n production -o jsonpath='{.data.tls\.crt}' | \
base64 -d | openssl x509 -noout -dates # Cert expiry
# ── Backend Health ──
kubectl get endpoints myapp-service -n production # Backend IPs
kubectl get svc myapp-service -n production # Service details
# ── External Testing ──
curl -v https://api.example.com/health # Full connection details
curl -s -o /dev/null -w "%{http_code}" https://api.example.com/ # Just status code
curl -H "Host: api.example.com" http://INGRESS_IP/health # Test specific host header
# ── Ingress Controller Config ──
# Nginx: dump generated nginx.conf
kubectl exec -n ingress-nginx deploy/ingress-nginx-controller -- \
cat /etc/nginx/nginx.conf | less
# Traefik: check routers
kubectl exec -n traefik deploy/traefik -- \
wget -qO- http://localhost:8080/api/http/routers | python3 -m json.tool
Gotcha: 502 Bad Gateway After Deploy¶
You deploy a new version and get 502s. The ingress controller can't reach the backend.
# Step 1: Check if endpoints exist
kubectl get endpoints myapp-service -n production
# If ENDPOINTS is <none>, the service selector doesn't match any running pods
# Step 2: Check if pods are ready
kubectl get pods -n production -l app=myapp
# If pods are in CrashLoopBackOff or not Ready, the endpoint won't register
# Step 3: Check if the port matches
kubectl get svc myapp-service -n production -o yaml | grep -A5 ports
kubectl get pods -n production -l app=myapp -o yaml | grep -A5 containerPort
# The service targetPort must match the container's listening port
# Step 4: Check ingress controller logs
kubectl logs -n ingress-nginx deploy/ingress-nginx-controller --tail=100 | \
grep -i "upstream\|502\|error"
# Step 5: Test from inside the cluster
kubectl run debug --rm -it --image=curlimages/curl -- \
curl -v http://myapp-service.production.svc:8080/health
Gotcha: Annotation Typos Fail Silently¶
You add nginx.ingress.kubernetes.io/proxy-body-siz: "50m" (missing the 'e'). No error, no warning. The annotation is simply ignored and the default body size (1m) applies. Large file uploads fail.
Debug clue: After applying an ingress annotation change, always diff the generated nginx.conf inside the controller pod. If the annotation had no effect, it either has a typo or conflicts with another annotation. The generated config is the only source of truth for what nginx is actually doing.
# Check for unrecognized annotations
kubectl get ingress myapp-ingress -n production -o yaml | grep "annotations" -A 50
# Compare against known annotations
# https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/
# Validate by checking the generated config
kubectl exec -n ingress-nginx deploy/ingress-nginx-controller -- \
grep -A10 "server_name api.example.com" /etc/nginx/nginx.conf
# If your annotation isn't reflected in the config, it's misspelled or invalid
# Prevention: use a CI linter or admission webhook that validates annotations
Pattern: Debugging Ingress Routing Step by Step¶
# 1. Verify DNS resolves to the ingress controller's external IP
dig +short api.example.com
kubectl get svc -n ingress-nginx ingress-nginx-controller -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
# These should match
# 2. Verify the ingress resource exists and has the correct rules
kubectl describe ingress myapp-ingress -n production
# Check: host, path, backend service name, port
# 3. Verify the service exists and has endpoints
kubectl get svc myapp-service -n production
kubectl get endpoints myapp-service -n production
# Endpoints should list pod IPs
# 4. Verify the backend is responding
kubectl exec -n ingress-nginx deploy/ingress-nginx-controller -- \
curl -s http://POD_IP:PORT/health
# Replace POD_IP and PORT with values from endpoints
# 5. Check ingress controller logs for the specific request
kubectl logs -n ingress-nginx deploy/ingress-nginx-controller --tail=200 | \
grep "api.example.com"
# 6. Test with verbose curl
curl -vvv -H "Host: api.example.com" https://INGRESS_IP/v1/endpoint
Pattern: cert-manager Troubleshooting¶
Certificates not issuing? Follow the chain:
# 1. Check Certificate resource
kubectl get certificate -n production
# READY should be True
# 2. If not ready, check the CertificateRequest
kubectl get certificaterequest -n production
kubectl describe certificaterequest <name> -n production
# 3. Check the Order (ACME)
kubectl get orders -n production
kubectl describe order <name> -n production
# 4. Check the Challenge (HTTP-01 or DNS-01)
kubectl get challenges -n production
kubectl describe challenge <name> -n production
# Common issues:
# - HTTP-01: ingress controller can't route /.well-known/acme-challenge/
# - DNS-01: DNS provider credentials incorrect
# 5. Check cert-manager logs
kubectl logs -n cert-manager deploy/cert-manager --tail=100 | grep -i error
# 6. Force renewal
kubectl delete secret api-tls -n production
# cert-manager will re-issue the certificate
Pattern: Canary Deployments via Ingress Annotations¶
# Primary ingress (stable)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-stable
namespace: production
spec:
ingressClassName: nginx
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp-stable
port:
number: 8080
---
# Canary ingress (new version, 10% traffic)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-canary
namespace: production
annotations:
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "10"
spec:
ingressClassName: nginx
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp-canary
port:
number: 8080
# Gradually increase canary weight
kubectl annotate ingress myapp-canary -n production \
nginx.ingress.kubernetes.io/canary-weight="25" --overwrite
# Route specific header to canary (for testing)
# nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
# nginx.ingress.kubernetes.io/canary-by-header-value: "true"
# Then: curl -H "X-Canary: true" https://api.example.com/
Gotcha: Default Backend Confusion¶
Requests that don't match any ingress rule go to the default backend. If you haven't configured one, users see a generic 404 page from nginx.
# Check what the default backend is
kubectl get deploy -n ingress-nginx ingress-nginx-controller -o yaml | \
grep -A3 default-backend
# Create a custom default backend
kubectl create deployment default-backend -n ingress-nginx \
--image=your-custom-404-image:latest
kubectl expose deployment default-backend -n ingress-nginx --port=8080
# Configure ingress controller to use it
# In helm values:
# defaultBackend:
# enabled: true
# image:
# repository: your-custom-404-image
Pattern: Connection Draining During Deploys¶
Prevent 502s during rolling updates:
# In your deployment
spec:
template:
spec:
terminationGracePeriodSeconds: 60
containers:
- name: myapp
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"]
# The sleep gives the ingress controller time to update
# its backend list before the pod starts terminating
---
# In your ingress annotations (nginx)
metadata:
annotations:
nginx.ingress.kubernetes.io/upstream-hash-by: ""
# Disable session affinity during updates
# Verify smooth rollout
kubectl rollout status deployment/myapp -n production
# Watch for 502s during the rollout
while true; do
curl -s -o /dev/null -w "%{http_code}\n" https://api.example.com/health
sleep 0.5
done
Gotcha: TLS Passthrough Misconfiguration¶
You configure TLS passthrough but the backend doesn't actually handle TLS. Or you enable passthrough on a port that's also doing TLS termination.
# TLS passthrough: the ingress controller does NOT decrypt
# The backend MUST handle TLS itself
# You CANNOT inspect or route based on HTTP path (it's encrypted)
# Verify passthrough is working
openssl s_client -connect api.example.com:443 -servername api.example.com
# The certificate should be from the BACKEND, not the ingress controller
# If you see the ingress controller's cert, passthrough isn't active
# Check the annotation:
kubectl get ingress -n production -o yaml | grep ssl-passthrough
Pattern: Custom Error Pages¶
# nginx-ingress: custom error pages
metadata:
annotations:
nginx.ingress.kubernetes.io/custom-http-errors: "404,502,503"
nginx.ingress.kubernetes.io/default-backend: custom-error-pages
# Create a simple error page service
# The service receives the error code in the X-Code header
# and the original URI in the X-Original-URI header
# Test custom error pages
curl -s -o /dev/null -w "%{http_code}" https://api.example.com/nonexistent-path
# Should get your custom 404, not nginx's default
Gotcha: Rate Limit Scope Mistakes¶
Rate limiting per-IP sounds right until all your traffic comes from the same source IP (a corporate proxy, a CDN, or a NAT gateway). Every user behind that IP shares the limit.
# Check what IP the ingress sees
kubectl logs -n ingress-nginx deploy/ingress-nginx-controller --tail=10 | \
awk '{print $1}'
# If all requests show the same IP, you're limiting the proxy, not the users
# Fix: Use X-Forwarded-For header for the real client IP
# In ConfigMap:
# use-forwarded-headers: "true"
# compute-full-forwarded-for: "true"
# Set trusted proxy CIDR:
# proxy-real-ip-cidr: "10.0.0.0/8,172.16.0.0/12"
Under the hood: When running behind a cloud load balancer, the ingress controller sees the LB's IP as the client.
use-forwarded-headerstells nginx to trustX-Forwarded-Forfrom upstream, but only from IPs inproxy-real-ip-cidr. Setting this CIDR too wide lets attackers spoof their source IP by injecting a fakeX-Forwarded-Forheader.