Skip to content

API Gateways & Ingress - Street-Level Ops

Quick Diagnosis Commands

# ── Ingress Controller Status ──
kubectl get pods -n ingress-nginx                    # Nginx ingress pods
kubectl get pods -n traefik                          # Traefik pods
kubectl logs -n ingress-nginx deploy/ingress-nginx-controller --tail=50

# ── Ingress Resources ──
kubectl get ingress -A                               # All ingress resources
kubectl describe ingress myapp-ingress -n production # Full details + events
kubectl get ingress myapp-ingress -n production -o yaml  # Raw YAML

# ── TLS/Certs ──
kubectl get certificates -A                          # cert-manager certs
kubectl describe certificate api-tls -n production   # Cert status
kubectl get secret api-tls -n production -o jsonpath='{.data.tls\.crt}' | \
  base64 -d | openssl x509 -noout -dates            # Cert expiry

# ── Backend Health ──
kubectl get endpoints myapp-service -n production    # Backend IPs
kubectl get svc myapp-service -n production          # Service details

# ── External Testing ──
curl -v https://api.example.com/health               # Full connection details
curl -s -o /dev/null -w "%{http_code}" https://api.example.com/  # Just status code
curl -H "Host: api.example.com" http://INGRESS_IP/health  # Test specific host header

# ── Ingress Controller Config ──
# Nginx: dump generated nginx.conf
kubectl exec -n ingress-nginx deploy/ingress-nginx-controller -- \
  cat /etc/nginx/nginx.conf | less

# Traefik: check routers
kubectl exec -n traefik deploy/traefik -- \
  wget -qO- http://localhost:8080/api/http/routers | python3 -m json.tool

Gotcha: 502 Bad Gateway After Deploy

You deploy a new version and get 502s. The ingress controller can't reach the backend.

# Step 1: Check if endpoints exist
kubectl get endpoints myapp-service -n production
# If ENDPOINTS is <none>, the service selector doesn't match any running pods

# Step 2: Check if pods are ready
kubectl get pods -n production -l app=myapp
# If pods are in CrashLoopBackOff or not Ready, the endpoint won't register

# Step 3: Check if the port matches
kubectl get svc myapp-service -n production -o yaml | grep -A5 ports
kubectl get pods -n production -l app=myapp -o yaml | grep -A5 containerPort
# The service targetPort must match the container's listening port

# Step 4: Check ingress controller logs
kubectl logs -n ingress-nginx deploy/ingress-nginx-controller --tail=100 | \
  grep -i "upstream\|502\|error"

# Step 5: Test from inside the cluster
kubectl run debug --rm -it --image=curlimages/curl -- \
  curl -v http://myapp-service.production.svc:8080/health

Gotcha: Annotation Typos Fail Silently

You add nginx.ingress.kubernetes.io/proxy-body-siz: "50m" (missing the 'e'). No error, no warning. The annotation is simply ignored and the default body size (1m) applies. Large file uploads fail.

Debug clue: After applying an ingress annotation change, always diff the generated nginx.conf inside the controller pod. If the annotation had no effect, it either has a typo or conflicts with another annotation. The generated config is the only source of truth for what nginx is actually doing.

# Check for unrecognized annotations
kubectl get ingress myapp-ingress -n production -o yaml | grep "annotations" -A 50

# Compare against known annotations
# https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/

# Validate by checking the generated config
kubectl exec -n ingress-nginx deploy/ingress-nginx-controller -- \
  grep -A10 "server_name api.example.com" /etc/nginx/nginx.conf
# If your annotation isn't reflected in the config, it's misspelled or invalid
# Prevention: use a CI linter or admission webhook that validates annotations

Pattern: Debugging Ingress Routing Step by Step

# 1. Verify DNS resolves to the ingress controller's external IP
dig +short api.example.com
kubectl get svc -n ingress-nginx ingress-nginx-controller -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
# These should match

# 2. Verify the ingress resource exists and has the correct rules
kubectl describe ingress myapp-ingress -n production
# Check: host, path, backend service name, port

# 3. Verify the service exists and has endpoints
kubectl get svc myapp-service -n production
kubectl get endpoints myapp-service -n production
# Endpoints should list pod IPs

# 4. Verify the backend is responding
kubectl exec -n ingress-nginx deploy/ingress-nginx-controller -- \
  curl -s http://POD_IP:PORT/health
# Replace POD_IP and PORT with values from endpoints

# 5. Check ingress controller logs for the specific request
kubectl logs -n ingress-nginx deploy/ingress-nginx-controller --tail=200 | \
  grep "api.example.com"

# 6. Test with verbose curl
curl -vvv -H "Host: api.example.com" https://INGRESS_IP/v1/endpoint

Pattern: cert-manager Troubleshooting

Certificates not issuing? Follow the chain:

# 1. Check Certificate resource
kubectl get certificate -n production
# READY should be True

# 2. If not ready, check the CertificateRequest
kubectl get certificaterequest -n production
kubectl describe certificaterequest <name> -n production

# 3. Check the Order (ACME)
kubectl get orders -n production
kubectl describe order <name> -n production

# 4. Check the Challenge (HTTP-01 or DNS-01)
kubectl get challenges -n production
kubectl describe challenge <name> -n production
# Common issues:
# - HTTP-01: ingress controller can't route /.well-known/acme-challenge/
# - DNS-01: DNS provider credentials incorrect

# 5. Check cert-manager logs
kubectl logs -n cert-manager deploy/cert-manager --tail=100 | grep -i error

# 6. Force renewal
kubectl delete secret api-tls -n production
# cert-manager will re-issue the certificate

Pattern: Canary Deployments via Ingress Annotations

# Primary ingress (stable)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-stable
  namespace: production
spec:
  ingressClassName: nginx
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: myapp-stable
                port:
                  number: 8080

---
# Canary ingress (new version, 10% traffic)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-canary
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "10"
spec:
  ingressClassName: nginx
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: myapp-canary
                port:
                  number: 8080
# Gradually increase canary weight
kubectl annotate ingress myapp-canary -n production \
  nginx.ingress.kubernetes.io/canary-weight="25" --overwrite

# Route specific header to canary (for testing)
# nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
# nginx.ingress.kubernetes.io/canary-by-header-value: "true"
# Then: curl -H "X-Canary: true" https://api.example.com/

Gotcha: Default Backend Confusion

Requests that don't match any ingress rule go to the default backend. If you haven't configured one, users see a generic 404 page from nginx.

# Check what the default backend is
kubectl get deploy -n ingress-nginx ingress-nginx-controller -o yaml | \
  grep -A3 default-backend

# Create a custom default backend
kubectl create deployment default-backend -n ingress-nginx \
  --image=your-custom-404-image:latest
kubectl expose deployment default-backend -n ingress-nginx --port=8080

# Configure ingress controller to use it
# In helm values:
# defaultBackend:
#   enabled: true
#   image:
#     repository: your-custom-404-image

Pattern: Connection Draining During Deploys

Prevent 502s during rolling updates:

# In your deployment
spec:
  template:
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: myapp
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 15"]
          # The sleep gives the ingress controller time to update
          # its backend list before the pod starts terminating

---
# In your ingress annotations (nginx)
metadata:
  annotations:
    nginx.ingress.kubernetes.io/upstream-hash-by: ""
    # Disable session affinity during updates
# Verify smooth rollout
kubectl rollout status deployment/myapp -n production
# Watch for 502s during the rollout
while true; do
  curl -s -o /dev/null -w "%{http_code}\n" https://api.example.com/health
  sleep 0.5
done

Gotcha: TLS Passthrough Misconfiguration

You configure TLS passthrough but the backend doesn't actually handle TLS. Or you enable passthrough on a port that's also doing TLS termination.

# TLS passthrough: the ingress controller does NOT decrypt
# The backend MUST handle TLS itself
# You CANNOT inspect or route based on HTTP path (it's encrypted)

# Verify passthrough is working
openssl s_client -connect api.example.com:443 -servername api.example.com
# The certificate should be from the BACKEND, not the ingress controller

# If you see the ingress controller's cert, passthrough isn't active
# Check the annotation:
kubectl get ingress -n production -o yaml | grep ssl-passthrough

Pattern: Custom Error Pages

# nginx-ingress: custom error pages
metadata:
  annotations:
    nginx.ingress.kubernetes.io/custom-http-errors: "404,502,503"
    nginx.ingress.kubernetes.io/default-backend: custom-error-pages
# Create a simple error page service
# The service receives the error code in the X-Code header
# and the original URI in the X-Original-URI header

# Test custom error pages
curl -s -o /dev/null -w "%{http_code}" https://api.example.com/nonexistent-path
# Should get your custom 404, not nginx's default

Gotcha: Rate Limit Scope Mistakes

Rate limiting per-IP sounds right until all your traffic comes from the same source IP (a corporate proxy, a CDN, or a NAT gateway). Every user behind that IP shares the limit.

# Check what IP the ingress sees
kubectl logs -n ingress-nginx deploy/ingress-nginx-controller --tail=10 | \
  awk '{print $1}'
# If all requests show the same IP, you're limiting the proxy, not the users

# Fix: Use X-Forwarded-For header for the real client IP
# In ConfigMap:
# use-forwarded-headers: "true"
# compute-full-forwarded-for: "true"
# Set trusted proxy CIDR:
# proxy-real-ip-cidr: "10.0.0.0/8,172.16.0.0/12"

Under the hood: When running behind a cloud load balancer, the ingress controller sees the LB's IP as the client. use-forwarded-headers tells nginx to trust X-Forwarded-For from upstream, but only from IPs in proxy-real-ip-cidr. Setting this CIDR too wide lets attackers spoof their source IP by injecting a fake X-Forwarded-For header.