Skip to content

API Gateways & Ingress Footguns

  1. Annotation typos that fail silently. You type nginx.ingress.kubernetes.io/proxy-body-siz: "50m" instead of proxy-body-size. No error, no warning, no event. The annotation is ignored and the default 1MB limit applies. Users report file uploads fail. You spend an hour checking application code before realizing the ingress annotation is misspelled.

Fix: Verify annotations took effect by checking the generated proxy config: kubectl exec -n ingress-nginx deploy/ingress-nginx-controller -- grep "client_max_body_size" /etc/nginx/nginx.conf. Add annotation validation to your CI pipeline. Consider OPA Gatekeeper to reject unknown annotations.

  1. Default backend returning generic nginx 404 to users. Requests that don't match any ingress rule fall through to the default backend. Without a custom one, users see a bare nginx "404 Not Found" page. This looks unprofessional and leaks that you're running nginx. Health checkers and scanners hitting random paths generate noise alerts.

Fix: Deploy a custom default backend that returns branded error pages and proper JSON responses for API requests. Configure it via the ingress controller's default-backend setting. Monitor default backend hit rate — a spike indicates misconfigured routes.

  1. TLS passthrough when you actually need termination (or vice versa). You enable ssl-passthrough: "true" because you want end-to-end encryption. But now the ingress controller can't read HTTP headers, so path-based routing stops working. All requests go to the default backend.

Fix: Use TLS termination (the default) for most cases. If you need backend encryption, use re-encryption (terminate at ingress, new TLS to backend) rather than passthrough. Only use passthrough when the backend must negotiate its own TLS. Document which mode each ingress uses.

  1. Rate limits scoped to the wrong identifier. You set rate limiting per source IP. But all traffic arrives through a CDN, corporate proxy, or NAT gateway — so every user behind that IP shares the same rate limit. One heavy user exhausts the quota for thousands of others.

Fix: Configure the ingress controller to use X-Forwarded-For for the real client IP (use-forwarded-headers: true). Set proxy-real-ip-cidr to trust only your CDN/proxy IPs. For API traffic, consider rate limiting by API key or JWT claim using a gateway plugin.

  1. No connection draining during deploys — intermittent 502s. You do a rolling deployment. The ingress controller has the old pod in its backend list. Traffic is sent to the terminating pod. The pod closes the connection. Users get 502 Bad Gateway errors for 5-30 seconds during every deploy.

Fix: Add a preStop lifecycle hook with a sleep (10-15 seconds) to your pod spec. This gives the ingress controller time to update its endpoint list before the pod actually terminates. Set terminationGracePeriodSeconds higher than the sleep. Test by hitting the endpoint in a loop during deploys and counting 5xx responses.

  1. Ingress controller as a single point of failure. You run one ingress controller replica. It crashes, gets OOM-killed, or is evicted during a node drain. All external traffic to all services drops to zero until the pod is rescheduled and ready.

Fix: Run at least 2-3 ingress controller replicas with a PodDisruptionBudget (minAvailable: 1). Spread across nodes with anti-affinity rules. Use a HorizontalPodAutoscaler to scale under load. Monitor ingress controller pod health as a top-priority alert.

  1. Ignoring ingress controller resource limits. The ingress controller runs with default resource requests. Traffic grows. The controller starts consuming 2GB of RAM for worker processes and connection tracking. It gets OOM-killed. Every service behind it goes down simultaneously.

Fix: Set explicit resource requests and limits on the ingress controller based on measured usage. Monitor its CPU and memory. The ingress controller is one of the most critical pods in your cluster — size it generously and autoscale it. A starved ingress controller is a cluster-wide outage.

  1. cert-manager HTTP-01 challenges blocked by the ingress itself. cert-manager needs to serve a challenge token at /.well-known/acme-challenge/. But your ingress has a catch-all rule, an auth requirement, or a redirect that prevents Let's Encrypt from reaching the challenge endpoint. Certificate issuance fails silently and the old cert expires.

Fix: Ensure cert-manager's solver ingress can serve the challenge path without auth, rate limiting, or redirects. Check kubectl get challenges and kubectl describe challenge when certs don't issue. Set up alerts for certificates expiring within 14 days.

  1. Mixing ingress controllers without specifying ingressClassName. You have both nginx and traefik installed. An ingress resource without ingressClassName gets picked up by both controllers — or neither, depending on default class configuration. Routing is unpredictable and debugging is a nightmare.

Fix: Always specify ingressClassName in every Ingress resource. Set one controller as the default using the ingressclass.kubernetes.io/is-default-class: "true" annotation on its IngressClass. Audit existing ingress resources to ensure they specify a class.

  1. No monitoring on the ingress layer. Your application has great monitoring. But you have zero visibility into the ingress controller: no request rate, no latency, no error rate, no connection count. When users report slowness, you can't tell if it's the ingress or the backend.

    Fix: Enable Prometheus metrics on the ingress controller (most support it natively). Create dashboards for: request rate by host/path, error rate (4xx/5xx) by backend, request latency (p50/p95/p99), active connections, and TLS certificate expiry. The ingress controller is the front door to your cluster — monitor it accordingly.