API Gateways: The Front Door to Your Microservices
- lesson
- api-gateways
- load-balancing
- reverse-proxies
- kubernetes-networking
- rate-limiting
- authentication
- observability ---# API Gateways — The Front Door to Your Microservices
Topics: API gateways, load balancing, reverse proxies, Kubernetes networking, rate limiting, authentication, observability Level: L1–L2 (Foundations to Operations) Time: 75–90 minutes Prerequisites: None (everything is explained from scratch)
The Mission¶
You're building a platform with twelve microservices. The mobile app, the web frontend, and three partner integrations all need to talk to them. Right now every service handles its own authentication, its own rate limiting, its own CORS headers. Some services check JWTs. Some use API keys. One still has HTTP basic auth because "we'll fix it later."
A partner integration is hammering your payment service with retries. Your auth service went down for 90 seconds last week and took six other services with it because each one validates tokens independently. Deploying a new version of the catalog service caused a blip of 502 errors because there's no canary mechanism.
Your team lead says: "We need a gateway." But what does that actually mean? And how is it different from the load balancer you already have?
Gateway vs. Load Balancer vs. Reverse Proxy¶
These three terms get used interchangeably, which causes confusion. They're related but different.
┌─────────────────────────────────────────────────┐
│ What each layer cares about │
├──────────────────┬──────────────────────────────┤
│ Reverse Proxy │ Sits in front of backends. │
│ │ Clients talk to the proxy, │
│ │ proxy talks to backends. │
│ │ Hides backend topology. │
├──────────────────┼──────────────────────────────┤
│ Load Balancer │ Distributes traffic across │
│ │ multiple backends. Cares │
│ │ about health checks and │
│ │ even distribution. │
├──────────────────┼──────────────────────────────┤
│ API Gateway │ Reverse proxy + business │
│ │ policies: auth, rate │
│ │ limits, transforms, canary │
│ │ routing, observability. │
└──────────────────┴──────────────────────────────┘
Every API gateway is a reverse proxy. Most API gateways include load balancing. Not every reverse proxy is an API gateway — nginx serving static files and forwarding to a backend is a reverse proxy, but it's not managing API keys or doing request transformation.
Mental Model: Think of it as a stack of responsibilities. A reverse proxy handles where traffic goes. A load balancer handles how it's distributed. An API gateway handles whether the request is allowed and what happens to it along the way. In practice, a single piece of software (Kong, Traefik, Envoy) often does all three.
| Feature | nginx (reverse proxy) | HAProxy (LB) | Kong (gateway) | AWS API Gateway |
|---|---|---|---|---|
| Route by host/path | Yes | Yes | Yes | Yes |
| Health checks | Basic | Advanced | Advanced | Managed |
| Rate limiting | Module | Basic | Plugin (Redis-backed) | Built-in |
| JWT validation | Module | No | Plugin | Built-in (Cognito) |
| Request transform | Limited | No | Plugin | Mapping templates |
| Canary routing | Weight-based | Weight-based | Plugin + CRD | Canary stage |
| Dashboard | No | Stats page | Kong Manager | CloudWatch |
Why API Gateways Exist: Cross-Cutting Concerns¶
Here's the problem gateways solve in one sentence: things every service needs, but no single service should own.
These are called cross-cutting concerns:
- Authentication — is this request from a legitimate caller?
- Rate limiting — is this caller sending too many requests?
- TLS termination — decrypt HTTPS so backends run plain HTTP
- Request/response transformation — add headers, strip fields, version translation
- Observability — access logs, metrics, distributed tracing
- Circuit breaking — stop sending traffic to a failing backend
- Canary releases — send 5% of traffic to the new version
Without a gateway, every service implements these independently. That means twelve implementations of JWT validation, twelve rate limiters with different configurations, and twelve places to get it wrong.
Trivia: The API gateway pattern predates microservices. Enterprise Service Buses (ESBs) in the early 2000s performed similar routing and transformation. Chris Richardson formalized the modern API gateway pattern around 2015 as part of the microservices architecture movement. The name changed, but the problem — centralized edge policy — is as old as multi-service architectures.
The Gateway Landscape¶
Let's look at what's out there. Each gateway has a personality.
Kong¶
Born from a failed API marketplace called Mashape (2010). The team pivoted and open-sourced their gateway in 2015. Built on top of nginx and OpenResty (nginx + Lua). Plugin-driven — you bolt on rate limiting, auth, logging as separate plugins.
# kong.yaml — declarative configuration
_format_version: "3.0"
services:
- name: catalog-service
url: http://catalog.default.svc:8080
routes:
- name: catalog-route
paths:
- /api/v1/catalog
strip_path: true
plugins:
- name: rate-limiting
config:
minute: 100
policy: redis
redis_host: redis.default.svc
- name: jwt
config:
claims_to_verify:
- exp
- name: correlation-id
config:
header_name: X-Request-ID
generator: uuid
Kong's model: define a service (the backend), attach routes (how traffic reaches it), attach plugins (what happens along the way). Plugins execute in a specific order — auth runs before rate limiting, rate limiting before proxying.
Traefik¶
Created by Emile Vauge in 2015. The key innovation: automatic service discovery. Traefik watches container orchestrators (Docker, Kubernetes, Consul) and configures routes without you writing config files. New container spins up with the right labels? Traefik routes to it within seconds.
# Traefik IngressRoute (Kubernetes CRD)
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: catalog-route
namespace: production
spec:
entryPoints:
- websecure
routes:
- match: Host(`api.example.com`) && PathPrefix(`/v1/catalog`)
kind: Rule
services:
- name: catalog-service
port: 8080
weight: 90
- name: catalog-canary
port: 8080
weight: 10
middlewares:
- name: rate-limit
- name: auth-forward
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: rate-limit
namespace: production
spec:
rateLimit:
average: 50
burst: 100
period: 1s
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: auth-forward
namespace: production
spec:
forwardAuth:
address: http://auth-service.auth.svc:8080/verify
authResponseHeaders:
- X-User-ID
- X-User-Email
Notice how Traefik's canary is just weighted services in the route definition. No separate canary ingress resource. No annotations. The CRDs are explicit and readable.
AWS API Gateway¶
Fully managed. You don't run it, don't scale it, don't patch it. Two flavors: REST API (feature-rich, more expensive) and HTTP API (cheaper, fewer features, lower latency).
Gotcha: AWS API Gateway has a hard 29-second integration timeout. If your backend takes longer, the gateway returns a 504 and there's nothing you can do about it. This limit has never been raised. It has forced countless teams to redesign synchronous APIs into async patterns with SQS or Step Functions. Check this limit before committing to AWS API Gateway for any workload that might have long-running requests.
Apache APISIX¶
The newer entrant. Built on nginx + etcd, fully dynamic configuration without reloads. Strong in the APAC market. Key differentiator: its plugin system runs Lua, Go, Python, or Wasm, so you're not locked into one language for custom logic.
Ambassador / Emissary-Ingress¶
Built on Envoy, designed specifically for Kubernetes. Uses CRDs extensively. Good if you're already in the Envoy ecosystem or headed toward a service mesh (Istio uses Envoy too). The project was rebranded from Ambassador to Emissary-Ingress in 2021.
Trivia: Envoy was created by Matt Klein at Lyft in 2015 specifically because configuring nginx for a dynamic microservices environment was unmanageable. Envoy introduced xDS APIs for dynamic configuration — the proxy reconfigures without reloading. This innovation became the foundation of the entire service mesh movement.
Flashcard Check #1¶
Cover the answers and test yourself.
| Question | Answer |
|---|---|
| What's the difference between a reverse proxy and an API gateway? | A reverse proxy routes traffic to backends. An API gateway adds policy enforcement: auth, rate limiting, transforms, observability. |
| Name three cross-cutting concerns gateways handle. | Auth, rate limiting, TLS termination (also: transforms, observability, circuit breaking, canary routing). |
| Why does AWS API Gateway's 29-second timeout matter? | Long-running sync requests get 504'd with no workaround. Forces async redesign. |
| What does Kong use under the hood? | nginx + OpenResty (nginx + Lua scripting). |
| How does Traefik differ from traditional gateways? | Auto-discovers services from orchestrators. No manual config files needed for new services. |
Rate Limiting: Three Algorithms You Should Know¶
Rate limiting at the gateway protects your backends from abuse, runaway clients, and your own frontend's retry storms. But "rate limiting" is not one thing — the algorithm determines how it behaves under load.
Fixed Window¶
Divide time into fixed windows (e.g., one-minute blocks). Count requests per window. Reset at the boundary.
Window: 12:00:00 – 12:01:00 Limit: 100 requests
├────── 73 requests ──────┤ ← OK
Window: 12:01:00 – 12:02:00
├───── 100 requests ──────┤ ← OK
└── 101st request → 429 Too Many Requests
The problem: A burst at the boundary. If a client sends 100 requests at 12:00:59 and 100 more at 12:01:01, they've sent 200 requests in 2 seconds while staying within the 100/minute limit for each window. This is the boundary burst problem.
Sliding Window¶
Keeps a weighted count across the current and previous window. If you're 30 seconds into the
current minute, the rate is: (prev_window_count * 0.5) + current_window_count. This
smooths out the boundary burst.
Previous window (12:00–12:01): 80 requests
Current window (12:01–12:02): at 12:01:30 (halfway through)
Weighted count: (80 × 0.5) + current_count = 40 + current_count
Effective remaining: 100 - 40 = 60 more allowed
Most production gateways use sliding window. Kong's rate-limiting plugin, nginx's
limit_req, and Envoy's rate limiter all use variations of this approach.
Token Bucket¶
A bucket holds tokens. It's refilled at a fixed rate. Each request consumes a token. If the bucket is empty, the request is rejected (or queued). The bucket size controls burst capacity.
Bucket capacity: 10 tokens
Refill rate: 2 tokens/second
t=0: bucket=10 → burst of 10 requests OK → bucket=0
t=1: bucket=2 → 2 requests OK → bucket=0
t=2: bucket=2 → 3 requests: 2 OK, 1 rejected
t=5: bucket=6 → idle period refilled the bucket
Token bucket is elegant because it naturally allows bursts (up to the bucket size) while enforcing a long-term average rate. AWS API Gateway uses token bucket. So do most cloud provider rate limiters.
Trivia: The token bucket algorithm was first described in 1986 for ATM (Asynchronous Transfer Mode) network traffic shaping. These 40-year-old algorithms are still the standard in modern gateways.
Interview Bridge: "Explain the difference between fixed window and token bucket rate limiting" is a common system design interview question. The key insight: fixed window has boundary burst problems, token bucket allows controlled bursts while enforcing average rate.
Authentication at the Gateway¶
This is where gateways save the most engineering time. Instead of every service validating credentials, the gateway does it once and passes identity downstream.
The Pattern¶
Client → Gateway → Auth Check → Backend
│
├─ Valid token?
│ ├─ Yes → forward request + inject X-User-ID header
│ └─ No → return 401, never hits backend
│
└─ Rate limit check (per user, not per IP)
├─ Under limit → forward
└─ Over limit → return 429
The backend trusts the gateway. If a request arrives with X-User-ID: 42, the backend knows
the gateway already validated the token. The backend never sees raw credentials.
JWT Validation at the Gateway¶
JWTs (JSON Web Tokens) are the most common gateway auth pattern. The gateway validates the signature, checks expiration, and extracts claims — without calling an external auth service.
JWT structure: header.payload.signature
│ │ │
│ │ └─ HMAC or RSA signature
│ └─ {"sub": "user-42", "exp": 1735689600, "role": "admin"}
└─ {"alg": "RS256", "typ": "JWT"}
Kong JWT plugin config:
plugins:
- name: jwt
config:
claims_to_verify:
- exp # reject expired tokens
header_names:
- Authorization
run_on_preflight: false
OAuth2 / OIDC at the Gateway¶
For more complex flows (user login, refresh tokens, third-party identity providers), gateways can act as an OAuth2 resource server or even handle the full OIDC flow:
# Traefik ForwardAuth to an OAuth2 proxy
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: oauth2-auth
spec:
forwardAuth:
address: http://oauth2-proxy.auth.svc:4180/oauth2/auth
trustForwardHeader: true
authResponseHeaders:
- X-Auth-Request-User
- X-Auth-Request-Email
- X-Auth-Request-Groups
API Key Authentication¶
Simpler than JWT but still useful for service-to-service or partner integrations:
# Kong API key plugin
plugins:
- name: key-auth
config:
key_names:
- X-API-Key
- apikey # query parameter fallback
hide_credentials: true # strip the key before forwarding to backend
Gotcha:
hide_credentials: truematters. Without it, the API key is forwarded to your backend in the request headers. If your backend logs request headers (many do by default), you're logging secrets. This has caused real credential leaks.
Request and Response Transformation¶
Gateways can modify requests before they reach backends, and modify responses before they reach clients. This is powerful for API versioning, migration, and compatibility.
# Kong request-transformer plugin
plugins:
- name: request-transformer
config:
add:
headers:
- "X-Request-ID:$(uuid)"
- "X-Gateway-Version:v2"
remove:
headers:
- "X-Internal-Debug"
rename:
headers:
- "X-Old-Header:X-New-Header"
Use cases that come up in real life:
- Header injection: Add request IDs, user identity, feature flags
- Header stripping: Remove internal headers before they leak to clients
- Path rewriting: /api/v2/catalog at the gateway becomes /catalog at the backend
- Response filtering: Remove internal fields from API responses
Under the Hood: Request transformation happens in the gateway's processing pipeline. Kong executes plugins in phases:
certificate → rewrite → access → response → log. Auth plugins run in theaccessphase. Transforms run inrewrite(request) andresponse(response body). Understanding this pipeline order matters when plugins interact — an auth plugin can't see a header that a transform plugin adds in a later phase.
Canary Releases via Gateway¶
A canary release sends a small percentage of traffic to a new version. If errors spike, you roll back. If metrics look good, you increase the percentage. The gateway is the natural place to do this because it already controls traffic routing.
Canary with nginx-ingress annotations¶
# Stable version — receives all non-canary traffic
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: catalog-stable
namespace: production
spec:
ingressClassName: nginx
rules:
- host: api.example.com
http:
paths:
- path: /v1/catalog
pathType: Prefix
backend:
service:
name: catalog-stable
port:
number: 8080
---
# Canary version — receives 10% of traffic
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: catalog-canary
namespace: production
annotations:
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "10"
spec:
ingressClassName: nginx
rules:
- host: api.example.com
http:
paths:
- path: /v1/catalog
pathType: Prefix
backend:
service:
name: catalog-canary
port:
number: 8080
Ramp up the canary:
# Increase to 25%
kubectl annotate ingress catalog-canary -n production \
nginx.ingress.kubernetes.io/canary-weight="25" --overwrite
# Route a specific header to canary (for internal testing)
# Add these annotations instead of canary-weight:
# nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
# nginx.ingress.kubernetes.io/canary-by-header-value: "true"
# Then: curl -H "X-Canary: true" https://api.example.com/v1/catalog
Canary with Gateway API (native traffic splitting)¶
The Kubernetes Gateway API has canary built into the spec — no annotations needed:
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: catalog-route
namespace: production
spec:
parentRefs:
- name: main-gateway
hostnames:
- "api.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /v1/catalog
backendRefs:
- name: catalog-stable
port: 8080
weight: 90
- name: catalog-canary
port: 8080
weight: 10
That's it. No separate ingress resources, no annotations. Change the weights, apply, done. This is one of the reasons the Gateway API exists.
Circuit Breaking: Stop Hitting a Dead Service¶
When a backend is failing, the worst thing you can do is keep sending it traffic. Circuit breaking detects failures and stops routing to the failing backend, giving it time to recover.
Circuit states:
┌──────────┐ failures > threshold ┌──────────┐
│ CLOSED │ ─────────────────────────→ │ OPEN │
│ (normal) │ │ (reject) │
└──────────┘ └────┬─────┘
↑ │
│ timeout expires │
│ ┌──────────────┐ │
└─────────│ HALF-OPEN │←────────────┘
success │ (test probe) │
└──────────────┘
In the closed state, all requests pass through. When failures exceed a threshold, the circuit opens — requests are immediately rejected (503) without hitting the backend. After a timeout, the circuit goes half-open: a single probe request is sent. If it succeeds, the circuit closes. If it fails, it reopens.
Envoy (used by Ambassador/Emissary and Istio) has the most sophisticated circuit breaking:
# Envoy circuit breaker via Istio DestinationRule
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: catalog-circuit-breaker
spec:
host: catalog-service
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
h2UpgradePolicy: DEFAULT
http1MaxPendingRequests: 50
http2MaxRequests: 200
outlierDetection:
consecutive5xxErrors: 5
interval: 30s
baseEjectionTime: 30s
maxEjectionPercent: 50
That config says: if a backend returns five 5xx errors in a row, eject it for 30 seconds. Never eject more than 50% of backends (so you don't route all traffic to zero servers).
Remember: Circuit breaking is about protecting the system, not the failed service. Without it, a slow or failing backend causes requests to queue up at the gateway, consuming connections and memory until the gateway itself becomes the bottleneck. The circuit breaker fails fast so the gateway stays healthy.
Flashcard Check #2¶
| Question | Answer |
|---|---|
| What's the boundary burst problem in fixed window rate limiting? | A client can send 2x the limit in a short period by timing requests at the window boundary. |
| Why is token bucket good for APIs? | It allows controlled bursts (up to bucket size) while enforcing an average rate over time. |
What does hide_credentials: true do in Kong's key-auth plugin? |
Strips the API key from headers before forwarding to the backend — prevents logging secrets. |
| Name the three circuit breaker states. | Closed (normal), Open (rejecting), Half-Open (testing with a probe). |
| Why do canary releases at the gateway instead of in the deployment? | The gateway already controls traffic routing, and you can shift percentages without touching the deployment. |
Observability: Seeing Through the Gateway¶
The gateway sees every request that enters your system. That makes it the best place to collect three things:
Access Logs¶
Every gateway produces access logs. The key fields you need:
# nginx-ingress log format (default)
10.0.5.23 - user42 [23/Mar/2026:14:22:31 +0000] "GET /v1/catalog/items HTTP/2.0" 200 1847
0.043 0.041 "https://web.example.com/browse" "Mozilla/5.0..." "req-id-abc123"
│ │ │ │ │ │
│ │ │ │ │ └─ request ID
│ │ │ │ └─ upstream response time
│ │ │ └─ request time (gateway total)
│ │ └─ upstream response body size
│ └─ HTTP status code
└─ client IP
The difference between request_time and upstream_response_time tells you how much
latency the gateway itself adds. If request_time is 200ms and upstream is 180ms, the gateway
added 20ms. If upstream is 5ms and request_time is 200ms, something in the gateway pipeline
(auth check? rate limit lookup? TLS handshake?) is slow.
Metrics (Prometheus)¶
Most gateways expose Prometheus metrics natively. The four golden signals at the gateway:
# Request rate by service and status code
nginx_ingress_controller_requests{service="catalog-stable",status="200"} 45231
nginx_ingress_controller_requests{service="catalog-stable",status="502"} 3
# Latency histogram
nginx_ingress_controller_request_duration_seconds_bucket{service="catalog-stable",le="0.1"} 42000
nginx_ingress_controller_request_duration_seconds_bucket{service="catalog-stable",le="0.5"} 44800
nginx_ingress_controller_request_duration_seconds_bucket{service="catalog-stable",le="1.0"} 45100
# Active connections
nginx_ingress_controller_nginx_process_connections{state="active"} 847
# Bytes transferred
nginx_ingress_controller_bytes_sent{service="catalog-stable"} 89234567
Distributed Tracing¶
The gateway is where trace context begins. It generates a trace ID, injects it as a header, and every downstream service propagates it:
Gateway generates: X-Request-ID: abc-123, traceparent: 00-traceid-spanid-01
│
├─→ catalog-service (reads traceparent, creates child span)
│ │
│ └─→ inventory-service (child span of catalog)
│
└─→ auth-service (separate child span from gateway)
Kong, Traefik, and Envoy all support OpenTelemetry trace propagation. The gateway span becomes the root of every trace, giving you end-to-end latency from the client's perspective.
Kubernetes Ingress vs. Gateway API¶
If you're running on Kubernetes, you have two ways to define gateway routing: the old Ingress resource and the newer Gateway API. This matters because they have fundamentally different design philosophies.
Ingress: The Original (2015)¶
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: catalog-ingress
annotations:
nginx.ingress.kubernetes.io/canary: "true" # nginx-specific
nginx.ingress.kubernetes.io/canary-weight: "10" # nginx-specific
nginx.ingress.kubernetes.io/limit-rps: "50" # nginx-specific
spec:
ingressClassName: nginx
rules:
- host: api.example.com
http:
paths:
- path: /v1/catalog
pathType: Prefix
backend:
service:
name: catalog-service
port:
number: 8080
The problem: the spec part is portable across controllers. The annotations part is not.
Every controller has its own annotation namespace, its own syntax, its own quirks.
Switching from nginx-ingress to Traefik means rewriting every annotation.
Gotcha: Ingress annotations are stringly-typed and fail silently. A typo like
nginx.ingress.kubernetes.io/rewrit-target(missing an 'e') is ignored without error. Usekubectl describe ingressand check for annotation validation warnings. Better yet, check the generated nginx config inside the controller pod to confirm your annotation took effect.
Gateway API: The Successor (2023 GA)¶
# Cluster operator creates the Gateway (infra concern)
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: main-gateway
namespace: gateway-infra
spec:
gatewayClassName: traefik # or kong, nginx, istio, cilium...
listeners:
- name: https
protocol: HTTPS
port: 443
tls:
mode: Terminate
certificateRefs:
- name: wildcard-tls
allowedRoutes:
namespaces:
from: All
---
# App team creates the HTTPRoute (app concern)
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: catalog-route
namespace: production
spec:
parentRefs:
- name: main-gateway
namespace: gateway-infra
hostnames:
- "api.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /v1/catalog
- headers:
- name: X-API-Version
value: "2"
filters:
- type: RequestHeaderModifier
requestHeaderModifier:
add:
- name: X-Routed-By
value: gateway-api
backendRefs:
- name: catalog-stable
port: 8080
weight: 90
- name: catalog-canary
port: 8080
weight: 10
The key difference: role separation. The Gateway resource is created by the platform team (they control TLS, ports, allowed namespaces). The HTTPRoute is created by the app team (they control paths, headers, backends, weights). Neither team needs to touch the other's resources.
Trivia: The Kubernetes Gateway API was proposed in 2019 and took four years to reach GA (v1.0, October 2023). The long timeline reflected the difficulty of designing an API that could satisfy dozens of gateway implementations while remaining simple enough to be useful. Both Ingress and Gateway API coexist today, but new features are only added to Gateway API.
| Capability | Ingress | Gateway API |
|---|---|---|
| Traffic splitting (canary) | Annotations (controller-specific) | Native weight field |
| Header-based routing | Annotations (if supported) | Native headers match |
| Request transforms | Annotations (if supported) | Native filters |
| Role separation | No | Gateway (infra) vs. HTTPRoute (app) |
| Cross-namespace routing | No | Yes, with ReferenceGrants |
| Portable across controllers | Spec yes, annotations no | Fully portable |
War Story: When the Gateway Becomes the Bottleneck¶
War Story: A team ran a single Kong gateway pod in Kubernetes with default resource limits (256MB RAM, 250m CPU). Their platform handled 200 req/s comfortably. Then they added the
response-ratelimitingplugin with Redis lookups and thejwtplugin, which needed to fetch JWKS keys. Under load, each request now required two network round-trips in the gateway pipeline. At 500 req/s during a product launch, the Kong pod's latency jumped from 5ms to 800ms. Request queues backed up. The gateway started returning 504 timeouts. The backend services were fine — p99 latency under 50ms — but users saw 5-second page loads because every request waited in the gateway queue. The fix: (1) bump Kong to 3 replicas with 1GB RAM and 1 CPU each, (2) cache JWT validation results for 60 seconds, (3) move rate limit checks to a local in-memory counter with periodic Redis sync instead of per-request Redis calls. Gateway latency dropped back to 8ms.
The lesson: the gateway is on the critical path of every request. A slow plugin, an
undersized pod, or a network call in the hot path turns your gateway into the bottleneck.
Monitor gateway latency separately from backend latency. If request_time - upstream_time is
growing, the gateway is the problem.
How to avoid this:
# Monitor gateway pod resources
kubectl top pods -n kong-system
# Check gateway-specific latency (not just total request time)
# In Prometheus:
# histogram_quantile(0.99, rate(kong_latency_bucket{type="kong"}[5m]))
# This shows gateway processing time, excluding upstream
# Run at least 2-3 gateway replicas with anti-affinity
# Set a PodDisruptionBudget so drains don't take all replicas down
# Gateway pod disruption budget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: kong-pdb
namespace: kong-system
spec:
minAvailable: 1
selector:
matchLabels:
app: kong-gateway
Gateway Anti-Patterns¶
Knowing what NOT to do is half the battle.
Anti-pattern #1: Business Logic in the Gateway¶
The gateway should enforce policy (auth, rate limits, routing). It should not contain business logic. If your gateway plugin is checking inventory levels, calculating prices, or validating business rules, you've turned the gateway into a monolith at the edge.
The test: if a developer who knows nothing about your business domain can understand every gateway rule, you're fine. If they need to understand your product catalog to configure the gateway, business logic has leaked in.
Anti-pattern #2: Single Point of Failure¶
One gateway replica. No PDB. No HPA. A node drain takes it down and every service goes dark.
# The fix: always
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: gateway-hpa
namespace: gateway-system
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: gateway
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
Anti-pattern #3: No Connection Draining During Deploys¶
You roll out a new gateway version. Active connections get dropped. Users get 502s for 15 seconds.
# Fix: preStop hook gives time for connection draining
spec:
template:
spec:
terminationGracePeriodSeconds: 60
containers:
- name: gateway
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"]
Anti-pattern #4: Rate Limiting by Source IP Behind a CDN¶
All traffic arrives from the CDN's IP range. Your per-IP rate limit applies to the CDN, not the user. One abusive user exhausts the limit for everyone.
# Fix: configure the gateway to trust X-Forwarded-For from known proxies
# Kong: use the real-ip plugin
plugins:
- name: real-ip
config:
header: X-Forwarded-For
trusted_ips:
- 173.245.48.0/20 # Cloudflare
- 103.21.244.0/22 # Cloudflare
- 10.0.0.0/8 # Internal
Gotcha: Setting
trusted_ipstoo wide lets attackers spoof their source IP by injecting a fakeX-Forwarded-Forheader. Only trust CIDR ranges you control (your CDN, your load balancers, your internal network).
Flashcard Check #3¶
| Question | Answer |
|---|---|
| What's the key design difference between Ingress and Gateway API? | Gateway API separates infrastructure (Gateway) from application routing (HTTPRoute), enabling role-based ownership. |
| Why should the gateway not contain business logic? | It becomes a monolith at the edge. Gateway config should be domain-agnostic (auth, routing, rate limits). |
| What's the risk of rate limiting by IP behind a CDN? | All users share the CDN's IP. One abusive user exhausts the limit for everyone. |
Why add a preStop sleep to gateway pods? |
Gives the load balancer time to stop sending traffic before the pod terminates, preventing 502s during deploys. |
What does gateway request_time - upstream_time tell you? |
How much latency the gateway itself adds. If this grows, the gateway pipeline (plugins, auth, TLS) is the bottleneck. |
Exercises¶
Exercise 1: Read a Gateway Config (5 minutes)¶
Here's a Kong declarative config. Answer the questions below.
services:
- name: orders-api
url: http://orders.production.svc:8080
routes:
- name: orders-route
paths: ["/api/v1/orders"]
strip_path: true
plugins:
- name: rate-limiting
config:
minute: 200
policy: local
- name: jwt
- name: request-transformer
config:
add:
headers: ["X-Gateway: kong-prod"]
- A request comes in to
/api/v1/orders/123. What path does the backend see? (Hint:strip_path) - The rate limit policy is
local. What happens if Kong has 3 replicas? - Is the JWT plugin configured to check token expiration?
Answers
1. `/123` — `strip_path: true` removes the matched path prefix `/api/v1/orders` 2. Each replica tracks its own counter. A client gets 200/min per replica, so effectively 600/min total. Use `policy: redis` for a shared counter. 3. No — the JWT plugin is configured with no `claims_to_verify`. Expired tokens will be accepted as valid. Add `claims_to_verify: [exp]`.Exercise 2: Design a Gateway Strategy (15 minutes)¶
You're deploying a new API for partner integrations. Requirements:
- Partners authenticate with API keys
- Each partner gets a different rate limit (free tier: 100/min, paid: 1000/min)
- You want to canary new versions with 5% traffic
- You need to strip the X-Internal-Debug header from responses
Sketch the gateway configuration (any gateway, any format). What resources do you need? What plugins/middlewares/filters? Where do rate limit tiers get configured?
Discussion points
- API keys map to consumers/groups in Kong (or equivalent in your gateway) - Rate limit tiers are configured per consumer group, not per route - Canary: two backend services with weighted routing (Gateway API) or canary annotation (nginx-ingress) - Response transform to strip headers (Kong: `response-transformer` plugin with `remove.headers`) - Consider: where do you store API keys? Gateway database? External secret store?Exercise 3: Diagnose the Bottleneck (10 minutes)¶
Your gateway metrics show:
request_time_p99: 1.2s
upstream_time_p99: 0.08s
active_connections: 4,847
gateway_cpu: 92%
gateway_memory: 1.8GB / 2GB
- Where is the bottleneck?
- What's your immediate fix?
- What's your long-term fix?
Answer
1. The gateway. `request_time - upstream_time = 1.12s` of gateway processing. Backends are fast (80ms p99). CPU at 92% and memory near limit confirms gateway saturation. 2. Immediate: scale gateway replicas horizontally (`kubectl scale` or HPA). Increase memory limit. 3. Long-term: profile which plugins are expensive. Check for per-request network calls (Redis, external auth). Add caching. Consider whether all plugins are needed on all routes.Cheat Sheet¶
| What | Command / Config | Notes |
|---|---|---|
| List all ingress resources | kubectl get ingress -A |
Check ADDRESS column for external IP |
| Check ingress controller logs | kubectl logs -n <ns> deploy/<controller> --tail=100 |
Filter for 502, upstream, error |
| Verify backend endpoints | kubectl get endpoints <svc> -n <ns> |
<none> = no healthy pods |
| Check generated nginx config | kubectl exec -n ingress-nginx deploy/ingress-nginx-controller -- cat /etc/nginx/nginx.conf |
Source of truth for what nginx is doing |
| Test with specific Host header | curl -H "Host: api.example.com" http://<INGRESS_IP>/path |
Bypasses DNS for testing |
| Check TLS cert from gateway | openssl s_client -connect api.example.com:443 -servername api.example.com |
Verify cert is correct and not expired |
| Kong declarative config reload | kong reload or deck sync |
deck diff to preview changes |
| Traefik router list | kubectl exec -n traefik deploy/traefik -- wget -qO- http://localhost:8080/api/http/routers |
Traefik dashboard API |
| Gateway API routes | kubectl get httproutes -A |
Check parentRefs and backendRefs |
| Token bucket capacity | burst_size / refill_rate = seconds of burst |
Tune burst for expected traffic spikes |
Rate Limiting Algorithm Quick Reference:
| Algorithm | Burst handling | Memory | Accuracy | Used by |
|---|---|---|---|---|
| Fixed window | Poor (boundary burst) | Low | Approximate | Simple impls |
| Sliding window | Good | Medium | Good | nginx limit_req, Kong |
| Token bucket | Excellent (configurable) | Low | Good | AWS API GW, Envoy |
Takeaways¶
- Gateways centralize cross-cutting concerns — auth, rate limiting, transforms, observability — so individual services don't each implement them (badly).
- Gateway vs. LB vs. reverse proxy is a spectrum of responsibility, not three different products. Most gateway software does all three.
- Rate limiting algorithm choice matters. Token bucket allows bursts; fixed window has boundary problems; sliding window is the common middle ground.
- The gateway is on the critical path. Monitor it separately. An undersized gateway with expensive plugins becomes the bottleneck faster than you'd expect.
- Gateway API is replacing Ingress in Kubernetes. Its role separation (infra team owns Gateway, app team owns HTTPRoute) solves real organizational problems.
- Never put business logic in the gateway. Auth, routing, rate limits: yes. Inventory checks, pricing calculations: no.
Related Lessons¶
- Envoy: The Proxy That's Everywhere — deep dive into the proxy that powers many gateways
- The Service Mesh Tax — when a gateway isn't enough and you need mesh
- Connection Refused — systematic debugging when traffic doesn't reach your backend
- Kubernetes Services: How Traffic Finds Your Pod — what happens below the gateway layer
- The Cascading Timeout — what happens when circuit breaking is missing
- What Happens When You Click a Link — the full request path, gateway included
- nginx: The Swiss Army Server — the reverse proxy that started it all
- The Load Balancer Lied — when load balancing goes wrong