Skip to content

API Gateways: The Front Door to Your Microservices

  • lesson
  • api-gateways
  • load-balancing
  • reverse-proxies
  • kubernetes-networking
  • rate-limiting
  • authentication
  • observability ---# API Gateways — The Front Door to Your Microservices

Topics: API gateways, load balancing, reverse proxies, Kubernetes networking, rate limiting, authentication, observability Level: L1–L2 (Foundations to Operations) Time: 75–90 minutes Prerequisites: None (everything is explained from scratch)


The Mission

You're building a platform with twelve microservices. The mobile app, the web frontend, and three partner integrations all need to talk to them. Right now every service handles its own authentication, its own rate limiting, its own CORS headers. Some services check JWTs. Some use API keys. One still has HTTP basic auth because "we'll fix it later."

A partner integration is hammering your payment service with retries. Your auth service went down for 90 seconds last week and took six other services with it because each one validates tokens independently. Deploying a new version of the catalog service caused a blip of 502 errors because there's no canary mechanism.

Your team lead says: "We need a gateway." But what does that actually mean? And how is it different from the load balancer you already have?


Gateway vs. Load Balancer vs. Reverse Proxy

These three terms get used interchangeably, which causes confusion. They're related but different.

                        ┌─────────────────────────────────────────────────┐
                        │          What each layer cares about            │
                        ├──────────────────┬──────────────────────────────┤
                        │  Reverse Proxy   │  Sits in front of backends. │
                        │                  │  Clients talk to the proxy, │
                        │                  │  proxy talks to backends.   │
                        │                  │  Hides backend topology.    │
                        ├──────────────────┼──────────────────────────────┤
                        │  Load Balancer   │  Distributes traffic across │
                        │                  │  multiple backends. Cares   │
                        │                  │  about health checks and    │
                        │                  │  even distribution.         │
                        ├──────────────────┼──────────────────────────────┤
                        │  API Gateway     │  Reverse proxy + business   │
                        │                  │  policies: auth, rate       │
                        │                  │  limits, transforms, canary │
                        │                  │  routing, observability.    │
                        └──────────────────┴──────────────────────────────┘

Every API gateway is a reverse proxy. Most API gateways include load balancing. Not every reverse proxy is an API gateway — nginx serving static files and forwarding to a backend is a reverse proxy, but it's not managing API keys or doing request transformation.

Mental Model: Think of it as a stack of responsibilities. A reverse proxy handles where traffic goes. A load balancer handles how it's distributed. An API gateway handles whether the request is allowed and what happens to it along the way. In practice, a single piece of software (Kong, Traefik, Envoy) often does all three.

Feature nginx (reverse proxy) HAProxy (LB) Kong (gateway) AWS API Gateway
Route by host/path Yes Yes Yes Yes
Health checks Basic Advanced Advanced Managed
Rate limiting Module Basic Plugin (Redis-backed) Built-in
JWT validation Module No Plugin Built-in (Cognito)
Request transform Limited No Plugin Mapping templates
Canary routing Weight-based Weight-based Plugin + CRD Canary stage
Dashboard No Stats page Kong Manager CloudWatch

Why API Gateways Exist: Cross-Cutting Concerns

Here's the problem gateways solve in one sentence: things every service needs, but no single service should own.

These are called cross-cutting concerns:

  • Authentication — is this request from a legitimate caller?
  • Rate limiting — is this caller sending too many requests?
  • TLS termination — decrypt HTTPS so backends run plain HTTP
  • Request/response transformation — add headers, strip fields, version translation
  • Observability — access logs, metrics, distributed tracing
  • Circuit breaking — stop sending traffic to a failing backend
  • Canary releases — send 5% of traffic to the new version

Without a gateway, every service implements these independently. That means twelve implementations of JWT validation, twelve rate limiters with different configurations, and twelve places to get it wrong.

Trivia: The API gateway pattern predates microservices. Enterprise Service Buses (ESBs) in the early 2000s performed similar routing and transformation. Chris Richardson formalized the modern API gateway pattern around 2015 as part of the microservices architecture movement. The name changed, but the problem — centralized edge policy — is as old as multi-service architectures.


The Gateway Landscape

Let's look at what's out there. Each gateway has a personality.

Kong

Born from a failed API marketplace called Mashape (2010). The team pivoted and open-sourced their gateway in 2015. Built on top of nginx and OpenResty (nginx + Lua). Plugin-driven — you bolt on rate limiting, auth, logging as separate plugins.

# kong.yaml — declarative configuration
_format_version: "3.0"

services:
  - name: catalog-service
    url: http://catalog.default.svc:8080
    routes:
      - name: catalog-route
        paths:
          - /api/v1/catalog
        strip_path: true
    plugins:
      - name: rate-limiting
        config:
          minute: 100
          policy: redis
          redis_host: redis.default.svc
      - name: jwt
        config:
          claims_to_verify:
            - exp
      - name: correlation-id
        config:
          header_name: X-Request-ID
          generator: uuid

Kong's model: define a service (the backend), attach routes (how traffic reaches it), attach plugins (what happens along the way). Plugins execute in a specific order — auth runs before rate limiting, rate limiting before proxying.

Traefik

Created by Emile Vauge in 2015. The key innovation: automatic service discovery. Traefik watches container orchestrators (Docker, Kubernetes, Consul) and configures routes without you writing config files. New container spins up with the right labels? Traefik routes to it within seconds.

# Traefik IngressRoute (Kubernetes CRD)
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: catalog-route
  namespace: production
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`api.example.com`) && PathPrefix(`/v1/catalog`)
      kind: Rule
      services:
        - name: catalog-service
          port: 8080
          weight: 90
        - name: catalog-canary
          port: 8080
          weight: 10
      middlewares:
        - name: rate-limit
        - name: auth-forward
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: rate-limit
  namespace: production
spec:
  rateLimit:
    average: 50
    burst: 100
    period: 1s
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: auth-forward
  namespace: production
spec:
  forwardAuth:
    address: http://auth-service.auth.svc:8080/verify
    authResponseHeaders:
      - X-User-ID
      - X-User-Email

Notice how Traefik's canary is just weighted services in the route definition. No separate canary ingress resource. No annotations. The CRDs are explicit and readable.

AWS API Gateway

Fully managed. You don't run it, don't scale it, don't patch it. Two flavors: REST API (feature-rich, more expensive) and HTTP API (cheaper, fewer features, lower latency).

Gotcha: AWS API Gateway has a hard 29-second integration timeout. If your backend takes longer, the gateway returns a 504 and there's nothing you can do about it. This limit has never been raised. It has forced countless teams to redesign synchronous APIs into async patterns with SQS or Step Functions. Check this limit before committing to AWS API Gateway for any workload that might have long-running requests.

Apache APISIX

The newer entrant. Built on nginx + etcd, fully dynamic configuration without reloads. Strong in the APAC market. Key differentiator: its plugin system runs Lua, Go, Python, or Wasm, so you're not locked into one language for custom logic.

Ambassador / Emissary-Ingress

Built on Envoy, designed specifically for Kubernetes. Uses CRDs extensively. Good if you're already in the Envoy ecosystem or headed toward a service mesh (Istio uses Envoy too). The project was rebranded from Ambassador to Emissary-Ingress in 2021.

Trivia: Envoy was created by Matt Klein at Lyft in 2015 specifically because configuring nginx for a dynamic microservices environment was unmanageable. Envoy introduced xDS APIs for dynamic configuration — the proxy reconfigures without reloading. This innovation became the foundation of the entire service mesh movement.


Flashcard Check #1

Cover the answers and test yourself.

Question Answer
What's the difference between a reverse proxy and an API gateway? A reverse proxy routes traffic to backends. An API gateway adds policy enforcement: auth, rate limiting, transforms, observability.
Name three cross-cutting concerns gateways handle. Auth, rate limiting, TLS termination (also: transforms, observability, circuit breaking, canary routing).
Why does AWS API Gateway's 29-second timeout matter? Long-running sync requests get 504'd with no workaround. Forces async redesign.
What does Kong use under the hood? nginx + OpenResty (nginx + Lua scripting).
How does Traefik differ from traditional gateways? Auto-discovers services from orchestrators. No manual config files needed for new services.

Rate Limiting: Three Algorithms You Should Know

Rate limiting at the gateway protects your backends from abuse, runaway clients, and your own frontend's retry storms. But "rate limiting" is not one thing — the algorithm determines how it behaves under load.

Fixed Window

Divide time into fixed windows (e.g., one-minute blocks). Count requests per window. Reset at the boundary.

Window: 12:00:00 – 12:01:00   Limit: 100 requests
        ├────── 73 requests ──────┤  ← OK
Window: 12:01:00 – 12:02:00
        ├───── 100 requests ──────┤  ← OK
        └── 101st request → 429 Too Many Requests

The problem: A burst at the boundary. If a client sends 100 requests at 12:00:59 and 100 more at 12:01:01, they've sent 200 requests in 2 seconds while staying within the 100/minute limit for each window. This is the boundary burst problem.

Sliding Window

Keeps a weighted count across the current and previous window. If you're 30 seconds into the current minute, the rate is: (prev_window_count * 0.5) + current_window_count. This smooths out the boundary burst.

Previous window (12:00–12:01): 80 requests
Current window (12:01–12:02):  at 12:01:30 (halfway through)
Weighted count: (80 × 0.5) + current_count = 40 + current_count
Effective remaining: 100 - 40 = 60 more allowed

Most production gateways use sliding window. Kong's rate-limiting plugin, nginx's limit_req, and Envoy's rate limiter all use variations of this approach.

Token Bucket

A bucket holds tokens. It's refilled at a fixed rate. Each request consumes a token. If the bucket is empty, the request is rejected (or queued). The bucket size controls burst capacity.

Bucket capacity: 10 tokens
Refill rate: 2 tokens/second

t=0:  bucket=10  →  burst of 10 requests OK  →  bucket=0
t=1:  bucket=2   →  2 requests OK             →  bucket=0
t=2:  bucket=2   →  3 requests: 2 OK, 1 rejected
t=5:  bucket=6   →  idle period refilled the bucket

Token bucket is elegant because it naturally allows bursts (up to the bucket size) while enforcing a long-term average rate. AWS API Gateway uses token bucket. So do most cloud provider rate limiters.

Trivia: The token bucket algorithm was first described in 1986 for ATM (Asynchronous Transfer Mode) network traffic shaping. These 40-year-old algorithms are still the standard in modern gateways.

Interview Bridge: "Explain the difference between fixed window and token bucket rate limiting" is a common system design interview question. The key insight: fixed window has boundary burst problems, token bucket allows controlled bursts while enforcing average rate.


Authentication at the Gateway

This is where gateways save the most engineering time. Instead of every service validating credentials, the gateway does it once and passes identity downstream.

The Pattern

Client → Gateway → Auth Check → Backend
           ├─ Valid token?
           │   ├─ Yes → forward request + inject X-User-ID header
           │   └─ No  → return 401, never hits backend
           └─ Rate limit check (per user, not per IP)
               ├─ Under limit → forward
               └─ Over limit → return 429

The backend trusts the gateway. If a request arrives with X-User-ID: 42, the backend knows the gateway already validated the token. The backend never sees raw credentials.

JWT Validation at the Gateway

JWTs (JSON Web Tokens) are the most common gateway auth pattern. The gateway validates the signature, checks expiration, and extracts claims — without calling an external auth service.

JWT structure:    header.payload.signature
                     │       │        │
                     │       │        └─ HMAC or RSA signature
                     │       └─ {"sub": "user-42", "exp": 1735689600, "role": "admin"}
                     └─ {"alg": "RS256", "typ": "JWT"}

Kong JWT plugin config:

plugins:
  - name: jwt
    config:
      claims_to_verify:
        - exp          # reject expired tokens
      header_names:
        - Authorization
      run_on_preflight: false

OAuth2 / OIDC at the Gateway

For more complex flows (user login, refresh tokens, third-party identity providers), gateways can act as an OAuth2 resource server or even handle the full OIDC flow:

# Traefik ForwardAuth to an OAuth2 proxy
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: oauth2-auth
spec:
  forwardAuth:
    address: http://oauth2-proxy.auth.svc:4180/oauth2/auth
    trustForwardHeader: true
    authResponseHeaders:
      - X-Auth-Request-User
      - X-Auth-Request-Email
      - X-Auth-Request-Groups

API Key Authentication

Simpler than JWT but still useful for service-to-service or partner integrations:

# Kong API key plugin
plugins:
  - name: key-auth
    config:
      key_names:
        - X-API-Key
        - apikey        # query parameter fallback
      hide_credentials: true  # strip the key before forwarding to backend

Gotcha: hide_credentials: true matters. Without it, the API key is forwarded to your backend in the request headers. If your backend logs request headers (many do by default), you're logging secrets. This has caused real credential leaks.


Request and Response Transformation

Gateways can modify requests before they reach backends, and modify responses before they reach clients. This is powerful for API versioning, migration, and compatibility.

# Kong request-transformer plugin
plugins:
  - name: request-transformer
    config:
      add:
        headers:
          - "X-Request-ID:$(uuid)"
          - "X-Gateway-Version:v2"
      remove:
        headers:
          - "X-Internal-Debug"
      rename:
        headers:
          - "X-Old-Header:X-New-Header"

Use cases that come up in real life: - Header injection: Add request IDs, user identity, feature flags - Header stripping: Remove internal headers before they leak to clients - Path rewriting: /api/v2/catalog at the gateway becomes /catalog at the backend - Response filtering: Remove internal fields from API responses

Under the Hood: Request transformation happens in the gateway's processing pipeline. Kong executes plugins in phases: certificate → rewrite → access → response → log. Auth plugins run in the access phase. Transforms run in rewrite (request) and response (response body). Understanding this pipeline order matters when plugins interact — an auth plugin can't see a header that a transform plugin adds in a later phase.


Canary Releases via Gateway

A canary release sends a small percentage of traffic to a new version. If errors spike, you roll back. If metrics look good, you increase the percentage. The gateway is the natural place to do this because it already controls traffic routing.

Canary with nginx-ingress annotations

# Stable version — receives all non-canary traffic
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: catalog-stable
  namespace: production
spec:
  ingressClassName: nginx
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /v1/catalog
            pathType: Prefix
            backend:
              service:
                name: catalog-stable
                port:
                  number: 8080
---
# Canary version — receives 10% of traffic
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: catalog-canary
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "10"
spec:
  ingressClassName: nginx
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /v1/catalog
            pathType: Prefix
            backend:
              service:
                name: catalog-canary
                port:
                  number: 8080

Ramp up the canary:

# Increase to 25%
kubectl annotate ingress catalog-canary -n production \
  nginx.ingress.kubernetes.io/canary-weight="25" --overwrite

# Route a specific header to canary (for internal testing)
# Add these annotations instead of canary-weight:
#   nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
#   nginx.ingress.kubernetes.io/canary-by-header-value: "true"
# Then: curl -H "X-Canary: true" https://api.example.com/v1/catalog

Canary with Gateway API (native traffic splitting)

The Kubernetes Gateway API has canary built into the spec — no annotations needed:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: catalog-route
  namespace: production
spec:
  parentRefs:
    - name: main-gateway
  hostnames:
    - "api.example.com"
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /v1/catalog
      backendRefs:
        - name: catalog-stable
          port: 8080
          weight: 90
        - name: catalog-canary
          port: 8080
          weight: 10

That's it. No separate ingress resources, no annotations. Change the weights, apply, done. This is one of the reasons the Gateway API exists.


Circuit Breaking: Stop Hitting a Dead Service

When a backend is failing, the worst thing you can do is keep sending it traffic. Circuit breaking detects failures and stops routing to the failing backend, giving it time to recover.

Circuit states:
┌──────────┐    failures > threshold    ┌──────────┐
│  CLOSED  │ ─────────────────────────→ │   OPEN   │
│ (normal) │                            │ (reject) │
└──────────┘                            └────┬─────┘
      ↑                                      │
      │           timeout expires            │
      │         ┌──────────────┐             │
      └─────────│  HALF-OPEN   │←────────────┘
   success      │ (test probe) │
                └──────────────┘

In the closed state, all requests pass through. When failures exceed a threshold, the circuit opens — requests are immediately rejected (503) without hitting the backend. After a timeout, the circuit goes half-open: a single probe request is sent. If it succeeds, the circuit closes. If it fails, it reopens.

Envoy (used by Ambassador/Emissary and Istio) has the most sophisticated circuit breaking:

# Envoy circuit breaker via Istio DestinationRule
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: catalog-circuit-breaker
spec:
  host: catalog-service
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        h2UpgradePolicy: DEFAULT
        http1MaxPendingRequests: 50
        http2MaxRequests: 200
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50

That config says: if a backend returns five 5xx errors in a row, eject it for 30 seconds. Never eject more than 50% of backends (so you don't route all traffic to zero servers).

Remember: Circuit breaking is about protecting the system, not the failed service. Without it, a slow or failing backend causes requests to queue up at the gateway, consuming connections and memory until the gateway itself becomes the bottleneck. The circuit breaker fails fast so the gateway stays healthy.


Flashcard Check #2

Question Answer
What's the boundary burst problem in fixed window rate limiting? A client can send 2x the limit in a short period by timing requests at the window boundary.
Why is token bucket good for APIs? It allows controlled bursts (up to bucket size) while enforcing an average rate over time.
What does hide_credentials: true do in Kong's key-auth plugin? Strips the API key from headers before forwarding to the backend — prevents logging secrets.
Name the three circuit breaker states. Closed (normal), Open (rejecting), Half-Open (testing with a probe).
Why do canary releases at the gateway instead of in the deployment? The gateway already controls traffic routing, and you can shift percentages without touching the deployment.

Observability: Seeing Through the Gateway

The gateway sees every request that enters your system. That makes it the best place to collect three things:

Access Logs

Every gateway produces access logs. The key fields you need:

# nginx-ingress log format (default)
10.0.5.23 - user42 [23/Mar/2026:14:22:31 +0000] "GET /v1/catalog/items HTTP/2.0" 200 1847
  0.043 0.041 "https://web.example.com/browse" "Mozilla/5.0..." "req-id-abc123"
  │       │      │                                              │     │     │
  │       │      │                                              │     │     └─ request ID
  │       │      │                                              │     └─ upstream response time
  │       │      │                                              └─ request time (gateway total)
  │       │      └─ upstream response body size
  │       └─ HTTP status code
  └─ client IP

The difference between request_time and upstream_response_time tells you how much latency the gateway itself adds. If request_time is 200ms and upstream is 180ms, the gateway added 20ms. If upstream is 5ms and request_time is 200ms, something in the gateway pipeline (auth check? rate limit lookup? TLS handshake?) is slow.

Metrics (Prometheus)

Most gateways expose Prometheus metrics natively. The four golden signals at the gateway:

# Request rate by service and status code
nginx_ingress_controller_requests{service="catalog-stable",status="200"} 45231
nginx_ingress_controller_requests{service="catalog-stable",status="502"} 3

# Latency histogram
nginx_ingress_controller_request_duration_seconds_bucket{service="catalog-stable",le="0.1"} 42000
nginx_ingress_controller_request_duration_seconds_bucket{service="catalog-stable",le="0.5"} 44800
nginx_ingress_controller_request_duration_seconds_bucket{service="catalog-stable",le="1.0"} 45100

# Active connections
nginx_ingress_controller_nginx_process_connections{state="active"} 847

# Bytes transferred
nginx_ingress_controller_bytes_sent{service="catalog-stable"} 89234567

Distributed Tracing

The gateway is where trace context begins. It generates a trace ID, injects it as a header, and every downstream service propagates it:

Gateway generates: X-Request-ID: abc-123, traceparent: 00-traceid-spanid-01
    ├─→ catalog-service (reads traceparent, creates child span)
    │       │
    │       └─→ inventory-service (child span of catalog)
    └─→ auth-service (separate child span from gateway)

Kong, Traefik, and Envoy all support OpenTelemetry trace propagation. The gateway span becomes the root of every trace, giving you end-to-end latency from the client's perspective.


Kubernetes Ingress vs. Gateway API

If you're running on Kubernetes, you have two ways to define gateway routing: the old Ingress resource and the newer Gateway API. This matters because they have fundamentally different design philosophies.

Ingress: The Original (2015)

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: catalog-ingress
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"          # nginx-specific
    nginx.ingress.kubernetes.io/canary-weight: "10"     # nginx-specific
    nginx.ingress.kubernetes.io/limit-rps: "50"         # nginx-specific
spec:
  ingressClassName: nginx
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /v1/catalog
            pathType: Prefix
            backend:
              service:
                name: catalog-service
                port:
                  number: 8080

The problem: the spec part is portable across controllers. The annotations part is not. Every controller has its own annotation namespace, its own syntax, its own quirks. Switching from nginx-ingress to Traefik means rewriting every annotation.

Gotcha: Ingress annotations are stringly-typed and fail silently. A typo like nginx.ingress.kubernetes.io/rewrit-target (missing an 'e') is ignored without error. Use kubectl describe ingress and check for annotation validation warnings. Better yet, check the generated nginx config inside the controller pod to confirm your annotation took effect.

Gateway API: The Successor (2023 GA)

# Cluster operator creates the Gateway (infra concern)
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: main-gateway
  namespace: gateway-infra
spec:
  gatewayClassName: traefik    # or kong, nginx, istio, cilium...
  listeners:
    - name: https
      protocol: HTTPS
      port: 443
      tls:
        mode: Terminate
        certificateRefs:
          - name: wildcard-tls
      allowedRoutes:
        namespaces:
          from: All
---
# App team creates the HTTPRoute (app concern)
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: catalog-route
  namespace: production
spec:
  parentRefs:
    - name: main-gateway
      namespace: gateway-infra
  hostnames:
    - "api.example.com"
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /v1/catalog
        - headers:
            - name: X-API-Version
              value: "2"
      filters:
        - type: RequestHeaderModifier
          requestHeaderModifier:
            add:
              - name: X-Routed-By
                value: gateway-api
      backendRefs:
        - name: catalog-stable
          port: 8080
          weight: 90
        - name: catalog-canary
          port: 8080
          weight: 10

The key difference: role separation. The Gateway resource is created by the platform team (they control TLS, ports, allowed namespaces). The HTTPRoute is created by the app team (they control paths, headers, backends, weights). Neither team needs to touch the other's resources.

Trivia: The Kubernetes Gateway API was proposed in 2019 and took four years to reach GA (v1.0, October 2023). The long timeline reflected the difficulty of designing an API that could satisfy dozens of gateway implementations while remaining simple enough to be useful. Both Ingress and Gateway API coexist today, but new features are only added to Gateway API.

Capability Ingress Gateway API
Traffic splitting (canary) Annotations (controller-specific) Native weight field
Header-based routing Annotations (if supported) Native headers match
Request transforms Annotations (if supported) Native filters
Role separation No Gateway (infra) vs. HTTPRoute (app)
Cross-namespace routing No Yes, with ReferenceGrants
Portable across controllers Spec yes, annotations no Fully portable

War Story: When the Gateway Becomes the Bottleneck

War Story: A team ran a single Kong gateway pod in Kubernetes with default resource limits (256MB RAM, 250m CPU). Their platform handled 200 req/s comfortably. Then they added the response-ratelimiting plugin with Redis lookups and the jwt plugin, which needed to fetch JWKS keys. Under load, each request now required two network round-trips in the gateway pipeline. At 500 req/s during a product launch, the Kong pod's latency jumped from 5ms to 800ms. Request queues backed up. The gateway started returning 504 timeouts. The backend services were fine — p99 latency under 50ms — but users saw 5-second page loads because every request waited in the gateway queue. The fix: (1) bump Kong to 3 replicas with 1GB RAM and 1 CPU each, (2) cache JWT validation results for 60 seconds, (3) move rate limit checks to a local in-memory counter with periodic Redis sync instead of per-request Redis calls. Gateway latency dropped back to 8ms.

The lesson: the gateway is on the critical path of every request. A slow plugin, an undersized pod, or a network call in the hot path turns your gateway into the bottleneck. Monitor gateway latency separately from backend latency. If request_time - upstream_time is growing, the gateway is the problem.

How to avoid this:

# Monitor gateway pod resources
kubectl top pods -n kong-system

# Check gateway-specific latency (not just total request time)
# In Prometheus:
#   histogram_quantile(0.99, rate(kong_latency_bucket{type="kong"}[5m]))
# This shows gateway processing time, excluding upstream

# Run at least 2-3 gateway replicas with anti-affinity
# Set a PodDisruptionBudget so drains don't take all replicas down
# Gateway pod disruption budget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: kong-pdb
  namespace: kong-system
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: kong-gateway

Gateway Anti-Patterns

Knowing what NOT to do is half the battle.

Anti-pattern #1: Business Logic in the Gateway

The gateway should enforce policy (auth, rate limits, routing). It should not contain business logic. If your gateway plugin is checking inventory levels, calculating prices, or validating business rules, you've turned the gateway into a monolith at the edge.

The test: if a developer who knows nothing about your business domain can understand every gateway rule, you're fine. If they need to understand your product catalog to configure the gateway, business logic has leaked in.

Anti-pattern #2: Single Point of Failure

One gateway replica. No PDB. No HPA. A node drain takes it down and every service goes dark.

# The fix: always
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: gateway-hpa
  namespace: gateway-system
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: gateway
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60

Anti-pattern #3: No Connection Draining During Deploys

You roll out a new gateway version. Active connections get dropped. Users get 502s for 15 seconds.

# Fix: preStop hook gives time for connection draining
spec:
  template:
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: gateway
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 15"]

Anti-pattern #4: Rate Limiting by Source IP Behind a CDN

All traffic arrives from the CDN's IP range. Your per-IP rate limit applies to the CDN, not the user. One abusive user exhausts the limit for everyone.

# Fix: configure the gateway to trust X-Forwarded-For from known proxies
# Kong: use the real-ip plugin
plugins:
  - name: real-ip
    config:
      header: X-Forwarded-For
      trusted_ips:
        - 173.245.48.0/20    # Cloudflare
        - 103.21.244.0/22    # Cloudflare
        - 10.0.0.0/8         # Internal

Gotcha: Setting trusted_ips too wide lets attackers spoof their source IP by injecting a fake X-Forwarded-For header. Only trust CIDR ranges you control (your CDN, your load balancers, your internal network).


Flashcard Check #3

Question Answer
What's the key design difference between Ingress and Gateway API? Gateway API separates infrastructure (Gateway) from application routing (HTTPRoute), enabling role-based ownership.
Why should the gateway not contain business logic? It becomes a monolith at the edge. Gateway config should be domain-agnostic (auth, routing, rate limits).
What's the risk of rate limiting by IP behind a CDN? All users share the CDN's IP. One abusive user exhausts the limit for everyone.
Why add a preStop sleep to gateway pods? Gives the load balancer time to stop sending traffic before the pod terminates, preventing 502s during deploys.
What does gateway request_time - upstream_time tell you? How much latency the gateway itself adds. If this grows, the gateway pipeline (plugins, auth, TLS) is the bottleneck.

Exercises

Exercise 1: Read a Gateway Config (5 minutes)

Here's a Kong declarative config. Answer the questions below.

services:
  - name: orders-api
    url: http://orders.production.svc:8080
    routes:
      - name: orders-route
        paths: ["/api/v1/orders"]
        strip_path: true
    plugins:
      - name: rate-limiting
        config:
          minute: 200
          policy: local
      - name: jwt
      - name: request-transformer
        config:
          add:
            headers: ["X-Gateway: kong-prod"]
  1. A request comes in to /api/v1/orders/123. What path does the backend see? (Hint: strip_path)
  2. The rate limit policy is local. What happens if Kong has 3 replicas?
  3. Is the JWT plugin configured to check token expiration?
Answers 1. `/123` — `strip_path: true` removes the matched path prefix `/api/v1/orders` 2. Each replica tracks its own counter. A client gets 200/min per replica, so effectively 600/min total. Use `policy: redis` for a shared counter. 3. No — the JWT plugin is configured with no `claims_to_verify`. Expired tokens will be accepted as valid. Add `claims_to_verify: [exp]`.

Exercise 2: Design a Gateway Strategy (15 minutes)

You're deploying a new API for partner integrations. Requirements: - Partners authenticate with API keys - Each partner gets a different rate limit (free tier: 100/min, paid: 1000/min) - You want to canary new versions with 5% traffic - You need to strip the X-Internal-Debug header from responses

Sketch the gateway configuration (any gateway, any format). What resources do you need? What plugins/middlewares/filters? Where do rate limit tiers get configured?

Discussion points - API keys map to consumers/groups in Kong (or equivalent in your gateway) - Rate limit tiers are configured per consumer group, not per route - Canary: two backend services with weighted routing (Gateway API) or canary annotation (nginx-ingress) - Response transform to strip headers (Kong: `response-transformer` plugin with `remove.headers`) - Consider: where do you store API keys? Gateway database? External secret store?

Exercise 3: Diagnose the Bottleneck (10 minutes)

Your gateway metrics show:

request_time_p99:    1.2s
upstream_time_p99:   0.08s
active_connections:  4,847
gateway_cpu:         92%
gateway_memory:      1.8GB / 2GB

  1. Where is the bottleneck?
  2. What's your immediate fix?
  3. What's your long-term fix?
Answer 1. The gateway. `request_time - upstream_time = 1.12s` of gateway processing. Backends are fast (80ms p99). CPU at 92% and memory near limit confirms gateway saturation. 2. Immediate: scale gateway replicas horizontally (`kubectl scale` or HPA). Increase memory limit. 3. Long-term: profile which plugins are expensive. Check for per-request network calls (Redis, external auth). Add caching. Consider whether all plugins are needed on all routes.

Cheat Sheet

What Command / Config Notes
List all ingress resources kubectl get ingress -A Check ADDRESS column for external IP
Check ingress controller logs kubectl logs -n <ns> deploy/<controller> --tail=100 Filter for 502, upstream, error
Verify backend endpoints kubectl get endpoints <svc> -n <ns> <none> = no healthy pods
Check generated nginx config kubectl exec -n ingress-nginx deploy/ingress-nginx-controller -- cat /etc/nginx/nginx.conf Source of truth for what nginx is doing
Test with specific Host header curl -H "Host: api.example.com" http://<INGRESS_IP>/path Bypasses DNS for testing
Check TLS cert from gateway openssl s_client -connect api.example.com:443 -servername api.example.com Verify cert is correct and not expired
Kong declarative config reload kong reload or deck sync deck diff to preview changes
Traefik router list kubectl exec -n traefik deploy/traefik -- wget -qO- http://localhost:8080/api/http/routers Traefik dashboard API
Gateway API routes kubectl get httproutes -A Check parentRefs and backendRefs
Token bucket capacity burst_size / refill_rate = seconds of burst Tune burst for expected traffic spikes

Rate Limiting Algorithm Quick Reference:

Algorithm Burst handling Memory Accuracy Used by
Fixed window Poor (boundary burst) Low Approximate Simple impls
Sliding window Good Medium Good nginx limit_req, Kong
Token bucket Excellent (configurable) Low Good AWS API GW, Envoy

Takeaways

  • Gateways centralize cross-cutting concerns — auth, rate limiting, transforms, observability — so individual services don't each implement them (badly).
  • Gateway vs. LB vs. reverse proxy is a spectrum of responsibility, not three different products. Most gateway software does all three.
  • Rate limiting algorithm choice matters. Token bucket allows bursts; fixed window has boundary problems; sliding window is the common middle ground.
  • The gateway is on the critical path. Monitor it separately. An undersized gateway with expensive plugins becomes the bottleneck faster than you'd expect.
  • Gateway API is replacing Ingress in Kubernetes. Its role separation (infra team owns Gateway, app team owns HTTPRoute) solves real organizational problems.
  • Never put business logic in the gateway. Auth, routing, rate limits: yes. Inventory checks, pricing calculations: no.