Skip to content

Envoy: The Proxy That's Everywhere

  • lesson
  • envoy-architecture
  • xds-apis
  • service-mesh-data-plane
  • l7-proxying
  • circuit-breaking
  • observability
  • istio
  • load-balancing
  • wasm-extensibility ---# Envoy — The Proxy That's Everywhere

Topics: Envoy architecture, xDS APIs, service mesh data plane, L7 proxying, circuit breaking, observability, Istio, load balancing, WASM extensibility Level: L1–L2 (Foundations → Operations) Time: 75–90 minutes Prerequisites: None (networking and Kubernetes concepts explained as we go)


The Mission

You're the new platform engineer on a team running 80 microservices in Kubernetes. One of them — the checkout service — is throwing intermittent 503 errors. Your team's Grafana dashboard shows the error rate spiking every few minutes, but the checkout application logs show nothing. No stack traces. No errors. The app thinks everything is fine.

Your tech lead says: "Check the Envoy sidecar."

You stare at your terminal. You've heard the word "Envoy" in meetings. You know it's "the proxy." But you have no idea what it actually does, how to read its logs, or why it would be returning 503s that the application doesn't know about.

This lesson traces a single HTTP request from a client, through an Envoy sidecar, to an upstream service, and back — building up every piece of Envoy's architecture along the way. By the end, you'll understand why Envoy exists, how it works, how to read its admin interface, and how to diagnose those 503s.


Why Envoy Exists (The L7 Proxy Gap)

Before Envoy, the proxy landscape looked like this:

Proxy Strength Limitation
HAProxy Blazing fast L4/L7, battle-tested Config file reload for changes, limited protocol support beyond HTTP
Nginx HTTP serving + reverse proxy, huge ecosystem Config reload for changes, extension requires C modules or Lua
F5 / hardware LBs Enterprise features, SSL offloading Expensive, proprietary, slow to change

These tools worked for years. But around 2015, companies like Lyft were decomposing monoliths into hundreds of microservices, and three problems appeared that existing proxies couldn't solve:

  1. Dynamic configuration. Services scale up and down constantly. Reloading a config file every time a pod appears or disappears doesn't work at scale.
  2. Deep protocol understanding. L4 proxies see TCP connections. They can't inspect HTTP headers, apply per-route retry policies, or emit per-service metrics. Nginx could do some of this, but extending it required C modules.
  3. Universal observability. Every service needed metrics, access logs, and distributed tracing — and every team was solving it differently (or not at all).

Name Origin: Envoy was created by Matt Klein at Lyft in 2015–2016. The name "Envoy" means a messenger or diplomatic representative — fitting for a proxy that acts as an intermediary for service-to-service communication. Klein wrote it in C++ specifically because Lyft's services were a mix of languages (Python, Go, Java) and he needed the proxy to be language-agnostic. The initial open-source release was September 2016. Envoy joined the CNCF as an incubating project in September 2017 and graduated in November 2018 — one of the fastest CNCF graduations at the time.

Trivia: The choice of C++ over Go or Rust was deliberate: C++ gives deterministic memory management without garbage collector pauses. In a proxy handling millions of requests per second, even a few milliseconds of GC pause causes visible latency spikes. Envoy uses tcmalloc by default and explicitly manages object lifetimes. This makes Envoy harder to contribute to but more predictable in production.


Envoy Architecture: The Four Primitives

Envoy models all traffic through four hierarchical building blocks. Learn these and you understand 80% of Envoy's configuration.

Downstream client
  └── Listener (0.0.0.0:8080)
        └── Filter chain
              ├── Network filters (TCP proxy / HTTP connection manager)
              └── HTTP filters (rate limit, JWT auth, WASM, router)
                    └── Route match → Cluster → Endpoint
                                                  └── Upstream service

Listeners

A listener is a network address where Envoy accepts connections. Think of it as Envoy opening a door at a specific IP and port.

listeners:
- name: http_listener
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 8080

Each listener has one or more filter chains — the pipeline that processes every connection arriving at that door.

Filter Chains

A filter chain is an ordered list of filters that process a connection. Network filters handle TCP-level concerns. HTTP filters handle request-level concerns.

The most important HTTP filter is the router — it's always the last filter in the chain. The router looks at the route table, finds the matching cluster, and sends the request upstream.

Other filters run before the router: rate limiting, JWT authentication, ext_authz (external authorization), WASM custom logic. Order matters — filters execute left to right.

Routes

Routes match incoming requests to upstream clusters. They match on path prefix, exact path, regex, headers, or query parameters. Routes are evaluated in order — first match wins.

Gotcha: First-match-wins is a silent footgun. If you put a catch-all prefix route (/) before a more specific route (/api/admin/), the specific route is dead code. No error, no warning — traffic just goes to the wrong place. Always order routes from most specific to least specific.

Clusters

A cluster is a logical group of upstream services. It holds the load-balancing algorithm, circuit breaker thresholds, health check configuration, and TLS settings. A cluster named checkout-v2 might have three pods behind it.

Endpoints

Endpoints are the actual IP:port pairs inside a cluster. They can be set statically or discovered dynamically via EDS (Endpoint Discovery Service).

Putting It Together: A Static Config Example

Here's a minimal but complete Envoy static configuration. This is the kind of file you'd use for a standalone Envoy (not managed by Istio):

static_resources:
  listeners:
  - name: main_listener
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 10000
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          route_config:
            name: local_routes
            virtual_hosts:
            - name: backend
              domains: ["*"]
              routes:
              - match:
                  prefix: "/api/checkout"
                route:
                  cluster: checkout_service
                  timeout: 15s
                  retry_policy:
                    retry_on: "5xx,connect-failure"
                    num_retries: 2
                    per_try_timeout: 5s
              - match:
                  prefix: "/"
                route:
                  cluster: default_service
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
          access_log:
          - name: envoy.access_loggers.stdout
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog

  clusters:
  - name: checkout_service
    type: STRICT_DNS
    connect_timeout: 5s
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: checkout_service
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: checkout.default.svc.cluster.local
                port_value: 8080
    circuit_breakers:
      thresholds:
      - max_connections: 2048
        max_pending_requests: 512
        max_requests: 2048
        max_retries: 3

  - name: default_service
    type: STRICT_DNS
    connect_timeout: 5s
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: default_service
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: frontend.default.svc.cluster.local
                port_value: 3000

Let's trace a request through this config:

  1. A request arrives at 0.0.0.0:10000 → the main_listener accepts it
  2. The HTTP connection manager filter chain processes the connection
  3. The router filter checks the route table: /api/checkout/order matches prefix /api/checkout → routes to checkout_service cluster
  4. The cluster has one endpoint: checkout.default.svc.cluster.local:8080
  5. Envoy opens a connection to that endpoint (or reuses a pooled one) and forwards the request
  6. The response flows back through the filter chain to the client

That retry policy on the route? If the upstream returns a 5xx or the connection fails, Envoy automatically retries up to 2 times, with each attempt capped at 5 seconds. The application never knows.


Flashcard Check #1

Cover the answers and test yourself:

Question Answer
What are Envoy's four main architectural primitives? Listeners, filter chains (with routes), clusters, and endpoints
What does "first match wins" mean in Envoy routing? Routes are evaluated in order; the first matching route handles the request
Why does an Envoy cluster hold circuit breaker thresholds? Circuit breaking is per-upstream-group, not per-route — the cluster is the unit of upstream traffic management
What is the last HTTP filter in every Envoy filter chain? The router filter — it looks up the route and sends the request to the matched cluster

The xDS Revolution: Dynamic Configuration

The static config above works for a standalone proxy. But in a Kubernetes cluster with services scaling up and down every minute, you can't hand-edit YAML every time a pod appears.

This is where xDS changes everything. xDS is a family of gRPC-based APIs that let a management plane push configuration to Envoy dynamically — no restart, no config reload, no connection drops.

API Full Name What It Manages
LDS Listener Discovery Service Listeners and filter chains
RDS Route Discovery Service Route configurations
CDS Cluster Discovery Service Upstream cluster definitions
EDS Endpoint Discovery Service Cluster member IP:port pairs
SDS Secret Discovery Service TLS certificates and keys
ADS Aggregated Discovery Service All of the above on one ordered stream

Remember: The xDS family mnemonic: LRCESA — Listeners, Routes, Clusters, Endpoints, Secrets, Aggregated. Data flows top-down: listeners contain routes, routes point to clusters, clusters resolve to endpoints. SDS handles certs separately. ADS wraps everything into a single ordered stream.

Why ADS Matters

With separate xDS streams, a race condition exists: CDS might deliver a new cluster definition before EDS delivers its endpoints. For a brief moment, Envoy knows the cluster exists but has nowhere to send traffic — resulting in a 503 NR (no route). ADS delivers all resource types on one ordered stream so Envoy applies updates atomically.

Mental Model: Think of xDS like a podcast with separate episode feeds for different topics. If you subscribe to each feed independently, episodes arrive out of order — you might hear about a character before their introduction episode drops. ADS is a single master feed that guarantees correct episode order.

Why This Is Revolutionary

Before xDS, dynamic proxy configuration meant either: - Template a config file, reload the proxy (risk of dropped connections) - Use DNS-based discovery (slow TTL propagation, no per-route config) - Build custom API integrations per proxy vendor

xDS is vendor-neutral. Today, not just Envoy but also gRPC's built-in load balancing, Cilium, and other data plane projects implement xDS. A control plane that speaks xDS can manage multiple proxy implementations. This turned Envoy from "Lyft's proxy" into the universal data plane of the cloud-native ecosystem.

Trivia: When Lyft open-sourced Envoy in 2016, xDS was Envoy-specific. As adoption grew, the CNCF formalized xDS v3 as an independent vendor-neutral API standard. It's now one of the most widely implemented infrastructure APIs in the cloud-native world.


Envoy as a Sidecar: The Istio Data Plane

In a service mesh like Istio, you don't write Envoy config at all. Instead:

  1. You label a Kubernetes namespace with istio-injection=enabled
  2. A mutating admission webhook injects an Envoy sidecar container into every new pod
  3. An init container sets up iptables rules to redirect all inbound and outbound traffic through the sidecar
  4. istiod (the Istio control plane) watches Kubernetes Services, Endpoints, and Istio CRDs (VirtualService, DestinationRule, etc.)
  5. istiod translates all of this into Envoy xDS configuration and pushes it to every sidecar over a persistent gRPC stream
Developer applies VirtualService
Kubernetes API Server
istiod (watches, converts to xDS)
       ↓  ↓  ↓  (gRPC streams to every sidecar)
  ┌────┴──┴──┴────────────────────┐
  │  Pod A          Pod B         │
  │  ┌─────────┐   ┌─────────┐   │
  │  │ app     │   │ app     │   │
  │  │ envoy ←─│───│→ envoy  │   │
  │  └─────────┘   └─────────┘   │
  └───────────────────────────────┘

Name Origin: istiod consolidated three former Istio components in Istio 1.5 (2020): Pilot (xDS config distribution), Citadel (certificate and identity management), and Galley (config validation). The name "Istio" is Greek for "sail" — continuing the Kubernetes nautical naming theme.

Every major service mesh uses Envoy as its data plane: Istio, AWS App Mesh, Consul Connect, Kuma, and Gloo Mesh. The control planes differ wildly. The traffic-handling binary is the same Envoy executable. When you debug a service mesh in production, you are always reading Envoy stats and logs — regardless of which control plane brand is on the slide deck.


Envoy as an Ingress Gateway

The same Envoy binary also runs at the cluster edge. In Istio, the IngressGateway is a standalone Envoy deployment (not a sidecar) that handles external traffic:

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: app-gateway
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 443
      name: https
      protocol: HTTPS
    tls:
      mode: SIMPLE
      credentialName: app-tls-cert
    hosts:
    - app.example.com

This tells the ingress gateway Envoy to listen on port 443, terminate TLS using the app-tls-cert Kubernetes Secret, and accept requests for app.example.com. A paired VirtualService then routes those requests to internal services.

Key operational difference: A sidecar failure affects one pod. An ingress gateway failure affects all external traffic. Gateway deployments need HPA (autoscaling), pod disruption budgets, and careful rolling update strategy. Contour, Ambassador, and Gloo are all Envoy-based ingress controllers that build on this same pattern.

Under the Hood: Envoy supports hot restart: a new process takes over listening sockets from the old one without dropping connections. The old process passes file descriptors via Unix domain sockets using the SCM_RIGHTS mechanism in sendmsg(). The new process accepts new connections while the old process drains in-flight requests. This is why Envoy can upgrade without a maintenance window — a feature Klein designed as first-class at Lyft because their previous proxy required downtime windows for upgrades.


The Admin Interface: Your Debugging Swiss Army Knife

Every Envoy instance exposes a local admin interface. Default port is 9901; in Istio sidecars it's 15000.

# Full running config (listeners, clusters, routes, endpoints)
curl -s localhost:15000/config_dump | python3 -m json.tool | less

# All clusters and their endpoint health
curl -s "localhost:15000/clusters?format=json" | python3 -m json.tool

# Active listeners
curl -s localhost:15000/listeners

# All stats (counters, gauges, histograms)
curl -s localhost:15000/stats

# Prometheus-format stats (for scraping)
curl -s "localhost:15000/stats?format=prometheus"

# Is Envoy ready to serve traffic?
curl -s localhost:15000/ready

# Current log levels
curl -s localhost:15000/logging

# Temporarily set connection-level logging to debug
curl -X POST "localhost:15000/logging?connection=debug"

# Drain inbound listeners (used during graceful shutdown)
curl -X POST "localhost:15000/drain_listeners?inboundonly"

Gotcha: The admin interface exposes config_dump, which can contain secrets (TLS private keys in static configs). It also accepts POST requests that modify runtime behavior (log levels, drain commands). Never expose the admin port beyond localhost.

Remember: Istio sidecars use port 15000, not Envoy's default 9901. If curl localhost:9901 returns nothing in an Istio pod, try 15000.


Access Logs and Response Flags

Envoy's access logs are the single fastest diagnostic tool for network issues in a mesh. Each log entry includes response flags — short codes that tell you exactly why a request failed.

Flag Meaning What to check
UF Upstream connection failure Pod crashed? Network policy? Wrong port?
UO Upstream overflow (circuit breaker) upstream_rq_pending_overflow stat, raise thresholds
NR No route found Missing route, wrong Host header, VirtualService misconfiguration
URX Upstream retry exhausted All retry attempts failed
UT Upstream request timeout Upstream too slow, or timeout too tight
RL Rate limited Rate limit policy triggered
DC Downstream connection terminated Client closed before response (usually not your bug)
LH Local health check failed Envoy health check misconfigured
# Count 503s by response flag
grep " 503 " /var/log/envoy/access.log \
  | awk '{print $NF}' \
  | sort | uniq -c | sort -rn

# In an Istio sidecar (JSON logs)
kubectl logs deploy/checkout -c istio-proxy \
  | python3 -c "
import sys, json
for line in sys.stdin:
    try:
        r = json.loads(line)
        if r.get('response_code') == 503:
            print(r.get('response_flags'), r.get('upstream_cluster'), r.get('path'))
    except:
        pass
"

Remember: The three most common 503 response flags: UF, UO, NR. Mnemonic: "Upstream Failed, Upstream Overflowed, No Route." When you see a 503, the response flag immediately tells you the failure category without reading a single line of application logs.


Flashcard Check #2

Question Answer
What port does the Envoy admin interface use in Istio sidecars? 15000 (not the default 9901)
What does the response flag UO mean? Upstream overflow — the circuit breaker threshold was exceeded
What is the fastest way to root-cause Envoy 503 errors? Check the response flag in access logs — it categorizes the failure immediately
What xDS API delivers TLS certificates to Envoy dynamically? SDS (Secret Discovery Service)

Circuit Breaking: Why the App Doesn't See the Error

Back to our mission. The checkout service logs show nothing because Envoy is rejecting requests before they reach the application. Here's how.

Envoy implements circuit breaking at the cluster level with four thresholds:

Threshold What it limits Default
max_connections Active TCP connections to upstream 1024
max_pending_requests Requests queued waiting for a connection 1024
max_requests Active requests in flight 1024
max_retries Concurrent retry attempts 3

When any threshold is exceeded, Envoy returns 503 with response flag UO immediately. The request never reaches the upstream pod. That's why the application logs are clean — it never knew the request existed.

The defaults look generous. They're not. In a mesh where 40 sidecars fan out to the same checkout service, the aggregate concurrent requests from all callers can easily exceed 1024 per-sidecar during a traffic spike. The fix is profiling your actual traffic:

# Check current active connections to a cluster
curl -s localhost:15000/stats | grep "cluster.checkout_service.upstream_cx_active"

# Check pending requests
curl -s localhost:15000/stats | grep "cluster.checkout_service.upstream_rq_pending_active"

# Check if circuit breaker has tripped (any non-zero = you're shedding traffic)
curl -s localhost:15000/stats | grep upstream_rq_pending_overflow

Tuning formula: - max_connections = observed P99 active connections x 2 - max_pending_requests = observed P99 pending x 1.5 (intentionally tight to shed early) - max_requests = observed P99 concurrent requests x 2 - max_retries = max_requests x 0.15

Under the Hood: Circuit breakers are per-cluster, not per-route. If two routes share the same upstream cluster, they share the same breaker budget. A traffic spike on one route trips the breaker and starves the other. Split critical routes into separate clusters if they need independent protection.

Outlier Detection: The Passive Companion

While circuit breaking caps total load, outlier detection ejects individual misbehaving endpoints from the pool. If one pod in the checkout cluster starts returning 5xx errors, Envoy ejects it for a configurable period — the remaining healthy pods absorb the traffic.

outlier_detection:
  consecutive_5xx: 10             # eject after 10 consecutive 5xx
  interval: 30s                   # evaluation window
  base_ejection_time: 30s         # minimum ejection duration
  max_ejection_percent: 50        # NEVER eject more than half the pool
  enforcing_consecutive_5xx: 100  # 100% enforcement (0 = observe only)

Gotcha: With max_ejection_percent: 100 (the default), a rolling restart that briefly returns 5xx can trigger cascading ejections until the entire pool is empty. Every request becomes a 503. Always set max_ejection_percent: 50 — never eject more than half.


Retry Policies: The Double-Edged Sword

Envoy's retry logic is powerful and dangerous. A well-configured retry policy masks transient failures. A poorly configured one creates a retry storm that makes a bad situation worse.

Three timeout boundaries you must understand:

┌─────────────────────────────── route.timeout (30s) ───────────────────────────────┐
│                                                                                    │
│  ┌── per_try_timeout (10s) ──┐  ┌── per_try_timeout (10s) ──┐  ┌── per_try ──┐  │
│  │   attempt 1               │  │   attempt 2 (retry)       │  │  attempt 3   │  │
│  └───────────────────────────┘  └───────────────────────────┘  └─────────────┘  │
│                                                                                    │
└────────────────────────────────────────────────────────────────────────────────────┘

The safe formula: per_try_timeout = route_timeout / (num_retries + 1)

War Story: A team configured retry_on: 5xx with num_retries: 3 but forgot to set per_try_timeout. When a downstream database slowed down, the checkout service started returning 503s after 25 seconds. Each Envoy sidecar retried 3 times, each attempt inheriting the full 30-second route timeout. One slow request consumed 90+ seconds of upstream capacity. Multiply by hundreds of concurrent requests: the retry amplification tripled upstream load, which made the database slower, which triggered more retries. The team had to disable retries entirely to break the feedback loop. The fix was two lines of config — adding per_try_timeout: 8s and retry_host_predicate: previous_hosts (to avoid retrying the same broken pod).

Remember: Retry amplification is multiplicative across hops. If Service A retries 3x to Service B, which retries 3x to Service C, a single failure at C generates up to 9 requests. In deep call chains, retry budgets must decrease at each hop.


Rate Limiting and Health Checking

Rate Limiting

Envoy supports two modes:

  • Local rate limiting — per-Envoy-instance token bucket. Simple, no external dependencies, but each sidecar enforces independently. 100 sidecars with a 10 rps local limit = 1000 rps aggregate.
  • Global rate limiting — calls an external gRPC rate limit service (like Lyft's ratelimit). Shared state across all instances. Accurate aggregate limits but adds a network hop per request.

Health Checking

Envoy supports both active and passive health checks:

  • Active: Envoy probes endpoints on a schedule (HTTP GET to /health, TCP connect, or gRPC health). Failed probes mark the endpoint unhealthy.
  • Passive: Outlier detection (described above) monitors real traffic patterns. No probe overhead, but failures are only detected after users are affected.

Best practice: use both. Active health checks catch down endpoints before traffic hits them. Outlier detection catches endpoints that are up but misbehaving (returning errors, timing out).


Observability: Metrics, Tracing, and Logs

Envoy emits rich telemetry without any application code changes. This is one of its most valuable properties.

Built-in Prometheus Metrics

Envoy exposes thousands of counters, gauges, and histograms. Key ones to monitor:

# Scrape in Prometheus format
curl -s "localhost:15000/stats?format=prometheus" | head -50

# Key stats to watch:
# upstream_rq_total          — total requests to a cluster
# upstream_rq_5xx            — 5xx responses from upstream
# upstream_rq_retry          — retry count (should be near zero normally)
# upstream_rq_pending_overflow — circuit breaker trips (ANY non-zero = shedding traffic)
# upstream_cx_connect_fail   — failed TCP connections
# upstream_rq_timeout        — request timeouts
# outlier_detection.ejections_active — currently ejected endpoints

Distributed Tracing

Envoy generates trace spans for every request and propagates context headers. Supported formats: B3 (Zipkin), Jaeger, AWS X-Ray, and OpenTelemetry.

The critical detail: Envoy propagates headers between hops, but your application must forward them when making downstream calls. If your checkout service calls a payment service, it must copy the x-b3-traceid, x-b3-spanid, x-b3-parentspanid, and x-b3-sampled headers from the incoming request to the outgoing one. If it doesn't, the trace breaks at that hop.

Gotcha: Envoy can't trace through your application logic automatically. It injects headers on ingress, but if your app doesn't forward them, traces break. This is the #1 reason teams get incomplete distributed traces in an Istio mesh.


The ext_authz Filter: External Authorization

The ext_authz filter calls an external gRPC or HTTP service to make authorization decisions for every request. The external service sees the request headers (and optionally the body) and returns allow or deny.

Client → Envoy → ext_authz service → "allow" → upstream service
                                    → "deny"  → 403 Forbidden

This is how Envoy integrates with external policy engines (OPA/Gatekeeper, custom auth services, API key validation). The filter runs before the router, so denied requests never reach the upstream.

Use cases: JWT validation, API key checks, RBAC enforcement, request signing verification. The key advantage over application-level auth: it's enforced uniformly across all services without each team implementing their own middleware.


Envoy vs Nginx vs HAProxy

All three are excellent production proxies. The difference is what they were designed for:

Capability Envoy Nginx HAProxy
Primary design Dynamic service mesh proxy Web server + reverse proxy High-performance TCP/HTTP LB
Config changes xDS API (no restart) Config reload (brief hiccup) Config reload or runtime API
Protocol support HTTP/1.1, HTTP/2, gRPC, WebSocket, MongoDB, Redis, Thrift HTTP/1.1, HTTP/2, gRPC (via module), WebSocket HTTP/1.1, HTTP/2, TCP, WebSocket
Observability Built-in: per-route metrics, distributed tracing, access logs Basic: access logs, stub_status, plus_status Built-in: stats page, CSV stats, Prometheus exporter
Extensibility WASM filters (sandboxed, hot-reloadable) Lua, C modules, njs (JavaScript) Lua, SPOE (external process)
Circuit breaking Native, per-cluster Via third-party modules Via maxconn + stick tables
Service mesh data plane Universal (Istio, Consul, App Mesh) Not designed for this Not designed for this
Performance (raw throughput) High (C++) Very high (C) Highest (C, optimized for this)

Mental Model: HAProxy is a racing car — raw performance, does one thing brilliantly. Nginx is an SUV — does many things well, huge ecosystem. Envoy is a self-driving car — designed from scratch for a world where routes change every second and you need the car to report its own telemetry.

When to use each: - Envoy: Service mesh, dynamic environments, Kubernetes, gRPC-heavy architectures - Nginx: Traditional web serving, static sites, simpler reverse proxy setups - HAProxy: Maximum throughput L4/L7 balancing, database connection pooling, latency-critical paths


Flashcard Check #3

Question Answer
What is the safe formula for per_try_timeout? route_timeout / (num_retries + 1)
Why do Envoy's default circuit breaker thresholds often cause problems? Defaults (1024 each) are per-sidecar; aggregate load from many callers in a mesh easily exceeds them
What does ADS solve that separate xDS streams don't? Ordering — it prevents the race condition where a cluster is delivered before its endpoints
What must applications do for distributed tracing to work through an Istio mesh? Forward trace context headers (x-b3-traceid, etc.) between incoming and outgoing requests
Why is max_ejection_percent: 100 dangerous? During a rolling restart, cascading ejections can empty the entire pool

WASM Filters: Custom Logic Without Rebuilding Envoy

WebAssembly (WASM) filters run custom code inside the Envoy process in a sandboxed VM. They can inspect/modify headers and bodies, emit custom metrics, and call external services.

Key properties: - Hot-reloadable: loaded and unloaded without restarting Envoy - Sandboxed: a crashing WASM module is isolated — it doesn't crash the proxy - Portable: the Proxy-WASM ABI is shared across Envoy, NGINX, and Apache APISIX - Polyglot: write in Go, Rust, AssemblyScript, or any language that compiles to WASM

Under the Hood: WASM filters run inside a V8 or wasmtime sandbox. The overhead is roughly 2–5x slower than native C++ filters, but the safety isolation and hot-reload capability make them the preferred extension path. Lua filters and C++ extension points are legacy.

Gotcha: Configure WASM filters with fail_open: true on non-security-critical paths. A VM crash with fail_close returns 500 to every request hitting that filter. With fail_open, the request passes through unfiltered — degraded but functional.


Solving the Mission

Let's go back to those intermittent 503s on the checkout service. Here's the diagnostic sequence:

# Step 1: Check the response flag on those 503s
kubectl logs deploy/checkout -c istio-proxy | grep '"response_code":503'
# → response_flags: "UO"

# Step 2: UO = upstream overflow = circuit breaker tripped. Check the counter.
kubectl exec deploy/checkout -c istio-proxy -- \
  curl -s localhost:15000/stats | grep upstream_rq_pending_overflow
# → cluster.payment-service.upstream_rq_pending_overflow: 847

# Step 3: 847 tripped requests. Check the active request count.
kubectl exec deploy/checkout -c istio-proxy -- \
  curl -s localhost:15000/stats | grep "cluster.payment-service.upstream_rq_active"
# → cluster.payment-service.upstream_rq_active: 1019

# Step 4: 1019 active requests against a default max of 1024. That's the problem.
# The payment service is slow, requests pile up, and the breaker trips.

The fix: raise max_requests to match actual capacity (after confirming the payment service can handle it), and set proper timeouts so slow requests don't pile up indefinitely:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: payment-service
spec:
  host: payment-service
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 4096
      http:
        http1MaxPendingRequests: 1024
        http2MaxRequests: 4096
    outlierDetection:
      consecutive5xxErrors: 10
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50

Mystery solved. The application was fine. Envoy was protecting it — a little too aggressively.


Exercises

Exercise 1: Read the Admin Interface (2 minutes)

If you have access to a Kubernetes cluster with Istio:

kubectl exec -it deploy/any-service -c istio-proxy -- curl -s localhost:15000/stats | head -30

What to look for Count the number of clusters: `curl -s localhost:15000/clusters | wc -l`. Each line is a cluster. In a large mesh, you might see hundreds — that's why the Sidecar resource (config scoping) matters.

Exercise 2: Identify the Failure (5 minutes)

An Envoy access log contains this entry:

[2026-03-23T14:05:12.003Z] "GET /api/orders HTTP/1.1" 503 UO 0 81 0 - "10.244.1.15" "python-requests/2.28.1" "abc-123" "orders.prod.svc.cluster.local" "-"
What caused the 503? What stat would you check? What config would you change?

Answer The `UO` flag means upstream overflow — circuit breaker tripped. Check `upstream_rq_pending_overflow` for the orders cluster. Raise `max_pending_requests` and/or `max_requests` in the DestinationRule after profiling actual traffic.

Exercise 3: Design a Retry Policy (10 minutes)

Your service has a 30-second route timeout and you want 2 retries for 5xx errors. Calculate the per_try_timeout. Then explain what happens if you forget to set it and the upstream averages 25 seconds per response.

Answer `per_try_timeout` = 30s / (2 + 1) = 10s. Without it, each retry inherits the full 30-second timeout. Three attempts at 25 seconds each = 75 seconds of upstream capacity per request. Under load, this retry amplification triples the request volume to the already-slow upstream, making it slower, triggering more retries — a classic retry storm.

Cheat Sheet

What Command / Config
Dump full config curl -s localhost:15000/config_dump \| python3 -m json.tool
List clusters + health curl -s "localhost:15000/clusters?format=json"
Check circuit breaker trips curl -s localhost:15000/stats \| grep upstream_rq_pending_overflow
Active requests to cluster curl -s localhost:15000/stats \| grep upstream_rq_active
Prometheus metrics curl -s "localhost:15000/stats?format=prometheus"
Set debug logging curl -X POST "localhost:15000/logging?connection=debug"
Drain listeners curl -X POST "localhost:15000/drain_listeners?inboundonly"
Readiness check curl -s localhost:15000/ready
Response Flag Meaning First Thing to Check
UF Upstream failure Pod running? Port correct? Network policy?
UO Upstream overflow Raise circuit breaker thresholds
NR No route Host header, VirtualService, route ordering
URX Retry exhausted All retries failed — check upstream health
UT Upstream timeout Timeout too tight or upstream too slow
xDS API Manages Mnemonic
LDS Listeners Listening at the door
RDS Routes Routing the request
CDS Clusters Clustering the backends
EDS Endpoints Endpoint IP:port pairs
SDS Secrets Secret TLS certs
ADS All (ordered) All together, atomically

Takeaways

  • Envoy is the universal data plane. Every major service mesh is just a control plane on top of the same Envoy binary.
  • The xDS API family is what makes Envoy different from Nginx and HAProxy: fully dynamic configuration without restarts or connection drops.
  • Response flags in access logs (UF, UO, NR) are the fastest path to root cause — faster than application logs, which may not even see the failure.
  • Circuit breaker defaults (1024) are almost never right. Profile real traffic and tune proactively, especially before load tests.
  • Retries without per_try_timeout are a ticking time bomb. The safe formula: route_timeout / (num_retries + 1).
  • The admin interface at port 15000 (Istio) is your debugging Swiss Army knife — but never expose it outside the pod.