Skip to content

Portal | Level: L2: Operations | Topics: Envoy Proxy | Domain: Kubernetes

Envoy Proxy — Primer

Why Envoy Matters

Envoy is the universal data plane of the cloud-native ecosystem. It is a high-performance L7 proxy written in C++ that runs as a sidecar alongside every service in a mesh, or as a standalone ingress/egress gateway. Every major service mesh — Istio, AWS App Mesh, Consul Connect, Kuma, and others — uses Envoy as its data plane. Understanding Envoy means understanding how traffic actually flows in modern microservice architectures.

The shift from L4 load balancers to L7 proxies like Envoy is architectural. L4 devices see IP/TCP and make routing decisions on ports. Envoy understands HTTP/1.1, HTTP/2, gRPC, WebSocket, MongoDB, Redis, and more at the protocol level. It can inspect headers, rewrite paths, apply per-service circuit breakers, emit per-route metrics, and enforce mutual TLS — none of which is possible at L4.

On-call engineers who understand Envoy can diagnose 503 UF/UO/NR response flags, read config_dump output, tune circuit breaker thresholds before load tests, and perform zero-downtime configuration changes via xDS. Engineers who do not understand it are stuck reading opaque logs and guessing.


Fun fact: Envoy was created by Matt Klein at Lyft in 2016 and donated to the CNCF, where it graduated in 2018. The name "Envoy" means a messenger or representative — fitting for a proxy that acts as an intermediary for service-to-service communication. Envoy was written in C++ for performance; its hot-restart capability was a key differentiator that made it suitable for sidecar deployment in production.

Architecture

Envoy models traffic through four hierarchical primitives:

Listeners define where Envoy accepts connections (address + port). Each listener has a filter chain.

Filter Chains are ordered pipelines of network and HTTP filters that process a connection. The terminal filter in an HTTP filter chain is the router filter, which consults the route table.

Routes match incoming requests (by header, prefix, exact path, regex) to a named cluster. Routes are evaluated in order — first match wins.

Clusters are logical upstream service groups. A cluster holds the load-balancing policy, circuit breaker thresholds, and health-check configuration. Each cluster resolves to a set of endpoints.

Endpoints are the actual IP:port pairs behind a cluster, supplied either statically or via EDS (Endpoint Discovery Service).

Downstream request
  └── Listener (bind address)
        └── Filter chain
              ├── Network filters (TCP proxy / HTTP connection manager)
              └── HTTP filters (rate limit, JWT, WASM, router)
                    └── Route match → Cluster → Endpoint
                                                  └── Upstream request

xDS APIs

Envoy's configuration can be delivered dynamically via the xDS (Discovery Service) API family. A management plane (like Istio's istiod) pushes configuration to Envoy over gRPC streams — no restart required.

API Manages
LDS — Listener Discovery Service Listeners and filter chains
RDS — Route Discovery Service Route configurations
CDS — Cluster Discovery Service Upstream cluster definitions
EDS — Endpoint Discovery Service Cluster member endpoints
SDS — Secret Discovery Service TLS certificates and keys
ADS — Aggregated Discovery Service All of the above on one stream (ordering-safe)

ADS is the recommended choice in production. When CDS, EDS, and LDS/RDS arrive on separate streams they can momentarily disagree (cluster added before endpoints arrive), causing brief 503s. ADS sequences updates atomically.

Remember: The xDS API mnemonic: "LRCESSA" — Listener, Route, Cluster, Endpoint, Secret, Aggregated. The data flows top-down: Listeners contain Routes, Routes point to Clusters, Clusters resolve to Endpoints. SDS handles certificates separately. ADS wraps all of them into a single ordered stream to prevent inconsistency.


Load Balancing

Envoy supports multiple load-balancing policies per cluster:

  • Round-robin — default; distributes requests evenly by cycling through endpoints.
  • Least-request — sends each new request to the endpoint with the fewest active requests. Better for mixed workloads with variable request latency.
  • Random — picks an endpoint uniformly at random. Lower overhead than round-robin under very high concurrency.
  • Ring hash — consistent hashing based on a header (typically session cookie or user ID). Stickiness without explicit sessions; useful for caching layers.
  • Maglev — Google's variant of consistent hashing with more even distribution and faster table rebuilds.
  • Zone-aware routing — biases traffic toward endpoints in the same availability zone as the proxy. Reduces cross-AZ data transfer costs and latency. Falls back to cross-zone when local zone capacity is insufficient.

Gotcha: Zone-aware routing can cause unbalanced load if your endpoints are unevenly distributed across zones. If zone A has 10 pods and zone B has 2 pods, proxies in zone A send almost all traffic locally — zone A pods are underloaded while zone B pods are overloaded. Envoy has a min_cluster_size threshold: if a zone has too few endpoints, it falls back to cross-zone routing. Tune this for your topology.


Observability

Envoy emits rich telemetry without application code changes.

Stats: Envoy exposes thousands of counters, gauges, and histograms at localhost:15090/stats (or localhost:9901/stats on the admin port). Key counters: upstream_cx_active, upstream_rq_total, upstream_rq_5xx, upstream_rq_retry, circuit_breakers.default.cx_open.

Access logs: Configurable per listener. The default format includes response flags (e.g., UF, UO, NR), upstream cluster, duration, bytes, and response code. JSON format makes downstream log parsing trivial.

Distributed tracing: Envoy propagates and emits trace spans for B3 (Zipkin), Jaeger, AWS X-Ray, and OpenTelemetry. It generates a new span per hop and propagates trace context headers (x-b3-traceid, x-b3-spanid, etc.) so services that don't instrument their own code still appear in traces.

Response flags are shorthand for why a response was not clean: - UF — upstream connection failure - UO — upstream overflow (circuit breaker open) - NR — no route found - URX — upstream retry exhausted - RL — rate limited - DC — downstream connection terminated (client closed before response)

Remember: The most common Envoy 503 response flags: "UF, UO, NR." Mnemonic: "Upstream Failed, Upstream Overflowed, No Route." When you see a 503 in access logs, the response flag immediately tells you the category of failure without reading upstream logs.


Circuit Breaking

Envoy implements circuit breaking at the cluster level, not the application level. Thresholds are configured per cluster (and optionally per priority):

  • max_connections — maximum active TCP connections to upstream
  • max_pending_requests — maximum queued requests waiting for a connection
  • max_requests — maximum active requests in flight
  • max_retries — maximum concurrent retries

When a threshold is exceeded, Envoy returns 503 with response flag UO (upstream overflow) rather than queuing more requests. This prevents cascade failures. Default thresholds (max_connections: 1024, max_pending_requests: 1024, max_requests: 1024) are often too high for fine-grained isolation and too low for high-throughput services — always tune to your traffic profile.

Outlier detection is the companion mechanism: it passively monitors for unhealthy hosts (consecutive 5xx, consecutive gateway errors, consecutive local origin errors, response latency above threshold) and ejects them from the load-balancing pool for a configurable ejection period. Unlike active health checks, outlier detection reacts to live traffic patterns.


Traffic Management

Retries: Envoy can retry on 5xx, gateway-error, connect-failure, retriable-4xx, reset, and per-try timeout. Retries should always be paired with a per-try timeout shorter than the route timeout to avoid cascading slowdowns. The retry_on: retriable-status-codes policy lets you specify exact codes (e.g., 503 only).

Timeouts: Three timeout boundaries matter: 1. connect_timeout — how long Envoy waits to establish a TCP connection to upstream 2. route.timeout — total time budget for the entire request (including retries) 3. route.retry_policy.per_try_timeout — per-attempt budget within the overall route timeout

Rate limiting: Envoy supports both local (in-process, token bucket) and global (via an external gRPC rate limit service like Lyft's ratelimit) rate limiting. Local rate limiting is per-Envoy-instance; global is shared state across all instances.

Traffic shifting: Routes support weighted cluster assignments. Splitting 90% to v1 and 10% to v2 requires only a configuration change — no DNS update, no new load balancer. This is the mechanism behind Istio's VirtualService traffic shifting.

Header manipulation: Envoy can add, remove, or rewrite request and response headers at the route, virtual-host, or cluster level. Useful for adding x-envoy-upstream-service-time, stripping internal headers before forwarding to upstream, or injecting debug headers.


HTTP/2 and gRPC

Envoy was designed with HTTP/2 first. It can: - Terminate HTTP/2 from downstream and proxy as HTTP/1.1 to upstream (or vice versa) - Bridge gRPC (HTTP/2) to gRPC-Web (HTTP/1.1) for browser clients - Provide gRPC transcoding (gRPC → REST JSON) without application changes - Apply per-stream flow control, header compression (HPACK), and stream multiplexing

For gRPC specifically, Envoy understands trailer-based status codes, handles grpc-timeout header propagation, and emits per-method stats (cluster.<name>.grpc.<service>.<method>.success).


WASM Filter Extensibility

Envoy supports WebAssembly (WASM) filters that run inside the proxy process in a sandboxed VM. WASM filters can inspect and modify request/response headers and bodies, emit custom metrics, and call external services. They replace the older Lua filter and C++ extension points for custom logic.

Key properties: WASM filters are loaded and unloaded without restarting Envoy. A crashing WASM module is isolated from the proxy — the filter fails open or closed depending on configuration, rather than crashing the proxy. Filters can be written in Go, Rust, AssemblyScript, or any language that compiles to WASM.

Under the hood: WASM filters run inside a V8 or wasmtime sandbox within the Envoy process. They communicate with the proxy through a well-defined ABI (proxy-wasm). The overhead is roughly 2-5x slower than native C++ filters, but the safety isolation and hot-reload capability make them the preferred choice for custom logic. The proxy-wasm spec is shared across Envoy, NGINX, and Apache APISIX, making filters portable.


Envoy as Ingress Gateway vs Sidecar

In a sidecar deployment (Istio, Linkerd), Envoy runs as a container injected alongside every pod. It intercepts all inbound and outbound traffic via iptables rules. The control plane manages configuration for every sidecar collectively.

As an ingress gateway, Envoy runs as a dedicated deployment at the edge of the cluster (or service mesh). It handles external-to-mesh traffic: TLS termination, routing to internal services, rate limiting, and authentication. Istio's IngressGateway and Contour are both Envoy-based ingress controllers.

Operational difference: sidecar failures affect only the co-located pod. Ingress gateway failures affect all external traffic. Gateway deployments need HPA, pod disruption budgets, and careful rolling update strategy.


Hot Restart and Draining

Envoy supports hot restart: a new Envoy process starts, receives configuration from the old process via a Unix domain socket, takes over the listening sockets, and the old process drains and exits. External connections see no interruption.

The drain sequence: 1. New process starts and signals readiness 2. Old process enters draining mode: stops accepting new connections on shared sockets 3. Old process waits for in-flight requests to complete (drain timeout, default 600s) 4. Old process exits

In Kubernetes, rolling updates achieve a similar effect. The critical detail: set a preStop hook and terminationGracePeriodSeconds long enough for Envoy to drain active connections. Without it, the pod is killed mid-request and clients see TCP RSTs.


Key Takeaways

  • Envoy is the universal data plane; every service mesh is just a control plane on top of it.
  • The xDS API family allows zero-restart, ordered configuration delivery. ADS is safest.
  • Circuit breaker defaults are almost never right for your workload — tune them proactively.
  • Response flags (UF, UO, NR, URX) in access logs identify root cause faster than upstream logs.
  • Retries without per-try timeouts cause retry storms under load.
  • WASM filters are the correct extensibility path; Lua and C++ extension points are legacy.
  • Hot restart requires explicit drain configuration in Kubernetes to avoid mid-request kills.

Wiki Navigation

Prerequisites

Next Steps

  • Envoy Flashcards (CLI) (flashcard_deck, L1) — Envoy Proxy