Skip to content

Portal | Level: L2 | Domain: Kubernetes

Envoy Proxy — Street-Level Ops

Admin Interface

Envoy exposes a local admin interface (default port 9901, or 15000 in Istio sidecars). Never expose this externally — it allows runtime config changes and log-level overrides.

Default trap: Istio sidecars use port 15000 for admin, not Envoy's default 9901. If you're debugging an Istio mesh and curl localhost:9901 returns nothing, try 15000.

# Dump the full running configuration (listeners, clusters, routes, endpoints)
curl -s localhost:15000/config_dump | python3 -m json.tool | less

# Dump only clusters with their current endpoint health status
curl -s "localhost:15000/clusters?format=json" | python3 -m json.tool

# List all active listeners
curl -s localhost:15000/listeners

# Dump all stats (counters, gauges, histograms)
curl -s localhost:15000/stats

# Filter stats to circuit breaker state
curl -s localhost:15000/stats | grep circuit_breaker

# Filter stats to upstream retry counters
curl -s localhost:15000/stats | grep upstream_rq_retry

# Filter stats to 5xx rates
curl -s localhost:15000/stats | grep upstream_rq_5xx

# Live stats stream (resets on each request)
curl -s "localhost:15000/stats?format=prometheus"

# Check current log levels
curl -s localhost:15000/logging

# Set connection manager log level to debug (resets on restart)
curl -X POST "localhost:15000/logging?connection=debug"

# Reset all log levels to warning
curl -X POST "localhost:15000/logging?level=warning"

# Healthcheck endpoint (useful in init containers)
curl -s localhost:15000/ready

Reading config_dump

/config_dump returns a large JSON blob. Key sections:

# Extract only static listeners
curl -s localhost:15000/config_dump \
  | python3 -c "
import sys, json
d = json.load(sys.stdin)
for c in d['configs']:
    if c['@type'].endswith('ListenersConfigDump'):
        print(json.dumps(c, indent=2))
"

# Extract cluster names and their load assignment
curl -s "localhost:15000/clusters?format=json" \
  | python3 -c "
import sys, json
d = json.load(sys.stdin)
for c in d.get('cluster_statuses', []):
    print(c['name'], '—', len(c.get('host_statuses', [])), 'hosts')
"

The config_dump endpoint can return tens of MB in large meshes. Pipe through python3 -m json.tool | grep -A5 '"name"' to locate a specific cluster or listener without loading the whole blob.


Diagnosing 503s with Response Flags

Access log response flags are the fastest path to root cause:

Flag Meaning Common cause
UF Upstream connection failure Upstream pod crashed, network policy, wrong port
UO Upstream overflow (circuit breaker) Circuit breaker thresholds too low, traffic spike
NR No route match Missing route, wrong Host header, VirtualService misconfiguration
URX Retry exhausted Upstream returning 5xx, retry budget exceeded
UT Upstream request timeout Upstream too slow, timeout too tight
RL Rate limited Rate limit policy triggered
DC Downstream connection terminated Client closed before response (usually not an Envoy bug)
LH Local service health check failed Envoy health check misconfigured

Debug clue: Response flags are the single fastest path to root-causing Envoy 503s. Skip the application logs and start with grep " 503 " access.log | awk '{print $NF}' | sort | uniq -c | sort -rn. If UO dominates, raise circuit breaker thresholds. If NR dominates, check your route config.

# Count 503s by response flag in an Envoy access log
grep " 503 " /var/log/envoy/access.log \
  | awk '{print $NF}' \
  | sort | uniq -c | sort -rn

# Istio sidecar access log (JSON format)
kubectl logs <pod> -c istio-proxy \
  | python3 -c "
import sys, json
for line in sys.stdin:
    try:
        r = json.loads(line)
        if r.get('response_code') == '503':
            print(r.get('response_flags'), r.get('upstream_cluster'), r.get('path'))
    except:
        pass
"

Checking Circuit Breaker State

# Is the circuit breaker open right now?
curl -s localhost:15000/stats | grep "circuit_breakers\|cx_open\|rq_open\|rq_pending_open"

# Upstream overflow counter (increments each time UO is returned)
curl -s localhost:15000/stats | grep upstream_rq_pending_overflow

# Active connections to a specific cluster
curl -s localhost:15000/stats | grep "cluster.my-service.upstream_cx_active"

# Active requests to a specific cluster
curl -s localhost:15000/stats | grep "cluster.my-service.upstream_rq_active"

If upstream_rq_pending_overflow is incrementing rapidly, max_pending_requests is too low for your traffic volume.

Under the hood: Envoy circuit breakers are per-cluster, not per-route. If two routes share the same upstream cluster, they share the same circuit breaker budget. A traffic spike on one route can trip the breaker and starve the other route. Split critical routes into separate clusters if they need independent protection.


Cluster Health and Endpoint Status

# Show all endpoints and their health status
curl -s "localhost:15000/clusters?format=json" | python3 -m json.tool \
  | grep -A10 '"address"'

# Count healthy vs unhealthy endpoints per cluster
curl -s "localhost:15000/clusters?format=json" \
  | python3 -c "
import sys, json
d = json.load(sys.stdin)
for c in d.get('cluster_statuses', []):
    total = len(c.get('host_statuses', []))
    healthy = sum(1 for h in c.get('host_statuses', [])
                  if all(s.get('type') == 'HEALTHY' for s in h.get('health_status', {}).values()
                         if isinstance(s, dict)))
    print(f\"{c['name']}: {healthy}/{total} healthy\")
"

Access Log Format Patterns

Envoy's default text access log format:

[%START_TIME%] "%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%"
%RESPONSE_CODE% %RESPONSE_FLAGS% %BYTES_RECEIVED% %BYTES_SENT%
%DURATION% %RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%
"%REQ(X-FORWARDED-FOR)%" "%REQ(USER-AGENT)%"
"%REQ(X-REQUEST-ID)%" "%REQ(:AUTHORITY)%" "%UPSTREAM_HOST%"

For structured JSON logging (recommended for log aggregation):

typed_config:
  "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
  log_format:
    json_format:
      start_time: "%START_TIME%"
      method: "%REQ(:METHOD)%"
      path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
      response_code: "%RESPONSE_CODE%"
      response_flags: "%RESPONSE_FLAGS%"
      duration_ms: "%DURATION%"
      upstream_cluster: "%UPSTREAM_CLUSTER%"
      upstream_host: "%UPSTREAM_HOST%"
      request_id: "%REQ(X-REQUEST-ID)%"

Header-Based Routing Debug

# Send request with specific header to test routing rules
curl -H "x-env: canary" http://my-service/api/v1/health

# Inject debug header to get upstream selection info (Istio)
curl -H "x-envoy-force-trace: true" http://my-service/api/v1/health

# Check which cluster Envoy routed to by inspecting response headers
curl -v -H "x-debug: 1" http://my-service/api/ 2>&1 | grep -i "x-envoy\|server\|via"

# Test timeout behavior: request that takes longer than route timeout
curl --max-time 30 http://my-service/slow-endpoint -v

Circuit Breaker Tuning

Start with real traffic metrics before setting thresholds:

# P99 active connections (use this as your max_connections ceiling with headroom)
curl -s localhost:15000/stats | grep "upstream_cx_active"

# P99 pending requests (use for max_pending_requests)
curl -s localhost:15000/stats | grep "upstream_rq_pending_active"

# Max concurrent requests observed
curl -s localhost:15000/stats | grep "upstream_rq_active"

Recommended tuning formula: - max_connections = observed P99 active connections * 2 - max_pending_requests = observed P99 pending * 1.5 (intentionally tight to shed early) - max_requests = observed P99 concurrent requests * 2 - max_retries = max_requests * retry rate (usually 0.1–0.2)


Outlier Detection Tuning

Default outlier detection settings eject an endpoint after 5 consecutive 5xx errors, with 30-second base ejection time. Too aggressive for flapping services:

outlier_detection:
  consecutive_5xx: 10           # raise from default 5
  interval: 30s                 # evaluation window
  base_ejection_time: 30s       # start with 30s ejection
  max_ejection_percent: 50      # never eject more than half the pool
  consecutive_gateway_failure: 5
  enforcing_consecutive_5xx: 100  # 100% enforcement (vs 0 = detection only)

Set enforcing_consecutive_5xx: 0 during initial rollout to observe ejections without acting on them.

Scale note: With max_ejection_percent: 50, if you have only 2 endpoints, one 5xx burst ejects half your backend. Set max_ejection_percent proportional to your fleet size, and never let it go above 50% on small pools.