Skip to content

Portal | Level: L3: Advanced | Topics: Service Mesh | Domain: Kubernetes

Service Mesh Drills

Remember: A service mesh adds three capabilities to your cluster: mTLS (encryption between services), Observability (automatic metrics, traces, and access logs for every request), and Traffic management (retries, timeouts, circuit breaking, canary deploys). Mnemonic: "MOT" — Mutual TLS, Observability, Traffic. The sidecar proxy (Envoy) intercepts all traffic transparently — no application code changes needed.

Gotcha: The Istio sidecar needs to be running before your application starts making requests. If your app starts faster than the sidecar, outbound requests fail with "connection refused." Fix: set holdApplicationUntilProxyStarts: true in the Istio mesh config, or add an init container that waits for the sidecar to be ready.

Drill 1: Enable Sidecar Injection

Difficulty: Easy

Q: How do you enable automatic Istio sidecar injection for the production namespace?

Answer
kubectl label namespace production istio-injection=enabled

# Verify
kubectl get namespace production --show-labels

# Restart existing pods to get sidecars
kubectl rollout restart deployment -n production
Pods created after labeling get automatic injection. Existing pods need a restart.

Drill 2: Diagnose 503 After Mesh Enable

Difficulty: Medium

Q: After enabling Istio, all requests return 503. Pods show 2/2 Ready. App logs show no incoming traffic. What do you check?

Answer
# 1. Check istio-proxy logs
kubectl logs deploy/my-app -n production -c istio-proxy --tail=50
# Look for: "upstream connect error or disconnect/reset before headers"

# 2. Run Istio analysis
istioctl analyze -n production

# 3. Check Service port naming — MOST COMMON CAUSE
kubectl get svc my-app -n production -o yaml | grep -A5 ports:
# Port must be named with protocol prefix: http-web, grpc-api, tcp-db
# NOT just "web" or "api"

# 4. Fix
kubectl patch svc my-app -n production --type=json \
  -p='[{"op":"replace","path":"/spec/ports/0/name","value":"http-web"}]'

# 5. Check mTLS mode
kubectl get peerauthentication -A
# STRICT mode blocks non-mesh clients

Drill 3: Canary Deployment with Traffic Splitting

Difficulty: Medium

Q: Route 90% of traffic to v1 and 10% to v2 of my-app using Istio.

Answer
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: my-app
  namespace: production
spec:
  host: my-app
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: my-app
  namespace: production
spec:
  hosts:
  - my-app
  http:
  - route:
    - destination:
        host: my-app
        subset: v1
      weight: 90
    - destination:
        host: my-app
        subset: v2
      weight: 10
Prerequisites: - Both Deployments must have `version: v1` / `version: v2` labels on pods - Both must be selected by the same Service

Drill 4: Header-Based Routing

Difficulty: Medium

Q: Route requests with header x-env: canary to the v2 subset, all other traffic to v1.

Answer
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: my-app
spec:
  hosts: ["my-app"]
  http:
  - match:
    - headers:
        x-env:
          exact: canary
    route:
    - destination:
        host: my-app
        subset: v2
  - route:
    - destination:
        host: my-app
        subset: v1
Order matters: specific matches first, default route last.

Drill 5: Circuit Breaker

Difficulty: Medium

Q: Configure outlier detection to eject endpoints that return 3+ consecutive 5xx errors, checked every 30 seconds, ejected for 1 minute.

Answer
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: my-app
spec:
  host: my-app
  trafficPolicy:
    outlierDetection:
      consecutive5xxErrors: 3
      interval: 30s
      baseEjectionTime: 60s
      maxEjectionPercent: 50
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 100
        http2MaxRequests: 1000
`maxEjectionPercent: 50` ensures at least half the endpoints stay active even during widespread failures.

Drill 6: Fault Injection for Testing

Difficulty: Easy

Q: Inject a 5-second delay into 10% of requests and return 503 for 5% of requests to test resilience.

Answer
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: my-app
spec:
  hosts: ["my-app"]
  http:
  - fault:
      delay:
        percentage:
          value: 10.0
        fixedDelay: 5s
      abort:
        percentage:
          value: 5.0
        httpStatus: 503
    route:
    - destination:
        host: my-app
Use this to test: - Timeout handling in upstream services - Retry logic - Circuit breaker behavior - User-facing error handling

Drill 7: Debug Proxy Configuration

Difficulty: Hard

Q: Traffic to backend-svc returns 404 even though the Service exists. How do you debug the Envoy proxy config?

Answer
# 1. Check proxy sync status
istioctl proxy-status
# Look for SYNCED status. If not synced, config hasn't propagated.

# 2. Check routes in the sidecar
istioctl proxy-config routes deploy/my-app -n production
# Look for backend-svc in the route table

# 3. Check clusters (upstream endpoints)
istioctl proxy-config clusters deploy/my-app -n production | grep backend-svc

# 4. Check endpoints
istioctl proxy-config endpoints deploy/my-app -n production | grep backend-svc
# Are there any endpoints? Are they HEALTHY?

# 5. Check listeners
istioctl proxy-config listeners deploy/my-app -n production

# 6. Full config dump
istioctl proxy-config all deploy/my-app -n production -o json
Common causes of 404: - VirtualService host doesn't match the request Host header - Service in different namespace without full FQDN - Sidecar resource restricting egress

Drill 8: mTLS Verification

Difficulty: Medium

Q: How do you verify that mTLS is actually enabled between two services?

Answer
# 1. Check PeerAuthentication policy
kubectl get peerauthentication -A

# 2. Check what mode is effective for a workload
istioctl authn tls-check <pod-name> <service-name>.production.svc.cluster.local

# 3. Check the proxy config for mTLS settings
istioctl proxy-config clusters deploy/my-app -n production -o json | \
  jq '.[] | select(.name | contains("backend-svc")) | .transportSocket'

# 4. Verify with Kiali dashboard (if installed)
# Shows lock icon on edges between services

# 5. Check istio-proxy logs for TLS handshake
kubectl logs deploy/my-app -c istio-proxy | grep -i tls
mTLS modes: - `STRICT` — only accept mTLS traffic - `PERMISSIVE` — accept both plain and mTLS (migration mode) - `DISABLE` — no mTLS

Drill 9: Retry Configuration

Difficulty: Easy

Q: Configure Istio to retry failed requests to backend-svc up to 3 times with a 2-second timeout per attempt.

Answer
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: backend-svc
spec:
  hosts: ["backend-svc"]
  http:
  - timeout: 10s
    retries:
      attempts: 3
      perTryTimeout: 2s
      retryOn: gateway-error,connect-failure,refused-stream,5xx
    route:
    - destination:
        host: backend-svc
`retryOn` options: - `5xx` — retry on 5xx responses - `gateway-error` — 502, 503, 504 - `connect-failure` — connection failed - `refused-stream` — REFUSED_STREAM error - `retriable-4xx` — retry on 409 - `reset` — connection reset

Drill 10: Sidecar Resource for Namespace Isolation

Difficulty: Hard

Q: Limit the payment namespace to only communicate with payment and database namespaces. Block all other egress.

Answer
apiVersion: networking.istio.io/v1beta1
kind: Sidecar
metadata:
  name: default
  namespace: payment
spec:
  egress:
  - hosts:
    - "./*"                      # Same namespace
    - "database/*"               # Database namespace
    - "istio-system/*"           # Required for mesh function
  outboundTrafficPolicy:
    mode: REGISTRY_ONLY          # Block unknown destinations
This configures the Envoy sidecar in every pod in the `payment` namespace to only know about services in `payment`, `database`, and `istio-system`. Requests to other namespaces will fail. Benefits: - Security: limits blast radius - Performance: smaller Envoy config (less memory, faster config push)

Wiki Navigation

Prerequisites