Service Mesh¶

20 cards — 🟢 3 easy | 🟡 4 medium | 🔴 3 hard

🟢 Easy (3)¶

1. What is a service mesh and what are its two main components?

Show answer

A service mesh is a dedicated infrastructure layer for managing service-to-service communication. Its two main components are the data plane (sidecar proxies intercepting all traffic between pods) and the control plane (configures proxies, issues certificates, collects telemetry).

Remember: "Service mesh = infrastructure layer for service-to-service communication." It handles mTLS, retries, circuit breaking, and observability.

Name origin: The term "service mesh" was coined by Buoyant (Linkerd creators) in 2017. William Morgan wrote the original blog post defining it.

Fun fact: The concept was inspired by Twitter\'s internal Finagle library, which handled similar concerns in their JVM microservices.

2. How does sidecar injection work in Istio and Linkerd?

Show answer

Both use a Kubernetes mutating admission webhook to automatically inject a proxy container into pods. In Istio, you label the namespace (kubectl label namespace myapp istio-injection=enabled). In Linkerd, you annotate it (linkerd.io/inject=enabled). Each pod then has two containers: the app and the proxy.

Under the hood: The mutating webhook intercepts Pod creation API calls and patches the Pod spec to add the proxy container before the Pod is scheduled.

3. What four golden signals does a service mesh provide for free without application instrumentation?

Show answer

Request rate (RPS per service/route), error rate (4xx/5xx percentages), latency (p50/p95/p99 per route), and topology (live service dependency graph).

Remember: "Sidecar = proxy container next to your app." It intercepts all network traffic transparently. Your app doesn't need code changes.

Example: Envoy is the most common sidecar proxy, used by Istio and others.

Name origin: The four golden signals come from Google\'s SRE book (2016): latency, traffic, errors, saturation. A mesh gives you three of four automatically.

🟡 Medium (4)¶

1. How does mutual TLS (mTLS) work in a service mesh?

Show answer

The control plane acts as a Certificate Authority. Each proxy gets a short-lived TLS certificate tied to its ServiceAccount identity. Every connection between services is encrypted and mutually authenticated. Certificates rotate automatically (24h default in Istio). No application code changes required.

2. How do you implement a canary deployment in Istio using VirtualService and DestinationRule?

Show answer

Define subsets in a DestinationRule (v1 and v2 with label selectors), then create a VirtualService that routes traffic by weight (e.g., 90% to subset v1, 10% to subset v2). Adjust weights to shift traffic gradually.

Remember: "Control plane = brain, Data plane = muscle." Control plane configures the proxies; data plane (sidecars) handles actual traffic.

Example: Start at 95/5 (5% canary), monitor error rate, then 90/10, 75/25, 50/50, 0/100. Shift back to 100/0 if errors spike.

3. What are the key differences between Istio and Linkerd in terms of proxy technology and resource overhead?

Show answer

Istio uses Envoy (C++) as its sidecar proxy with ~100MB per sidecar overhead. Linkerd uses linkerd2-proxy (Rust) with ~20MB per sidecar. Istio offers more features and configuration options; Linkerd is simpler and more opinionated with lower resource usage.

Remember: "mTLS in mesh = zero-trust networking." Every service proves its identity. No plaintext between services.

Gotcha: mTLS adds latency — measure the overhead before enabling mesh-wide.

Number anchor: Envoy (Istio) uses ~100MB per sidecar. linkerd2-proxy (Rust) uses ~20MB. For a cluster with 1000 pods, that\'s 100GB vs 20GB of overhead.

4. What is the difference between Permissive and Strict mTLS modes in Istio?

Show answer

Permissive mode (the default) accepts both plaintext and mTLS connections, allowing gradual migration. Strict mode requires mTLS for all connections and rejects plaintext traffic. You enforce strict mode with a PeerAuthentication resource setting mtls.mode to STRICT.

Remember: "Permissive = migration mode (accepts both). Strict = enforced (mTLS only)." Always start permissive, then switch to strict.

🔴 Hard (3)¶

1. How do you configure retries and timeouts in an Istio VirtualService, and what is the retryOn field used for?

Show answer

In the VirtualService http route, set timeout (e.g., 5s), retries.attempts (e.g., 3), retries.perTryTimeout (e.g., 2s), and retries.retryOn (e.g., "5xx,reset,connect-failure") to specify which response codes or conditions trigger a retry. This prevents wasting retries on non-transient errors.

Gotcha: Retries can amplify load during an outage. Always set retries.retryOn to specific error codes (5xx, reset) — never retry on 4xx (client errors).

2. How does Linkerd implement traffic splitting, and how does it differ from Istio's approach?

Show answer

Linkerd uses the SMI (Service Mesh Interface) TrafficSplit CRD, which specifies a root service and weighted backends (e.g., my-service-v1 at 900m, my-service-v2 at 100m). Istio uses its own VirtualService and DestinationRule CRDs with subset-based routing. Linkerd's approach uses a vendor-neutral standard; Istio's is more feature-rich but proprietary.

Name origin: SMI = Service Mesh Interface. A CNCF project that defines vendor-neutral CRDs for traffic management across different meshes.

3. Why can Istio's port naming requirement cause silent failures, and what must you do to avoid it?

Show answer

Istio requires Service ports to be named with a protocol prefix (e.g., http-web, grpc-api). Unnamed ports are treated as opaque TCP, which means L7 features like retries, traffic splitting, and observability silently stop working for those ports. Always name ports with the correct protocol prefix.

Gotcha: This is the #1 gotcha when adopting Istio. Unnamed ports silently lose L7 features. Add `name: http-web` or `name: grpc-api` to every Service port.