Portal | Level: L3: Advanced | Topics: Service Mesh, Kubernetes Networking | Domain: Kubernetes

Service Mesh (Istio / Linkerd) - Primer¶

Why This Matters¶

As microservices multiply, every team rediscovers the same problems: retries, timeouts, mutual TLS, canary routing, and observability. A service mesh moves this cross-cutting logic out of application code and into the infrastructure layer. Understanding service meshes is critical for anyone operating production Kubernetes at scale.

Core Concepts¶

What Is a Service Mesh?¶

A service mesh is a dedicated infrastructure layer for managing service-to-service communication. It works by injecting a sidecar proxy (usually Envoy) alongside every pod, forming a data plane. A control plane configures those proxies to enforce policies.

[Service A] <---> [Sidecar Proxy] <--- mesh ---> [Sidecar Proxy] <---> [Service B]
                                    |
                              [Control Plane]
                         (Istiod / Linkerd control)

Data Plane vs Control Plane¶

Component	What it does	Examples
Data plane	Proxies that intercept all traffic between pods	Envoy (Istio), linkerd2-proxy
Control plane	Configures proxies, issues certificates, collects telemetry	Istiod, Linkerd destination/identity

Sidecar Injection¶

Both Istio and Linkerd automatically inject a proxy container into pods. This is done via a Kubernetes mutating admission webhook.

# Istio: label namespace for auto-injection
kubectl label namespace myapp istio-injection=enabled

# Linkerd: annotate namespace for auto-injection
kubectl annotate namespace myapp linkerd.io/inject=enabled

After injection, each pod has two containers: your app and the proxy.

Mutual TLS (mTLS)¶

The killer feature. A service mesh gives you encryption and identity between all services without changing application code.

Fun fact: The term "service mesh" was coined by William Morgan, co-founder of Buoyant (the company behind Linkerd), in a 2017 blog post. Linkerd (originally written in Scala on the JVM) was the first service mesh, inspired by Twitter's Finagle library. Istio (Google, IBM, Lyft) launched the same year with Envoy as its data plane, and the "mesh wars" began.

How It Works¶

Control plane acts as a Certificate Authority (CA)
Each proxy gets a short-lived TLS certificate (identity = ServiceAccount)
Every connection is encrypted and mutually authenticated
Certificates rotate automatically (24h default in Istio)

Modes¶

Mode	Behavior
Permissive (default in Istio)	Accept both plaintext and mTLS
Strict	Require mTLS; reject plaintext

# Istio: enforce strict mTLS namespace-wide
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: myapp
spec:
  mtls:
    mode: STRICT

Traffic Management¶

Virtual Services & Destination Rules (Istio)¶

# Route 90% of traffic to v1, 10% to v2 (canary)
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: my-service
spec:
  hosts:
    - my-service
  http:
    - route:
        - destination:
            host: my-service
            subset: v1
          weight: 90
        - destination:
            host: my-service
            subset: v2
          weight: 10
---
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: my-service
spec:
  host: my-service
  subsets:
    - name: v1
      labels:
        version: v1
    - name: v2
      labels:
        version: v2

Traffic Splits (Linkerd)¶

Linkerd uses the SMI (Service Mesh Interface) TrafficSplit CRD:

apiVersion: split.smi-spec.io/v1alpha2
kind: TrafficSplit
metadata:
  name: my-service
spec:
  service: my-service
  backends:
    - service: my-service-v1
      weight: 900m
    - service: my-service-v2
      weight: 100m

Retries & Timeouts¶

# Istio: timeout + retries
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: my-service
spec:
  hosts:
    - my-service
  http:
    - timeout: 5s
      retries:
        attempts: 3
        perTryTimeout: 2s
        retryOn: 5xx,reset,connect-failure
      route:
        - destination:
            host: my-service

Observability¶

A mesh gives you golden signals for free (no instrumentation required):

Signal	What you get
Request rate	RPS per service/route
Error rate	4xx/5xx percentages
Latency	p50/p95/p99 per route
Topology	Live service dependency graph

# Istio: Kiali dashboard
istioctl dashboard kiali

# Linkerd: built-in dashboard
linkerd viz dashboard

# Linkerd: CLI stats
linkerd viz stat deploy -n myapp

Istio vs Linkerd¶

Aspect	Istio	Linkerd
Proxy	Envoy (C++)	linkerd2-proxy (Rust)
Resource overhead	Higher (~100MB/sidecar)	Lower (~20MB/sidecar)
Complexity	More features, more config	Simpler, opinionated
Traffic management	VirtualService/DestinationRule	SMI TrafficSplit
mTLS	Automatic, configurable	Always on
Best for	Complex routing, multi-cluster	Simplicity, low overhead

Under the hood: Istio's sidecar injection works via a Kubernetes mutating admission webhook. When a pod is created in a labeled namespace, the webhook intercepts the API request and adds an istio-proxy container and an istio-init init container (which sets up iptables rules to redirect all traffic through the proxy). This is why restarting pods after enabling injection is required — pods created before the webhook was active have no sidecar.

Gotcha: A common Istio debugging trap: port naming. Istio requires service ports to be named with a protocol prefix like http-web or grpc-api. If a port is unnamed or uses a non-standard prefix, Istio treats it as opaque TCP — no L7 routing, no retries, no metrics by route. This silent fallback to TCP causes "my VirtualService rules have no effect" incidents that waste hours.

Common Pitfalls¶

Sidecar not injected — Namespace not labeled/annotated. Pods created before injection was enabled need restart.
Port naming — Istio requires ports named with protocol prefix (e.g., http-web, grpc-api). Unnamed ports are treated as TCP.
Resource overhead — Each sidecar uses CPU/memory. Budget for it in resource requests.
Init container ordering — The mesh init container must run before app init containers that need network.
Health check bypass — Kubelet health checks may bypass the proxy. Istio rewrites probes automatically; Linkerd does not intercept them.

Interview tip: When asked "when would you use a service mesh?", the strongest answer focuses on the inflection point: "When you have enough services that implementing retries, timeouts, mTLS, and observability in each one becomes a maintenance burden." For 3-5 services, a mesh is overkill. For 20+, the cross-cutting concerns justify the operational overhead of sidecar management.

Installation Quick Reference¶

# Istio (istioctl)
istioctl install --set profile=demo -y
kubectl label namespace default istio-injection=enabled

# Linkerd (CLI)
linkerd install --crds | kubectl apply -f -
linkerd install | kubectl apply -f -
linkerd viz install | kubectl apply -f -
linkerd check

Prerequisites¶

Kubernetes Ops (Production) (Topic Pack, L2)

Next Steps¶

Envoy Proxy (Topic Pack, L2)
HashiCorp Consul (Topic Pack, L2)
Istio Service Mesh (Topic Pack, L2)
Service Mesh Drills (Drill, L3)
Skillcheck: Service Mesh (Assessment, L3)

API Gateways & Ingress (Topic Pack, L2) — Kubernetes Networking
Case Study: CNI Broken After Restart (Case Study, L2) — Kubernetes Networking
Case Study: Canary Deploy Routing to Wrong Backend — Ingress Misconfigured (Case Study, L2) — Kubernetes Networking
Case Study: CoreDNS Timeout Pod DNS (Case Study, L2) — Kubernetes Networking
Case Study: Grafana Dashboard Empty — Prometheus Blocked by NetworkPolicy (Case Study, L2) — Kubernetes Networking
Case Study: Service Mesh 503s — Envoy Misconfigured, RBAC Policy (Case Study, L2) — Kubernetes Networking
Case Study: Service No Endpoints (Case Study, L1) — Kubernetes Networking
Cilium & eBPF Networking (Topic Pack, L2) — Kubernetes Networking
Deep Dive: Kubernetes Networking (deep_dive, L2) — Kubernetes Networking
Docker Networking Flashcards (CLI) (flashcard_deck, L1) — Kubernetes Networking

Service Mesh (Istio / Linkerd) - Primer¶

Why This Matters¶

Core Concepts¶

What Is a Service Mesh?¶

Data Plane vs Control Plane¶

Sidecar Injection¶

Mutual TLS (mTLS)¶

How It Works¶

Modes¶

Traffic Management¶

Virtual Services & Destination Rules (Istio)¶

Traffic Splits (Linkerd)¶

Retries & Timeouts¶

Observability¶

Istio vs Linkerd¶

Common Pitfalls¶

Installation Quick Reference¶

Wiki Navigation¶

Prerequisites¶

Next Steps¶

Pages that link here¶

Service Mesh (Istio / Linkerd) - Primer¶

Why This Matters¶

Core Concepts¶

What Is a Service Mesh?¶

Data Plane vs Control Plane¶

Sidecar Injection¶

Mutual TLS (mTLS)¶

How It Works¶

Modes¶

Traffic Management¶

Virtual Services & Destination Rules (Istio)¶

Traffic Splits (Linkerd)¶

Retries & Timeouts¶

Observability¶

Istio vs Linkerd¶

Common Pitfalls¶

Installation Quick Reference¶

Wiki Navigation¶

Prerequisites¶

Next Steps¶

Related Content¶

Pages that link here¶