Skip to content

Portal | Level: L3: Advanced | Topics: Service Mesh, Kubernetes Networking | Domain: Kubernetes

Service Mesh (Istio / Linkerd) - Primer

Why This Matters

As microservices multiply, every team rediscovers the same problems: retries, timeouts, mutual TLS, canary routing, and observability. A service mesh moves this cross-cutting logic out of application code and into the infrastructure layer. Understanding service meshes is critical for anyone operating production Kubernetes at scale.

Core Concepts

What Is a Service Mesh?

A service mesh is a dedicated infrastructure layer for managing service-to-service communication. It works by injecting a sidecar proxy (usually Envoy) alongside every pod, forming a data plane. A control plane configures those proxies to enforce policies.

[Service A] <---> [Sidecar Proxy] <--- mesh ---> [Sidecar Proxy] <---> [Service B]
                                    |
                              [Control Plane]
                         (Istiod / Linkerd control)

Data Plane vs Control Plane

Component What it does Examples
Data plane Proxies that intercept all traffic between pods Envoy (Istio), linkerd2-proxy
Control plane Configures proxies, issues certificates, collects telemetry Istiod, Linkerd destination/identity

Sidecar Injection

Both Istio and Linkerd automatically inject a proxy container into pods. This is done via a Kubernetes mutating admission webhook.

# Istio: label namespace for auto-injection
kubectl label namespace myapp istio-injection=enabled

# Linkerd: annotate namespace for auto-injection
kubectl annotate namespace myapp linkerd.io/inject=enabled

After injection, each pod has two containers: your app and the proxy.

Mutual TLS (mTLS)

The killer feature. A service mesh gives you encryption and identity between all services without changing application code.

Fun fact: The term "service mesh" was coined by William Morgan, co-founder of Buoyant (the company behind Linkerd), in a 2017 blog post. Linkerd (originally written in Scala on the JVM) was the first service mesh, inspired by Twitter's Finagle library. Istio (Google, IBM, Lyft) launched the same year with Envoy as its data plane, and the "mesh wars" began.

How It Works

  1. Control plane acts as a Certificate Authority (CA)
  2. Each proxy gets a short-lived TLS certificate (identity = ServiceAccount)
  3. Every connection is encrypted and mutually authenticated
  4. Certificates rotate automatically (24h default in Istio)

Modes

Mode Behavior
Permissive (default in Istio) Accept both plaintext and mTLS
Strict Require mTLS; reject plaintext
# Istio: enforce strict mTLS namespace-wide
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: myapp
spec:
  mtls:
    mode: STRICT

Traffic Management

Virtual Services & Destination Rules (Istio)

# Route 90% of traffic to v1, 10% to v2 (canary)
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: my-service
spec:
  hosts:
    - my-service
  http:
    - route:
        - destination:
            host: my-service
            subset: v1
          weight: 90
        - destination:
            host: my-service
            subset: v2
          weight: 10
---
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: my-service
spec:
  host: my-service
  subsets:
    - name: v1
      labels:
        version: v1
    - name: v2
      labels:
        version: v2

Traffic Splits (Linkerd)

Linkerd uses the SMI (Service Mesh Interface) TrafficSplit CRD:

apiVersion: split.smi-spec.io/v1alpha2
kind: TrafficSplit
metadata:
  name: my-service
spec:
  service: my-service
  backends:
    - service: my-service-v1
      weight: 900m
    - service: my-service-v2
      weight: 100m

Retries & Timeouts

# Istio: timeout + retries
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: my-service
spec:
  hosts:
    - my-service
  http:
    - timeout: 5s
      retries:
        attempts: 3
        perTryTimeout: 2s
        retryOn: 5xx,reset,connect-failure
      route:
        - destination:
            host: my-service

Observability

A mesh gives you golden signals for free (no instrumentation required):

Signal What you get
Request rate RPS per service/route
Error rate 4xx/5xx percentages
Latency p50/p95/p99 per route
Topology Live service dependency graph
# Istio: Kiali dashboard
istioctl dashboard kiali

# Linkerd: built-in dashboard
linkerd viz dashboard

# Linkerd: CLI stats
linkerd viz stat deploy -n myapp

Istio vs Linkerd

Aspect Istio Linkerd
Proxy Envoy (C++) linkerd2-proxy (Rust)
Resource overhead Higher (~100MB/sidecar) Lower (~20MB/sidecar)
Complexity More features, more config Simpler, opinionated
Traffic management VirtualService/DestinationRule SMI TrafficSplit
mTLS Automatic, configurable Always on
Best for Complex routing, multi-cluster Simplicity, low overhead

Under the hood: Istio's sidecar injection works via a Kubernetes mutating admission webhook. When a pod is created in a labeled namespace, the webhook intercepts the API request and adds an istio-proxy container and an istio-init init container (which sets up iptables rules to redirect all traffic through the proxy). This is why restarting pods after enabling injection is required — pods created before the webhook was active have no sidecar.

Gotcha: A common Istio debugging trap: port naming. Istio requires service ports to be named with a protocol prefix like http-web or grpc-api. If a port is unnamed or uses a non-standard prefix, Istio treats it as opaque TCP — no L7 routing, no retries, no metrics by route. This silent fallback to TCP causes "my VirtualService rules have no effect" incidents that waste hours.

Common Pitfalls

  1. Sidecar not injected — Namespace not labeled/annotated. Pods created before injection was enabled need restart.
  2. Port naming — Istio requires ports named with protocol prefix (e.g., http-web, grpc-api). Unnamed ports are treated as TCP.
  3. Resource overhead — Each sidecar uses CPU/memory. Budget for it in resource requests.
  4. Init container ordering — The mesh init container must run before app init containers that need network.
  5. Health check bypass — Kubelet health checks may bypass the proxy. Istio rewrites probes automatically; Linkerd does not intercept them.

Interview tip: When asked "when would you use a service mesh?", the strongest answer focuses on the inflection point: "When you have enough services that implementing retries, timeouts, mTLS, and observability in each one becomes a maintenance burden." For 3-5 services, a mesh is overkill. For 20+, the cross-cutting concerns justify the operational overhead of sidecar management.

Installation Quick Reference

# Istio (istioctl)
istioctl install --set profile=demo -y
kubectl label namespace default istio-injection=enabled

# Linkerd (CLI)
linkerd install --crds | kubectl apply -f -
linkerd install | kubectl apply -f -
linkerd viz install | kubectl apply -f -
linkerd check

Wiki Navigation

Prerequisites

Next Steps