Skip to content

Mental Model: Bulkhead

Category: Architecture & Design Origin: Naval architecture (watertight compartments); applied to software by Michael Nygard, Release It! (2007) One-liner: Partition resources by consumer or use case so that one consumer's misbehavior cannot exhaust resources needed by others.

The Model

A ship's hull is divided into watertight compartments — bulkheads. If one compartment floods, the others remain intact. The ship can lose a section and still float. Without bulkheads, any breach anywhere sinks the whole vessel. Naval engineers don't assume the hull is perfect; they design for partial failure.

The software bulkhead applies the same principle: instead of sharing a single pool of resources (threads, connections, memory, file descriptors) across all consumers or use cases, you partition those resources into isolated pools. If one consumer exhausts its pool — due to a slow dependency, a traffic spike, or a bug — other consumers continue operating normally, drawing from their own pool. The blast radius of a failure is bounded by the partition.

There are two primary forms. Thread pool bulkheads assign separate thread pools to different downstream dependencies. If database calls are slow and consume all threads in the database pool, HTTP client calls to a payment service continue on their own pool, unaffected. Connection pool bulkheads assign separate connection pools to different consumer types or tenants. A batch processing job that opens hundreds of database connections doesn't starve the real-time API of its connections.

In Kubernetes, bulkheads manifest as resource quotas and limits at the namespace or pod level. A namespace for batch jobs has its own CPU and memory quota. A spike in batch job CPU usage doesn't steal CPU from the API serving namespace. Without namespace-level quotas, a misbehaving batch job can consume the entire node's resources, causing evictions of unrelated pods. Kubernetes ResourceQuotas and LimitRanges are bulkhead implementations.

The core trade-off is resource utilization versus isolation. Shared pools achieve higher average utilization — if one consumer is idle, another can use its allocation. Partitioned pools may leave capacity sitting idle. The bulkhead pattern accepts this cost deliberately: the guarantee of isolation is worth the overhead of over-provisioning. The appropriate degree of partitioning is a product of how much one consumer's failure is allowed to affect others, and how much spare capacity you can afford.

Visual

WITHOUT BULKHEADS (shared pool):
┌────────────────────────────────────────────────────────┐
│              Shared Thread Pool (100 threads)          │
│  [T][T][T][T][T][T][T][T][T][T][T][T][T][T][T]...     │
└────────────────────────────────────────────────────────┘
       ▲                    ▲                  ▲
  API requests          Batch jobs        Admin tasks
       │                    │
       │          Batch jobs slow; consume 97 threads
  API requests queue up → timeout → service appears down

WITH BULKHEADS (partitioned pools):
┌──────────────────┐  ┌──────────────────┐  ┌──────────────┐
│  API Pool        │  │  Batch Pool      │  │  Admin Pool  │
│  (40 threads)    │  │  (50 threads)    │  │  (10 threads)│
│  [T][T]...[T]    │  │  [T][T]...[T]   │  │  [T]...[T]   │
└──────────────────┘  └──────────────────┘  └──────────────┘
       ▲                    ▲                      ▲
  API requests          Batch jobs            Admin tasks
       │                    │
       │          Batch pool saturated → batch jobs queue/shed
  API pool unaffected → API continues serving normally

KUBERNETES RESOURCE QUOTA BULKHEAD:
┌─────────────────────────────────────────────────────────────┐
│  Namespace: api-serving                                     │
│  ResourceQuota: cpu=8, memory=16Gi, pods=20                 │
│  ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  │
│  Namespace: batch-processing                                │
│  ResourceQuota: cpu=16, memory=32Gi, pods=50                │
│  ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  │
│  Namespace: monitoring                                      │
│  ResourceQuota: cpu=2, memory=8Gi, pods=10                  │
└─────────────────────────────────────────────────────────────┘
Batch namespace saturated → api-serving namespace unaffected

When to Reach for This

  • A service has multiple distinct consumer types (real-time API, batch jobs, admin tools) that share underlying resources — a slow batch job should not degrade real-time API latency
  • You have a multi-tenant service where one tenant's high usage should not affect others' experience
  • You're designing connection pool sizing and have multiple downstream dependencies — a slow database shouldn't exhaust the connection pool used for cache lookups
  • Kubernetes workloads from different teams or functions run in the same cluster — namespace-level ResourceQuotas prevent any one team from consuming disproportionate cluster resources
  • You've experienced a failure where one slow dependency caused thread exhaustion that cascaded to unrelated functionality
  • You need to guarantee SLA tiers: premium customers get a dedicated resource pool; free-tier customers share a constrained pool

When NOT to Use This

  • Resources are so scarce that partitioning leaves any pool too small to be useful — a 10-thread pool split 5/5 may leave both consumers under-resourced when demand is even slightly uneven
  • The consumers are so similar in behavior that there's no meaningful isolation benefit — partitioning two identical workloads just creates two pools that both fill up at the same time
  • Over-partitioning creates operational complexity that exceeds the reliability benefit — managing 20 thread pools with different configurations is itself a failure mode (wrong sizing, forgotten pools)
  • Applying bulkheads without addressing the root cause of the resource exhaustion — a memory leak or infinite loop will fill any pool; bulkheads reduce blast radius but don't fix bugs

Applied Examples

Example 1: Kubernetes namespace ResourceQuota bulkheads

A platform team runs API workloads and ML training jobs in the same Kubernetes cluster. Without quotas, a data scientist submits a training job that requests 64 CPUs — the API pods get preempted due to resource pressure.

# api-serving namespace quota
apiVersion: v1
kind: ResourceQuota
metadata:
  name: api-serving-quota
  namespace: api-serving
spec:
  hard:
    requests.cpu: "8"
    requests.memory: 16Gi
    limits.cpu: "16"
    limits.memory: 32Gi
    pods: "20"
---
# batch-ml namespace quota
apiVersion: v1
kind: ResourceQuota
metadata:
  name: batch-ml-quota
  namespace: batch-ml
spec:
  hard:
    requests.cpu: "32"
    requests.memory: 64Gi
    limits.cpu: "64"
    limits.memory: 128Gi
    pods: "50"

With these quotas, the ML training job can consume its entire namespace allocation, but the api-serving namespace has its own guaranteed resources. The API pods are not evicted. The Kubernetes scheduler enforces the bulkhead at the admission level — pods that would exceed the quota are rejected with a clear error rather than scheduling and consuming resources on the node.

Example 2: Thread pool bulkheads for microservice dependencies

A Go service calls two downstream services: an inventory database (sometimes slow) and a product catalog cache (always fast). Without bulkheads, slow database calls consume the goroutine pool for both.

// Two separate worker pools — bulkheads
var (
    // Inventory DB pool: sized for expected concurrency + tolerance for slowness
    inventoryPool = pond.New(20, 100, pond.Strategy(pond.Eager()))

    // Catalog cache pool: fast calls, smaller pool needed
    catalogPool = pond.New(50, 500, pond.Strategy(pond.Eager()))
)

func GetProductDetails(productID string) (*ProductDetails, error) {
    var inventory *Inventory
    var catalog *CatalogEntry
    var invErr, catErr error

    // These run in separate pools — inventory slowness doesn't block catalog pool
    invTask := inventoryPool.Submit(func() {
        inventory, invErr = inventoryDB.Lookup(productID)
    })
    catTask := catalogPool.Submit(func() {
        catalog, catErr = catalogCache.Get(productID)
    })

    invTask.Wait()
    catTask.Wait()

    if invErr != nil {
        return nil, fmt.Errorf("inventory lookup failed: %w", invErr)
    }
    // catalog errors are non-fatal; serve with degraded data
    return merge(inventory, catalog), nil
}

When the inventory database becomes slow, the inventory pool saturates. Incoming requests to inventoryPool.Submit() return a queue-full error immediately (shedding load). The catalog pool is completely unaffected. Callers can implement fallback logic independently per pool.

Example 3: Tenant-level bulkheads in a SaaS API

A multi-tenant SaaS platform serves both free-tier and enterprise customers from the same API fleet. A free-tier user runs an automated script that fires 500 requests/second, saturating the shared connection pool. Enterprise customers experience degraded latency.

The fix: assign separate connection pools — and separate API worker pools — per customer tier:

import threading
from queue import Queue, Full

class TieredWorkerPool:
    """Bulkhead: separate worker capacity per customer tier."""

    POOL_SIZES = {
        "enterprise": 80,   # 80 workers dedicated to enterprise
        "pro":        60,   # 60 workers for pro tier
        "free":       20,   # 20 workers max for free tier (hard cap)
    }

    def __init__(self):
        self.queues = {
            tier: Queue(maxsize=size * 2)  # queue = 2x workers
            for tier, size in self.POOL_SIZES.items()
        }
        self.workers = {}
        for tier, size in self.POOL_SIZES.items():
            self.workers[tier] = [
                threading.Thread(
                    target=self._worker,
                    args=(self.queues[tier],),
                    daemon=True,
                    name=f"worker-{tier}-{i}"
                )
                for i in range(size)
            ]
            for w in self.workers[tier]:
                w.start()

    def submit(self, tier: str, task):
        queue = self.queues.get(tier, self.queues["free"])
        try:
            queue.put_nowait(task)
        except Full:
            raise BulkheadFullError(f"Tier '{tier}' worker pool is at capacity")

    def _worker(self, queue: Queue):
        while True:
            task = queue.get()
            try:
                task()
            finally:
                queue.task_done()

When the free-tier script saturates the free pool (20 workers, 40-item queue), subsequent free-tier requests get BulkheadFullError immediately — not a timeout. Enterprise workers are unaffected; they draw from their own 80-worker pool. The platform team can tune each pool independently without touching the others.

The Junior vs Senior Gap

Junior Senior
Creates one connection pool for the database and configures it once globally Creates separate connection pools for read-heavy queries, write operations, and admin tasks
Runs all Kubernetes workloads in the default namespace without resource quotas Namespaces workloads by function with ResourceQuotas; makes quota violations immediately visible
Investigates why the API is slow; discovers a batch job is consuming all DB connections Proactively separates batch and API connection pools before the incident occurs
Sizes thread pools by "what the docs suggested" Sizes thread pools based on measured concurrency needs plus headroom; uses metrics to right-size over time
Treats resource exhaustion as a capacity problem (add more servers) Treats resource exhaustion as a partitioning problem (who is consuming what, and should they be isolated?)
Defines bulkheads after an incident Defines bulkhead boundaries during service design, as part of the failure mode analysis

Connections

  • Complements: Circuit Breaker (use together for — bulkheads limit the resources any one consumer can exhaust; circuit breakers stop calls to a failing dependency before resources are consumed at all; both are defenses against cascade failure and work at different layers)
  • Complements: 12-Factor App (use together for — 12-factor's process model and port binding describe how to run services; bulkheads describe how much of shared infrastructure each process type is allowed to consume)
  • Tensions: Idempotency (contradicts when — bulkheads can cause requests to be dropped or shed rather than retried; if the shed requests aren't retried by the client, idempotency becomes irrelevant; ensure clients understand when they should retry a rejected request vs. treat it as a permanent failure)
  • Topic Packs: kubernetes
  • Case Studies: node-pressure-evictions (node pressure evictions are the Kubernetes-level consequence of missing bulkheads — pods that share a node without resource limits allow one workload to starve others until the kubelet evicts them), resource-quota-blocking-deploy (ResourceQuotas that are too tight block legitimate deploys; bulkhead sizing must be calibrated, not just installed)