Pattern: Cache Stampede¶

ID: FP-010 Family: Thundering Herd Frequency: Common Blast Radius: Multi-Service Detection Difficulty: Moderate

The Shape¶

A cached item expires or is evicted. Simultaneously, multiple callers (goroutines, threads, pods) find the cache miss and all fetch the underlying data source to repopulate the cache. The data source receives N identical requests at once, where N equals the concurrent requesters. The cache was protecting the data source from exactly this load; its absence triggers the overload.

How You'll See It¶

In Kubernetes¶

Cache pod (Redis, Memcached) restarts or flushes. All application pods simultaneously hit the database backend to repopulate their caches. Database CPU spikes to 100%; application pods experience elevated latency; in severe cases, pods OOMKill or enter CrashLoopBackOff due to connection pool exhaustion (see FP-002).

In Linux/Infrastructure¶

A large in-memory dataset expires from a local application cache (e.g., a Python dict with TTL). Multiple request handler threads simultaneously query the database. Normally the cache absorbs 99% of reads; 1% reach the DB. During the stampede, 100% hit the DB for 500ms until the cache is repopulated.

In CI/CD¶

Build cache (e.g., Docker layer cache, Gradle cache) is invalidated by a base image update. 50 parallel builds all simultaneously pull the base image from the registry. Registry bandwidth is saturated for 5 minutes.

The Tell¶

Cache hit rate drops to near zero for a brief period. Data source (database, API) request rate spikes sharply at the same moment. Duration of the overload is proportional to the time to repopulate the cache, not the time to serve a single request.

Common Misdiagnosis¶

Looks Like	But Actually	How to Tell the Difference
Database overload	Cache stampede	Cache eviction/restart timestamp matches DB spike timestamp
Traffic spike	Cache miss amplification	Actual user traffic flat; DB spike is a multiple of normal cache-miss rate
Slow query	Simultaneous identical queries	`pg_stat_activity` shows dozens of identical queries at the same instant

The Fix (Generic)¶

Immediate: Manually repopulate the cache with a single process; block concurrent repopulation (mutex/lock).
Short-term: Implement "probabilistic early expiry" (refresh slightly before TTL expires); add a mutex around cache-miss repopulation (single-flight pattern).
Long-term: Use a singleflight or coalescing cache; stagger TTLs (add random(0, TTL*0.1) to each cached item's TTL to prevent synchronized expiry).

Real-World Examples¶

Example 1: Redis was flushed for maintenance. 200 application pods simultaneously queried PostgreSQL for a product catalog (10,000 rows). Postgres received 200 × 1 = 200 identical full-table reads. Postgres CPU hit 100%; pods timed out waiting for responses.
Example 2: CDN cache expired for a popular article at exactly midnight (round-number TTL). 1,200 simultaneous readers triggered 1,200 origin requests. Origin rate limit kicked in; 90% of readers received 429s.

War Story¶

We scheduled the Redis flush for 2am "off-peak." At 2:00:00, the cache was cleared. By 2:00:01, our Datadog dashboard showed database connections at 847 (max 1000). By 2:00:03, connection pool exhausted; apps returning 503. The "quiet" time had about 200 background jobs all running simultaneously, all warming their own caches. We didn't anticipate that "off-peak" for user traffic didn't mean "off-peak" for background jobs. Fix: pre-warm the cache with a single script before flushing the old one, and use a read-through cache with a mutex.

Cross-References¶

Topic Packs: distributed-systems, database-ops
Footguns: distributed-systems/footguns.md — "Cache stampede on cold start"
Related Patterns: FP-009 (retry storm — same thundering herd mechanism), FP-011 (restart avalanche — same shape in K8s)