Pattern: Dual-Write Divergence¶
ID: FP-016 Family: Split Brain Frequency: Common Blast Radius: Multi-Service Detection Difficulty: Actively Misleading
The Shape¶
An application writes the same logical update to two separate stores (e.g., a database and a cache, or two databases in different regions) in a non-atomic sequence. If the second write fails or is delayed, the two stores hold different values for the same logical entity. Reads from either store return different answers. The system appears to work — no errors are thrown — but serves inconsistent data silently.
How You'll See It¶
In Kubernetes¶
User profile updated in Postgres; update then written to Redis cache. Network call to Redis fails (timeout). Error is swallowed (logged but not returned to user). Subsequent reads from the Redis cache return the old profile. User sees their update revert. No error in application logs — only a warning about the Redis write.
In Linux/Infrastructure¶
Application writes to a local database AND sends an event to a message queue. The message queue broker is temporarily unavailable. The local write succeeds; the event is not sent. Downstream consumers never learn about the change. System state diverges.
In Networking¶
DNS record updated in primary DNS server. Zone transfer to secondary DNS fails (network issue between DNS servers). Primary returns new record; secondary returns old record. 50% of queries (resolved by secondary) get wrong answers.
The Tell¶
Two reads of the same entity, from different sources, return different values. Application logs show a warning on the second write (ignored error) near the time of the divergence. The divergence is permanent (doesn't self-heal) unless a reconciliation mechanism exists.
Common Misdiagnosis¶
| Looks Like | But Actually | How to Tell the Difference |
|---|---|---|
| Caching bug | Non-atomic dual-write failure | Manual check: DB and cache hold different values for the same key |
| Eventual consistency | Silent divergence | With eventual consistency, stores converge; dual-write failure is permanent without intervention |
| User error | Data was correctly written to one store | DB shows correct value; cache shows stale value |
The Fix (Generic)¶
- Immediate: Identify diverged records; manually sync them; flush/invalidate the stale cache entries.
- Short-term: Implement "cache-aside" pattern (read from DB, populate cache on miss) rather than dual-write; or use an outbox pattern (write to DB, let a background process populate cache from the DB change).
- Long-term: Never write to two systems without a transactional guarantee or at-least-once delivery; use the outbox pattern for cross-system consistency; add periodic reconciliation jobs that compare and alert on divergence.
Real-World Examples¶
- Example 1: E-commerce inventory: write to Postgres + write to Redis. Redis write fails silently. Product page (reads Redis) shows 10 in stock; checkout (reads Postgres) shows 2. Overselling occurs until cache expires.
- Example 2: User preference service: write to Postgres + Elasticsearch. Elasticsearch cluster slow during deploy. 3,000 records diverged between Postgres and Elasticsearch. Search returned stale preferences for 2 hours.
War Story¶
We got a bug report: "I saved my profile and it reverted." We looked at the code: it wrote to Postgres (success) then to Redis (timeout — swallowed). We had literally
except RedisError: logger.warning("Redis write failed"). The user saw their Postgres write succeed, but every subsequent request hit Redis and got stale data. The Postgres write was correct. The user refreshed 10 times, each time seeing the old value, convinced the save was broken. We switched to cache-invalidate-on-write (delete the Redis key; next read repopulates from Postgres). The write failure became a cache miss, not a divergence.
Cross-References¶
- Topic Packs: distributed-systems, database-ops
- Footguns: distributed-systems/footguns.md — "Non-atomic read-modify-write across network"
- Related Patterns: FP-015 (stale leader — cluster-level dual-write), FP-025 (untested backup — another "believed consistent, wasn't" pattern)