Pattern: Timeout Assumed = Not Executed¶
ID: FP-018 Family: Split Brain Frequency: Very Common Blast Radius: Single Service to Multi-Service Detection Difficulty: Actively Misleading
The Shape¶
A client sends a request and receives a timeout (or network error). The client assumes the request was never processed and retries. In fact, the request was received and executed by the server; the response was lost in transit. The server now executes the same request twice. For non-idempotent operations (creating a record, charging a card, sending an email, deducting inventory), this causes duplicate execution — two charges, two records, two emails.
How You'll See It¶
In Kubernetes¶
A pod sends an HTTP POST to an upstream service; the upstream is slow. The pod's HTTP client times out. The pod retries the POST. The upstream executed both POSTs; now there are two records in the database. The readiness probe timeout doesn't indicate failure; the upstream was just slow.
In Linux/Infrastructure¶
A deployment script runs terraform apply. Network connection drops mid-apply. Script
is re-run. Terraform state shows partial apply; resources were created in the first
run but state wasn't written. Second run may try to create the same resources again,
failing with "resource already exists."
In Databases¶
A transaction is committed on the server; the commit acknowledgment is lost before
reaching the client. Client retries the transaction. If the transaction isn't idempotent
(e.g., INSERT without ON CONFLICT DO NOTHING), the retry creates a duplicate row.
In CI/CD¶
CI pipeline step that provisions infrastructure times out due to slow response from cloud API. Re-run creates a second copy of the provisioned resource (second ELB, second RDS instance) alongside the first. Costs double; one resource is orphaned.
The Tell¶
Duplicate records in the database, duplicate charges, duplicate emails — all originating from the same logical user action. Request logs on the server show the request executed twice; client logs show it was sent twice (retry after timeout). The first execution and the retry are differentiated only by timestamp (seconds apart).
Common Misdiagnosis¶
| Looks Like | But Actually | How to Tell the Difference |
|---|---|---|
| Bug in application logic | Timeout + retry duplicates | Server received request twice; client retried after timeout |
| User double-clicked | Single user action, two server executions | Server-side logs show two requests from same session within seconds |
| Race condition | Sequential retry of a non-idempotent operation | Second request comes exactly timeout_duration after first |
The Fix (Generic)¶
- Immediate: Identify and deduplicate the affected records (using business-logic criteria: same user, same timestamp window, same amount).
- Short-term: Make the endpoint idempotent: accept a client-generated
idempotency_keyheader; store keys in a short-lived table; return the cached response if the key was already processed. - Long-term: Design all state-changing APIs as idempotent from the start; use
INSERT ... ON CONFLICT DO NOTHINGin databases; adopt the "at-least-once delivery + idempotent consumer" pattern for all critical operations.
Real-World Examples¶
- Example 1: Payment service timeout at 30s. Network was slow (45s actual response time). Client retried. User was charged twice. Business impact: immediate refund + trust damage.
- Example 2: Email notification service: POST request to send welcome email timed out after 10s. Retry sent. Email service received both; user got two "Welcome!" emails. Both had been processed; the timeout was in the HTTP response, not the send.
War Story¶
Users were getting double-charged. We reviewed the payment code: no bugs. Reviewed the payment processor logs: two calls, 32 seconds apart, same amount, same card, different idempotency state because we weren't using idempotency keys. Our client had a 30s timeout; the processor was slow that day. The "failed" first request actually succeeded. Adding idempotency keys (client generates UUID per payment attempt; processor deduplicates on it) reduced double-charges to zero. The pattern was in Stripe's documentation the whole time; we just hadn't read it.
Cross-References¶
- Topic Packs: distributed-systems, database-ops
- Footguns: distributed-systems/footguns.md — "Timeout as 'request not executed'"
- Related Patterns: FP-009 (retry storm — same retry mechanism at scale), FP-016 (dual-write divergence — another non-atomic write problem)