Pattern: Timeout Assumed = Not Executed¶

ID: FP-018 Family: Split Brain Frequency: Very Common Blast Radius: Single Service to Multi-Service Detection Difficulty: Actively Misleading

The Shape¶

A client sends a request and receives a timeout (or network error). The client assumes the request was never processed and retries. In fact, the request was received and executed by the server; the response was lost in transit. The server now executes the same request twice. For non-idempotent operations (creating a record, charging a card, sending an email, deducting inventory), this causes duplicate execution — two charges, two records, two emails.

How You'll See It¶

In Kubernetes¶

A pod sends an HTTP POST to an upstream service; the upstream is slow. The pod's HTTP client times out. The pod retries the POST. The upstream executed both POSTs; now there are two records in the database. The readiness probe timeout doesn't indicate failure; the upstream was just slow.

In Linux/Infrastructure¶

A deployment script runs terraform apply. Network connection drops mid-apply. Script is re-run. Terraform state shows partial apply; resources were created in the first run but state wasn't written. Second run may try to create the same resources again, failing with "resource already exists."

In Databases¶

A transaction is committed on the server; the commit acknowledgment is lost before reaching the client. Client retries the transaction. If the transaction isn't idempotent (e.g., INSERT without ON CONFLICT DO NOTHING), the retry creates a duplicate row.

In CI/CD¶

CI pipeline step that provisions infrastructure times out due to slow response from cloud API. Re-run creates a second copy of the provisioned resource (second ELB, second RDS instance) alongside the first. Costs double; one resource is orphaned.

The Tell¶

Duplicate records in the database, duplicate charges, duplicate emails — all originating from the same logical user action. Request logs on the server show the request executed twice; client logs show it was sent twice (retry after timeout). The first execution and the retry are differentiated only by timestamp (seconds apart).

Common Misdiagnosis¶

Looks Like	But Actually	How to Tell the Difference
Bug in application logic	Timeout + retry duplicates	Server received request twice; client retried after timeout
User double-clicked	Single user action, two server executions	Server-side logs show two requests from same session within seconds
Race condition	Sequential retry of a non-idempotent operation	Second request comes exactly `timeout_duration` after first

The Fix (Generic)¶

Immediate: Identify and deduplicate the affected records (using business-logic criteria: same user, same timestamp window, same amount).
Short-term: Make the endpoint idempotent: accept a client-generated idempotency_key header; store keys in a short-lived table; return the cached response if the key was already processed.
Long-term: Design all state-changing APIs as idempotent from the start; use INSERT ... ON CONFLICT DO NOTHING in databases; adopt the "at-least-once delivery + idempotent consumer" pattern for all critical operations.

Real-World Examples¶

Example 1: Payment service timeout at 30s. Network was slow (45s actual response time). Client retried. User was charged twice. Business impact: immediate refund + trust damage.
Example 2: Email notification service: POST request to send welcome email timed out after 10s. Retry sent. Email service received both; user got two "Welcome!" emails. Both had been processed; the timeout was in the HTTP response, not the send.

War Story¶

Users were getting double-charged. We reviewed the payment code: no bugs. Reviewed the payment processor logs: two calls, 32 seconds apart, same amount, same card, different idempotency state because we weren't using idempotency keys. Our client had a 30s timeout; the processor was slow that day. The "failed" first request actually succeeded. Adding idempotency keys (client generates UUID per payment attempt; processor deduplicates on it) reduced double-charges to zero. The pattern was in Stripe's documentation the whole time; we just hadn't read it.

Cross-References¶

Topic Packs: distributed-systems, database-ops
Footguns: distributed-systems/footguns.md — "Timeout as 'request not executed'"
Related Patterns: FP-009 (retry storm — same retry mechanism at scale), FP-016 (dual-write divergence — another non-atomic write problem)