Skip to content

Mental Model: Event Sourcing

Category: Architecture & Design Origin: Domain-Driven Design community; Greg Young is credited with formalizing the pattern (~2010); informed by Martin Fowler's earlier writings on Event Sourcing One-liner: Instead of storing the current state of an entity, store every event that caused the state to change — the current state is always derivable by replaying the event log.

The Model

Most databases store the current state of the world: a row in the orders table reflects what an order looks like right now. When an order is shipped, you update the status column from pending to shipped. The previous state — pending — is gone. You know what things are; you don't know what happened. Event sourcing inverts this: instead of storing the current state, you store every event that changed the state. The current state is not stored directly; it's computed by replaying events from the beginning (or from a snapshot).

The event log for an order might look like: - OrderPlaced { orderId: 123, items: [...], total: $47.50 } at T=0 - PaymentConfirmed { orderId: 123, transactionId: "txn_abc" } at T+2s - ShipmentDispatched { orderId: 123, trackingId: "FEDEX-999" } at T+3d - OrderDelivered { orderId: 123 } at T+5d

To know the current state of order 123, replay these four events. The result is an Order object in Delivered state. You also know every transition it went through, who triggered it, and when. This history is the source of truth — it cannot be changed retroactively (events are immutable and append-only).

The critical insight is that an append-only event log is the most durable and auditable form of storage. Traditional databases tell you the answer; event sourcing tells you the derivation. This matters enormously for audit trails (financial transactions, compliance, medical records), for debugging (replay the exact sequence of events that led to a bug), and for building new projections of old data (add a new read model and populate it by replaying historical events). Kafka, NATS JetStream, and EventStoreDB are infrastructure built for this pattern.

Event sourcing pairs naturally with CQRS (Command Query Responsibility Segregation): the write side appends events; the read side maintains one or more projections — materialized views built by consuming the event stream. A projection is an eventually consistent read model optimized for specific query patterns. You can have many projections from the same event stream, each shaped for a different consumer, without the event log caring.

The boundary conditions where event sourcing creates problems: queries. You cannot efficiently query "all orders where status = shipped and total > $100" against an event log — that requires a projection. Every query pattern you need requires a maintained projection, which adds operational complexity. Event log storage also grows without bound — a system with billions of events and no retention strategy accumulates significant storage costs. Snapshots (periodic point-in-time state captures) mitigate replay latency, but add another artifact to manage. Event sourcing is overkill for simple CRUD applications where history has no business value.

Visual

TRADITIONAL (MUTABLE STATE):
┌────────────────────────────────────────┐
  orders table                            id  | status    | total | updated_at    123 | delivered | 47.50 | 2026-03-15    history gone
└────────────────────────────────────────┘

EVENT SOURCING (APPEND-ONLY EVENT LOG):
┌────────────────────────────────────────────────────────────┐
  order_events (Kafka topic / EventStoreDB stream)                                                                        T=0    OrderPlaced       { id:123, items:[...], $47.50 }    T+2s   PaymentConfirmed  { id:123, txn:"txn_abc" }          T+3d   ShipmentDispatched{ id:123, tracking:"FEDEX-999" }   T+5d   OrderDelivered    { id:123 }                       └────────────────────────────────────────────────────────────┘
                   replay / project
         ┌────────────────────────────────┐  ┌─────────────────────────┐
  Projection: order-status          Projection: analytics    (optimized for status lookup)     (optimized for revenue   id  | status    | tracking         aggregation)            123 | delivered | FEDEX-999       ...                    └────────────────────────────────┘  └─────────────────────────┘

CQRS + EVENT SOURCING FLOW:
  Client                Command Side          Event Store
                                                    ├── POST /orders/ship ───►│                                                ├── validate ────────►│
                                                                            ├── append event ────►│ ShipmentDispatched
                            │◄───────────────────┤ (offset: 42)
    │◄── 202 Accepted ───────┤                                                                                                       Query Side                                                                                  ◄── consume events ─────┘
                          ── update projection ──►
                          ── answer GET /orders/123 ──►

SNAPSHOT OPTIMIZATION (avoids full replay for old aggregates):
  events[0..999]  Snapshot(state_at_999)  events[1000..now]
  Current state = apply(events[1000..now], starting_from: Snapshot)
flowchart LR
    subgraph Write Side
        CMD["Command\nPOST /orders/ship"] --> VAL[Validate]
        VAL --> ES[(Event Store\nappend-only log)]
    end

    subgraph Event Log
        E1["OrderPlaced"] --> E2["PaymentConfirmed"]
        E2 --> E3["ShipmentDispatched"]
        E3 --> E4["OrderDelivered"]
    end

    subgraph Read Side
        ES -->|consume events| P1["Projection:\norder-status"]
        ES -->|consume events| P2["Projection:\nanalytics"]
    end

When to Reach for This

  • You need a full, immutable audit trail — financial systems, medical records, compliance-heavy workflows where "what happened and when" is as important as "what is the current state"
  • You want to support temporal queries: "what was the state of this order on March 5th at 2 PM?" — impossible with mutable state, trivial with event sourcing
  • You're building in a domain where business events are the primary language (e.g., "OrderPlaced", "PaymentFailed", "ItemReturned") and the state model naturally follows
  • You need to support multiple read models of the same data without coupling their schemas to the write model — add a new projection without migrating the write side
  • You're building a distributed system where Change Data Capture (CDC) feeds are a natural integration pattern — Debezium + Kafka is event sourcing at the infrastructure level
  • You anticipate that the business rules for deriving state may change, and you want to retroactively apply new rules to historical events

When NOT to Use This

  • Simple CRUD applications where history has no business value — a user profile that just needs to be updated doesn't benefit from storing every field change; the complexity cost is not justified
  • Systems with extremely high event volume and no data retention budget — an IoT sensor emitting 100 events/second for 10,000 devices generates 86 billion events per day; you need a tiered retention and snapshotting strategy before adopting event sourcing
  • Teams unfamiliar with eventually consistent read models — projections are eventually consistent; if your application requires immediate read-after-write consistency in the same request, you need to design carefully or choose a different pattern
  • Queries are the primary access pattern and the query shapes are complex and varied — every query shape requires a maintained projection; a SQL database with good indexing is simpler for query-heavy workloads

Applied Examples

Example 1: Financial account ledger with Kafka

A fintech product tracks user balances. Using mutable state, a balance is a single number. With event sourcing, the balance is derived from a ledger of events:

# Events are plain data classes — immutable, serialized to Kafka
from dataclasses import dataclass
from datetime import datetime
from decimal import Decimal

@dataclass(frozen=True)
class MoneyDeposited:
    account_id: str
    amount: Decimal
    currency: str
    reference_id: str  # idempotency key
    occurred_at: datetime

@dataclass(frozen=True)
class MoneyWithdrawn:
    account_id: str
    amount: Decimal
    currency: str
    reference_id: str
    occurred_at: datetime

# Aggregate — reconstructed by replaying events
class Account:
    def __init__(self, account_id: str):
        self.account_id = account_id
        self.balance = Decimal("0.00")
        self.version = 0

    def apply(self, event) -> None:
        if isinstance(event, MoneyDeposited):
            self.balance += event.amount
        elif isinstance(event, MoneyWithdrawn):
            self.balance -= event.amount
        self.version += 1

def load_account(account_id: str, event_store) -> Account:
    account = Account(account_id)
    events = event_store.load_stream(f"account-{account_id}")
    for event in events:
        account.apply(event)
    return account

The event log is stored in Kafka (or EventStoreDB). To know the balance, replay events. To audit a disputed transaction, inspect the raw event stream. To build a "monthly statement" projection, consume the Kafka topic and aggregate events per calendar month.

Example 2: Change Data Capture as event sourcing at the infrastructure layer

A team has an existing PostgreSQL database for their orders service. They need to feed order state changes to a downstream analytics system and a fulfillment system. Rather than dual-writing from the application, they use Debezium (CDC) to capture every row change as a Kafka event:

// Debezium connector config
{
  "name": "orders-cdc-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "postgres.internal",
    "database.port": "5432",
    "database.user": "debezium",
    "database.dbname": "orders",
    "table.include.list": "public.orders",
    "plugin.name": "pgoutput",
    "publication.name": "dbz_publication",
    "topic.prefix": "orders-db"
  }
}

Every INSERT, UPDATE, and DELETE on the orders table emits an event to Kafka topic orders-db.public.orders. Downstream consumers (analytics, fulfillment) consume this topic and build their own projections. The application team didn't change a single line of application code — the event stream is derived from the database's built-in WAL (Write-Ahead Log). This is the pragmatic, incremental version of event sourcing: the application uses mutable state, but the integration layer is event-sourced.

The Junior vs Senior Gap

Junior Senior
Stores current state and adds an updated_at column for auditing Stores events as the source of truth; current state is derived, not stored
Queries the event log directly with complex WHERE clauses Maintains purpose-built projections for each query pattern; never queries the raw event log for read operations
Replays all events from offset 0 on every request Snapshots aggregates periodically; replays only events since the last snapshot
Assumes event sourcing means no schema changes ever Designs event versioning and upcasting from day one; event schemas evolve, and old events must be readable by new code
Treats event sourcing as a storage format choice Treats event sourcing as a design philosophy that shapes how business logic is expressed (commands, events, aggregates)
Adopts event sourcing for a simple user settings service Applies event sourcing selectively to domains where history and auditability have genuine business value

Connections

  • Complements: Idempotency (use together for — in at-least-once delivery systems, the same event may be delivered multiple times; event consumers must be idempotent — applying the same event twice produces the same state as applying it once; this is fundamental to building reliable event-sourced systems)
  • Complements: Circuit Breaker (use together for — when event consumers call downstream services to populate projections, circuit breakers prevent projection-building from failing catastrophically when a downstream dependency is unavailable; the consumer pauses and resumes rather than crashing)
  • Tensions: 12-Factor App (contradicts when — event sourcing relies on a durable, persistent event store that is inherently stateful; a naive reading of 12-factor's "stateless processes" doesn't accommodate event log consumers that must track their offset position; this state must be externalized to the broker or a dedicated store, not held in process memory)
  • Topic Packs: kafka, distributed-systems