Kafka — Trivia & Interesting Facts¶

Surprising, historical, and little-known facts about Apache Kafka.

Kafka was named after the author Franz Kafka¶

Jay Kreps, one of Kafka's creators at LinkedIn, named it after the Czech writer Franz Kafka because "it is a system optimized for writing" and Kreps liked Kafka's work. He has admitted the name has no deep technical significance — it just sounded good for a messaging system.

LinkedIn processes over 7 trillion messages per day through Kafka¶

As of the early 2020s, LinkedIn's Kafka deployment handled over 7 trillion messages per day across multiple data centers. This makes LinkedIn's internal deployment one of the largest message streaming systems ever built. Kafka was originally created at LinkedIn in 2010 specifically to handle their growing data pipeline needs.

Kafka's append-only log design was inspired by database write-ahead logs¶

Jay Kreps, Neha Narkhede, and Jun Rao designed Kafka's core abstraction — an append-only, immutable log — by taking the write-ahead log concept from databases and making it the primary data structure rather than a recovery mechanism. This insight — that the log itself is the fundamental data structure — was detailed in Kreps' influential 2013 blog post "The Log."

Removing ZooKeeper from Kafka took over 4 years¶

KIP-500, the proposal to remove Kafka's dependency on Apache ZooKeeper, was submitted in 2019. The replacement, KRaft (Kafka Raft), reached production readiness in Apache Kafka 3.3 (2022) and ZooKeeper was deprecated in 3.5 (2023). The migration was one of the most complex architectural changes in Apache project history, requiring a complete rewrite of the metadata management layer.

A single Kafka broker can handle over 2 million messages per second¶

LinkedIn engineers have published benchmarks showing a single Kafka broker handling over 2 million messages per second with message sizes of 100 bytes. This throughput is achieved because Kafka uses sequential disk I/O, OS page cache, zero-copy transfers via sendfile(), and batching — bypassing the JVM heap entirely for data transfer.

Kafka's consumer group rebalancing has been called "the worst part of Kafka"¶

Consumer group rebalancing — the process of redistributing partitions among consumers when the group membership changes — has been Kafka's most complained-about feature. During a rebalance, all consumers stop processing (a "stop-the-world" pause). This motivated years of work on incremental cooperative rebalancing (KIP-429), which finally made rebalances less disruptive starting in Kafka 2.4 (2019).

Confluent was valued at over $8 billion built primarily around Kafka¶

Confluent, founded in 2014 by the three creators of Kafka (Kreps, Narkhede, Rao), went public in 2021 with a valuation exceeding $8 billion. This makes Kafka one of the most commercially successful open-source projects ever created, with a single company built primarily around operational tooling and a managed cloud service for the project.

Exactly-once semantics in Kafka was considered impossible for years¶

The distributed systems community long held that exactly-once message delivery was impossible in practice (only at-least-once or at-most-once). Kafka 0.11 (2017) introduced "exactly-once semantics" through idempotent producers and transactional writes. The feature was met with skepticism, but it works by combining idempotency keys with two-phase commit across partitions.

Kafka topics can retain data for years, not just hours¶

Unlike traditional message queues that delete messages after consumption, Kafka can retain data indefinitely. Many organizations use Kafka as a long-term data store with retention periods measured in months or years. Uber's Kafka deployment reportedly retained certain topics for over a year, using tiered storage to move older data to cheaper object storage.