Elasticsearch — Trivia & Interesting Facts¶
Surprising, historical, and little-known facts about Elasticsearch.
Elasticsearch was created because Shay Banon's wife needed a recipe search engine¶
Shay Banon started building a search engine for his wife's cooking recipes in 2004 using Lucene. The first iteration was called Compass. He rewrote it from scratch as a distributed system and released Elasticsearch in February 2010. Within a few years, it became the most popular enterprise search engine in the world.
Elasticsearch's default setting of 1 replica shard has caused countless data loss incidents¶
New Elasticsearch users often create clusters with a single node, not realizing that the default of 1 replica means every index expects at least 2 nodes. With only one node, the cluster goes "yellow" (replicas unassigned), and a single node failure means complete data loss. This default has been the subject of hundreds of Stack Overflow questions.
The Elastic Stack processes over 4 trillion events per day across its user base¶
Elastic has reported that their users collectively ingest over 4 trillion events per day through the Elastic Stack (Elasticsearch, Logstash, Kibana, Beats). Some individual customers ingest over 100 TB per day into their Elasticsearch clusters, requiring hundreds of nodes.
Elasticsearch changed its license from Apache 2.0 after Amazon forked it¶
In January 2021, Elastic changed Elasticsearch's license from Apache 2.0 to a dual Server Side Public License (SSPL) and Elastic License, primarily to prevent Amazon from offering Elasticsearch as a managed service (Amazon Elasticsearch Service) without contributing back. Amazon responded by forking the last Apache-licensed version as "OpenSearch" in April 2021.
Lucene's inverted index was inspired by the back of a textbook¶
Apache Lucene, the search library underlying Elasticsearch, uses an inverted index — the same concept as the index at the back of a book, mapping words to the pages where they appear. Doug Cutting started Lucene in 1999, naming it after his wife's middle name. The inverted index data structure dates back to the 1950s and early information retrieval research.
A split-brain scenario once caused Wikipedia's search to return wrong results for hours¶
Elasticsearch clusters are vulnerable to split-brain problems when network partitions cause multiple nodes to believe they are the master. The minimum_master_nodes setting (later replaced by voting configurations in 7.x) was frequently misconfigured. Split-brain could cause two halves of a cluster to accept conflicting writes, silently corrupting search results.
Elasticsearch stores every field twice by default¶
By default, Elasticsearch stores each field in both the inverted index (for searching) and as a stored _source document (for retrieval). This means indexes typically use 2-3x the raw data size on disk. Disabling _source saves space but makes reindexing impossible without the original data source, which has bitten many organizations during version upgrades.
The 5-shard default was changed in version 7.0 after years of debate¶
For years, Elasticsearch defaulted to 5 primary shards per index, which was wildly excessive for small indexes and led to "oversharding" — clusters with millions of tiny shards consuming excessive memory and slowing performance. Elastic finally changed the default to 1 primary shard in version 7.0 (2019), and published guidance recommending shard sizes between 10 GB and 50 GB.
Elasticsearch can search 1 billion documents in under 100 milliseconds¶
Well-tuned Elasticsearch clusters routinely achieve sub-100ms search latencies across billions of documents. This is possible because each shard's inverted index enables O(1) term lookups, and searches are parallelized across shards and nodes. Some organizations, like eBay, have run Elasticsearch clusters indexing over 250 billion documents.