Ceph — Trivia & Interesting Facts¶
Surprising, historical, and little-known facts about Ceph distributed storage.
Ceph started as a PhD thesis at UC Santa Cruz¶
Sage Weil created Ceph as part of his PhD research at the University of California, Santa Cruz, starting in 2003. His dissertation, "Ceph: Reliable, Scalable, and High-Performance Distributed Storage," was published in 2007. Ceph is one of the few PhD projects that became a production-grade infrastructure standard.
The name Ceph comes from "cephalopod"¶
Ceph is short for cephalopod — the class of animals that includes octopuses and squids. Sage Weil chose the name because cephalopods have distributed nervous systems (each arm can act independently), which mirrors Ceph's distributed architecture. The Ceph logo is an octopus for this reason.
CRUSH is what makes Ceph unique — and it replaced hash tables¶
Ceph's CRUSH (Controlled Replication Under Scalable Hashing) algorithm computes data placement algorithmically rather than using lookup tables. This means any node can calculate where any piece of data lives without consulting a central directory. The algorithm was novel when published in 2006 and remains Ceph's key architectural innovation.
Red Hat acquired Inktank (Ceph's company) for $175 million in 2014¶
Sage Weil founded Inktank Storage in 2012 to commercialize Ceph. Red Hat acquired Inktank for $175 million just two years later, making Ceph the foundation of Red Hat's storage strategy. Weil continued leading Ceph development at Red Hat for years afterward.
Ceph provides block, file, AND object storage from one cluster¶
Ceph is unusual in offering all three major storage interfaces — RBD (block), CephFS (file), and RGW (object/S3-compatible) — from a single unified storage cluster. Most storage systems specialize in one or two of these. This "three-in-one" capability was a deliberate design goal from the beginning.
The BlueStore backend replaced FileStore after years of performance issues¶
Ceph originally stored objects on top of a local filesystem (XFS or btrfs) via FileStore. This added overhead because Ceph's journaling duplicated the filesystem's own journaling. BlueStore, introduced in the Luminous release (2017), writes directly to raw block devices, eliminating the double-write penalty and improving performance by 2x.
CERN runs one of the largest Ceph clusters in the world¶
CERN, the European Organization for Nuclear Research, operates Ceph clusters totaling over 50 petabytes to store data from the Large Hadron Collider. Their deployment has pushed Ceph to scales few other organizations have attempted and has driven improvements to Ceph's performance at extreme scale.
Ceph releases are named alphabetically after marine creatures¶
Each major Ceph release is named after a marine creature in alphabetical order: Argonaut, Bobtail, Cuttlefish, Dumpling, Emperor, Firefly, Giant, Hammer, Infernalis, Jewel, Kraken, Luminous, Mimic, Nautilus, Octopus, Pacific, Quincy, Reef, and Squid. The nautical naming convention ties back to the cephalopod origin.
A single misconfigured OSD can degrade an entire cluster¶
One of Ceph's operational challenges is that a single slow or misconfigured OSD (Object Storage Daemon) can cause latency spikes across the entire cluster due to the way CRUSH distributes data. The "slow OSD" problem has been the subject of extensive engineering effort, including the addition of OSD heartbeat and blacklisting mechanisms.
Ceph monitors use Paxos consensus, not Raft¶
Ceph's monitor daemons use a variant of the Paxos consensus algorithm rather than the more popular Raft. This was a design decision made in the early 2000s when Paxos was the dominant consensus protocol. When Raft was published in 2013, Ceph had already been using Paxos for years and the cost of switching was prohibitive.