Storage Ops — Trivia & Interesting Facts¶

Surprising, historical, and little-known facts about storage operations.

The first hard drive (IBM 350, 1956) stored 3.75 MB and weighed over a ton¶

IBM's 350 Disk Storage Unit, part of the RAMAC system, used fifty 24-inch platters to store 3.75 megabytes of data. It weighed over 2,000 pounds, was the size of two refrigerators, and could be leased for $3,200 per month (about $35,000 in 2024 dollars). Today, a microSD card smaller than a fingernail stores 1 TB — a 267-million-fold improvement in capacity alone.

ZFS was designed by Jeff Bonwick at Sun Microsystems and was "the last word in filesystems"¶

ZFS (Zettabyte File System) was created by Jeff Bonwick and his team at Sun Microsystems, first shipping in Solaris 10 in 2005. Bonwick designed it to address every storage problem simultaneously: volume management, RAID, data integrity verification, compression, deduplication, and snapshots. The name "ZFS" was meant to imply it was the final filesystem anyone would ever need — the Z being the last letter of the alphabet.

Ceph was started as a PhD project and now runs some of the world's largest storage systems¶

Sage Weil developed Ceph as his PhD thesis at UC Santa Cruz, defended in 2007. It was designed from the start for exabyte-scale distributed storage. Ceph now runs the storage backends for CERN (hundreds of petabytes for Large Hadron Collider data), Bloomberg, and numerous cloud providers. Red Hat acquired Inktank (Weil's Ceph company) in 2014 for $175 million.

Thin provisioning can lead to catastrophic "space panics" if not monitored¶

Thin provisioning allows administrators to allocate more virtual storage than physically exists, betting that not all storage will be used simultaneously. When actual usage exceeds physical capacity — an event called "out of space" or "space panic" — virtual machines crash, databases corrupt, and the recovery process can take days. This failure mode has caused numerous production outages at organizations that didn't monitor thin pool utilization.

The IOPS metric that storage vendors quote is almost always misleading¶

Storage vendors advertise IOPS (Input/Output Operations Per Second) using ideal conditions: 100% random 4KB reads from cache. Real-world workloads mix reads and writes, use variable block sizes, and often exceed cache capacity. A drive rated at 100,000 IOPS in the vendor's benchmark might deliver 10,000 IOPS under a realistic database workload. The gap between benchmark IOPS and production IOPS is one of storage operations' dirty secrets.

LVM snapshots use copy-on-write and can destroy performance if you don't know how they work¶

LVM snapshots work by preserving original blocks in a separate area (the COW — copy-on-write — space) when they're modified. Every write to a snapshotted volume becomes two I/O operations: read the original block, write it to the snapshot area, then write the new data. This can reduce write performance by 50-80%. LVM snapshots that fill their allocated COW space are automatically invalidated, losing the snapshot entirely.

Storage area networks (SANs) are being replaced by software-defined storage¶

Enterprise SANs from EMC, NetApp, and Pure Storage dominated the 2000s-2010s, with hardware appliances costing $100,000-$1,000,000+. The shift to software-defined storage (Ceph, VMware vSAN, MinIO) running on commodity hardware has disrupted this market. VMware vSAN alone captured significant SAN market share by eliminating the need for dedicated storage hardware. The traditional SAN market has been declining since approximately 2018.

The "3-2-1" backup rule's "1 offsite" requirement created the cloud storage market¶

The need for offsite backup copies was one of the original drivers of cloud storage adoption. Before AWS S3 (2006), offsite backup meant shipping tapes to Iron Mountain or maintaining a secondary datacenter. S3's pay-per-use model made offsite backup accessible to organizations of any size. Many companies' first AWS service was S3, used purely for backup before they adopted any other cloud services.

NVMe over Fabrics extends NVMe performance across the network¶

NVMe-oF (NVMe over Fabrics), ratified in 2016, allows servers to access remote NVMe storage with near-local latency over RDMA or TCP networks. This technology effectively eliminates the need for local storage in servers — compute nodes can boot from and run entirely on shared NVMe storage pools. The latency overhead is typically less than 10 microseconds, compared to hundreds of microseconds for iSCSI.

Storage capacity planning is a dark art because growth is rarely linear¶

Storage growth patterns are highly unpredictable. A machine learning team might suddenly need 50 TB for a new training dataset. A compliance requirement might mandate keeping data for 7 years instead of 3. Log storage can spike 10x during an incident. Experienced storage operators plan for 40-60% headroom rather than trying to predict exact growth, because the predictions are almost always wrong.

The dd command has bricked more storage systems than most people realize¶

The Unix dd command (sometimes called "disk destroyer") writes raw data to devices without any safety checks. A typo in the of= parameter — writing to /dev/sda instead of /dev/sdb — instantly overwrites the partition table and beginning of the filesystem with no confirmation prompt. The number of production databases destroyed by dd typos is unknown but widely feared, which is why modern alternatives like ddrescue add safety features.