Disk & Storage Ops — Trivia & Interesting Facts¶

Surprising, historical, and little-known facts about RAID, disk failures, and storage operations.

RAID was invented at Berkeley in 1988 and the "I" originally stood for "Inexpensive"¶

David Patterson, Garth Gibson, and Randy Katz published "A Case for Redundant Arrays of Inexpensive Disks (RAID)" at UC Berkeley in 1988. The storage industry later quietly changed "Inexpensive" to "Independent" because enterprise disk arrays were anything but cheap. Patterson also co-created the RISC architecture — the man had a talent for naming things the industry would later rebrand.

RAID 5 is considered dangerous for drives larger than 2 TB¶

With modern large-capacity drives, the probability of encountering an Unrecoverable Read Error (URE) during a RAID 5 rebuild is alarmingly high. At a typical URE rate of 1 in 10^14 bits, rebuilding a 4 TB drive means reading ~32 trillion bits — making a URE during rebuild statistically likely. This is why RAID 6 (dual parity) became the minimum recommendation for large drives, and why many storage engineers now consider RAID 5 obsolete.

Google's famous 2007 disk failure study shattered manufacturer MTBF claims¶

Google's paper "Failure Trends in a Large Disk Drive Population" (2007) analyzed 100,000+ drives and found that annual failure rates were 2-4x higher than manufacturer specifications. The study also found that SMART attributes were poor predictors of failure — drives failed without warning far more often than expected. This paper changed how the entire industry thought about disk reliability.

Bathtub curve failures are real, but the "infant mortality" phase is worse than you think¶

The classic bathtub curve for disk failures (high early failures, low mid-life, rising late-life) was confirmed by Backblaze's data from 200,000+ drives. But the infant mortality period is particularly nasty: roughly 5% of drives fail in the first 18 months. This is why burn-in testing — running new drives under load for 24-72 hours before deployment — is standard practice in datacenters.

Backblaze publishes drive failure data that the rest of the industry won't¶

Since 2013, Backblaze has published quarterly hard drive reliability reports covering their fleet of 250,000+ drives. This data is unprecedented — no other company shares failure rates by make and model. The reports have revealed dramatic reliability differences between models (some Seagate 3TB drives had 25%+ annual failure rates) and have become essential reading for anyone buying drives at scale.

A RAID rebuild can take days on modern large drives¶

Rebuilding a failed drive in a RAID array with 16 TB or 20 TB drives can take 24-72 hours under production load. During this entire window, the array is degraded and vulnerable to a second failure. This rebuild time problem is one of the primary drivers behind the shift to erasure coding (used by Ceph, MinIO, and cloud storage) which can rebuild from any subset of available chunks rather than a single parity stripe.

The "RAID is not a backup" mantra exists because people kept learning it the hard way¶

RAID protects against drive failure but not against accidental deletion, filesystem corruption, ransomware, or controller failure. A corrupted write goes to all mirrors simultaneously. A bad RAID controller can trash the entire array. The number of companies that discovered this distinction during an actual data loss event is distressingly large, which is why "RAID is not a backup" is practically a religious commandment in ops.

Write hole is a RAID corruption bug that has existed for 35 years¶

The "RAID write hole" occurs when a power failure happens mid-write to a RAID 5/6 array: the data blocks are updated but the parity block is not (or vice versa). On the next read, the parity check fails silently, and the array returns corrupt data. Hardware RAID controllers with battery-backed cache mitigate this, but software RAID (like Linux md) was vulnerable until bitmap-based journaling was added.

Bit rot is real, measurable, and worse than most people think¶

CERN published a study in 2007 finding silent data corruption (bit rot) at rates of roughly 1 bit flip per 10 TB per year on their storage systems. ZFS was designed specifically to detect this with end-to-end checksums on every block — its creator, Jeff Bonwick, called bit rot "the silent killer of data." Most traditional RAID systems have no protection against bit rot because they don't checksum data at the block level.

Enterprise SSDs changed the RAID calculus entirely¶

SSDs fail differently than HDDs: they tend to fail suddenly and completely rather than developing bad sectors gradually. An SSD either works or it doesn't. This makes RAID rebuilds much faster (no bad-sector reallocation delays) but also means failures are less predictable. The shift to NVMe SSDs also eliminated the RAID controller bottleneck — many modern systems use software-defined storage with direct-attached NVMe instead of hardware RAID.

The Dell PERC controller has a "patrol read" feature most admins never enable¶

Dell PowerEdge RAID Controllers (PERC) include a background "patrol read" that scans all drives for media errors before they cause problems during an actual read. When enabled, it runs continuously at low priority and can detect failing sectors early. Despite being available for over 15 years, many admins don't know it exists or leave it disabled to avoid the minor performance impact.

NetApp's WAFL filesystem was designed to never overwrite data in place¶

NetApp's Write Anywhere File Layout (WAFL), designed by Dave Hitz and James Lau in the early 1990s, never overwrites existing data blocks. Every write goes to a new location, and the old blocks remain until garbage collection. This copy-on-write approach eliminated the RAID write hole entirely and enabled instant snapshots — ideas that ZFS and Btrfs would later adopt.