Skip to content

RAID: Why Your Disks Will Fail

  • lesson
  • raid-0/1/5/6/10
  • rebuild-times
  • write-hole
  • ure-probability
  • smart-monitoring
  • l2 ---# RAID: Why Your Disks Will Fail (And What Happens When They Do)

Topics: RAID 0/1/5/6/10, rebuild times, write hole, URE probability, SMART monitoring Level: L2 (Operations) Time: 45–60 minutes Prerequisites: None


The Mission

The monitoring alert says: RAID array degraded - disk /dev/sdb failed. Your data is still accessible (RAID is doing its job), but you're one disk failure away from catastrophe. The rebuild will take 22 hours on these 8TB drives. During that time, every remaining disk is under maximum stress. If another one fails...

This lesson covers RAID levels, what happens during failure and rebuild, and why modern storage is moving away from traditional RAID.


RAID Levels: What You Actually Need to Know

Level Min disks Capacity Reads Writes Tolerated failures Best for
RAID 0 2 N × disk Fast Fast 0 (any failure = total loss) Scratch space, temp data
RAID 1 2 50% Fast Moderate 1 disk OS drives, small critical data
RAID 5 3 (N-1) × disk Fast Slow writes (parity calc) 1 disk Dangerous for large drives
RAID 6 4 (N-2) × disk Fast Slower 2 disks Large arrays where 1-failure tolerance isn't enough
RAID 10 4 50% Very fast Fast 1 per mirror pair Databases, high-performance

Name Origin: RAID was invented in 1988 at UC Berkeley by David Patterson, Garth Gibson, and Randy Katz. The original acronym: "Redundant Array of Inexpensive Disks." The industry later changed "Inexpensive" to "Independent" because enterprise disk vendors didn't like their products being called "inexpensive."


The RAID 5 Problem: Why It's Dangerous on Large Drives

RAID 5 uses parity to tolerate one disk failure. When a disk fails, the array reads all remaining disks to reconstruct the missing data. But:

Every disk has an Unrecoverable Read Error (URE) rate. Enterprise drives: ~1 in 10^15 bits. Consumer drives: ~1 in 10^14 bits.

10TB drive = 10 × 10^13 bits = 10^14 bits
URE rate (consumer): 1 in 10^14 bits

During rebuild, reading every bit of every remaining disk:
3 remaining drives × 10TB each = 30TB read
Probability of hitting a URE: ~5%

A 5% chance that your rebuild FAILS and you lose everything.

With 14TB or larger drives, this probability climbs to 10-25%. RAID 5 with large drives is playing Russian roulette during every rebuild.

Trivia: Google published a disk failure study in 2007 analyzing 100,000+ drives. Key findings: annual failure rates were 2-4x higher than manufacturer specs, and SMART attributes were poor predictors of failure. Backblaze publishes quarterly failure data for 250,000+ drives — the only company that does this publicly. Some Seagate 3TB models showed 25%+ annual failure rates.


Rebuild Times: The Vulnerability Window

Drive size    Estimated rebuild time
───────────────────────────────────
2 TB          ~6 hours
4 TB          ~11 hours
8 TB          ~22 hours
16 TB         ~55 hours

During rebuild, the array is: - Running at degraded performance (parity calculations on every read) - Under maximum disk I/O stress (reading every sector) - Vulnerable to a second failure (which = total data loss in RAID 5)

Production workloads slow rebuilds further — a busy database on a degrading array can double these times.


SMART Monitoring: Early Warning

SMART (Self-Monitoring, Analysis, and Reporting Technology) is built into every modern drive. Four attributes predict failure:

sudo smartctl -A /dev/sda | grep -E "Reallocated|Reported_Uncorrect|Current_Pending|Offline_Uncorrectable"
# → 5   Reallocated_Sector_Ct    0        ← bad sectors remapped
# → 187 Reported_Uncorrect       0        ← uncorrectable errors
# → 197 Current_Pending_Sector   0        ← sectors waiting to be remapped
# → 198 Offline_Uncorrectable    0        ← sectors that can't be read offline

Remember: The Backblaze rule: "5, 187, 197, 198 — the four horsemen of disk failure." Any non-zero value warrants investigation. Multiple non-zero values = replace the drive.

# Quick health check
sudo smartctl -H /dev/sda
# → SMART overall-health self-assessment test result: PASSED

# Enable automatic monitoring
sudo systemctl enable --now smartd

Beyond RAID: Why the Industry Is Moving On

Traditional RAID is being replaced by:

  • ZFS — filesystem-level redundancy with checksums, self-healing, compression
  • Ceph — distributed storage across many nodes (not many disks in one box)
  • Cloud block storage — EBS, Persistent Disks (redundancy handled by the provider)
  • Erasure coding — like RAID across a cluster, with tunable redundancy

Trivia: ZFS was designed by Sun Microsystems specifically to solve the "bit rot" problem — data corruption from random bit flips. CERN measured ~1 bit flip per 10TB per year. Traditional RAID doesn't detect bit rot (RAID guarantees the data is there, not that it's correct). ZFS checksums every block and can detect AND repair corruption.


Flashcard Check

Q1: RAID 5 on 10TB drives — why is rebuild dangerous?

During rebuild, every bit of every remaining disk is read. With URE rates of 1 in 10^14 bits and 30TB to read, there's a ~5% chance of hitting a read error. Failed rebuild = total data loss.

Q2: RAID 10 vs RAID 5 — which for a database?

RAID 10. Better write performance (no parity calculation), faster rebuilds (only mirror needs copying, not the entire array), and can survive multiple failures if they're in different mirror pairs.

Q3: SMART attributes 5, 187, 197, 198 are non-zero. What do you do?

Plan immediate drive replacement. These are the four predictors of imminent failure. Don't wait for the RAID degradation alert.

Q4: 16TB drive rebuild takes how long?

~55 hours (2+ days). During which the array is vulnerable to a second failure. Production workloads can double this. RAID 6 or RAID 10 is essential at this size.


Cheat Sheet

Task Command
Array status cat /proc/mdstat (Linux software RAID)
SMART health sudo smartctl -H /dev/sda
SMART attributes sudo smartctl -A /dev/sda
Rebuild progress cat /proc/mdstat (shows percentage)
Add spare sudo mdadm --manage /dev/md0 --add /dev/sdc
Mark failed sudo mdadm --manage /dev/md0 --fail /dev/sdb

Takeaways

  1. RAID 5 is dangerous on large drives. URE probability during rebuild makes data loss likely. Use RAID 6 or RAID 10 for drives >4TB.

  2. Rebuild time is the vulnerability window. 22 hours on 8TB drives. During rebuild, another failure = total loss. Plan for this.

  3. Monitor SMART before RAID alerts. The four horsemen (5, 187, 197, 198) predict failure before it happens. Replace proactively.

  4. The industry is moving beyond RAID. ZFS, Ceph, and cloud storage solve problems RAID can't: bit rot detection, distributed redundancy, and elastic capacity.

  5. RAID is not backup. RAID protects against disk failure. It does NOT protect against accidental deletion, ransomware, or bad data. You still need backups.


Exercises

  1. Check SMART health on a local drive. Run sudo smartctl -H /dev/sda (adjust the device name for your system). Then run sudo smartctl -A /dev/sda and find the four critical attributes: Reallocated_Sector_Ct (5), Reported_Uncorrect (187), Current_Pending_Sector (197), and Offline_Uncorrectable (198). Record their current values. If any are non-zero, research what that means for your specific drive.

  2. Calculate URE risk for a RAID 5 rebuild. Given a 4-drive RAID 5 array with 14TB consumer drives (URE rate: 1 in 10^14 bits), calculate: (a) how many bytes must be read during rebuild (3 remaining drives x 14TB), (b) the total bits read, and (c) the probability of hitting at least one URE. Repeat the calculation for enterprise drives (URE rate: 1 in 10^15 bits). Write down why RAID 6 or RAID 10 is recommended at this drive size.

  3. Create and degrade a software RAID array (loopback devices). Create two 100MB files as virtual disks: dd if=/dev/zero of=/tmp/disk1 bs=1M count=100 and same for disk2. Set up loop devices: sudo losetup /dev/loop10 /tmp/disk1 and loop11. Create a RAID 1 array: sudo mdadm --create /dev/md99 --level=1 --raid-devices=2 /dev/loop10 /dev/loop11. Check status with cat /proc/mdstat. Mark one disk as failed: sudo mdadm --manage /dev/md99 --fail /dev/loop11. Observe the degraded state. Clean up with sudo mdadm --stop /dev/md99 and sudo losetup -d /dev/loop10 /dev/loop11.

  4. Compare RAID levels for a scenario. You have 6 x 8TB drives for a file server. Calculate usable capacity, fault tolerance, and estimated rebuild time for RAID 5, RAID 6, and RAID 10. Write a one-paragraph recommendation stating which level you would choose and why, considering the 22-hour rebuild window for 8TB drives.


  • The Disk That Filled Up — when the RAID array's filesystem fills
  • The Backup Nobody Tested — RAID is not a substitute for backups
  • What Happens When You Press Power — RAID controller initialization during boot