Linux Data Hoarding¶

39 cards — 🟢 12 easy | 🟡 14 medium | 🔴 7 hard

🟢 Easy (12)¶

1. What does JBOD stand for, and why do data hoarders prefer it over traditional RAID?

Show answer

JBOD = Just a Bunch of Disks. Each drive is an independent filesystem combined via a union filesystem (mergerfs). Advantages over RAID: mix any drive sizes, add one drive at a time, any single drive is readable on its own, and a failed drive only loses the files physically on that drive.

Mnemonic: "JBOD = Just Buy One Drive" — you scale one drive at a time, not matching sets.

2. What is SnapRAID and how does it differ from real-time RAID?

Show answer

SnapRAID is a snapshot parity tool created by Andrea Mazzoleni (2011, GPLv3). Unlike real-time RAID, SnapRAID computes parity on a schedule (typically daily via cron). Between syncs, newly added files have NO parity protection.

This makes it ideal for write-once-read-many workloads (media libraries) but wrong for databases or VMs with high write churn.

3. What is the most critical rule about the parity drive in SnapRAID?

Show answer

The parity drive must be at least as large as the largest data drive. If your biggest data drive is 12TB, the parity drive must be >= 12TB. SnapRAID stores one parity block per data block, so it needs capacity equal to the largest single drive.

4. What is snapraid-runner and why use it instead of bare cron?

Show answer

snapraid-runner (by Chronial) is a Python wrapper that adds safety checks to automated SnapRAID syncs. Key feature: it runs `snapraid diff` first and aborts if deletions exceed a configurable threshold (deletethreshold). This prevents parity destruction when a drive fails to mount and SnapRAID sees all its files as "deleted."

GitHub: github.com/Chronial/snapraid-runner

5. Why is ext4 the default filesystem choice for SnapRAID data drives?

Show answer

ext4 is boring, reliable, and universally supported. It has mature fsck recovery tools, works on every Linux distro, and has decades of battle testing. It lacks checksums and snapshots, but SnapRAID provides checksumming and parity externally.

Mnemonic: "ext4 = Toyota Corolla" — not exciting, but it starts every morning.

6. What are BorgBackup's key features for data hoarding backups?

Show answer

BorgBackup (2015, BSD-3-Clause, Python+C): content-defined chunking deduplication, compression (lz4/zstd/zlib/lzma), AES-256-CTR encryption, and append-only repos (ransomware protection).

Key commands: `borg init`, `borg create`, `borg prune`, `borg check`.

Choose borg when: local/SFTP targets, want max compression, need append-only repos.

7. What makes restic different from BorgBackup?

Show answer

restic (2015, BSD-2-Clause, Go): single static binary, native multi-backend support (S3, B2, Azure, GCS, SFTP, rclone), always-on AES-256 encryption, lock-free concurrent backups.

Choose restic when: backing up to cloud storage (S3/B2), want zero dependencies, need multi-platform support.

8. What did Google's study reveal about SMART's ability to predict drive failures?

Show answer

Google's 2007 study of 100,000+ drives found that 36% of failed drives showed ZERO SMART warnings beforehand. SMART catches gradual degradation (sector reallocation) but misses sudden failures (head crashes, PCB failures, firmware bugs).

Lesson: Use SMART as an early warning system, not a crystal ball. Always have parity + backups as additional layers.

9. What is par2 and what problem does it solve?

Show answer

par2 (Parchive v2) uses Reed-Solomon error correction to create recovery blocks for files. Originally created for Usenet transfers (2001-2002, by Tobias Rieper, Stefan Wehlus, and Howard Fukada).

Use case: `par2 create -r10 archive.par2 *.tar.gz` creates 10% redundancy. If files are damaged, `par2 repair archive.par2` can reconstruct them. Ideal for cold storage archives and long-term preservation.

10. What is the advantage of using /dev/disk/by-id/ instead of /dev/sdX in fstab?

Show answer

/dev/sdX device names (sda, sdb, sdc) can change between reboots depending on detection order. /dev/disk/by-id/ paths contain the drive model and serial number, creating stable identifiers that survive reboots, cable swaps, and controller changes.

Example: /dev/disk/by-id/ata-WDC_WD120EMFZ-11A6JA0_SERIAL-part1

Alternative: /dev/disk/by-uuid/ (filesystem UUID, also stable).

11. What is the "Perfect Media Server" stack and who created it?

Show answer

The Perfect Media Server stack was popularized by Alex Kretzschmar (ironicbadger, host of the Self-Hosted podcast). It combines: mergerfs (union filesystem), SnapRAID (snapshot parity), ext4/XFS (per-drive filesystems), Docker (containers for media apps), rclone (cloud backup), and smartd (monitoring).

Site: perfectmediaserver.com

12. What are the core tools in the *arr stack and what does each do?

Show answer

Sonarr: TV show management + download automation (monitors RSS, renames, organizes)
Radarr: Movie management (Sonarr fork, same pattern for movies)
Lidarr: Music management (same architecture)
Prowlarr: Indexer management (feeds search results to Sonarr/Radarr/Lidarr)

All run as Docker containers pointed at your mergerfs mount. They handle media lifecycle; the storage stack handles durability.

🟡 Medium (14)¶

1. What does snapraid sync do, and when should you run it?

Show answer

`snapraid sync` reads all data drives, computes parity blocks, and writes them to the parity drive(s). Run it daily via cron. Between syncs, newly added files have zero parity protection.

Always run `snapraid diff` first (or use snapraid-runner) to check for unexpected mass deletions before syncing.

2. What does snapraid scrub do and what is the default scrub percentage?

Show answer

`snapraid scrub` verifies data integrity by reading data blocks and comparing checksums. The default scrub verifies ~8% of data per run. Running weekly, this covers all data roughly once every 12 weeks.

Use `snapraid scrub -p 100` for a full verification, but only when needed — it reads every block on every drive.

3. How do you recover files from a failed drive using SnapRAID?

Show answer

1. Mount the replacement drive at the same mount point
2. Run: `snapraid fix -d d3` (replace d3 with the failed drive label)
3. Verify: `snapraid check -d d3`
4. Update parity: `snapraid sync`

SnapRAID reconstructs files from parity data on the remaining drives. Expected time: 4-12 hours for a full drive.

4. When should you choose XFS over ext4 for data drives?

Show answer

Choose XFS when storing predominantly large files (4K video, ISOs, disk images). XFS has a better allocator for large sequential I/O and supports reflink copies (cp --reflink). XFS scales to 8 EiB volumes.

Downside: XFS cannot be shrunk (only grown). Historically fragile on power loss, but v5 format (default since 2014) fixed most issues.

5. What is rclone's crypt overlay and why is it important for offsite backups?

Show answer

rclone's crypt overlay provides client-side encryption as a transparent layer on any of rclone's 70+ storage backends. Data is encrypted locally before upload. The crypt remote wraps another remote, so `rclone sync /data b2-crypt:backups/` encrypts on-the-fly.

IMPORTANT: rclone is a sync/transfer tool, NOT a backup tool. It has no versioning or deduplication. Pair with borg/restic for those features.

6. Which 5 SMART attributes are most important to monitor for drive health?

Show answer

The big five: (1) Reallocated_Sector_Ct (#5) — bad sectors remapped, >0 is a concern; (2) Reported_Uncorrect (#187) — uncorrectable errors; (3) Command_Timeout (#188) — controller communication failures; (4) Current_Pending_Sector (#197) — sectors awaiting reallocation; (5) Offline_Uncorrectable (#198) — sectors failing offline tests.

Check: `smartctl -A /dev/sdX`

7. What is drive burn-in and why do data hoarders do it before trusting new drives?

Show answer

Burn-in tests a new drive before putting data on it, catching infant mortality failures (drives that fail in the first weeks). Steps:
1. `smartctl -t long /dev/sdX` (SMART extended test, 8-24h)
2. `badblocks -wsv -b 4096 /dev/sdX` (destructive write+read test)
3. Check SMART attributes after — any non-zero Reallocated/Pending sectors = RMA

Drives passing burn-in are statistically more reliable.

8. How do jdupes, fdupes, and rdfind compare for duplicate detection?

Show answer

jdupes: fastest (7x faster than fdupes), C, supports hardlink/softlink/delete modes, hash-based, by Jody Bruchon.
fdupes: the original (1999, by Adrian Lopez), simpler interface, MD5 + byte-by-byte comparison.
rdfind: C++, ranking-based dedup, O(N log N) time.

All match only 100% identical files (no fuzzy matching). jdupes is the standard choice for data hoarding.

9. How do you spin down idle drives to save power, and which tools handle it?

Show answer

Two approaches:
1. hdparm -S 242 /dev/sdX (value * 5 seconds, 242 = 1 hour standby timeout)
2. hd-idle -i 600 /dev/sdX (600 seconds idle before spindown, more reliable for USB/some SATA)

Spindown reduces power (~8W active vs ~0.5W standby per drive) and wear on infrequently accessed archive drives. Caveat: frequent spin-up/down cycles also cause wear — balance idle timeout with access patterns.

10. Why is "backup on the same drive as source" not actually a backup?

Show answer

A backup on the same physical drive protects against accidental deletion (maybe) but not against drive failure, ransomware affecting the whole filesystem, fire, or theft. Common mistakes: borg repo in /mnt/disk1/backups backing up /mnt/disk1/data, or restic repo on the same mergerfs pool.

The 3-2-1 rule requires: 3 copies, 2 different media types, 1 offsite. Same-drive copies satisfy none of these.

11. Why are noatime and nofail essential mount options for data hoarding drives?

Show answer

noatime: prevents updating access time metadata on every read, eliminating unnecessary writes. Critical for media streaming (constant reads) and SnapRAID (fewer changes to sync).

nofail: if a drive fails or disconnects, boot continues instead of hanging. You get a degraded array instead of an unbootable server.

Always use: defaults,noatime,nofail for data drives.

12. Why must SnapRAID content files be stored on multiple different drives?

Show answer

Content files are the checksum database — they store hashes for every file in the array. If all content file copies are lost, SnapRAID cannot verify data integrity or perform recovery. Store at least 2 copies on different physical drives (e.g., one on a data drive, one on /var).

Losing content files while also losing a data drive = unrecoverable data loss.

13. What is the correct relationship between SnapRAID parity and backups?

Show answer

SnapRAID parity is NOT backup. Parity protects against disk failure (hardware). Backup protects against deletion, ransomware, fire, and theft (everything else). You need both.

Parity = survive a drive dying
Backup = survive rm -rf, ransomware, house fire

Minimum: SnapRAID for parity + borg/restic to a separate drive + rclone to offsite cloud.

14. What are the steps to add a new drive to an existing data hoarding array?

Show answer

1. Burn-in (SMART long test + badblocks): ~24 hours
2. Format: mkfs.ext4 -L diskN /dev/sdX
3. Mount: add to fstab with noatime,nofail, mount at /mnt/diskN
4. Add to mergerfs (live: xattr on .mergerfs control file, or edit fstab and remount)
5. Add to snapraid.conf (data dN /mnt/diskN/, plus content file)
6. Run snapraid sync

mergerfs automatically starts using the new drive for new files based on create policy (e.g., mfs = most free space).

🔴 Hard (7)¶

1. How many parity levels does SnapRAID support, and what does each level provide?

Show answer

SnapRAID supports 1 through 6 parity levels (configured as parity, 2-parity, 3-parity, etc.). Each level adds tolerance for one additional simultaneous disk failure:

1-parity = survive 1 failure (like RAID5)
2-parity = survive 2 failures (like RAID6)
3-parity = survive 3 failures (like RAID-Z3)
4-6 parity = for very large arrays (20+ disks)

Rule of thumb: 1 parity per 4 data disks for home use.

2. Why should you never use btrfs RAID5 or RAID6 for data you care about?

Show answer

btrfs RAID5/6 has an unfixed write hole bug. If power is lost during a write, parity can become inconsistent with data. On the next scrub, btrfs may "fix" good data with bad parity — making corruption worse. The kernel now warns when creating RAID5/6 profiles.

Safe btrfs options: RAID1 (mirroring) or RAID10. For parity protection, use SnapRAID externally instead.

3. What are the pros and cons of ZFS for data hoarding compared to the JBOD+mergerfs+SnapRAID approach?

Show answer

ZFS pros: CoW, built-in checksums, RAID-Z levels, ARC cache, zfs send/recv for backups, proven track record.

ZFS cons: Memory hungry (~1GB per TB rule of thumb), cannot easily add single drives to an existing pool, kernel module not in mainline Linux, no fsck equivalent (corrupt pool = data loss).

Key: ZFS is a different paradigm — it replaces mergerfs+SnapRAID entirely. Don't mix them.

4. What happens if a drive fails to mount before running snapraid sync?

Show answer

If a data drive doesn't mount, its mount point is empty. SnapRAID sees all files on that drive as "deleted" and syncs parity accordingly — destroying the parity protection for those files. This is the most dangerous SnapRAID failure mode.

Prevention: (1) Use snapraid-runner with deletethreshold, (2) Check `df -h /mnt/disk*` before sync, (3) Write a pre-sync script that verifies all drives are mounted.

5. What is the circular dependency problem with encryption key storage?

Show answer

If your encryption keys (LUKS, borg repokey, rclone crypt password) are stored only on the encrypted system itself, losing that system means losing access to all backups too.

Common traps: passphrase in a password manager, password manager backup on the encrypted drive; borg repokey stored only in the repo; rclone.conf with crypt passwords on the encrypted volume.

Fix: Store keys in 2+ independent locations (printed paper in safe + password manager with separate backup).

6. What is SnapRAID split parity (v11.0+) and when would you use it?

Show answer

Split parity allows a single parity level to span multiple smaller drives using comma-separated paths: `parity /mnt/p1/snap.parity,/mnt/p2/snap.parity`. The next file starts growing when the previous one fills up.

Use case: you have two 8TB drives but your largest data drive is 12TB. Split parity combines them into one 16TB parity target, satisfying the "parity >= largest data drive" requirement.

7. What does Backblaze's public drive data tell us about failure rates?

Show answer

Backblaze publishes quarterly stats from 290,000+ drives. Key findings:
- 2024 AFR (Annualized Failure Rate): 1.57% overall
- 2025 AFR: dropped to 1.36%
- Failure rates vary dramatically by model and age
- High-capacity drives (20TB+) show lower AFR (~0.77% in Q4 2024)
- Some 12TB models exceed 5% AFR

Data is publicly downloadable at backblaze.com/cloud-storage/resources/hard-drive-test-data