Linux Data Hoarding¶
Managing large data collections on Linux — from media libraries to log archives to dataset mirrors — requires understanding filesystems, RAID, deduplication, and backup strategies. This topic covers the tools and techniques for storing, organizing, and protecting large volumes of data without drowning in disk usage.
Why this matters¶
Whether you are archiving years of application logs, mirroring package repositories, or managing a home media server, the principles are the same: choose the right filesystem, automate integrity checks, and plan for growth before you run out of space at 3 AM.
Prerequisites¶
Familiarity with Linux filesystems and basic storage concepts (see Linux Ops Storage).
Key concepts covered¶
- Filesystem selection: ZFS vs btrfs vs ext4/XFS for large-volume workloads
- Deduplication and compression: inline vs offline, trade-offs with CPU and RAM
- Integrity verification: checksums, scrubs, and SMART monitoring
- Backup strategy: 3-2-1 rule, snapshot-based backups, and offsite replication
Contents¶
Start with the primer for foundational concepts, then apply them in street ops and learn the pitfalls.
| # | File | What it covers |
|---|---|---|
| 1 | Primer | Filesystem choices, RAID levels, ZFS/btrfs snapshots, and capacity planning |
| 2 | Street Ops | Practical workflows for rsync, deduplication, integrity checking, and archival |
| 3 | Footguns & Pitfalls | Silent data rot, RAID misconceptions, and backup strategies that fail when you need them |