Btrfs: subvolume, snapshot, reflink, CoW¶
Mental model¶
Btrfs is a copy-on-write (CoW) filesystem where data is never overwritten in place. This one design choice gives you snapshots, reflinks, checksums, and dynamic subvolumes essentially for free.
What it looks like¶
"Btrfs is like LVM + filesystem combined." People see btrfs subvolume
and think partition, see btrfs snapshot and think backup.
What it really is¶
- CoW (copy-on-write): when you modify a block, Btrfs writes the new version to a new location and updates the pointer. The old block stays untouched until nothing references it. This is the foundation for everything else.
- Subvolume: an independently mountable tree within the same filesystem. Not a partition — no fixed size. All subvolumes share the same underlying storage pool. You can mount subvolumes at different paths or with different mount options.
- Snapshot: a subvolume created as a CoW copy of another subvolume. At creation it shares all data blocks with the source — zero extra space used. Only diverged blocks consume additional space over time. Snapshots are not recursive (nested subvolumes appear as empty directories).
- Reflink: a CoW file copy (
cp --reflink). Both files share data blocks until one is modified, then only changed blocks are duplicated. Instant, space-efficient copy. - B-tree structure: everything (data, metadata, checksums) is stored in B-trees. Checksums are computed for every data and metadata block, enabling detection of silent corruption (bitrot).
Why it seems confusing¶
- Subvolumes look like directories but behave like mount points. They have no fixed size yet can be mounted independently.
- Snapshots look like backups but live on the same device (same failure domain). They are fast undo, not disaster recovery.
- CoW means writes go to new locations, so "overwrite" does not mean
what it means on ext4 — this affects
sync, fragmentation, and database workloads (write amplification). - Btrfs RAID is different from mdadm RAID — the filesystem itself manages the redundancy, with different maturity levels per profile (RAID-5/6 historically unstable).
What actually matters¶
- Snapshots are cheap: create them before risky operations (upgrades, config changes). Rollback is instant.
- Subvolumes are your layout tool: use separate subvolumes for
/,/home,/var/logto snapshot and manage independently. - CoW + databases: databases that do heavy random writes (MySQL,
PostgreSQL) suffer from CoW fragmentation. Use
chattr +C(nodatacow) on their data directories, or put them on a separate non-CoW subvolume. - Scrub detects bitrot:
btrfs scrub start /mntreads and checksums all data. With RAID profiles, it can auto-repair from the good copy. - Send/receive for incremental backup:
btrfs sendserializes snapshot differences. Combined withbtrfs receive, this gives efficient incremental backups to another Btrfs filesystem.
Common mistakes¶
- Treating snapshots as backups. They share the same device — disk failure loses both original and snapshots.
- Letting snapshots accumulate. Each snapshot pins old data blocks,
consuming space that
dfdoesn't attribute clearly. Usebtrfs filesystem duorcompsizeto see actual usage. - Using Btrfs RAID-5/6 in production without understanding the current stability status (check the Btrfs wiki status page before choosing).
- Running a CoW filesystem under a database without
nodatacowand wondering why performance degrades and fragmentation grows. - Forgetting that snapshots are not recursive — nested subvolumes inside a snapshotted subvolume will appear as empty directories.
Small examples¶
# Create subvolumes
btrfs subvolume create /mnt/@home
btrfs subvolume create /mnt/@var_log
# Mount subvolume at specific path (fstab-style)
mount -o subvol=@home /dev/sda1 /home
# Snapshot before an upgrade (instant, zero extra space)
btrfs subvolume snapshot /mnt/@ /mnt/@_before_upgrade
# Read-only snapshot (for send/receive backup)
btrfs subvolume snapshot -r /mnt/@ /mnt/@_snap_readonly
# Rollback: delete broken subvolume, rename snapshot
btrfs subvolume delete /mnt/@
mv /mnt/@_before_upgrade /mnt/@
# Reflink copy (instant, shares blocks)
cp --reflink=always big_file.img big_file_copy.img
# Incremental send/receive to backup drive
btrfs send -p /mnt/@_snap_old /mnt/@_snap_new | \
btrfs receive /backup/
# Disable CoW for database directory
mkdir /mnt/@/pgdata
chattr +C /mnt/@/pgdata # must be set before files are created
# Scrub: verify checksums, detect bitrot
btrfs scrub start /mnt
btrfs scrub status /mnt
One-line summary¶
Btrfs is a CoW filesystem where subvolumes replace partitions, snapshots are free CoW clones of subvolumes, and reflinks are free CoW clones of files — all powered by never overwriting data in place.