Portal | Level: L1: Foundations | Topics: rsync, Linux Fundamentals | Domain: Linux
rsync - Primer¶
Why This Matters¶
Who made it: rsync was created by Andrew Tridgell and Paul Mackerras in 1996. Tridgell is also the creator of Samba (SMB/CIFS for Linux). The rsync algorithm -- splitting files into blocks and computing rolling checksums to find differences -- was the subject of Tridgell's PhD thesis at the Australian National University.
rsync is the workhorse of file synchronization in production Linux environments. Whether you are migrating terabytes between datacenters, maintaining incremental backups that go back 90 days, deploying a static site, or keeping a fleet of servers in sync, rsync is the tool you reach for. It has been battle-tested since 1996. It is installed on virtually every Unix system. It transfers only the differences between source and destination, making it dramatically faster than naive copy tools for repeated syncs. Mastering rsync means understanding trailing-slash semantics, delete behavior, exclusion patterns, and resumable transfers -- the details that separate a clean sync from a data-loss incident.
Core Concepts¶
The Delta-Transfer Algorithm¶
rsync's defining feature is its delta-transfer algorithm. Instead of copying entire files, it splits files into fixed-size blocks, computes rolling checksums, and transfers only the blocks that differ between source and destination. For a 2 GB database dump where 50 MB changed, rsync sends roughly 50 MB instead of 2 GB.
The algorithm works in three phases:
- File list generation -- the sender builds a list of files to transfer, including metadata (size, mtime, permissions).
- Checksum matching -- for each file that might differ, the receiver computes block checksums and sends them to the sender.
- Delta transmission -- the sender compares local blocks against the receiver's checksums and sends only the differing blocks.
For new files or files below the block-size threshold, rsync falls back to full transfer. The overhead of checksumming is worthwhile only when files are large and changes are small.
Archive Mode (-a)¶
The -a flag is shorthand for -rlptgoD:
| Flag | Meaning |
|---|---|
-r |
Recurse into directories |
-l |
Copy symlinks as symlinks |
-p |
Preserve permissions |
-t |
Preserve modification times |
-g |
Preserve group ownership |
-o |
Preserve owner (requires root) |
-D |
Preserve device files and special files |
Archive mode is the correct default for almost all server-to-server syncs. Without it, you lose ownership and permissions, which breaks applications that depend on file metadata.
What -a does not include:
-H(hard links) -- must add explicitly if your source uses hard links-A(ACLs) -- add if your filesystem uses POSIX ACLs-X(extended attributes) -- add for SELinux contexts or xattrs-S(sparse files) -- add for disk images or sparse databases
Visibility: -v, -P, and --progress¶
# Verbose: list each file as it transfers
rsync -av source/ dest/
# Progress per file (bytes transferred, percentage, speed)
rsync -avP source/ dest/
# -P is shorthand for --partial --progress
# So you also get resumable transfers for free
# Itemized changes: show exactly what changed per file
rsync -avv --itemize-changes source/ dest/
The itemize-changes output uses a compact notation:
Each character position encodes a change type: f=file, d=directory, s=size changed, t=timestamp changed, p=permissions changed, etc. This is invaluable for auditing what rsync actually did.
--delete and Its Variants¶
--delete makes the destination a mirror of the source: files that exist on the destination but not the source are removed. Without it, rsync only adds and updates -- it never removes.
There are three timing variants:
| Flag | When files are deleted | Use case |
|---|---|---|
--delete-before |
Before transfer starts | Frees disk space first; slow for large trees |
--delete-during |
As each directory is processed | Default when --delete is used; good balance |
--delete-after |
After all transfers complete | Safest: destination has both old and new files during sync |
Critical safety rule: always run --delete with --dry-run first. There is no undo.
War story: An engineer ran
rsync -av --delete /dev/null/ /var/www/intending to clear a directory. The trailing slash on/dev/null/failed silently, and--deleteremoved everything in/var/www/because the "source" appeared empty. The correct way to empty a directory isrm -rf /var/www/*or create an empty temp directory as the source. Always dry-run--deleteoperations.
# Preview what would be deleted
rsync -avn --delete source/ dest/
# Then execute
rsync -av --delete source/ dest/
Exclude and Include Patterns¶
rsync evaluates --exclude and --include rules in the order they appear on the command line. The first matching rule wins.
# Exclude logs and temp files
rsync -av --exclude='*.log' --exclude='/tmp/' source/ dest/
# Include only .conf files, exclude everything else
rsync -av --include='*.conf' --exclude='*' source/ dest/
# Complex pattern: include .py but exclude __pycache__
rsync -av \
--include='*/' \
--exclude='__pycache__/' \
--include='*.py' \
--exclude='*' \
source/ dest/
Key pattern rules:
- Patterns starting with
/are anchored to the transfer root. - Patterns ending with
/match only directories. *matches any path component (but not/).**matches anything, including slashes.- Rules from a file:
--exclude-from=exclude-list.txt(one pattern per line).
The order trap: if you put --exclude='*' before --include='*.conf', the exclude catches everything first. Include rules that should override excludes must come first.
Remote Shell: -e for SSH¶
rsync's default transport is SSH. The -e flag lets you customize the SSH command:
# Default (equivalent to just using remote host syntax)
rsync -av source/ user@host:/dest/
# Custom SSH port
rsync -av -e 'ssh -p 2222' source/ user@host:/dest/
# Specific identity file
rsync -av -e 'ssh -i ~/.ssh/deploy_key' source/ user@host:/dest/
# Disable strict host key checking (CI pipelines)
rsync -av -e 'ssh -o StrictHostKeyChecking=no' source/ user@host:/dest/
# SSH with compression (alternative to rsync -z)
rsync -av -e 'ssh -C' source/ user@host:/dest/
Dry Run (-n / --dry-run)¶
The most important safety flag. Shows what rsync would do without actually doing it:
Always use dry-run before any destructive operation (--delete, first-time large syncs, production deploys). The output is identical to a real run -- same file list, same itemized changes -- but nothing is modified on disk.
Checksum vs Timestamp Comparison¶
By default, rsync decides whether to transfer a file based on modification time and size. If both match, the file is skipped. This is fast but can be wrong:
- NFS may report different timestamps for the same file.
- FAT/exFAT filesystems have 2-second timestamp granularity.
touchcan set mtimes without changing content.- Build systems may regenerate identical files with new timestamps.
--checksum (-c) forces rsync to compute MD5 checksums to compare files:
Trade-off: --checksum reads every file on both sides to compute checksums, even if timestamps already match. For large datasets, this can be dramatically slower. Use it when timestamps are unreliable, not as a default.
Partial Transfers: --partial and --partial-dir¶
When a transfer is interrupted (network drop, killed process), rsync normally deletes the partially transferred file and starts over next time. This is painful for large files over slow links.
# Keep partial files for resumption
rsync -avP source/ dest/
# -P = --partial --progress
# Keep partials in a hidden directory (cleaner)
rsync -av --partial-dir=.rsync-partial source/ dest/
With --partial, the incomplete file stays at the destination. On the next run, rsync resumes from where it left off. With --partial-dir, partials are stored in a separate directory and moved into place only when complete -- safer because you never see incomplete files in the destination tree.
Bandwidth Limiting (--bwlimit)¶
Prevent rsync from saturating your network link:
# Limit to 10 MB/s
rsync -av --bwlimit=10m source/ dest/
# Limit to 5000 KiB/s
rsync -av --bwlimit=5000 source/ dest/
The default unit is KiB/s. Append m for MiB/s. This is essential for production transfers over shared links. Without it, a large rsync can starve other traffic and trigger alerts.
Incremental Backups with --link-dest¶
--link-dest creates space-efficient incremental backups using hard links. Unchanged files are hard-linked to the previous backup, consuming zero additional disk space:
# Today's backup links unchanged files to yesterday's
rsync -av --link-dest=/backups/2025-01-14 \
/data/ /backups/2025-01-15/
Each backup directory appears to be a full backup (you can ls it and see every file), but unchanged files share inodes. A 100 GB dataset that changes 1 GB per day costs roughly 1 GB per daily backup.
The --link-dest path is evaluated relative to the destination, not the source. This is a common source of confusion.
--backup and --backup-dir¶
When rsync overwrites or deletes a file, --backup preserves the old version:
# Save old versions in a backup directory
rsync -av --backup --backup-dir=/backups/$(date +%F) \
--delete source/ dest/
Combined with --delete, this gives you a recycle bin: deleted files end up in the backup directory instead of being lost.
Compression (-z)¶
Useful for compressible data over slow links. Wasteful for already-compressed data (images, videos, .gz, .zip). Modern SSH already compresses the channel, so -z on a fast LAN adds CPU overhead with no benefit. As of rsync 3.2.x, you can use --compress-choice=zstd for better compression performance.
Trailing Slash Semantics¶
This is the single most confusing aspect of rsync and the cause of the most common mistakes:
# WITH trailing slash: sync contents of source into dest
rsync -av source/ dest/
# Result: dest/file1, dest/file2
# WITHOUT trailing slash: sync source directory itself into dest
rsync -av source dest/
# Result: dest/source/file1, dest/source/file2
The trailing slash on the source means "the contents of this directory." No trailing slash means "this directory and its contents." The trailing slash on the destination does not matter (but is good practice for clarity).
Think of source/ as source/* in shell glob terms, and source as the directory itself.
Remember: Mnemonic: "Slash means Stuff, no slash means the Sack." A trailing slash says "the stuff inside." No trailing slash says "the whole sack (directory) and its contents." This is the single most common rsync mistake in production scripts.
Gotcha: The trailing slash rule applies only to the source. The destination trailing slash has no effect on behavior. However, always include it for readability --
rsync -av src/ dest/makes the intent clear.
Production Examples¶
Local Directory Sync¶
# Mirror /app/current to /app/staging
rsync -av --delete /app/current/ /app/staging/
# Same, but preserve any extra configs on staging
rsync -av /app/current/ /app/staging/
Remote Sync Over SSH¶
# Push local build to production server
rsync -avz --delete \
-e 'ssh -i ~/.ssh/deploy_key -p 22' \
--exclude='.git/' \
--exclude='node_modules/' \
--exclude='.env' \
./dist/ deploy@prod.example.com:/var/www/app/
# Pull remote logs to local for analysis
rsync -avz --partial \
ops@loghost.example.com:/var/log/app/ \
./logs/
Incremental Backup with Hard Links¶
#!/bin/bash
# Daily incremental backup using --link-dest
DATE=$(date +%F)
LATEST=$(ls -1d /backups/daily/20* 2>/dev/null | tail -1)
DEST="/backups/daily/${DATE}"
LINK_OPT=""
if [ -n "$LATEST" ]; then
LINK_OPT="--link-dest=${LATEST}"
fi
rsync -av --delete \
$LINK_OPT \
--exclude='.cache/' \
--exclude='*.tmp' \
/data/ "${DEST}/"
# Prune backups older than 30 days
find /backups/daily/ -maxdepth 1 -type d -mtime +30 -exec rm -rf {} +
Mirror with --delete¶
# Full mirror: destination will be an exact copy of source
# ALWAYS dry-run first
rsync -avn --delete --delete-excluded \
--exclude='.git/' \
/source/ /mirror/
# If the dry-run looks correct, execute
rsync -av --delete --delete-excluded \
--exclude='.git/' \
/source/ /mirror/
Bandwidth-Limited Transfer¶
# Transfer a large dataset overnight without saturating the link
rsync -avz --partial --bwlimit=50m \
--progress \
/exports/dataset-2025.tar.gz \
analyst@remote.example.com:/imports/
Quick Reference¶
| Task | Command |
|---|---|
| Basic local sync | rsync -av source/ dest/ |
| Remote push | rsync -avz source/ user@host:/dest/ |
| Remote pull | rsync -avz user@host:/source/ dest/ |
| Mirror with delete | rsync -av --delete source/ dest/ |
| Dry run | rsync -avn --delete source/ dest/ |
| Resume large file | rsync -avP source/ dest/ |
| Limit bandwidth | rsync -av --bwlimit=10m source/ dest/ |
| Incremental backup | rsync -av --link-dest=../prev source/ dest/ |
| Exclude patterns | rsync -av --exclude='*.log' source/ dest/ |
| Exclude from file | rsync -av --exclude-from=excl.txt source/ dest/ |
| Custom SSH port | rsync -av -e 'ssh -p 2222' source/ user@host:/dest/ |
| Checksum compare | rsync -avc source/ dest/ |
| Itemized changes | rsync -av --itemize-changes source/ dest/ |
| Compress transfer | rsync -avz source/ user@host:/dest/ |
| Preserve hard links | rsync -avH source/ dest/ |
| Preserve ACLs + xattrs | rsync -avAX source/ dest/ |
| Backup replaced files | rsync -av --backup --backup-dir=/bak source/ dest/ |
Key Flags Cheat Sheet¶
| Flag | Long form | Meaning |
|---|---|---|
-a |
--archive |
-rlptgoD (the standard set) |
-v |
--verbose |
List files as they transfer |
-z |
--compress |
Compress data during transfer |
-P |
--partial --progress |
Keep partial files + show progress |
-n |
--dry-run |
Show what would happen |
-c |
--checksum |
Compare by checksum, not mtime+size |
-e |
--rsh |
Specify remote shell command |
-H |
--hard-links |
Preserve hard links |
-A |
--acls |
Preserve ACLs |
-X |
--xattrs |
Preserve extended attributes |
-S |
--sparse |
Handle sparse files efficiently |
-u |
--update |
Skip files that are newer on receiver |
-R |
--relative |
Use relative path names |
--delete |
Delete extraneous files from dest | |
--exclude |
Exclude files matching pattern | |
--include |
Include files matching pattern | |
--link-dest |
Hard-link to files in DIR when unchanged | |
--bwlimit |
Limit socket I/O bandwidth | |
--backup |
Make backups of replaced files | |
--backup-dir |
Directory to put backups in | |
--partial-dir |
Put partially transferred files in DIR | |
--itemize-changes |
Output a change-summary for all updates |
Wiki Navigation¶
Prerequisites¶
- Linux Ops (Topic Pack, L0)
Related Content¶
- /proc Filesystem (Topic Pack, L2) — Linux Fundamentals
- Advanced Bash for Ops (Topic Pack, L1) — Linux Fundamentals
- Adversarial Interview Gauntlet (30 sequences) (Scenario, L2) — Linux Fundamentals
- Bash Exercises (Quest Ladder) (CLI) (Exercise Set, L0) — Linux Fundamentals
- Case Study: CI Pipeline Fails — Docker Layer Cache Corruption (Case Study, L2) — Linux Fundamentals
- Case Study: Container Vuln Scanner False Positive Blocks Deploy (Case Study, L2) — Linux Fundamentals
- Case Study: Disk Full Root Services Down (Case Study, L1) — Linux Fundamentals
- Case Study: Disk Full — Runaway Logs, Fix Is Loki Retention (Case Study, L2) — Linux Fundamentals
- Case Study: HPA Flapping — Metrics Server Clock Skew, Fix Is NTP (Case Study, L2) — Linux Fundamentals
- Case Study: Inode Exhaustion (Case Study, L1) — Linux Fundamentals