Skip to content

Portal | Level: L1: Foundations | Topics: rsync, Linux Fundamentals | Domain: Linux

rsync - Primer

Why This Matters

Who made it: rsync was created by Andrew Tridgell and Paul Mackerras in 1996. Tridgell is also the creator of Samba (SMB/CIFS for Linux). The rsync algorithm -- splitting files into blocks and computing rolling checksums to find differences -- was the subject of Tridgell's PhD thesis at the Australian National University.

rsync is the workhorse of file synchronization in production Linux environments. Whether you are migrating terabytes between datacenters, maintaining incremental backups that go back 90 days, deploying a static site, or keeping a fleet of servers in sync, rsync is the tool you reach for. It has been battle-tested since 1996. It is installed on virtually every Unix system. It transfers only the differences between source and destination, making it dramatically faster than naive copy tools for repeated syncs. Mastering rsync means understanding trailing-slash semantics, delete behavior, exclusion patterns, and resumable transfers -- the details that separate a clean sync from a data-loss incident.

Core Concepts

The Delta-Transfer Algorithm

rsync's defining feature is its delta-transfer algorithm. Instead of copying entire files, it splits files into fixed-size blocks, computes rolling checksums, and transfers only the blocks that differ between source and destination. For a 2 GB database dump where 50 MB changed, rsync sends roughly 50 MB instead of 2 GB.

The algorithm works in three phases:

  1. File list generation -- the sender builds a list of files to transfer, including metadata (size, mtime, permissions).
  2. Checksum matching -- for each file that might differ, the receiver computes block checksums and sends them to the sender.
  3. Delta transmission -- the sender compares local blocks against the receiver's checksums and sends only the differing blocks.

For new files or files below the block-size threshold, rsync falls back to full transfer. The overhead of checksumming is worthwhile only when files are large and changes are small.

Archive Mode (-a)

The -a flag is shorthand for -rlptgoD:

Flag Meaning
-r Recurse into directories
-l Copy symlinks as symlinks
-p Preserve permissions
-t Preserve modification times
-g Preserve group ownership
-o Preserve owner (requires root)
-D Preserve device files and special files

Archive mode is the correct default for almost all server-to-server syncs. Without it, you lose ownership and permissions, which breaks applications that depend on file metadata.

What -a does not include:

  • -H (hard links) -- must add explicitly if your source uses hard links
  • -A (ACLs) -- add if your filesystem uses POSIX ACLs
  • -X (extended attributes) -- add for SELinux contexts or xattrs
  • -S (sparse files) -- add for disk images or sparse databases

Visibility: -v, -P, and --progress

# Verbose: list each file as it transfers
rsync -av source/ dest/

# Progress per file (bytes transferred, percentage, speed)
rsync -avP source/ dest/

# -P is shorthand for --partial --progress
# So you also get resumable transfers for free

# Itemized changes: show exactly what changed per file
rsync -avv --itemize-changes source/ dest/

The itemize-changes output uses a compact notation:

>f.st...... file.txt

Each character position encodes a change type: f=file, d=directory, s=size changed, t=timestamp changed, p=permissions changed, etc. This is invaluable for auditing what rsync actually did.

--delete and Its Variants

--delete makes the destination a mirror of the source: files that exist on the destination but not the source are removed. Without it, rsync only adds and updates -- it never removes.

There are three timing variants:

Flag When files are deleted Use case
--delete-before Before transfer starts Frees disk space first; slow for large trees
--delete-during As each directory is processed Default when --delete is used; good balance
--delete-after After all transfers complete Safest: destination has both old and new files during sync

Critical safety rule: always run --delete with --dry-run first. There is no undo.

War story: An engineer ran rsync -av --delete /dev/null/ /var/www/ intending to clear a directory. The trailing slash on /dev/null/ failed silently, and --delete removed everything in /var/www/ because the "source" appeared empty. The correct way to empty a directory is rm -rf /var/www/* or create an empty temp directory as the source. Always dry-run --delete operations.

# Preview what would be deleted
rsync -avn --delete source/ dest/

# Then execute
rsync -av --delete source/ dest/

Exclude and Include Patterns

rsync evaluates --exclude and --include rules in the order they appear on the command line. The first matching rule wins.

# Exclude logs and temp files
rsync -av --exclude='*.log' --exclude='/tmp/' source/ dest/

# Include only .conf files, exclude everything else
rsync -av --include='*.conf' --exclude='*' source/ dest/

# Complex pattern: include .py but exclude __pycache__
rsync -av \
  --include='*/' \
  --exclude='__pycache__/' \
  --include='*.py' \
  --exclude='*' \
  source/ dest/

Key pattern rules:

  • Patterns starting with / are anchored to the transfer root.
  • Patterns ending with / match only directories.
  • * matches any path component (but not /).
  • ** matches anything, including slashes.
  • Rules from a file: --exclude-from=exclude-list.txt (one pattern per line).

The order trap: if you put --exclude='*' before --include='*.conf', the exclude catches everything first. Include rules that should override excludes must come first.

Remote Shell: -e for SSH

rsync's default transport is SSH. The -e flag lets you customize the SSH command:

# Default (equivalent to just using remote host syntax)
rsync -av source/ user@host:/dest/

# Custom SSH port
rsync -av -e 'ssh -p 2222' source/ user@host:/dest/

# Specific identity file
rsync -av -e 'ssh -i ~/.ssh/deploy_key' source/ user@host:/dest/

# Disable strict host key checking (CI pipelines)
rsync -av -e 'ssh -o StrictHostKeyChecking=no' source/ user@host:/dest/

# SSH with compression (alternative to rsync -z)
rsync -av -e 'ssh -C' source/ user@host:/dest/

Dry Run (-n / --dry-run)

The most important safety flag. Shows what rsync would do without actually doing it:

rsync -avn --delete source/ dest/

Always use dry-run before any destructive operation (--delete, first-time large syncs, production deploys). The output is identical to a real run -- same file list, same itemized changes -- but nothing is modified on disk.

Checksum vs Timestamp Comparison

By default, rsync decides whether to transfer a file based on modification time and size. If both match, the file is skipped. This is fast but can be wrong:

  • NFS may report different timestamps for the same file.
  • FAT/exFAT filesystems have 2-second timestamp granularity.
  • touch can set mtimes without changing content.
  • Build systems may regenerate identical files with new timestamps.

--checksum (-c) forces rsync to compute MD5 checksums to compare files:

# Slow but accurate: compare by content, not timestamp
rsync -avc source/ dest/

Trade-off: --checksum reads every file on both sides to compute checksums, even if timestamps already match. For large datasets, this can be dramatically slower. Use it when timestamps are unreliable, not as a default.

Partial Transfers: --partial and --partial-dir

When a transfer is interrupted (network drop, killed process), rsync normally deletes the partially transferred file and starts over next time. This is painful for large files over slow links.

# Keep partial files for resumption
rsync -avP source/ dest/
# -P = --partial --progress

# Keep partials in a hidden directory (cleaner)
rsync -av --partial-dir=.rsync-partial source/ dest/

With --partial, the incomplete file stays at the destination. On the next run, rsync resumes from where it left off. With --partial-dir, partials are stored in a separate directory and moved into place only when complete -- safer because you never see incomplete files in the destination tree.

Bandwidth Limiting (--bwlimit)

Prevent rsync from saturating your network link:

# Limit to 10 MB/s
rsync -av --bwlimit=10m source/ dest/

# Limit to 5000 KiB/s
rsync -av --bwlimit=5000 source/ dest/

The default unit is KiB/s. Append m for MiB/s. This is essential for production transfers over shared links. Without it, a large rsync can starve other traffic and trigger alerts.

--link-dest creates space-efficient incremental backups using hard links. Unchanged files are hard-linked to the previous backup, consuming zero additional disk space:

# Today's backup links unchanged files to yesterday's
rsync -av --link-dest=/backups/2025-01-14 \
  /data/ /backups/2025-01-15/

Each backup directory appears to be a full backup (you can ls it and see every file), but unchanged files share inodes. A 100 GB dataset that changes 1 GB per day costs roughly 1 GB per daily backup.

The --link-dest path is evaluated relative to the destination, not the source. This is a common source of confusion.

--backup and --backup-dir

When rsync overwrites or deletes a file, --backup preserves the old version:

# Save old versions in a backup directory
rsync -av --backup --backup-dir=/backups/$(date +%F) \
  --delete source/ dest/

Combined with --delete, this gives you a recycle bin: deleted files end up in the backup directory instead of being lost.

Compression (-z)

# Compress during transfer
rsync -avz source/ user@host:/dest/

Useful for compressible data over slow links. Wasteful for already-compressed data (images, videos, .gz, .zip). Modern SSH already compresses the channel, so -z on a fast LAN adds CPU overhead with no benefit. As of rsync 3.2.x, you can use --compress-choice=zstd for better compression performance.

Trailing Slash Semantics

This is the single most confusing aspect of rsync and the cause of the most common mistakes:

# WITH trailing slash: sync contents of source into dest
rsync -av source/ dest/
# Result: dest/file1, dest/file2

# WITHOUT trailing slash: sync source directory itself into dest
rsync -av source dest/
# Result: dest/source/file1, dest/source/file2

The trailing slash on the source means "the contents of this directory." No trailing slash means "this directory and its contents." The trailing slash on the destination does not matter (but is good practice for clarity).

Think of source/ as source/* in shell glob terms, and source as the directory itself.

Remember: Mnemonic: "Slash means Stuff, no slash means the Sack." A trailing slash says "the stuff inside." No trailing slash says "the whole sack (directory) and its contents." This is the single most common rsync mistake in production scripts.

Gotcha: The trailing slash rule applies only to the source. The destination trailing slash has no effect on behavior. However, always include it for readability -- rsync -av src/ dest/ makes the intent clear.

Production Examples

Local Directory Sync

# Mirror /app/current to /app/staging
rsync -av --delete /app/current/ /app/staging/

# Same, but preserve any extra configs on staging
rsync -av /app/current/ /app/staging/

Remote Sync Over SSH

# Push local build to production server
rsync -avz --delete \
  -e 'ssh -i ~/.ssh/deploy_key -p 22' \
  --exclude='.git/' \
  --exclude='node_modules/' \
  --exclude='.env' \
  ./dist/ deploy@prod.example.com:/var/www/app/

# Pull remote logs to local for analysis
rsync -avz --partial \
  ops@loghost.example.com:/var/log/app/ \
  ./logs/
#!/bin/bash
# Daily incremental backup using --link-dest
DATE=$(date +%F)
LATEST=$(ls -1d /backups/daily/20* 2>/dev/null | tail -1)
DEST="/backups/daily/${DATE}"

LINK_OPT=""
if [ -n "$LATEST" ]; then
  LINK_OPT="--link-dest=${LATEST}"
fi

rsync -av --delete \
  $LINK_OPT \
  --exclude='.cache/' \
  --exclude='*.tmp' \
  /data/ "${DEST}/"

# Prune backups older than 30 days
find /backups/daily/ -maxdepth 1 -type d -mtime +30 -exec rm -rf {} +

Mirror with --delete

# Full mirror: destination will be an exact copy of source
# ALWAYS dry-run first
rsync -avn --delete --delete-excluded \
  --exclude='.git/' \
  /source/ /mirror/

# If the dry-run looks correct, execute
rsync -av --delete --delete-excluded \
  --exclude='.git/' \
  /source/ /mirror/

Bandwidth-Limited Transfer

# Transfer a large dataset overnight without saturating the link
rsync -avz --partial --bwlimit=50m \
  --progress \
  /exports/dataset-2025.tar.gz \
  analyst@remote.example.com:/imports/

Quick Reference

Task Command
Basic local sync rsync -av source/ dest/
Remote push rsync -avz source/ user@host:/dest/
Remote pull rsync -avz user@host:/source/ dest/
Mirror with delete rsync -av --delete source/ dest/
Dry run rsync -avn --delete source/ dest/
Resume large file rsync -avP source/ dest/
Limit bandwidth rsync -av --bwlimit=10m source/ dest/
Incremental backup rsync -av --link-dest=../prev source/ dest/
Exclude patterns rsync -av --exclude='*.log' source/ dest/
Exclude from file rsync -av --exclude-from=excl.txt source/ dest/
Custom SSH port rsync -av -e 'ssh -p 2222' source/ user@host:/dest/
Checksum compare rsync -avc source/ dest/
Itemized changes rsync -av --itemize-changes source/ dest/
Compress transfer rsync -avz source/ user@host:/dest/
Preserve hard links rsync -avH source/ dest/
Preserve ACLs + xattrs rsync -avAX source/ dest/
Backup replaced files rsync -av --backup --backup-dir=/bak source/ dest/

Key Flags Cheat Sheet

Flag Long form Meaning
-a --archive -rlptgoD (the standard set)
-v --verbose List files as they transfer
-z --compress Compress data during transfer
-P --partial --progress Keep partial files + show progress
-n --dry-run Show what would happen
-c --checksum Compare by checksum, not mtime+size
-e --rsh Specify remote shell command
-H --hard-links Preserve hard links
-A --acls Preserve ACLs
-X --xattrs Preserve extended attributes
-S --sparse Handle sparse files efficiently
-u --update Skip files that are newer on receiver
-R --relative Use relative path names
--delete Delete extraneous files from dest
--exclude Exclude files matching pattern
--include Include files matching pattern
--link-dest Hard-link to files in DIR when unchanged
--bwlimit Limit socket I/O bandwidth
--backup Make backups of replaced files
--backup-dir Directory to put backups in
--partial-dir Put partially transferred files in DIR
--itemize-changes Output a change-summary for all updates

Wiki Navigation

Prerequisites