rsync - Street Ops¶

What experienced operators know about rsync that the man page buries in 4000 lines of options.

Quick Diagnosis Commands¶

Checking What Would Change (Dry Run)¶

# Preview all changes, including deletions
rsync -avn --delete --itemize-changes source/ dest/

# Same, but only show files that differ
rsync -avn --delete --itemize-changes source/ dest/ | grep -v '^\.'

# Count how many files would transfer
rsync -avn source/ dest/ | grep -c '^'

# Show only files that would be deleted
rsync -avn --delete source/ dest/ | grep '^deleting'

The --itemize-changes output is your best friend for understanding exactly what rsync plans to do:

>f..t...... config.yml       # file, timestamp changed
>f.s....... app.jar          # file, size changed
>f..tp..... deploy.sh        # file, timestamp + permissions changed
*deleting   old-release/     # directory will be removed
cd+++++++++ new-feature/     # new directory being created
>f+++++++++ new-feature/app  # new file being created

Verifying Sync Completeness¶

# After sync, verify source and dest match
rsync -avnc source/ dest/
# -c forces checksum comparison
# If output is empty, they match

# Compare file counts
find source/ -type f | wc -l
find dest/ -type f | wc -l

# Compare total sizes
du -sh source/ dest/

# Deep verification: generate checksums on both sides
find source/ -type f -exec md5sum {} + | sort > /tmp/src.md5
cd dest/ && md5sum -c /tmp/src.md5

Monitoring Transfer Speed¶

# Real-time progress with overall stats
rsync -av --progress --stats source/ dest/

# Human-readable summary at the end
rsync -av --stats source/ dest/ 2>&1 | tail -20

# Monitor bandwidth usage during transfer (separate terminal)
watch -n1 'cat /proc/net/dev | grep eth0'

# Use pv for pipeline monitoring
tar cf - source/ | pv | ssh user@host 'tar xf - -C /dest/'
# (Alternative to rsync when you want visual throughput)

Common Scenarios¶

Server Migration¶

Moving an application from old server to new server with minimal downtime:

# Phase 1: Initial bulk sync (while app is still running on old server)
rsync -avzP --delete \
  --exclude='/var/run/' \
  --exclude='/proc/' \
  --exclude='/sys/' \
  --exclude='*.pid' \
  --exclude='*.sock' \
  -e 'ssh -i ~/.ssh/migration_key' \
  / newserver:/

# Phase 2: Stop the application, do a final delta sync
# (This is fast because phase 1 already transferred most data)
systemctl stop myapp

rsync -avz --delete \
  --exclude='/var/run/' \
  --exclude='/proc/' \
  --exclude='/sys/' \
  -e 'ssh -i ~/.ssh/migration_key' \
  / newserver:/

# Phase 3: Verify critical paths
rsync -avnc /etc/ newserver:/etc/
rsync -avnc /var/lib/myapp/ newserver:/var/lib/myapp/

# Phase 4: Cut over DNS / load balancer to new server

For application-level migration (not full OS):

# Sync the application directory
rsync -avz --delete \
  --exclude='.env' \
  --exclude='logs/' \
  --exclude='*.pid' \
  --exclude='node_modules/' \
  /opt/myapp/ newserver:/opt/myapp/

# Sync the data directory separately (might be large)
rsync -avzP --bwlimit=100m \
  /data/myapp/ newserver:/data/myapp/

Incremental Backup Rotation (Daily/Weekly/Monthly with --link-dest)¶

A production backup scheme that maintains daily, weekly, and monthly snapshots using hard links for space efficiency:

#!/bin/bash
# incremental-backup.sh -- production backup with rotation
set -euo pipefail

SOURCE="/data"
BACKUP_ROOT="/backups"
DATE=$(date +%F)
DAY_OF_WEEK=$(date +%u)   # 1=Monday, 7=Sunday
DAY_OF_MONTH=$(date +%d)

DAILY_DIR="${BACKUP_ROOT}/daily/${DATE}"
LATEST_DAILY=$(ls -1d ${BACKUP_ROOT}/daily/20* 2>/dev/null | tail -1 || true)

# --- Daily backup with --link-dest ---
LINK_OPT=""
if [ -n "$LATEST_DAILY" ] && [ "$LATEST_DAILY" != "$DAILY_DIR" ]; then
    LINK_OPT="--link-dest=${LATEST_DAILY}"
fi

rsync -a --delete \
  $LINK_OPT \
  --exclude-from=/etc/backup-excludes.txt \
  "${SOURCE}/" "${DAILY_DIR}/"

echo "$(date -Iseconds) daily backup complete: ${DAILY_DIR}" >> /var/log/backup.log

# --- Weekly snapshot (Sunday) ---
if [ "$DAY_OF_WEEK" -eq 7 ]; then
    WEEKLY_DIR="${BACKUP_ROOT}/weekly/$(date +%G-W%V)"
    cp -al "${DAILY_DIR}" "${WEEKLY_DIR}"
    echo "$(date -Iseconds) weekly snapshot: ${WEEKLY_DIR}" >> /var/log/backup.log
fi

# --- Monthly snapshot (1st of month) ---
if [ "$DAY_OF_MONTH" -eq "01" ]; then
    MONTHLY_DIR="${BACKUP_ROOT}/monthly/$(date +%Y-%m)"
    cp -al "${DAILY_DIR}" "${MONTHLY_DIR}"
    echo "$(date -Iseconds) monthly snapshot: ${MONTHLY_DIR}" >> /var/log/backup.log
fi

# --- Prune old backups ---
find "${BACKUP_ROOT}/daily/"   -maxdepth 1 -type d -mtime +30  -exec rm -rf {} +
find "${BACKUP_ROOT}/weekly/"  -maxdepth 1 -type d -mtime +90  -exec rm -rf {} +
find "${BACKUP_ROOT}/monthly/" -maxdepth 1 -type d -mtime +365 -exec rm -rf {} +

The cp -al command creates hard-linked copies instantly, regardless of directory size. Each weekly/monthly snapshot is a complete directory tree that shares inodes with the daily backup.

Syncing Config Across a Fleet¶

#!/bin/bash
# sync-config.sh -- push config to all servers
set -euo pipefail

CONFIG_DIR="/etc/myapp"
SERVERS_FILE="/etc/fleet-servers.txt"  # one hostname per line
FAILED=""

while IFS= read -r server; do
    echo "--- Syncing to ${server} ---"
    if rsync -avz --delete \
        --exclude='local-overrides.conf' \
        --timeout=30 \
        -e 'ssh -o ConnectTimeout=10 -o BatchMode=yes' \
        "${CONFIG_DIR}/" "${server}:${CONFIG_DIR}/"; then
        echo "OK: ${server}"
    else
        echo "FAIL: ${server}"
        FAILED="${FAILED} ${server}"
    fi
done < "$SERVERS_FILE"

if [ -n "$FAILED" ]; then
    echo "FAILED SERVERS:${FAILED}" >&2
    exit 1
fi

For larger fleets (50+ servers), parallelize with GNU parallel or xargs:

# Parallel sync to fleet (8 at a time)
cat /etc/fleet-servers.txt | \
  xargs -P8 -I{} rsync -avz --delete \
    --timeout=30 \
    -e 'ssh -o ConnectTimeout=10 -o BatchMode=yes' \
    /etc/myapp/ {}:/etc/myapp/

Deploying Static Sites¶

#!/bin/bash
# deploy-static.sh -- deploy a built static site
set -euo pipefail

BUILD_DIR="./build"
DEPLOY_HOST="web@cdn-origin.example.com"
DEPLOY_PATH="/var/www/site"

# Verify build exists
if [ ! -d "$BUILD_DIR" ] || [ ! -f "${BUILD_DIR}/index.html" ]; then
    echo "ERROR: Build directory missing or incomplete" >&2
    exit 1
fi

# Dry run first
echo "=== Dry run ==="
rsync -avn --delete \
  --exclude='.htaccess' \
  --exclude='uploads/' \
  "${BUILD_DIR}/" "${DEPLOY_HOST}:${DEPLOY_PATH}/"

read -p "Deploy? [y/N] " confirm
if [ "$confirm" != "y" ]; then
    echo "Aborted."
    exit 0
fi

# Deploy with delete, keeping uploads and .htaccess
rsync -avz --delete \
  --exclude='.htaccess' \
  --exclude='uploads/' \
  "${BUILD_DIR}/" "${DEPLOY_HOST}:${DEPLOY_PATH}/"

echo "Deploy complete. Invalidate CDN cache if needed."

Replicating Large Datasets¶

# Initial transfer of a large dataset (terabytes)
# Use --partial-dir so incomplete files don't pollute destination
# Use --bwlimit to avoid saturating the link
rsync -av \
  --partial-dir=.rsync-tmp \
  --bwlimit=200m \
  --progress \
  --stats \
  --timeout=600 \
  /exports/dataset-2025/ remote:/imports/dataset-2025/

# If interrupted, just re-run the same command -- it resumes
# Check completion after:
rsync -avnc /exports/dataset-2025/ remote:/imports/dataset-2025/

For extremely large datasets, consider splitting the sync:

# Sync in directory chunks to make progress visible
for dir in /exports/dataset-2025/*/; do
    dirname=$(basename "$dir")
    echo "=== Syncing ${dirname} ==="
    rsync -av --partial-dir=.rsync-tmp \
      --bwlimit=200m \
      "${dir}" "remote:/imports/dataset-2025/${dirname}/"
done

Disaster Recovery Data Movement¶

#!/bin/bash
# dr-sync.sh -- sync critical data to DR site
set -euo pipefail

DR_HOST="dr-backup@dr-site.example.com"
LOCKFILE="/var/run/dr-sync.lock"
LOGFILE="/var/log/dr-sync.log"

# Ensure single instance
exec 200>"$LOCKFILE"
if ! flock -n 200; then
    echo "$(date -Iseconds) DR sync already running, skipping" >> "$LOGFILE"
    exit 0
fi

log() { echo "$(date -Iseconds) $*" >> "$LOGFILE"; }

log "START dr-sync"

# Critical databases (small, must be consistent)
log "syncing database dumps"
rsync -az --delete \
  --timeout=120 \
  /backups/db-dumps/ "${DR_HOST}:/dr/db-dumps/"

# Application data (large, can tolerate eventual consistency)
log "syncing application data"
rsync -az --delete \
  --partial-dir=.rsync-partial \
  --bwlimit=100m \
  --timeout=600 \
  /data/app/ "${DR_HOST}:/dr/app-data/"

# Config and secrets (small, critical)
log "syncing config"
rsync -az --delete \
  /etc/myapp/ "${DR_HOST}:/dr/config/"

log "END dr-sync"

Syncing to/from S3-Compatible Storage (rclone Comparison)¶

rsync does not natively speak S3. For S3-compatible storage, use rclone, which provides rsync-like semantics:

# rsync to local/remote filesystem
rsync -av --delete /data/ remote:/data/

# Equivalent with rclone to S3
rclone sync /data/ s3remote:my-bucket/data/ --progress

# rclone from S3 to local
rclone sync s3remote:my-bucket/data/ /data/ --progress

# rclone with bandwidth limit (same concept as rsync --bwlimit)
rclone sync /data/ s3remote:my-bucket/data/ --bwlimit 50M

# rclone dry run (same concept as rsync -n)
rclone sync /data/ s3remote:my-bucket/data/ --dry-run

Key differences from rsync:

Feature	rsync	rclone
Delta transfer	Yes (block-level)	No (whole-file only)
S3/GCS/Azure support	No	Yes
SSH transport	Built-in	Via SFTP backend
Checksum comparison	MD5	Backend-specific (S3: ETag/MD5)
Permissions/ownership	Full POSIX	N/A for object storage
Bandwidth limiting	`--bwlimit`	`--bwlimit`

Use rsync for filesystem-to-filesystem. Use rclone for anything involving object storage.

Operational Patterns¶

rsync in Cron with Locking (flock)¶

Never run rsync from cron without file locking. If a sync takes longer than the cron interval, you get overlapping processes fighting over the same destination:

# /etc/cron.d/rsync-backup
# Run every hour, skip if previous run is still going
0 * * * * root flock -n /var/run/rsync-backup.lock \
  rsync -a --delete /data/ /backups/hourly/ \
  >> /var/log/rsync-backup.log 2>&1

With a wrapper script for better control:

#!/bin/bash
# /usr/local/bin/rsync-cron.sh
set -euo pipefail

LOCKFILE="/var/run/rsync-${1:-default}.lock"
LOGFILE="/var/log/rsync-${1:-default}.log"
MAX_RUNTIME=3600  # kill after 1 hour

exec 200>"$LOCKFILE"
if ! flock -n 200; then
    echo "$(date -Iseconds) SKIP: previous run still active" >> "$LOGFILE"
    exit 0
fi

# Timeout protection
timeout "$MAX_RUNTIME" rsync -a --delete \
  --exclude-from=/etc/rsync-excludes.txt \
  "$2" "$3" \
  >> "$LOGFILE" 2>&1

EXIT_CODE=$?
if [ $EXIT_CODE -eq 124 ]; then
    echo "$(date -Iseconds) TIMEOUT: sync killed after ${MAX_RUNTIME}s" >> "$LOGFILE"
elif [ $EXIT_CODE -ne 0 ]; then
    echo "$(date -Iseconds) ERROR: rsync exited with code ${EXIT_CODE}" >> "$LOGFILE"
else
    echo "$(date -Iseconds) OK: sync complete" >> "$LOGFILE"
fi

rsync Daemon Mode¶

For high-frequency syncs or when SSH overhead is undesirable, rsync can run as a daemon listening on port 873:

# /etc/rsyncd.conf
uid = nobody
gid = nogroup
use chroot = yes
max connections = 10
log file = /var/log/rsyncd.log
pid file = /var/run/rsyncd.pid

[data]
    path = /data/shared
    comment = Shared data
    read only = no
    auth users = syncuser
    secrets file = /etc/rsyncd.secrets
    hosts allow = 10.0.0.0/8

[backups]
    path = /backups
    comment = Backup target
    read only = no
    auth users = backupuser
    secrets file = /etc/rsyncd.secrets
    hosts allow = 10.0.1.0/24

# Start daemon
rsync --daemon --config=/etc/rsyncd.conf

# Client connects using double-colon syntax (daemon mode, not SSH)
rsync -av /data/ syncuser@backupserver::data/

# Or with rsync:// URL
rsync -av /data/ rsync://syncuser@backupserver/data/

# The secrets file format: username:password
echo "syncuser:s3cretP4ss" > /etc/rsyncd.secrets
chmod 600 /etc/rsyncd.secrets

Daemon mode is faster than SSH (no encryption overhead) but less secure. Use it only on trusted networks. For untrusted networks, tunnel through SSH:

# rsync daemon over SSH tunnel
ssh -L 873:localhost:873 backupserver &
rsync -av /data/ rsync://localhost/data/

rsync Over SSH with Key Auth¶

Production rsync over SSH requires proper key setup with restrictions:

# On the destination server, restrict the deploy key in authorized_keys:
# ~/.ssh/authorized_keys
command="rsync --server --daemon .",no-port-forwarding,no-X11-forwarding,no-agent-forwarding ssh-ed25519 AAAA... deploy@source

# Or restrict to specific rsync receive commands:
command="rsync --server -vlogDtpre.iLsfxCIvu . /var/www/app/",no-port-forwarding,no-X11-forwarding ssh-ed25519 AAAA... deploy@ci

For a more flexible approach, use rrsync (restricted rsync), which ships with rsync:

# authorized_keys with rrsync -- limits to a specific directory
command="/usr/bin/rrsync /var/www/app",no-port-forwarding,no-X11-forwarding ssh-ed25519 AAAA... deploy@ci

# The client syncs normally -- rrsync enforces the path restriction
rsync -avz ./dist/ deploy@prod:/
# rrsync rewrites this to /var/www/app/ on the server side

Wrapper Scripts with Logging and Error Handling¶

A production-grade rsync wrapper:

#!/bin/bash
# /usr/local/bin/managed-rsync.sh
# Production rsync wrapper with logging, alerting, and error handling
set -euo pipefail

# --- Configuration ---
SCRIPT_NAME=$(basename "$0")
LOG_DIR="/var/log/rsync"
ALERT_EMAIL="ops@example.com"
ALERT_ON_FAILURE=true
MAX_RETRIES=3
RETRY_DELAY=60

# --- Argument parsing ---
SOURCE="${1:?Usage: $SCRIPT_NAME SOURCE DEST [LABEL]}"
DEST="${2:?Usage: $SCRIPT_NAME SOURCE DEST [LABEL]}"
LABEL="${3:-$(echo "$SOURCE" | tr '/' '-' | sed 's/^-//')}"

LOGFILE="${LOG_DIR}/${LABEL}.log"
LOCKFILE="/var/run/rsync-${LABEL}.lock"

mkdir -p "$LOG_DIR"

log() {
    echo "$(date -Iseconds) [$LABEL] $*" | tee -a "$LOGFILE"
}

alert() {
    if [ "$ALERT_ON_FAILURE" = true ]; then
        echo "$*" | mail -s "rsync FAILED: ${LABEL}" "$ALERT_EMAIL" 2>/dev/null || true
    fi
}

# --- Locking ---
exec 200>"$LOCKFILE"
if ! flock -n 200; then
    log "SKIP: previous sync still running"
    exit 0
fi

# --- Transfer with retries ---
log "START: ${SOURCE} -> ${DEST}"
START_TIME=$(date +%s)

for attempt in $(seq 1 $MAX_RETRIES); do
    log "Attempt ${attempt}/${MAX_RETRIES}"

    if rsync -az --delete \
        --partial-dir=.rsync-partial \
        --timeout=300 \
        --stats \
        --log-file="${LOGFILE}" \
        --exclude-from=/etc/rsync-excludes.txt \
        "${SOURCE}" "${DEST}"; then

        END_TIME=$(date +%s)
        DURATION=$(( END_TIME - START_TIME ))
        log "OK: completed in ${DURATION}s"
        exit 0
    fi

    RC=$?
    log "WARN: attempt ${attempt} failed with exit code ${RC}"

    if [ $attempt -lt $MAX_RETRIES ]; then
        log "Retrying in ${RETRY_DELAY}s..."
        sleep "$RETRY_DELAY"
    fi
done

END_TIME=$(date +%s)
DURATION=$(( END_TIME - START_TIME ))
log "FAIL: all ${MAX_RETRIES} attempts failed after ${DURATION}s"
alert "rsync ${LABEL} failed after ${MAX_RETRIES} attempts (${DURATION}s). Check ${LOGFILE}"
exit 1

rsync Exit Codes Reference¶

Code	Meaning	Action
0	Success	None
1	Syntax or usage error	Fix command
2	Protocol incompatibility	Version mismatch between client/server
3	Errors selecting I/O files	Check permissions
5	Error starting client-server protocol	Daemon config issue
10	Error in socket I/O	Network issue, retry
11	Error in file I/O	Disk full, permissions
12	Error in rsync protocol data stream	Network corruption, retry
13	Errors with program diagnostics	Check rsync version
14	Error in IPC code	Internal error
20	Received SIGUSR1 or SIGINT	Killed by user/timeout
21	waitpid() error	System issue
22	Error allocating memory	OOM, reduce file count
23	Partial transfer (some files not transferred)	Check per-file errors in log
24	Partial transfer (vanished source files)	Source changed during sync
25	--max-delete limit reached	Increase limit or investigate
30	Timeout in data send/receive	Network issue, increase --timeout
35	Timeout waiting for daemon connection	Daemon down or firewall

Exit code 24 is common in production (files being actively written to during sync) and is usually safe to ignore. Exit code 23 requires investigation -- some files failed to transfer.