security
l1
topic-pack
backup-restore --- Portal | Level: L1: Foundations | Topics: Backup & Restore | Domain: Security

Backup & Restore Primer¶

Why This Matters¶

Backups are your last line of defense against data loss — ransomware, human error, hardware failure, or bad deployments. But a backup you have never restored is just a hope. The discipline is not in taking backups; it is in testing restores, meeting recovery targets, and knowing exactly what is and is not protected.

Core Concepts¶

The 3-2-1 Rule¶

The foundational backup strategy:

3 copies of your data (1 primary + 2 backups)
2 different storage media/types (e.g., disk + object storage)
1 offsite copy (different region, different provider, or physical location)

Modern variant (3-2-1-1-0): add 1 air-gapped/immutable copy and 0 errors in restore testing.

RPO and RTO¶

Metric	Definition	Example
RPO (Recovery Point Objective)	Maximum acceptable data loss	RPO = 1 hour means you can lose at most 1 hour of data
RTO (Recovery Time Objective)	Maximum acceptable downtime	RTO = 4 hours means service must be back within 4 hours

RPO determines backup frequency. RTO determines restore speed and automation level. Both are business decisions, not technical ones.

Remember: "RPO = how much data you can lose. RTO = how long you can be down." Mnemonic: RPO has a P for Point (point in time you recover to). RTO has a T for Time (time to recover). An RPO of 0 requires synchronous replication. An RTO of 0 requires active-active architecture. Both are expensive — define what the business actually needs.

Gotcha: The most dangerous backup assumption: "we have backups" without ever testing a restore. At least 30% of backup restores fail due to corruption, missing dependencies, or changed schemas. Test restores quarterly at minimum. For databases, test that the restored data is actually queryable, not just that the file was copied.

Backup Types¶

Type	What It Captures	Speed	Storage
Full	Everything	Slow	Large
Incremental	Changes since last backup (any type)	Fast	Small
Differential	Changes since last full backup	Medium	Medium
Snapshot	Point-in-time filesystem/volume state	Very fast	Varies

Backup Tools¶

Borg Backup¶

Name origin: BorgBackup is named after the Borg from Star Trek — the cybernetic collective that assimilates everything. Fitting for a deduplicating backup tool that absorbs data efficiently. It was forked from Attic in 2015. The key innovation is content-defined chunking with a rolling hash (Buzhash), which means small changes to a large file only result in a few new chunks being stored, not a full copy.

Deduplicating, compressed, encrypted backup tool. Excellent for server-side backups:

# Initialize a repository
borg init --encryption=repokey /backup/repo

# Create a backup
borg create /backup/repo::daily-{now} /etc /var/lib/postgresql /home \
    --exclude '*.tmp' --exclude '/home/*/.cache'

# List archives
borg list /backup/repo

# Restore a specific archive
borg extract /backup/repo::daily-2024-01-15 --target /restore/

# Prune old backups (keep 7 daily, 4 weekly, 6 monthly)
borg prune /backup/repo --keep-daily=7 --keep-weekly=4 --keep-monthly=6

Key features: block-level deduplication, compression (lz4/zstd), authenticated encryption.

Restic¶

Similar to Borg but with native cloud backend support (S3, GCS, Azure Blob, SFTP):

# Initialize a repo on S3
restic init --repo s3:s3.amazonaws.com/my-backup-bucket

# Backup
restic backup /etc /var/lib/postgresql \
    --repo s3:s3.amazonaws.com/my-backup-bucket \
    --exclude-caches

# List snapshots
restic snapshots --repo s3:s3.amazonaws.com/my-backup-bucket

# Restore
restic restore latest --target /restore/ \
    --repo s3:s3.amazonaws.com/my-backup-bucket

# Forget and prune
restic forget --keep-daily 7 --keep-weekly 4 --prune \
    --repo s3:s3.amazonaws.com/my-backup-bucket

Velero (Kubernetes)¶

Name origin: Velero is Italian for "sailboat" — a nod to Heptio's nautical branding (Heptio was the company founded by Kubernetes co-creators Joe Beda and Craig McLuckie). VMware acquired Heptio in 2018. Velero was originally called "Ark" but was renamed to avoid trademark conflicts. It backs up both Kubernetes resource definitions (YAML) and persistent volume data, storing everything in object storage (S3, GCS, Azure Blob).

Backup and restore for Kubernetes cluster resources and persistent volumes:

# Install Velero with AWS provider
velero install --provider aws --bucket my-velero-bucket \
    --secret-file ./credentials --backup-location-config region=us-east-1

# Create a backup of a namespace
velero backup create staging-backup --include-namespaces staging

# Create a scheduled backup
velero schedule create daily-backup --schedule="0 2 * * *" \
    --include-namespaces production --ttl 720h

# Restore from backup
velero restore create --from-backup staging-backup

# Restore to a different namespace
velero restore create --from-backup staging-backup \
    --namespace-mappings staging:staging-restored

# List backups
velero backup get

Snapshot Strategies¶

Snapshots (LVM/Cloud)¶

# LVM snapshot for consistent backup
lvcreate --size 10G --snapshot --name db-snap /dev/vg0/db-data

# AWS EBS snapshot
aws ec2 create-snapshot --volume-id vol-abc123 --description "db-daily"

Snapshots are not backups alone — they reside in the same provider/region. Combine with cross-region copies.

Database Backups¶

Gotcha: Copying database files while the database is running (e.g., cp /var/lib/postgresql/ while PostgreSQL is active) produces a corrupted backup. The database has in-flight transactions, dirty buffers, and WAL entries that are not on disk yet. You need either: (1) a logical dump (pg_dump) that reads consistent data through the database engine, (2) a filesystem snapshot taken while the database is quiesced (FLUSH TABLES WITH READ LOCK for MySQL), or (3) continuous WAL archiving for point-in-time recovery. Never cp a live database directory.

Databases need application-consistent backups, not just file copies:

# PostgreSQL logical backup
pg_dump -Fc mydb > mydb_$(date +%Y%m%d).dump

# PostgreSQL restore
pg_restore -d mydb mydb_20240115.dump

# MySQL
mysqldump --single-transaction --all-databases > full_$(date +%Y%m%d).sql

# Point-in-time recovery (PostgreSQL WAL archiving)
archive_command = 'cp %p /backup/wal/%f'
restore_command = 'cp /backup/wal/%f %p'

Ransomware-Resilient Backup Design¶

Ransomware encrypts your data and demands payment. Your backups are the primary defense — but only if the attacker cannot reach them.

Immutable Backups¶

Storage that cannot be modified or deleted for a defined retention period:

# AWS S3 Object Lock (compliance mode — even root can't delete)
aws s3api put-object-lock-configuration \
    --bucket my-backup-bucket \
    --object-lock-configuration '{"ObjectLockEnabled":"Enabled","Rule":{"DefaultRetention":{"Mode":"COMPLIANCE","Days":30}}}'

# Restic with S3 Object Lock
restic backup /data --repo s3:s3.amazonaws.com/my-backup-bucket
# Objects are immutable for 30 days — ransomware can't delete them

# Borg with append-only repository (remote server restricts to append)
# In ~/.ssh/authorized_keys on backup server:
# command="borg serve --append-only --restrict-to-path /backup/repo" ssh-rsa AAAA...

Air-Gapped Copies¶

A backup that is physically disconnected from any network the attacker could reach:

Tape: LTO tape stored offsite (still used in enterprise)
Offline disk: USB drive rotated weekly, stored in a safe
Cloud with separate credentials: Different provider, different account, different auth chain
Pull-based backup: Backup server pulls data from production (production can't write to backup)

Retention That Survives Encryption¶

Ransomware may sit dormant for weeks before activating. Your retention must outlast the dwell time:

Keep at least 30 days of daily backups
Keep at least 3 months of weekly backups
Test restoration from oldest available backup, not just latest
Monitor backup sizes — sudden size changes may indicate encrypted data being backed up

The Ransomware Backup Checklist¶

Backups are immutable (S3 Object Lock, append-only, WORM storage)
At least one copy is air-gapped or on a separate credential chain
Retention covers 30+ days (beyond typical ransomware dwell time)
Backup credentials are separate from production credentials
Restore tested monthly from a backup older than 7 days
Backup integrity monitoring alerts on anomalies (size, duration, error rate)
Backup network is segmented from production network

Restore Testing¶

The most important backup practice. Automate a monthly restore to a temp location, validate critical files exist, and check data integrity. Track results. A backup that fails restore is not a backup.

Common Pitfalls¶

No restore testing: The backup works until you need it — then you discover it does not
Backing up the container, not the data: Containers are ephemeral; back up persistent volumes
Ignoring RPO/RTO: Daily backups with a 1-hour RPO requirement is a gap, not a strategy
Single-region backups: Provider outage takes your data and your backups
No encryption: Backup media is a theft target — encrypt at rest and in transit
No monitoring: Backup jobs fail silently; alert on missed or failed backups
Snapshot-only strategy: Snapshots in the same provider are not offsite copies

Backup Restore Flashcards (CLI) (flashcard_deck, L1) — Backup & Restore
Disaster Recovery & Backup Engineering (Topic Pack, L2) — Backup & Restore

Backup & Restore Primer¶

Why This Matters¶

Core Concepts¶

The 3-2-1 Rule¶

RPO and RTO¶

Backup Types¶

Backup Tools¶

Borg Backup¶

Restic¶

Velero (Kubernetes)¶

Snapshot Strategies¶

Snapshots (LVM/Cloud)¶

Database Backups¶

Ransomware-Resilient Backup Design¶

Immutable Backups¶

Air-Gapped Copies¶

Retention That Survives Encryption¶

The Ransomware Backup Checklist¶

Restore Testing¶

Common Pitfalls¶

Wiki Navigation¶

Pages that link here¶

Backup & Restore Primer¶

Why This Matters¶

Core Concepts¶

The 3-2-1 Rule¶

RPO and RTO¶

Backup Types¶

Backup Tools¶

Borg Backup¶

Restic¶

Velero (Kubernetes)¶

Snapshot Strategies¶

Snapshots (LVM/Cloud)¶

Database Backups¶

Ransomware-Resilient Backup Design¶

Immutable Backups¶

Air-Gapped Copies¶

Retention That Survives Encryption¶

The Ransomware Backup Checklist¶

Restore Testing¶

Common Pitfalls¶

Wiki Navigation¶

Related Content¶

Pages that link here¶