Portal | Level: L1: Foundations | Topics: tar & Compression, Linux Fundamentals | Domain: Linux
tar & Compression - Primer¶
Why This Matters¶
Every backup, every deployment artifact, every log archive, every Docker build context, every file transfer between servers involves tar, compression, or both. These are not optional skills. You will use tar and compression tools daily in operations.
tar (tape archive) bundles files into a single stream. Compression tools reduce the size. They are separate concerns that work together: tar handles structure (filenames, permissions, ownership, directory hierarchy), and compression handles size. Understanding this separation explains why the flags work the way they do.
Name origin:
tarstands for tape archive. It was created for Unix V7 in 1979 by John Gilmore (who later co-founded the EFF). The original purpose was writing file trees to magnetic tape drives — hence the name. The format's design (512-byte blocks, sequential access) still reflects its tape heritage. The GNU version, written by Gilmore himself, added compression integration and long filename support.
tar Fundamentals¶
Core Flags¶
tar has three mutually exclusive modes:
| Flag | Operation | Mnemonic |
|---|---|---|
-c |
Create an archive | Create |
-x |
Extract from an archive | eXtract |
-t |
List contents | lisT |
Common modifiers:
| Flag | Purpose |
|---|---|
-f FILE |
Read/write FILE (not stdin/stdout) |
-v |
Verbose output |
-z |
Filter through gzip |
-j |
Filter through bzip2 |
-J |
Filter through xz |
--zstd |
Filter through zstd |
Creating Archives¶
tar czf backup.tar.gz /var/data/ # gzip (most common)
tar cjf backup.tar.bz2 /var/data/ # bzip2 (smaller, slower)
tar cJf backup.tar.xz /var/data/ # xz (smallest, slowest)
tar --zstd -cf backup.tar.zst /var/data/ # zstd (modern, fast, good ratio)
tar cf backup.tar /var/data/ # uncompressed
# Multiple sources
tar czf backup.tar.gz /var/data/ /etc/nginx/ /home/deploy/.bashrc
Flag order matters for -f: it must be immediately followed by the filename.
Remember: Mnemonic for tar flags: Create, eXtract, lisT — the three modes. Then
-ffor File,-vfor Verbose, and-z/-j/-Jfor compression (z=gzip, j=bzip2, capital J=xz). "Create eXtract lisT" = CXT. The classic invocationtar czfreads as "create, gzip, file."
Extracting Archives¶
tar xzf backup.tar.gz # Extract gzip
tar xzf backup.tar.gz -C /var/restore/ # Extract to specific directory
tar xjf backup.tar.bz2 # Extract bzip2
tar xJf backup.tar.xz # Extract xz
# GNU tar auto-detects compression on extraction
tar xf backup.tar.gz # Works regardless of compression method
tar xf backup.tar.xz # Same command for any format
Listing Contents¶
tar tzf backup.tar.gz # Quick listing
tar tzvf backup.tar.gz # Detailed (permissions, size, date)
tar tzf backup.tar.gz | grep nginx.conf # Search for a file
Excluding Files¶
tar czf backup.tar.gz --exclude='node_modules' --exclude='*.log' \
--exclude='.git' /var/app/
# Exclude from file
tar czf backup.tar.gz --exclude-from=excludes.txt /var/app/
Extracting Specific Files¶
# Extract one file
tar xzf backup.tar.gz var/data/config.yaml
# Extract matching a pattern (GNU tar)
tar xzf backup.tar.gz --wildcards '*.conf'
# Strip leading directory components
tar xzf backup.tar.gz --strip-components=1
# Archive: myapp-v2.1/bin/app, myapp-v2.1/etc/config
# Extracts: bin/app, etc/config
Changing Directory with -C¶
# Create with relative paths (avoids absolute path issues)
tar czf /backups/data.tar.gz -C /var/data .
# Extract to a target directory
tar xzf backup.tar.gz -C /var/restore/
Incremental Archives¶
# Full backup with snapshot file
tar czf full.tar.gz --listed-incremental=/var/backups/snapshot.snar /var/data/
# Subsequent runs create incrementals (only changed files)
tar czf incr-1.tar.gz --listed-incremental=/var/backups/snapshot.snar /var/data/
# Simpler: files newer than a date
tar czf incremental.tar.gz --newer='2026-03-18' /var/data/
Compression Tools¶
gzip / gunzip — The Universal Default¶
Fast, reasonable compression, available everywhere.
gzip access.log # Compress (deletes original!)
gzip -k access.log # Compress, keep original
gunzip access.log.gz # Decompress
gzip -9 access.log # Best compression (slower)
gzip -1 access.log # Fastest (larger output)
zcat access.log.gz # View without decompressing
zgrep "ERROR" access.log.gz # Search without decompressing
bzip2 / bunzip2 — Better Ratio, Slower¶
Better compression than gzip but significantly slower and more memory-hungry.
bzip2 access.log # Compress (deletes original)
bzip2 -k access.log # Keep original
bunzip2 access.log.bz2 # Decompress
xz / unxz — Best Ratio, Slowest¶
Best compression ratio. Very slow to compress. High memory usage.
xz access.log # Compress (deletes original)
xz -k access.log # Keep original
xz -T 0 access.log # Use all CPU cores
xz --memlimit=512MiB file # Limit memory (important on shared servers)
zstd — Modern, Fast, Excellent¶
The best general-purpose choice. Near-gzip speed with better-than-bzip2 compression. Excellent decompression speed. Native threading.
Who made it: Zstandard (zstd) was created by Yann Collet at Facebook in 2015. Collet also created LZ4 and xxHash. Zstd was designed to replace both gzip (for general use) and snappy (for speed). It is now used by the Linux kernel for compressed firmware, by LLVM for debug info, by Docker for image layers, and by Meta internally for nearly everything. It was standardized as RFC 8478 in 2018.
zstd access.log # Compress (keeps original by default)
zstd --rm access.log # Remove original after compression
zstd -d access.log.zst # Decompress
zstd -19 access.log # High compression (level 1-19)
zstd -T0 access.log # Use all cores
zstd --adapt access.log # Auto-adjust level based on I/O speed
lz4 — Fastest¶
Fastest compression and decompression. Lower ratio. Ideal when speed matters more than size.
zip / unzip — Windows Compatibility¶
Not the best at anything, but universal. Windows users can open zip files natively.
zip -r backup.zip /var/data/ # Create
unzip backup.zip -d /var/restore/ # Extract
unzip -l backup.zip # List contents
Compression Comparison¶
| Tool | Ratio | Compress Speed | Decompress Speed | Memory | Best For |
|---|---|---|---|---|---|
| lz4 | Low | Fastest | Fastest | Low | Real-time, local transfers |
| gzip | Good | Fast | Fast | Low | General purpose, compatibility |
| zstd | Very good | Fast | Very fast | Medium | Modern default, best overall |
| bzip2 | Very good | Slow | Moderate | Medium | Legacy (prefer zstd) |
| xz | Best | Very slow | Moderate | High | Archival, distro packages |
Real numbers on a 1GB log file (approximate):
Tool Size Compress Decompress
gzip -6 ~180 MB ~12s ~3s
bzip2 -6 ~140 MB ~45s ~15s
xz -6 ~110 MB ~120s ~5s
zstd -3 ~160 MB ~3s ~1s
zstd -19 ~120 MB ~90s ~1s
lz4 ~350 MB ~1s ~0.5s
zstd at default level (-3) compresses nearly as well as gzip while being 4x faster.
Parallel Compression¶
Standard compression tools use one core. Parallel versions use all cores.
# pigz — parallel gzip (drop-in replacement)
tar -I pigz -cf backup.tar.gz /var/data/
# pbzip2 — parallel bzip2
tar -I pbzip2 -cf backup.tar.bz2 /var/data/
# xz with threads
tar -I "xz -T0" -cf backup.tar.xz /var/data/
# zstd with threads
tar -I "zstd -T0" -cf backup.tar.zst /var/data/
Install: apt install pigz pbzip2 or yum install pigz pbzip2.
On an 8-core system compressing 10GB: single-core gzip takes ~2min, pigz takes ~20s, zstd -T0 takes ~8s.
Backup Patterns with tar¶
Full Backup with Verification¶
BACKUP="/backups/app-$(date +%Y%m%d).tar.gz"
tar czf "$BACKUP" --exclude='*.tmp' --exclude='cache/*' /var/data/
tar tzf "$BACKUP" > /dev/null && echo "Verified" || echo "CORRUPTED"
sha256sum "$BACKUP" > "${BACKUP}.sha256"
Incremental Backup with Snapshot¶
SNAPSHOT="/var/backups/snapshot.snar"
DATE=$(date +%Y%m%d-%H%M%S)
if [ ! -f "$SNAPSHOT" ]; then
tar czf "/backups/full-${DATE}.tar.gz" \
--listed-incremental="$SNAPSHOT" /var/data/
else
tar czf "/backups/incr-${DATE}.tar.gz" \
--listed-incremental="$SNAPSHOT" /var/data/
fi
# Restore: full first, then each incremental in order
tar xzf full-*.tar.gz -C /var/restore/ --listed-incremental=/dev/null
tar xzf incr-1.tar.gz -C /var/restore/ --listed-incremental=/dev/null
tar xzf incr-2.tar.gz -C /var/restore/ --listed-incremental=/dev/null
Key Takeaways¶
- tar bundles files; compression tools shrink them. They are separate concerns combined with flags or pipes.
-ccreate,-xextract,-tlist.-fnames the file.-z/-j/-J/--zstdselect compression.- Modern GNU tar auto-detects compression on extraction —
tar xfworks for any format. - zstd is the modern default: better ratio than gzip at higher speed. Use it unless compatibility requires gzip.
-
Parallel compression (pigz, pbzip2, zstd -T0) cuts times by 4-8x on multi-core systems.
Default trap:
gzipdeletes the original file after compression by default. This catches people off guard. Usegzip -kto keep the original.zstddoes the opposite — it keeps the original by default. When scripting, always be explicit:gzip -korzstd --rmto avoid surprises. -
--strip-componentscontrols extraction paths and prevents tar bombs. --excludepatterns keep junk out of archives (node_modules, .git, logs, cache).- Always verify archives after creation:
tar tf archive.tar.gz > /dev/null. - zip for Windows compatibility. tar + zstd or tar + gzip for Linux-to-Linux.
Wiki Navigation¶
Related Content¶
- /proc Filesystem (Topic Pack, L2) — Linux Fundamentals
- Advanced Bash for Ops (Topic Pack, L1) — Linux Fundamentals
- Adversarial Interview Gauntlet (30 sequences) (Scenario, L2) — Linux Fundamentals
- Bash Exercises (Quest Ladder) (CLI) (Exercise Set, L0) — Linux Fundamentals
- Case Study: CI Pipeline Fails — Docker Layer Cache Corruption (Case Study, L2) — Linux Fundamentals
- Case Study: Container Vuln Scanner False Positive Blocks Deploy (Case Study, L2) — Linux Fundamentals
- Case Study: Disk Full Root Services Down (Case Study, L1) — Linux Fundamentals
- Case Study: Disk Full — Runaway Logs, Fix Is Loki Retention (Case Study, L2) — Linux Fundamentals
- Case Study: HPA Flapping — Metrics Server Clock Skew, Fix Is NTP (Case Study, L2) — Linux Fundamentals
- Case Study: Inode Exhaustion (Case Study, L1) — Linux Fundamentals