Portal | Level: L2: Operations | Topics: Storage (SAN/NAS/DAS), Filesystems & Storage, RAID | Domain: Datacenter & Hardware
Storage Operations - Primer¶
Why This Matters¶
Storage is where data lives. Everything else — compute, networking, orchestration — exists to serve data. When storage fails, data is lost or corrupted. When storage is slow, everything is slow. Understanding storage at the operational level — LVM, filesystems, RAID, NFS, iSCSI, and distributed storage — is essential for anyone managing production infrastructure.
Storage Layers¶
Application
↓ writes to path
Filesystem (ext4, XFS, ZFS, BTRFS)
↓ translates to blocks
Volume Manager (LVM, ZFS zpool)
↓ maps logical to physical
Block Device (partition, RAID array, iSCSI LUN, NVMe)
↓ reads/writes sectors
Physical Disk (HDD, SSD, NVMe)
Understanding this stack is critical for debugging. A "slow write" could be at any layer.
LVM (Logical Volume Manager)¶
Name origin: LVM was first implemented for HP-UX in 1990 and ported to Linux by Heinz Mauelshagen in 1998. The Linux implementation (LVM2) uses the device-mapper kernel framework. The terms PV, VG, and LV come directly from the original HP-UX design — a rare case where enterprise Unix terminology survived unchanged into Linux.
Remember: Mnemonic for the LVM stack: PVG — Physical volumes (raw disks) pour into Volume Groups (the pool), which you carve into Logical volumes (the slices you mount). Think of it like a swimming pool: PVs are the water sources, VGs are the pool, LVs are the lanes.
LVM adds a flexible abstraction between physical disks and filesystems.
Core Concepts¶
Physical Volumes (PVs) → Actual disks or partitions
└─→ Volume Groups (VGs) → Pool of storage from one or more PVs
└─→ Logical Volumes (LVs) → Virtual partitions carved from the VG
└─→ Filesystem → ext4/XFS mounted on the LV
Essential LVM Commands¶
# View current layout
pvs # Physical volumes
vgs # Volume groups
lvs # Logical volumes
lsblk # Block device tree view
# Create from scratch
pvcreate /dev/sdb /dev/sdc # Initialize disks
vgcreate data_vg /dev/sdb /dev/sdc # Create volume group
lvcreate -L 100G -n app_data data_vg # Create 100G logical volume
mkfs.xfs /dev/data_vg/app_data # Format
mount /dev/data_vg/app_data /data # Mount
# Extend a volume (online, no downtime)
lvextend -L +50G /dev/data_vg/app_data # Add 50G to LV
xfs_growfs /data # Grow XFS filesystem
# or for ext4:
resize2fs /dev/data_vg/app_data # Grow ext4 filesystem
LVM Snapshots¶
# Create a snapshot (for backups or testing)
lvcreate -s -L 10G -n app_snap /dev/data_vg/app_data
# Mount the snapshot read-only
mount -o ro /dev/data_vg/app_snap /mnt/snapshot
# Restore from snapshot (destructive!)
lvconvert --merge /dev/data_vg/app_snap
# Requires unmount and reactivation
Filesystems¶
Filesystem Comparison¶
| Feature | ext4 | XFS | BTRFS | ZFS |
|---|---|---|---|---|
| Max size | 1 EB | 8 EB | 16 EB | 256 ZB |
| Online shrink | Yes | No | Yes | No |
| Online grow | Yes | Yes | Yes | Yes |
| Snapshots | No (use LVM) | No (use LVM) | Built-in | Built-in |
| Checksums | Metadata only | Metadata only | Data + metadata | Data + metadata |
| Compression | No | No | Yes (zstd, lzo) | Yes (lz4, zstd) |
| Dedup | No | No | Yes (offline) | Yes (inline) |
| Best for | General, small files | Large files, DBs | Flexible storage | Data integrity |
Filesystem Health¶
# Check filesystem usage
df -hT # Usage with filesystem type
df -i # Inode usage (can exhaust before space)
# Check filesystem health
xfs_repair -n /dev/sda1 # XFS dry-run check
e2fsck -n /dev/sda1 # ext4 dry-run check (must be unmounted)
btrfs scrub start /data # BTRFS online integrity check
# Find what's eating disk space
du -xsh /* | sort -rh | head # Top directories
find / -xdev -type f -size +1G # Files over 1 GB
RAID¶
RAID Levels for Ops¶
| Level | Drives | Capacity | Fault Tolerance | Use Case |
|---|---|---|---|---|
| RAID 0 | 2+ | N × size | None | Temp/scratch only |
| RAID 1 | 2 | 1 × size | 1 drive | Boot drives, OS |
| RAID 5 | 3+ | (N-1) × size | 1 drive | General storage |
| RAID 6 | 4+ | (N-2) × size | 2 drives | Large arrays |
| RAID 10 | 4+ | N/2 × size | 1 per mirror pair | Databases, high IOPS |
Gotcha: RAID 5 with large drives (4TB+) is increasingly dangerous. Rebuild times on big drives can exceed 24 hours, and during that window a second drive failure kills the array. With modern drive sizes, RAID 6 or RAID 10 is strongly preferred. The probability of an Unrecoverable Read Error (URE) during a multi-terabyte rebuild is non-trivial — the spec rate of 1 URE per 10^14 bits means roughly one error per 12.5 TB read.
Fun fact: RAID was coined by David Patterson, Garth Gibson, and Randy Katz at UC Berkeley in their 1988 paper. The "I" originally stood for "Inexpensive" — the idea was to replace one expensive mainframe disk with many cheap commodity drives. The industry later rebranded it to "Independent" because vendors did not want to call their products "inexpensive."
Hardware RAID (MegaRAID)¶
# Install MegaCLI or storcli
storcli64 /c0 show # Controller overview
storcli64 /c0/v0 show # Virtual drive (RAID array) details
storcli64 /c0/e0/s0 show all # Physical drive details
# Check for degraded arrays
storcli64 /c0/v0 show | grep -i state
# Optimal = healthy, Degraded = disk failed, Offline = critical
# Check rebuild progress
storcli64 /c0/v0 show rebuild
Software RAID (mdadm)¶
# Create RAID 1
mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sdb /dev/sdc
# Check status
cat /proc/mdstat
mdadm --detail /dev/md0
# Replace a failed disk
mdadm /dev/md0 --fail /dev/sdc
mdadm /dev/md0 --remove /dev/sdc
# Physically replace disk, then:
mdadm /dev/md0 --add /dev/sdd
# Monitor rebuild:
watch cat /proc/mdstat
Network Storage¶
NFS¶
# Server: export a directory
echo '/data/shared 10.0.1.0/24(rw,sync,no_subtree_check,no_root_squash)' >> /etc/exports
exportfs -ra
# Client: mount
mount -t nfs nfs-server:/data/shared /mnt/shared
# Persistent mount (fstab)
# nfs-server:/data/shared /mnt/shared nfs defaults,hard,intr,timeo=600 0 0
# Troubleshooting
showmount -e nfs-server # List exports
nfsstat -c # Client statistics
nfsstat -s # Server statistics
mount | grep nfs # Current NFS mounts
iSCSI¶
# Discover targets
iscsiadm -m discovery -t sendtargets -p 10.0.1.100
# Login to target
iscsiadm -m node -T iqn.2024-01.com.example:storage -p 10.0.1.100 --login
# Check connected sessions
iscsiadm -m session
# The iSCSI LUN appears as a local block device
lsblk # Look for new sdX device
# Persistent login (survives reboot)
iscsiadm -m node -T iqn.2024-01.com.example:storage -p 10.0.1.100 \
--op update -n node.startup -v automatic
Distributed Storage¶
Portworx¶
Portworx provides software-defined storage for Kubernetes:
# Check Portworx cluster status
pxctl status
pxctl cluster list
# Volume operations
pxctl volume list
pxctl volume inspect <vol-id>
pxctl volume create mydata --size 100 --repl 3
# Check alerts
pxctl alerts show
MinIO (S3-Compatible Object Storage)¶
# Check MinIO status
mc admin info myminio
# Bucket operations
mc ls myminio/
mc mb myminio/backups
mc cp /data/backup.tar.gz myminio/backups/
# Check disk health
mc admin heal myminio/ --dry-run
SMART Monitoring¶
Debug clue: When
df -hshows space available but writes fail with "No space left on device," checkdf -ifor inode exhaustion. Millions of tiny files (e.g., session files, mail queue) can consume all inodes while barely using any disk space. The fix is either cleaning up the files or reformatting with more inodes (mkfs.ext4 -N).
SMART data predicts disk failures before they happen:
# Check disk health
smartctl -a /dev/sda
# Key attributes to watch
smartctl -A /dev/sda | grep -E 'Reallocated|Pending|Uncorrectable|Wear_Leveling'
# Reallocated_Sector_Ct > 0 → disk is remapping bad sectors
# Current_Pending_Sector > 0 → sectors waiting to be remapped
# Offline_Uncorrectable > 0 → unrecoverable errors found
# Run a self-test
smartctl -t short /dev/sda # ~2 minutes
smartctl -t long /dev/sda # Hours, thorough
# Check test results
smartctl -l selftest /dev/sda
Storage Performance¶
# Quick I/O benchmark
dd if=/dev/zero of=/data/testfile bs=1M count=1024 oflag=direct
dd if=/data/testfile of=/dev/null bs=1M iflag=direct
# Proper benchmark with fio
fio --name=randwrite --ioengine=libaio --direct=1 --bs=4k \
--size=1G --numjobs=4 --rw=randwrite --group_reporting \
--filename=/data/fiotest
# Monitor I/O in real-time
iostat -xz 1 # Extended I/O stats, 1-second interval
iotop -oPa # Per-process I/O (accumulated)
# Check I/O scheduler
cat /sys/block/sda/queue/scheduler
# For SSDs, use 'none' or 'mq-deadline'
# For HDDs, use 'bfq' or 'mq-deadline'
Wiki Navigation¶
Prerequisites¶
- Linux Ops (Topic Pack, L0)
Next Steps¶
- Ceph Storage (Topic Pack, L2)
- Disaster Recovery & Backup Engineering (Topic Pack, L2)
- S3-Compatible Object Storage (Topic Pack, L1)
Related Content¶
- Case Study: NVMe Drive Disappeared (Case Study, L2) — Filesystems & Storage, Storage (SAN/NAS/DAS)
- Deep Dive: RAID and Storage Internals (deep_dive, L2) — RAID, Storage (SAN/NAS/DAS)
- Disk & Storage Ops (Topic Pack, L1) — Filesystems & Storage, RAID
- Case Study: Backup Job Failing — iSCSI Target Unreachable, VLAN Misconfigured (Case Study, L2) — Storage (SAN/NAS/DAS)
- Case Study: Database Replication Lag — Root Cause Is RAID Degradation (Case Study, L2) — RAID
- Case Study: Disk Full Root Services Down (Case Study, L1) — Filesystems & Storage
- Case Study: HBA Firmware Mismatch (Case Study, L2) — Storage (SAN/NAS/DAS)
- Case Study: OS Install Fails RAID Controller (Case Study, L2) — RAID
- Case Study: RAID Degraded Rebuild Latency (Case Study, L2) — RAID
- Case Study: Runaway Logs Fill Disk (Case Study, L1) — Filesystems & Storage
Pages that link here¶
- Anti-Primer: Storage Ops
- Ceph Storage
- Disaster Recovery & Backup Engineering
- Incident Replay: HBA Firmware Mismatch
- Incident Replay: NVMe Drive Disappeared
- Incident Replay: RAID Degraded — Rebuild Latency
- Linux Filesystem Internals
- Master Curriculum: 40 Weeks
- NVMe Drive Disappeared After Reboot
- OS Installation Cannot See Disks
- RAID Degraded Rebuild Latency
- RAID and Storage Internals
- Runbook: Disk Full
- S3-Compatible Object Storage
- Storage Operations