Skip to content

Portal | Level: L2: Operations | Topics: Storage (SAN/NAS/DAS), Filesystems & Storage, RAID | Domain: Datacenter & Hardware

Storage Operations - Primer

Why This Matters

Storage is where data lives. Everything else — compute, networking, orchestration — exists to serve data. When storage fails, data is lost or corrupted. When storage is slow, everything is slow. Understanding storage at the operational level — LVM, filesystems, RAID, NFS, iSCSI, and distributed storage — is essential for anyone managing production infrastructure.

Storage Layers

Application
  ↓ writes to path
Filesystem (ext4, XFS, ZFS, BTRFS)
  ↓ translates to blocks
Volume Manager (LVM, ZFS zpool)
  ↓ maps logical to physical
Block Device (partition, RAID array, iSCSI LUN, NVMe)
  ↓ reads/writes sectors
Physical Disk (HDD, SSD, NVMe)

Understanding this stack is critical for debugging. A "slow write" could be at any layer.

LVM (Logical Volume Manager)

Name origin: LVM was first implemented for HP-UX in 1990 and ported to Linux by Heinz Mauelshagen in 1998. The Linux implementation (LVM2) uses the device-mapper kernel framework. The terms PV, VG, and LV come directly from the original HP-UX design — a rare case where enterprise Unix terminology survived unchanged into Linux.

Remember: Mnemonic for the LVM stack: PVGPhysical volumes (raw disks) pour into Volume Groups (the pool), which you carve into Logical volumes (the slices you mount). Think of it like a swimming pool: PVs are the water sources, VGs are the pool, LVs are the lanes.

LVM adds a flexible abstraction between physical disks and filesystems.

Core Concepts

Physical Volumes (PVs)     → Actual disks or partitions
  └─→ Volume Groups (VGs)  → Pool of storage from one or more PVs
       └─→ Logical Volumes (LVs) → Virtual partitions carved from the VG
            └─→ Filesystem        → ext4/XFS mounted on the LV

Essential LVM Commands

# View current layout
pvs               # Physical volumes
vgs               # Volume groups
lvs               # Logical volumes
lsblk             # Block device tree view

# Create from scratch
pvcreate /dev/sdb /dev/sdc                    # Initialize disks
vgcreate data_vg /dev/sdb /dev/sdc            # Create volume group
lvcreate -L 100G -n app_data data_vg          # Create 100G logical volume
mkfs.xfs /dev/data_vg/app_data                # Format
mount /dev/data_vg/app_data /data             # Mount

# Extend a volume (online, no downtime)
lvextend -L +50G /dev/data_vg/app_data        # Add 50G to LV
xfs_growfs /data                              # Grow XFS filesystem
# or for ext4:
resize2fs /dev/data_vg/app_data               # Grow ext4 filesystem

LVM Snapshots

# Create a snapshot (for backups or testing)
lvcreate -s -L 10G -n app_snap /dev/data_vg/app_data

# Mount the snapshot read-only
mount -o ro /dev/data_vg/app_snap /mnt/snapshot

# Restore from snapshot (destructive!)
lvconvert --merge /dev/data_vg/app_snap
# Requires unmount and reactivation

Filesystems

Filesystem Comparison

Feature ext4 XFS BTRFS ZFS
Max size 1 EB 8 EB 16 EB 256 ZB
Online shrink Yes No Yes No
Online grow Yes Yes Yes Yes
Snapshots No (use LVM) No (use LVM) Built-in Built-in
Checksums Metadata only Metadata only Data + metadata Data + metadata
Compression No No Yes (zstd, lzo) Yes (lz4, zstd)
Dedup No No Yes (offline) Yes (inline)
Best for General, small files Large files, DBs Flexible storage Data integrity

Filesystem Health

# Check filesystem usage
df -hT                         # Usage with filesystem type
df -i                          # Inode usage (can exhaust before space)

# Check filesystem health
xfs_repair -n /dev/sda1        # XFS dry-run check
e2fsck -n /dev/sda1            # ext4 dry-run check (must be unmounted)
btrfs scrub start /data        # BTRFS online integrity check

# Find what's eating disk space
du -xsh /* | sort -rh | head   # Top directories
find / -xdev -type f -size +1G # Files over 1 GB

RAID

RAID Levels for Ops

Level Drives Capacity Fault Tolerance Use Case
RAID 0 2+ N × size None Temp/scratch only
RAID 1 2 1 × size 1 drive Boot drives, OS
RAID 5 3+ (N-1) × size 1 drive General storage
RAID 6 4+ (N-2) × size 2 drives Large arrays
RAID 10 4+ N/2 × size 1 per mirror pair Databases, high IOPS

Gotcha: RAID 5 with large drives (4TB+) is increasingly dangerous. Rebuild times on big drives can exceed 24 hours, and during that window a second drive failure kills the array. With modern drive sizes, RAID 6 or RAID 10 is strongly preferred. The probability of an Unrecoverable Read Error (URE) during a multi-terabyte rebuild is non-trivial — the spec rate of 1 URE per 10^14 bits means roughly one error per 12.5 TB read.

Fun fact: RAID was coined by David Patterson, Garth Gibson, and Randy Katz at UC Berkeley in their 1988 paper. The "I" originally stood for "Inexpensive" — the idea was to replace one expensive mainframe disk with many cheap commodity drives. The industry later rebranded it to "Independent" because vendors did not want to call their products "inexpensive."

Hardware RAID (MegaRAID)

# Install MegaCLI or storcli
storcli64 /c0 show                    # Controller overview
storcli64 /c0/v0 show                 # Virtual drive (RAID array) details
storcli64 /c0/e0/s0 show all          # Physical drive details

# Check for degraded arrays
storcli64 /c0/v0 show | grep -i state
# Optimal = healthy, Degraded = disk failed, Offline = critical

# Check rebuild progress
storcli64 /c0/v0 show rebuild

Software RAID (mdadm)

# Create RAID 1
mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sdb /dev/sdc

# Check status
cat /proc/mdstat
mdadm --detail /dev/md0

# Replace a failed disk
mdadm /dev/md0 --fail /dev/sdc
mdadm /dev/md0 --remove /dev/sdc
# Physically replace disk, then:
mdadm /dev/md0 --add /dev/sdd
# Monitor rebuild:
watch cat /proc/mdstat

Network Storage

NFS

# Server: export a directory
echo '/data/shared 10.0.1.0/24(rw,sync,no_subtree_check,no_root_squash)' >> /etc/exports
exportfs -ra

# Client: mount
mount -t nfs nfs-server:/data/shared /mnt/shared

# Persistent mount (fstab)
# nfs-server:/data/shared /mnt/shared nfs defaults,hard,intr,timeo=600 0 0

# Troubleshooting
showmount -e nfs-server        # List exports
nfsstat -c                     # Client statistics
nfsstat -s                     # Server statistics
mount | grep nfs               # Current NFS mounts

iSCSI

# Discover targets
iscsiadm -m discovery -t sendtargets -p 10.0.1.100

# Login to target
iscsiadm -m node -T iqn.2024-01.com.example:storage -p 10.0.1.100 --login

# Check connected sessions
iscsiadm -m session

# The iSCSI LUN appears as a local block device
lsblk   # Look for new sdX device

# Persistent login (survives reboot)
iscsiadm -m node -T iqn.2024-01.com.example:storage -p 10.0.1.100 \
    --op update -n node.startup -v automatic

Distributed Storage

Portworx

Portworx provides software-defined storage for Kubernetes:

# Check Portworx cluster status
pxctl status
pxctl cluster list

# Volume operations
pxctl volume list
pxctl volume inspect <vol-id>
pxctl volume create mydata --size 100 --repl 3

# Check alerts
pxctl alerts show

MinIO (S3-Compatible Object Storage)

# Check MinIO status
mc admin info myminio

# Bucket operations
mc ls myminio/
mc mb myminio/backups
mc cp /data/backup.tar.gz myminio/backups/

# Check disk health
mc admin heal myminio/ --dry-run

SMART Monitoring

Debug clue: When df -h shows space available but writes fail with "No space left on device," check df -i for inode exhaustion. Millions of tiny files (e.g., session files, mail queue) can consume all inodes while barely using any disk space. The fix is either cleaning up the files or reformatting with more inodes (mkfs.ext4 -N).

SMART data predicts disk failures before they happen:

# Check disk health
smartctl -a /dev/sda

# Key attributes to watch
smartctl -A /dev/sda | grep -E 'Reallocated|Pending|Uncorrectable|Wear_Leveling'
# Reallocated_Sector_Ct > 0  → disk is remapping bad sectors
# Current_Pending_Sector > 0 → sectors waiting to be remapped
# Offline_Uncorrectable > 0  → unrecoverable errors found

# Run a self-test
smartctl -t short /dev/sda     # ~2 minutes
smartctl -t long /dev/sda      # Hours, thorough

# Check test results
smartctl -l selftest /dev/sda

Storage Performance

# Quick I/O benchmark
dd if=/dev/zero of=/data/testfile bs=1M count=1024 oflag=direct
dd if=/data/testfile of=/dev/null bs=1M iflag=direct

# Proper benchmark with fio
fio --name=randwrite --ioengine=libaio --direct=1 --bs=4k \
    --size=1G --numjobs=4 --rw=randwrite --group_reporting \
    --filename=/data/fiotest

# Monitor I/O in real-time
iostat -xz 1                   # Extended I/O stats, 1-second interval
iotop -oPa                     # Per-process I/O (accumulated)

# Check I/O scheduler
cat /sys/block/sda/queue/scheduler
# For SSDs, use 'none' or 'mq-deadline'
# For HDDs, use 'bfq' or 'mq-deadline'

Wiki Navigation

Prerequisites

Next Steps