Disk Troubleshooting¶

11 cards — 🟢 3 easy | 🟡 5 medium | 🔴 3 hard

🟢 Easy (3)¶

1. How do you display the block device hierarchy including disk types, sizes, and mount points?

Show answer

lsblk shows all block devices in a tree format. It displays the relationship between physical disks, partitions, LVM logical volumes, and their mount points. Add -f to also show filesystem types and UUIDs.

2. What does the blkid command show, and why is it important for troubleshooting?

Show answer

blkid shows filesystem UUIDs, types (ext4, xfs, etc.), and labels for all block devices. It is critical for troubleshooting because /etc/fstab should use UUIDs (not /dev/sdX names which can change between reboots), and blkid helps you identify which device corresponds to which UUID.

3. What is the difference between df and du, and when would you use each for disk troubleshooting?

Show answer

df -h shows filesystem-level usage (total, used, available, mount point) -- use it to identify which filesystem is full. du -sh /path/* shows directory-level usage -- use it to drill down and find which directories or files are consuming the most space. They complement each other: df for the overview, du for the details.

🟡 Medium (5)¶

1. How do you use iostat to diagnose disk I/O performance problems?

Show answer

iostat -xz 1 shows per-device I/O statistics updated every second. Key columns: %util (device utilization -- >70% on spinning disk means saturated), await (average I/O latency in ms -- what apps feel), r/s and w/s (read/write operations per second), rrqm/s and wrqm/s (merged requests). On NVMe, %util is misleading due to parallel queues.

2. How do you identify which process is generating the most disk I/O?

Show answer

Use iotop -oP to show per-process I/O in real-time, displaying only processes actually performing I/O (-o flag). It shows read and write bandwidth per process, helping you pinpoint which application is causing disk pressure.

3. A server fails to boot because of an /etc/fstab entry. What are the likely causes and how do you fix it?

Show answer

Common causes: wrong UUID (disk replaced or reformatted), missing device, typo in mount point or filesystem type. Fix: boot into single-user mode or rescue media, edit /etc/fstab to correct the entry (use blkid to verify UUIDs), or add nofail mount option to prevent boot failure on missing devices. Always test with mount -a after editing fstab.

4. Why do /dev/sdX device names change between reboots, and what should you use instead?

Show answer

/dev/sdX names are assigned by the kernel based on device detection order, which can change when disks are added, removed, or detected in a different sequence. Use UUIDs (from blkid) or filesystem labels in /etc/fstab and scripts. UUIDs are tied to the filesystem and persist regardless of detection order.

5. How do you quickly check the status of LVM physical volumes, volume groups, and logical volumes?

Show answer

Use pvs (Physical Volume summary), vgs (Volume Group summary), and lvs (Logical Volume summary). These give a one-line-per-item overview of your LVM configuration including sizes, free space, and status. For detailed info, use pvdisplay, vgdisplay, and lvdisplay.

🔴 Hard (3)¶

1. What is the Device Mapper in Linux, and which storage technologies depend on it?

Show answer

Device Mapper is a kernel framework that provides a generic way to create virtual block devices mapped onto real ones. It underpins LVM (logical volume management), dm-crypt (disk encryption via LUKS), and multipath (redundant storage paths). Understanding Device Mapper helps when debugging why a device is not appearing or performing as expected.

2. You need to replace a failed disk in a Linux software RAID array. What is the general procedure?

Show answer

1) Identify the failed disk with mdadm --detail /dev/mdX or cat /proc/mdstat. 2) Remove the failed disk: mdadm --manage /dev/mdX --remove /dev/sdY. 3) Physically replace the disk. 4) Partition the new disk to match the array layout. 5) Add the new disk: mdadm --manage /dev/mdX --add /dev/sdZ. 6) Monitor rebuild: watch cat /proc/mdstat. The array runs degraded during rebuild.

3. How do you check and change the I/O scheduler for a block device, and which scheduler is best for NVMe vs spinning disk?

Show answer

Check: cat /sys/block/sda/queue/scheduler. Change: echo deadline > /sys/block/sda/queue/scheduler. For spinning disks, deadline or mq-deadline reduces latency for database workloads. For NVMe, use none (no scheduler) since NVMe has its own internal parallelism and scheduling. Set nr_requests to increase queue depth for high-throughput workloads.