Skip to content

Pattern: Device Name Confusion

ID: FP-048 Family: Human Error Amplifier Frequency: Common Blast Radius: Single Host Detection Difficulty: Obvious (but irreversible)

The Shape

Linux device names (like /dev/sdb, /dev/sdc) are not persistent — they're assigned by the kernel at boot based on detection order. Adding a new disk can shift existing device names. An engineer who knew /dev/sdb was the data disk runs fdisk /dev/sdb or mkfs.ext4 /dev/sdb1 after a reboot or new disk addition, only to find the target was /dev/sdc (the new disk took sdb). The command executes on the wrong disk; data on the original /dev/sdb is destroyed.

How You'll See It

In Linux/Infrastructure

# Engineer expects /dev/sdb to be the data disk (it was, before reboot)
$ fdisk /dev/sdb
# Wipes partition table on what is now a different disk
After adding a new disk for expansion, the system rebooted. The new disk was detected first and assigned /dev/sdb. The data disk is now /dev/sdc. fdisk rewrites the wrong disk's partition table. Data is gone.

In Kubernetes

Local persistent volume nodes that use device paths directly. Node maintenance causes a disk to be reattached in a different order. The PV's local.path field points to /dev/sdb — the data disk. After reboot, a second disk became /dev/sdb. Kubernetes mounts the wrong disk; application starts writing data to the wrong (possibly empty or wrong-filesystem) device.

In Datacenter

A server technician connects a new SAS disk. The order of SAS enumeration places the new disk at sdb, shifting the data disk to sdc. An automated provisioning script (written with hardcoded device paths) runs and partitions the wrong device.

The Tell

Device path (/dev/sdb) was used in a command after a system change (reboot, new disk added, disk replaced). Data loss occurred on a disk that was not intended to be modified. lsblk or ls -la /dev/disk/by-id/ shows the mapping between device names and stable identifiers.

Common Misdiagnosis

Looks Like But Actually How to Tell the Difference
Hardware failure Wrong device modified Device is healthy; data was overwritten by a command on the wrong device
Command error Device name changed Command was correct; device name was wrong after reboot/change

The Fix (Generic)

  1. Immediate: Stop all writes to the disk; attempt data recovery (testdisk, photorec) before any further access.
  2. Short-term: Always use persistent device identifiers: /dev/disk/by-id/, /dev/disk/by-uuid/, or /dev/disk/by-path/. Run lsblk -o NAME,SIZE,TYPE,MOUNTPOINT,UUID before any disk operation.
  3. Long-term: Never hardcode device names in scripts or fstab; use UUID or disk label (LABEL=data-disk) in /etc/fstab; validate device identity before any destructive operation: lsblk -o NAME,SIZE,SERIAL,MODEL.

Real-World Examples

  • Example 1: Admin added a new SSD for OS speed improvement. Rebooted. New SSD became /dev/sda; original OS disk became /dev/sdb. Admin ran dd if=/dev/zero of=/dev/sdb intending to wipe the new (empty) SSD. Wiped the production OS disk instead.
  • Example 2: Automated provisioning script with mkfs.ext4 /dev/sdb1. A previous server in the fleet had /dev/sdb as an empty disk; this server had data there. Script ran without checking; data gone.

War Story

We were expanding storage on a running server. The new disk was hot-plugged. I ran lsblk immediately after, noted /dev/sdb was the new empty disk (3TB, no partitions). I started partitioning. Finished. Tried to mount the filesystem: existing ext4 filesystem (from the production data disk). The new disk was /dev/sdc; the hot-plug reordered /dev/sdb to be the production data disk. I had re-partitioned the production data disk. testdisk recovered the partition table (no data loss — I hadn't formatted, just repartitioned). Lesson: always lsblk -o NAME,SIZE,SERIAL,MODEL and verify the serial number matches the new disk's serial from the physical label.

Cross-References