Skip to content

Symptoms: RAID Degraded Rebuild Latency

  • Production database server db-prod-07 (Dell PowerEdge R740) is experiencing severe I/O latency spikes.
  • Application team reports query response times have increased from ~5ms to 200-800ms intermittently.
  • Monitoring shows disk I/O wait at 40-60%, up from a baseline of 3-5%.
  • The server has a RAID-6 array (8x 2TB SAS drives) managed by md (Linux software RAID).
  • One drive (/dev/sdd) was replaced yesterday after a failure alert; the array is currently rebuilding.
  • During rebuild windows, some queries time out entirely, causing application-level 500 errors.
  • The issue started approximately 18 hours ago, shortly after the replacement drive was hot-swapped.
  • No other infrastructure changes were made in the past 72 hours.