Symptoms: RAID Degraded Rebuild Latency¶
- Production database server
db-prod-07(Dell PowerEdge R740) is experiencing severe I/O latency spikes. - Application team reports query response times have increased from ~5ms to 200-800ms intermittently.
- Monitoring shows disk I/O wait at 40-60%, up from a baseline of 3-5%.
- The server has a RAID-6 array (8x 2TB SAS drives) managed by
md(Linux software RAID). - One drive (
/dev/sdd) was replaced yesterday after a failure alert; the array is currently rebuilding. - During rebuild windows, some queries time out entirely, causing application-level 500 errors.
- The issue started approximately 18 hours ago, shortly after the replacement drive was hot-swapped.
- No other infrastructure changes were made in the past 72 hours.