Grading Checklist: RAID Degraded Rebuild Latency¶
A good response must include:
- Checked
/proc/mdstatto confirm array state, rebuild progress, and estimated completion time - Reviewed
dmesg/smartctlfor signs of additional drive degradation - Identified the rebuild speed limits (
/proc/sys/dev/raid/speed_limit_minandspeed_limit_max) and recommended tuning - Considered adjusting
stripe_cache_sizeto balance rebuild speed vs. application I/O - Checked the I/O scheduler in use and evaluated whether a change would help
- Assessed the risk of a second drive failure during rebuild (RAID-6 tolerates it, RAID-5 does not)
- Proposed a plan to reduce production I/O load during rebuild (read replicas, failover, traffic shifting)
- Verified the replacement drive is healthy using
smartctl - Mentioned monitoring for URE (Unrecoverable Read Errors) during rebuild
- Documented a rollback or escalation plan if the rebuild fails or another drive drops
- Considered whether ionice or cgroup I/O throttling could help prioritize application I/O
- Communicated timeline and risk to stakeholders (DB team, application owners)