Skip to content

300+ DevOps learning assets — start with the Portal

GrokDevOps Wiki

Grading Checklist: RAID Degraded Rebuild Latency

grokdatum/grokdevops

Home
Portal
Practice
Lessons
Ansible
Linux
Networking
Datacenter & Hardware
Kubernetes
DevOps & Tooling
CLI Tools
Observability
Security
Cloud
Deep Dives
Cheatsheets
Guides

GrokDevOps Wiki

grokdatum/grokdevops

Home
Portal
Practice
Lessons
Ansible
Linux
Networking
Datacenter & Hardware
Kubernetes
DevOps & Tooling
CLI Tools
Observability
Security
Cloud
Deep Dives
Cheatsheets
Guides

L2 case-study datacenter raid

Grading Checklist: RAID Degraded Rebuild Latency¶

A good response must include:

Checked /proc/mdstat to confirm array state, rebuild progress, and estimated completion time
Reviewed dmesg / smartctl for signs of additional drive degradation
Identified the rebuild speed limits (/proc/sys/dev/raid/speed_limit_min and speed_limit_max) and recommended tuning
Considered adjusting stripe_cache_size to balance rebuild speed vs. application I/O
Checked the I/O scheduler in use and evaluated whether a change would help
Assessed the risk of a second drive failure during rebuild (RAID-6 tolerates it, RAID-5 does not)
Proposed a plan to reduce production I/O load during rebuild (read replicas, failover, traffic shifting)
Verified the replacement drive is healthy using smartctl
Mentioned monitoring for URE (Unrecoverable Read Errors) during rebuild
Documented a rollback or escalation plan if the rebuild fails or another drive drops
Considered whether ionice or cgroup I/O throttling could help prioritize application I/O
Communicated timeline and risk to stakeholders (DB team, application owners)

March 17, 2026 22:39:44 March 5, 2026 18:35:55

Made with Material for MkDocs