Skip to content

Grading Rubric

Criterion Strong (3) Adequate (2) Weak (1)
Identified misleading symptom Checked disk I/O after ruling out database-level blockers; recognized I/O saturation within 10 min Investigated PostgreSQL settings and queries first, then checked I/O Spent extended time on recovery parameters, vacuum, or query tuning
Found root cause in linux_ops domain Identified RAID degradation via /proc/mdstat and SMART failures Found the disk I/O issue but not the RAID degradation Assumed the disk was just slow (aging hardware) without checking RAID
Remediated in datacenter domain Replaced the failed disk, rebuilt RAID, verified SMART on new disk Identified the need for disk replacement but did not guide the rebuild Tried to tune PostgreSQL or increase replica resources instead
Cross-domain thinking Explained the full chain: disk failure -> RAID degradation -> I/O bottleneck -> WAL replay stall -> replication lag Acknowledged the hardware/database connection but missed the RAID detail Treated it as a single-domain database or Kubernetes issue

Prerequisite Topic Packs

  • database-ops — needed for Domain A investigation (PostgreSQL replication, WAL, pg_stat_replication)
  • k8s-storage — needed for Domain A (PVC, local storage, node storage)
  • disk-and-storage-ops — needed for Domain B root cause (iostat, RAID, SMART)
  • disk-and-storage-ops — needed for Domain C remediation (RAID rebuild, disk replacement)
  • linux-performance — needed for Domain B (I/O analysis, disk utilization)