Skip to content

Grading Rubric

Criterion Strong (3) Adequate (2) Weak (1)
Identified misleading symptom Quickly found disk consumers with du; recognized Loki + verbose logging as the cause, not a generic disk issue Found the large directories but took time to connect Loki retention to the problem Focused on logrotate, tmp files, or core dumps; missed the Loki storage angle
Found root cause in observability domain Identified both Loki retention disabled and event-processor DEBUG logging Found one of the two issues but not both Assumed it was purely a Linux disk management problem
Remediated in devops_tooling domain Updated Helm values for both Loki retention and log level; cleaned up stale data Fixed one component via Helm but cleaned up the other manually Manually deleted files without fixing the underlying configuration
Cross-domain thinking Explained how observability infrastructure competes for node resources and how Helm config drives Loki behavior Acknowledged multiple systems were involved Treated it as a single-domain disk space issue

Prerequisite Topic Packs

  • disk-and-storage-ops — needed for Domain A investigation (df, du, disk usage analysis)
  • linux-logging — needed for Domain A investigation (container logs, log rotation)
  • log-pipelines — needed for Domain B root cause (Loki architecture, retention, ingester storage)
  • helm — needed for Domain C remediation (Helm values, upgrades)
  • k8s-node-lifecycle — needed for understanding DiskPressure taint and pod eviction