Quiz: Monitoring Migration¶

4 questions

L0 (1 questions)¶

1. What are three key differences between legacy monitoring (Nagios/Zabbix) and modern monitoring (Prometheus)?

Show answer

1. Data model: legacy is check-based (pass/fail); Prometheus is metric-based (continuous values with labels).
2. Collection: legacy pushes or actively polls; Prometheus pulls/scrapes HTTP endpoints.
3. Querying: legacy uses static thresholds; Prometheus uses PromQL for dynamic, dimensional queries. Additionally, Prometheus is config-as-code (YAML) while legacy often uses GUI/custom config files.

L1 (1 questions)¶

1. What is the Nagios check_disk equivalent in Prometheus and how would you write the alert rule?

Show answer

Nagios check_disk maps to node_exporter metrics: node_filesystem_avail_bytes and node_filesystem_size_bytes. Alert rule: expr: (node_filesystem_avail_bytes{mountpoint='/'} / node_filesystem_size_bytes{mountpoint='/'}) * 100 < 20, for: 5m, severity: warning. This fires when free space on / drops below 20%, sustained for 5 minutes.

L2 (1 questions)¶

1. Why must monitoring migrations include a parallel-run phase, and how long should it last?

Show answer

Parallel running (both old and new systems active simultaneously) is essential for comparing alert fidelity: does the new system catch the same incidents as the old one? Run for at least 4-8 weeks to cover enough incident variety. During this phase, tune Prometheus thresholds, train the team on PromQL and Grafana, and track any incidents the legacy system catches that Prometheus misses (gaps to fix before cutover).

L3 (1 questions)¶

1. During a Nagios-to-Prometheus migration, how do you handle custom NRPE plugins and network SNMP monitoring?

Show answer

Custom NRPE plugins cannot be ported directly. Options: write a custom Prometheus exporter (expose metrics at /metrics), use the Pushgateway for batch jobs that run periodically, or use the script_exporter. For SNMP network devices: deploy snmp_exporter with MIB configuration files that map OIDs to Prometheus metrics. Do not leave network devices unmonitored during the transition — this is a common gap.