Monitoring Migration¶
10 cards — 🟢 3 easy | 🟡 4 medium | 🔴 3 hard
🟢 Easy (3)¶
1. What is the fundamental difference between Nagios-style and Prometheus-style monitoring?
Show answer
Nagios uses check-based (pass/fail) monitoring with active polling or push. Prometheus uses metric-based monitoring (continuous numerical values) with a pull/scrape model, multi-dimensional labels, and PromQL for dynamic queries.2. What is node_exporter, and what type of metrics does it provide?
Show answer
node_exporter is a Prometheus exporter that runs on each host (port 9100) and exposes OS-level metrics: CPU, memory, disk, network, and I/O. It replaces Nagios checks like check_disk, check_load, and check_mem.3. What is the blackbox_exporter used for in a Prometheus deployment?
Show answer
The blackbox_exporter performs endpoint probes (HTTP, TCP, ICMP, DNS) from outside the target, replacing Nagios checks like check_http, check_tcp, and check_ping. It exposes metrics like probe_success and probe_http_status_code.🟡 Medium (4)¶
1. What are the five phases of a monitoring migration, and why is the parallel run phase the longest?
Show answer
Assessment (2-4 weeks), Foundation (2-4 weeks), Parallel Run (4-8 weeks), Cutover (1-2 weeks), Decommission (2-4 weeks). The parallel run is longest because both systems must monitor simultaneously to compare alert fidelity, tune Prometheus thresholds, train the team, and build confidence in the new system before committing.2. How do you translate a Nagios "check_disk -w 20% -c 10%" check to a Prometheus alerting rule?
Show answer
Use PromQL: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 20 with "for: 5m" and severity: warning. This continuously evaluates disk usage rather than running a point-in-time check script.3. In a Zabbix-to-Prometheus migration, what replaces Zabbix triggers, templates, and host groups?
Show answer
Zabbix triggers become Prometheus alerting rules (PromQL expressions). Templates become recording rules plus Grafana dashboards. Host groups become labels (job, environment, team) which provide multi-dimensional grouping.4. Should you migrate historical metric data from the legacy system to Prometheus? Why or why not?
Show answer
Generally no — the data models are fundamentally different (check results vs time series), making direct migration impractical. Options: accept the data break, keep the legacy system read-only for N months for historical queries, or export key metrics to CSV for compliance/reporting.🔴 Hard (3)¶
1. How should custom Nagios NRPE plugins be handled during a migration to Prometheus?
Show answer
Custom NRPE plugins (typically shell scripts checking specific things) should NOT be ported directly. They need to become either custom Prometheus exporters (that expose metrics on an HTTP endpoint) or pushgateway jobs (for batch/cron jobs that run and push results). This transforms point-in-time checks into continuous metric collection.2. Why is "big-bang cutover" dangerous in a monitoring migration, and what should you do instead?
Show answer
Turning off Nagios and enabling Prometheus on the same day risks missing incidents because the new system has untested coverage gaps. Instead, run both systems in parallel for at least 4 weeks, comparing alert fidelity to ensure Prometheus catches the same incidents Nagios does, then gradually cut over alerting before decommissioning legacy.3. What challenge does network device monitoring present during a monitoring migration?