Monitoring Migration — Trivia & Interesting Facts¶

Surprising, historical, and little-known facts about monitoring migration.

Most monitoring migrations take 6-18 months longer than planned¶

Industry surveys consistently find that monitoring platform migrations are among the most underestimated infrastructure projects. A typical migration from a legacy system (Nagios, Zabbix) to a modern stack (Prometheus/Grafana, Datadog) takes 12-24 months, while most teams initially estimate 3-6 months. The primary delay is not technical migration but replicating institutional knowledge encoded in thousands of alert rules and dashboards.

Datadog grew from $100M to $2B in annual revenue in just 4 years¶

Datadog's explosive growth from 2019-2023 was largely fueled by organizations migrating from self-hosted monitoring to SaaS. The migration wave was so strong that Datadog went from $100 million to over $2 billion in annual recurring revenue in approximately four years. This growth demonstrated that many organizations preferred paying premium SaaS prices over the operational burden of self-hosted monitoring.

Running two monitoring systems in parallel is the norm during migration¶

The recommended practice during monitoring migration is to run the old and new systems simultaneously for months. This "dual-running" phase typically doubles monitoring infrastructure costs and operational complexity. However, organizations that skip this phase consistently report missed alerts during cutover — catching monitoring gaps requires weeks of parallel comparison.

Dashboard migration is the most emotionally charged part of the process¶

When teams migrate monitoring platforms, the fiercest resistance often comes from dashboard migration. Engineers develop deep attachment to dashboards they have customized over years. Studies of monitoring migrations at large organizations found that fewer than 50% of existing dashboards were actively used, but teams resisted deleting any of them. The migration became an opportunity to audit and rationalize dashboard sprawl.

StatsD was created at Etsy in 2011 and its metric format became ubiquitous¶

Erik Kastner at Etsy created StatsD in 2011 as a simple UDP-based daemon for application metrics. Its simplicity — just send a text string like page.views:1|c to a UDP port — made it trivially easy to instrument code. The StatsD format became so widespread that virtually every modern monitoring system can ingest it, making StatsD metrics the easiest to migrate between platforms.

Prometheus remote write became the universal migration bridge¶

Prometheus's remote write protocol, originally designed for long-term storage backends, became the de facto data bridge during monitoring migrations. Organizations migrating from Prometheus to Thanos, Cortex, VictoriaMetrics, Grafana Mimir, or even commercial platforms like Datadog and New Relic all use remote write as the migration path. This single protocol has enabled more monitoring migrations than any other technology.

Alert migration has a "silent failure" problem that is nearly impossible to detect¶

When migrating alert rules between platforms, subtle differences in query semantics, evaluation intervals, and threshold behavior can cause alerts to become silently less effective. A Prometheus alert that evaluated every 15 seconds might become a 1-minute evaluation in a new platform, potentially missing brief spikes. These differences are extremely difficult to detect without a real incident exposing the gap.

The cost of monitoring vendor lock-in was estimated at $1 million per year for mid-size companies¶

A 2022 analysis by Grafana Labs estimated that mid-size organizations (500-2,000 engineers) spend an average of $1 million per year on monitoring platform costs, with an additional $500K-1M in switching costs if they want to migrate. This vendor lock-in, created by proprietary query languages, custom integrations, and dashboard formats, has been a primary driver of the open-source monitoring movement.

OpenTelemetry was created specifically to solve the monitoring migration problem¶

OpenTelemetry (2019) merged the OpenTracing and OpenCensus projects with the explicit goal of creating a vendor-neutral telemetry standard. By standardizing how applications generate metrics, logs, and traces, OpenTelemetry makes future monitoring migrations dramatically easier — teams can switch backends without re-instrumenting applications. This was a direct response to years of painful monitoring migrations across the industry.