Datadog¶

33 cards — 🟢 5 easy | 🟡 11 medium | 🔴 2 hard

🟢 Easy (5)¶

1. What is a Datadog agent?

Show answer

A software runs on a Datadog host. Its purpose is to collect data from the host and sent it to Datadog (data like metrics, logs, etc.)

Remember: Datadog Agent collects metrics, traces, and logs from the host. Runs as a service (systemd) or DaemonSet in Kubernetes. Configurable via datadog.yaml.

2. What is a host in regards to Datadog?

Show answer

Any physical or virtual instance that is monitored with Datadog. Few examples:

- Cloud Instance, Virtual Machine
- Bare metal node
- Platform or service specific nodes like Kubernetes node

Basically any device or location that has Datadog agent installed and running on.

Remember: consistent tagging (env, service, version) across metrics, traces, and logs is what makes Datadog powerful. Without consistent tags, correlation is impossible.

3. What dashboard types does Datadog offer and when would you use each?

Show answer

Timeboards: all widgets share the same time scope, useful for troubleshooting and correlation (synchronized zoom/pan). Screenboards: free-form layout with widgets at independent time scopes, useful for status boards and executive views. Widgets include: timeseries, query value, top list, heatmap, distribution, log stream, service map, SLO summary. Template variables allow filtering dashboards by tag values.

Example: a good service dashboard: request rate, error rate, latency (p50/p95/p99), saturation (CPU/memory), and deployment markers. The RED method in one view.

4. How do Datadog integrations work and name five common ones.

Show answer

Integrations connect Datadog to external services and technologies. Types: 1) Agent-based: the agent runs a check (e.g., postgres, nginx, redis checks); 2) API/webhook-based: services push data to Datadog (e.g., AWS, GCP, PagerDuty); 3) Library-based: instrumentation SDKs (e.g., ddtrace for APM). Common integrations: AWS (CloudWatch metrics), Kubernetes (pod/node metrics), PostgreSQL, Nginx, Redis. Each integration provides out-of-the-box dashboards and monitors.

Remember: consistent tagging (env, service, version) across metrics, traces, and logs is what makes Datadog powerful. Without consistent tags, correlation is impossible.

5. What are Datadog tagging best practices?

Show answer

Tags are key:value pairs that enable filtering, grouping, and aggregation. Best practices: 1) Use consistent naming (env:production, service:api, team:platform); 2) Tag at the source (agent config, cloud provider tags auto-imported); 3) Keep cardinality reasonable (avoid per-request or per-user tags on metrics); 4) Use reserved tags: env, service, version (Unified Service Tagging) for correlation across metrics, traces, and logs; 5) Document your tagging convention.

Remember: tags are key:value pairs (env:production, service:web-api). They enable filtering, grouping, and aggregation across all Datadog data. Tag everything consistently.

🟡 Medium (11)¶

1. Describe at least three use cases for using something like Datadog. Can be as specific as you would like

Show answer

* Monitor instances/servers downtime
* Detect anomalies and send an alert when it happens
* Service request or response latency

Remember: Datadog's three pillars: metrics (numeric time-series), traces (request flows across services), and logs (text events). Correlate all three for fast incident resolution.

Gotcha: Datadog pricing is per host, per million log events, and per indexed span. Monitor your Datadog usage to avoid surprise bills — observability tools can be expensive at scale.

2. What can you tell about Datadog integrations?

Show answer

- Datadog has many supported integrations with different services, platforms, etc.
- Each integration includes information on how to apply it, how to use it and what configuration options it supports

Remember: consistent tagging (env, service, version) across metrics, traces, and logs is what makes Datadog powerful. Without consistent tags, correlation is impossible.

3. What are the components of a Datadog agent?

Show answer

* Collector: its role is to collect data from the host on which it's installed. The default period of time as of today is every 15 seconds.
* Forwarder: responsible for sending the data to Datadog over HTTPS

Remember: Datadog Agent collects metrics, traces, and logs from the host. Runs as a service (systemd) or DaemonSet in Kubernetes. Configurable via datadog.yaml.

4. When opening some of the integrations windows/pages, there is a section called "Monitors". What can be found there?

Show answer

Usually you can find there some anomaly types that Datadog suggests to monitor and track.

Remember: Datadog's three pillars: metrics (numeric time-series), traces (request flows across services), and logs (text events). Correlate all three for fast incident resolution.

Gotcha: Datadog pricing is per host, per million log events, and per indexed span. Monitor your Datadog usage to avoid surprise bills — observability tools can be expensive at scale.

5. What ways are there to collect or send data to Datadog?

Show answer

* Datadog agent installed on the device or location which you would like to monitor
* Using Datadog API
* Built-in integrations

Remember: consistent tagging (env, service, version) across metrics, traces, and logs is what makes Datadog powerful. Without consistent tags, correlation is impossible.

6. What are Datadog tags?

Show answer

Datadog tags are used to mark different information with unique properties. For example, you might want to tag some data with "environment: production" while tagging information from staging or dev environment with "environment: staging".

Remember: tags are key:value pairs (env:production, service:web-api). They enable filtering, grouping, and aggregation across all Datadog data. Tag everything consistently.

7. Describe the Datadog Agent architecture and its main processes.

Show answer

The Datadog Agent runs as a service on hosts and consists of: 1) Core Agent: collects system metrics, handles check scheduling, and manages the event pipeline; 2) Trace Agent (APM): receives traces from instrumented applications and forwards them to Datadog; 3) Process Agent: collects live process and container data; 4) Log Agent: tails log files, listens on TCP/UDP, and ships logs. All communicate with Datadog via HTTPS on port 443.

Remember: Datadog Agent collects metrics, traces, and logs from the host. Runs as a service (systemd) or DaemonSet in Kubernetes. Configurable via datadog.yaml.

8. How do you submit custom metrics to Datadog?

Show answer

Methods: 1) DogStatsD: a StatsD-compatible UDP server bundled with the agent (send from app code via client libraries); 2) Agent checks: Python scripts in the agent that collect and submit metrics; 3) Datadog API: POST metrics directly via REST API. Metric types: count, gauge, rate, histogram, distribution. Custom metrics are billed per unique metric name + tag combination, so tag cardinality matters.

Remember: consistent tagging (env, service, version) across metrics, traces, and logs is what makes Datadog powerful. Without consistent tags, correlation is impossible.

9. What types of monitors does Datadog support and how do alerts work?

Show answer

Monitor types: 1) Metric: threshold or anomaly on numeric metrics; 2) Service Check: agent or integration health status; 3) Log: alert on log patterns or counts; 4) APM: latency, error rate, or throughput thresholds; 5) Composite: combine multiple monitors with boolean logic; 6) Forecast: predict future metric values. Alerts flow through notification channels (Slack, PagerDuty, email). Monitors support warn/alert thresholds, recovery conditions, and no-data handling.

Remember: good alerting follows the RED method for services (Rate, Errors, Duration) and USE method for resources (Utilization, Saturation, Errors).

Remember: Datadog monitors are alert rules. Types: metric, anomaly, forecast, outlier, log, APM, composite. Each can notify Slack, PagerDuty, email, webhooks.

10. How do you define and track SLOs in Datadog?

Show answer

Datadog SLOs track service reliability against targets. Types: 1) Metric-based: percentage of time a metric meets a threshold (e.g., latency p99 < 500ms); 2) Monitor-based: percentage of time a monitor is in OK state. Configure a target (e.g., 99.9%), time window (7d, 30d, 90d), and error budget. SLO widgets on dashboards show remaining error budget. Alerts can fire when error budget is being consumed too fast.

Remember: consistent tagging (env, service, version) across metrics, traces, and logs is what makes Datadog powerful. Without consistent tags, correlation is impossible.

11. What is Datadog Synthetic Monitoring?

Show answer

Synthetic monitoring runs automated tests against your services from global locations. Types: 1) API tests: HTTP, SSL, DNS, TCP, gRPC checks to verify availability and response correctness; 2) Browser tests: record and replay user journeys in a headless browser to catch UI regressions; 3) Multistep API tests: chain multiple API calls. Tests run on a schedule and alert on failures. They provide uptime tracking and help catch issues before users do.

Remember: Datadog = SaaS monitoring platform. Metrics, traces, logs, and security in one pane. Agent-based collection, cloud-native integrations.

🔴 Hard (2)¶

1. How does Datadog APM work and what is a trace vs a span?

Show answer

A trace represents a single request flowing through a distributed system. Each trace is composed of spans, where each span represents a unit of work (e.g., an HTTP handler, a database query, an external call). Spans have start time, duration, tags, and parent-child relationships. The tracing library instruments your code, sends spans to the Trace Agent (localhost:8126), which forwards them to Datadog. Traces enable latency analysis, error tracking, and service dependency mapping.

Remember: Datadog APM instruments your code to trace requests across services. Shows latency, error rates, and dependency maps. Supports auto-instrumentation for Python, Java, Go, etc.

2. What are Datadog log pipelines and how do they process logs?

Show answer

Log pipelines are ordered sets of processors that parse and enrich raw log data. Key processors: 1) Grok Parser: extract structured fields from unstructured log lines; 2) Date Remapper: set the official log timestamp; 3) Status Remapper: set log severity; 4) Attribute Remapper: rename or transform fields; 5) Category Processor: classify logs by rules. Pipelines are matched to logs by filter queries. They transform raw text into structured, searchable, queryable data.

Remember: Datadog Log Management: collect -> parse (pipelines) -> index (include/exclude) -> analyze. Use facets and saved views for fast investigation. Archive to S3 for long-term retention.