Tempo¶

20 cards — 🟢 3 easy | 🟡 4 medium | 🔴 3 hard

🟢 Easy (3)¶

1. What question do distributed traces answer that metrics and logs cannot?

Show answer

Where in the system is a problem happening? — traces show request flow across services, revealing which service or hop introduced latency or errors."

Remember: "Tempo = Grafana's tracing backend." It stores traces in object storage (S3/GCS), making it cost-effective at scale.

See also: Tempo pairs with Grafana for visualization, Loki for logs, and Mimir for metrics.

2. What is a span in distributed tracing?

Show answer

A span represents a single unit of work within a trace, with a name, start time, duration, and parent span reference. A trace is a tree of spans showing the full request lifecycle.

Remember: "Trace = tree of spans." Each span has a parent, start time, duration, and metadata. The root span represents the entire request.

3. What is a trace ID and how is it used?

Show answer

A trace ID is a globally unique identifier that ties all spans of a single request together across services. It is propagated in HTTP headers (e.g., traceparent) so each service can attach its spans to the same trace.

🟡 Medium (4)¶

1. How does Tempo differ from Jaeger in its storage approach?

Show answer

Tempo stores traces in object storage (S3, GCS) without requiring a separate indexing database, making it cheaper and simpler to operate at scale. Jaeger typically requires Elasticsearch or Cassandra for indexing.

Remember: "Trace ID = correlation key." Propagate it in HTTP headers (traceparent in W3C format) so all services link their spans to the same trace.

Gotcha: If any service drops the trace header, you get a broken trace.

2. What is trace sampling and why is it necessary?

Show answer

Sampling means only collecting a fraction of traces (e.g., 1% or 10%). It is necessary because capturing every trace at high throughput generates enormous volumes of data. Head-based sampling decides at the start; tail-based sampling decides after the trace completes (keeping only interesting traces).

3. What is TraceQL and what does it enable?

Show answer

TraceQL is Tempo's query language for searching traces by span attributes, duration, status, and resource fields. It allows queries like { span.http.status_code >= 500 && duration > 2s } to find slow, erroring requests.

4. How does Tempo integrate with Grafana and Loki for end-to-end observability?

Show answer

Grafana links metrics, logs, and traces: a Prometheus alert can link to Loki logs via labels, and Loki log lines containing trace IDs become clickable links to Tempo traces, enabling drill-down from symptom to root cause.

🔴 Hard (3)¶

1. What are the main components in Tempo's architecture?

Show answer

Distributor (receives spans from instrumented apps), Ingester (batches and writes to backend), Compactor (merges blocks in object storage), Querier (reads traces from storage), and Query Frontend (caches and splits queries).

Remember: "Sampling reduces volume, not visibility." Head sampling decides at trace start; tail sampling decides after trace completes (keeps errors/slow traces).

Gotcha: Head sampling may drop interesting traces. Tail sampling is better but more complex.

2. What is the advantage of tail-based sampling over head-based sampling?

Show answer

Tail-based sampling makes the keep/drop decision after the trace is complete, so it can retain traces with errors, high latency, or other interesting attributes. Head-based sampling decides at the start and may discard important traces by chance.

3. How do exemplars bridge the gap between metrics and traces?

Show answer

Exemplars attach a trace ID to a specific metric data point (e.g., a histogram bucket observation). In Grafana, clicking an exemplar on a latency graph jumps directly to the Tempo trace for that request, connecting aggregate metrics to individual request detail.