Skip to content

Certification Prep: PCA — Prometheus Certified Associate

Metadata

Field Value
Issuer CNCF (Cloud Native Computing Foundation)
Exam Code PCA
Format Multiple-choice
Duration 90 minutes
Passing Score 75%
Cost $250 USD
Retake Policy One free retake included
Prometheus Version Current stable (check CNCF site)
Wiki Coverage ~85%

Exam Domains & Wiki Mapping

Observability Concepts (18%)

Objective Topic Pack Coverage
Explain the three pillars of observability (metrics, logs, traces) monitoring-fundamentals, observability-deep-dive ✅ Full
Understand the difference between monitoring and observability monitoring-fundamentals, observability-deep-dive ✅ Full
Describe metrics types (counter, gauge, histogram, summary) prometheus-deep-dive ✅ Full
Explain the pull-based model vs push-based model monitoring-fundamentals, prometheus-deep-dive ✅ Full
Understand SLIs, SLOs, and error budgets postmortem-slo, slo-tooling ✅ Full
Describe the role of metrics in incident response incident-triage, monitoring-fundamentals ✅ Full

Prometheus Fundamentals (20%)

Objective Topic Pack Coverage
Understand Prometheus architecture (server, TSDB, scrape, rules, alerting) prometheus-deep-dive ✅ Full
Configure Prometheus via prometheus.yml prometheus-deep-dive ✅ Full
Understand scrape configuration (targets, intervals, relabeling) prometheus-deep-dive ✅ Full
Describe service discovery mechanisms (static, file, Kubernetes, DNS, Consul) prometheus-deep-dive ✅ Full
Understand the Prometheus data model (metric name, labels, timestamp, value) prometheus-deep-dive ✅ Full
Explain storage and retention (local TSDB, remote write, remote read) prometheus-deep-dive ✅ Full
Understand Prometheus high availability (federation, Thanos, Cortex) prometheus-deep-dive ⚠️ Partial
Configure recording rules for performance optimization prometheus-deep-dive ✅ Full

PromQL (28%)

Objective Topic Pack Coverage
Write basic PromQL queries (instant vectors, range vectors) prometheus-deep-dive ✅ Full
Use label matchers (=, !=, =~, !~) prometheus-deep-dive ✅ Full
Apply aggregation operators (sum, avg, min, max, count, topk, bottomk) prometheus-deep-dive ✅ Full
Use by and without clauses for grouping prometheus-deep-dive ✅ Full
Apply rate functions (rate, irate, increase) prometheus-deep-dive ✅ Full
Use histogram functions (histogram_quantile) prometheus-deep-dive ✅ Full
Understand offset modifier and subquery syntax prometheus-deep-dive ⚠️ Partial
Apply binary operators and vector matching (on, ignoring, group_left, group_right) prometheus-deep-dive ⚠️ Partial
Use predict_linear, deriv, delta, idelta for trend analysis prometheus-deep-dive ⚠️ Partial
Write queries for common patterns (error rate, latency percentiles, saturation) prometheus-deep-dive, slo-tooling ✅ Full

Instrumentation and Exporters (16%)

Objective Topic Pack Coverage
Describe client library instrumentation (Go, Python, Java) prometheus-deep-dive ⚠️ Partial
Understand the difference between direct and exporter-based instrumentation prometheus-deep-dive ✅ Full
Use common exporters (node_exporter, blackbox_exporter, mysqld_exporter) prometheus-deep-dive, monitoring-fundamentals ✅ Full
Understand the exposition format (OpenMetrics, Prometheus text format) prometheus-deep-dive ⚠️ Partial
Apply naming conventions for metrics (_total, _seconds, _bytes, _info) prometheus-deep-dive ✅ Full
Understand pushgateway use cases and limitations prometheus-deep-dive ✅ Full
Describe Kubernetes metrics sources (kube-state-metrics, metrics-server, cAdvisor) prometheus-deep-dive, k8s-ops (HPA) ✅ Full

Dashboarding and Visualization (8%)

Objective Topic Pack Coverage
Use Grafana to visualize Prometheus metrics grafana ✅ Full
Create dashboards with panels, variables, and time ranges grafana ✅ Full
Use Prometheus expression browser for ad-hoc queries prometheus-deep-dive ✅ Full
Understand dashboard best practices (USE method, RED method) monitoring-fundamentals, observability-deep-dive ✅ Full
Configure Grafana data sources for Prometheus grafana ✅ Full

Alerting and Alertmanager (10%)

Objective Topic Pack Coverage
Write alerting rules in Prometheus alerting-rules, prometheus-deep-dive ✅ Full
Configure Alertmanager (routing, receivers, grouping, inhibition, silences) alerting-rules ✅ Full
Understand alert routing trees and match logic alerting-rules ✅ Full
Configure notification channels (email, Slack, PagerDuty, webhook) alerting-rules ⚠️ Partial
Use for duration in alerting rules to avoid flapping alerting-rules ✅ Full
Describe Alertmanager high availability (clustering) alerting-rules ⚠️ Partial
Understand alert lifecycle (pending, firing, resolved) alerting-rules, incident-triage ✅ Full

Study Plan

Phase 1: Foundations (Weeks 1–2)

Goal: Solid understanding of observability concepts, Prometheus architecture, and basic PromQL.

  • Week 1: Observability and Prometheus fundamentals
  • Read: monitoring-fundamentals — monitoring vs observability, metrics types
  • Read: observability-deep-dive — three pillars, USE/RED methods
  • Read: prometheus-deep-dive — architecture, scraping, data model, TSDB
  • Read: postmortem-slo — SLIs, SLOs, error budgets
  • Practice: Install Prometheus, scrape node_exporter, explore the expression browser
  • Practice: Configure static targets and file-based service discovery

  • Week 2: PromQL foundations

  • Read: prometheus-deep-dive — PromQL section in depth
  • Practice: Write queries for instant vectors and range vectors
  • Practice: Use rate() on counter metrics, understand why rate() not increase() for alerting
  • Practice: Aggregation with sum by(), avg by(), topk()
  • Practice: Calculate error rates: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))
  • Practice: Calculate latency percentiles: histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

Phase 2: Deep Dive (Weeks 3–4)

Goal: Advanced PromQL, instrumentation, alerting, and dashboarding.

  • Week 3: Advanced PromQL and instrumentation
  • Read: prometheus-deep-dive — binary operators, vector matching, subqueries
  • Practice: Use on(), ignoring(), group_left() for joining metrics
  • Practice: Use predict_linear() for capacity planning queries
  • Practice: Recording rules — pre-compute expensive queries
  • Study: Client library instrumentation patterns (counters for events, gauges for current state, histograms for latency)
  • Study: Metric naming conventions (_total for counters, _seconds for duration, _bytes for size)
  • Study: OpenMetrics format and exposition format details

  • Week 4: Alerting, Alertmanager, and Grafana

  • Read: alerting-rules — writing alert rules, for duration, labels, annotations
  • Read: grafana — dashboards, panels, variables, data sources
  • Practice: Write alerting rules for high error rate, high latency, disk almost full
  • Practice: Configure Alertmanager: routing tree, grouping, inhibition, silences
  • Practice: Build a Grafana dashboard with USE method panels (utilization, saturation, errors)
  • Practice: Use Grafana template variables for multi-service dashboards

Phase 3: Exam Simulation (Week 5)

Goal: Timed practice and gap remediation.

  • Take practice exams (PromQL exercises from promlabs.com are excellent)
  • Focus on PromQL (28% of exam) — this is the highest-weighted domain
  • Drill: given a metric name and labels, write the correct PromQL query
  • Drill: given a PromQL query, predict the output
  • Review: vector matching rules (on, ignoring, group_left, group_right)
  • Review: counter vs gauge vs histogram vs summary — when to use each
  • Review: Alertmanager routing tree — trace an alert through the config to the receiver
  • Study the wiki gaps: Thanos/Cortex HA patterns, OpenMetrics format, client library details

Gap Analysis

Gap Exam Weight Recommended External Resource
Advanced vector matching (group_left, group_right in depth) Medium (within 28%) PromLabs blog — vector matching explained
Prometheus HA (Thanos, Cortex architecture) Low (within 20%) Thanos documentation, Cortex architecture docs
Client library instrumentation (Go, Python, Java examples) Medium (within 16%) Prometheus client_golang, client_python documentation
OpenMetrics exposition format vs Prometheus text format Low (within 16%) OpenMetrics specification on GitHub
predict_linear, deriv, delta functions in depth Low (within 28%) Prometheus documentation — functions reference
Alertmanager clustering and HA Low (within 10%) Alertmanager HA documentation
Notification integrations (webhook configuration details) Low (within 10%) Alertmanager configuration reference

Exam-Day Strategy

Time Management

  • ~60 questions in 90 minutes = ~1.5 min per question
  • PromQL questions may take longer — budget 2 min for those
  • Flag conceptual questions you can revisit quickly at the end
  • Don't get bogged down on obscure function syntax — flag and move on

Question Triage

  1. Read the full question and all options
  2. Identify the domain: is this PromQL, architecture, alerting, or instrumentation?
  3. For PromQL questions: mentally trace through the query step by step
  4. For architecture questions: recall the Prometheus component diagram
  5. For alerting questions: think about the routing tree and alert lifecycle

Common Traps

  • rate() vs irate(): rate is per-second average over window, irate uses last two data points only (more volatile)
  • rate() only works on counters — never use it on gauges
  • increase() is syntactic sugar for rate() * window — it extrapolates and can return non-integer values for counters
  • Counter resets: rate() handles counter resets automatically — don't worry about them in queries
  • histogram_quantile requires a le label (from histogram buckets) — summary quantiles are pre-calculated and cannot be aggregated
  • Histogram vs Summary: histograms are aggregatable (recommended), summaries are not
  • Recording rules use record: not alert: — and they produce new time series, not alerts
  • Alertmanager group_by determines which alerts are batched together — not which alerts fire

PromQL Mental Model

When facing a PromQL question, think in this order: 1. What metric am I starting with? (counter, gauge, histogram?) 2. Do I need a rate? (yes for counters, no for gauges) 3. What time window? (typically [5m] for rate, but exam may specify) 4. Do I need to aggregate? (sum, avg, max — and by which labels?) 5. Do I need a quantile? (histogram_quantile for latency) 6. Does the answer need labels? (check if by or without is needed)

If You're Stuck

  • Counter metrics always end in _total — if you see _total, think rate()
  • Duration metrics end in _seconds — if you see bucket, think histogram_quantile
  • up metric: 1 = target is healthy, 0 = target is down
  • Eliminate answers that use incorrect function/metric combinations
  • Flag and return — fresh eyes often see the answer immediately

Cross-References