Certification Prep: PCA — Prometheus Certified Associate¶
Metadata¶
| Field | Value |
|---|---|
| Issuer | CNCF (Cloud Native Computing Foundation) |
| Exam Code | PCA |
| Format | Multiple-choice |
| Duration | 90 minutes |
| Passing Score | 75% |
| Cost | $250 USD |
| Retake Policy | One free retake included |
| Prometheus Version | Current stable (check CNCF site) |
| Wiki Coverage | ~85% |
Exam Domains & Wiki Mapping¶
Observability Concepts (18%)¶
| Objective | Topic Pack | Coverage |
|---|---|---|
| Explain the three pillars of observability (metrics, logs, traces) | monitoring-fundamentals, observability-deep-dive | ✅ Full |
| Understand the difference between monitoring and observability | monitoring-fundamentals, observability-deep-dive | ✅ Full |
| Describe metrics types (counter, gauge, histogram, summary) | prometheus-deep-dive | ✅ Full |
| Explain the pull-based model vs push-based model | monitoring-fundamentals, prometheus-deep-dive | ✅ Full |
| Understand SLIs, SLOs, and error budgets | postmortem-slo, slo-tooling | ✅ Full |
| Describe the role of metrics in incident response | incident-triage, monitoring-fundamentals | ✅ Full |
Prometheus Fundamentals (20%)¶
| Objective | Topic Pack | Coverage |
|---|---|---|
| Understand Prometheus architecture (server, TSDB, scrape, rules, alerting) | prometheus-deep-dive | ✅ Full |
Configure Prometheus via prometheus.yml |
prometheus-deep-dive | ✅ Full |
| Understand scrape configuration (targets, intervals, relabeling) | prometheus-deep-dive | ✅ Full |
| Describe service discovery mechanisms (static, file, Kubernetes, DNS, Consul) | prometheus-deep-dive | ✅ Full |
| Understand the Prometheus data model (metric name, labels, timestamp, value) | prometheus-deep-dive | ✅ Full |
| Explain storage and retention (local TSDB, remote write, remote read) | prometheus-deep-dive | ✅ Full |
| Understand Prometheus high availability (federation, Thanos, Cortex) | prometheus-deep-dive | ⚠️ Partial |
| Configure recording rules for performance optimization | prometheus-deep-dive | ✅ Full |
PromQL (28%)¶
| Objective | Topic Pack | Coverage |
|---|---|---|
| Write basic PromQL queries (instant vectors, range vectors) | prometheus-deep-dive | ✅ Full |
Use label matchers (=, !=, =~, !~) |
prometheus-deep-dive | ✅ Full |
Apply aggregation operators (sum, avg, min, max, count, topk, bottomk) |
prometheus-deep-dive | ✅ Full |
Use by and without clauses for grouping |
prometheus-deep-dive | ✅ Full |
Apply rate functions (rate, irate, increase) |
prometheus-deep-dive | ✅ Full |
Use histogram functions (histogram_quantile) |
prometheus-deep-dive | ✅ Full |
| Understand offset modifier and subquery syntax | prometheus-deep-dive | ⚠️ Partial |
Apply binary operators and vector matching (on, ignoring, group_left, group_right) |
prometheus-deep-dive | ⚠️ Partial |
Use predict_linear, deriv, delta, idelta for trend analysis |
prometheus-deep-dive | ⚠️ Partial |
| Write queries for common patterns (error rate, latency percentiles, saturation) | prometheus-deep-dive, slo-tooling | ✅ Full |
Instrumentation and Exporters (16%)¶
| Objective | Topic Pack | Coverage |
|---|---|---|
| Describe client library instrumentation (Go, Python, Java) | prometheus-deep-dive | ⚠️ Partial |
| Understand the difference between direct and exporter-based instrumentation | prometheus-deep-dive | ✅ Full |
| Use common exporters (node_exporter, blackbox_exporter, mysqld_exporter) | prometheus-deep-dive, monitoring-fundamentals | ✅ Full |
| Understand the exposition format (OpenMetrics, Prometheus text format) | prometheus-deep-dive | ⚠️ Partial |
Apply naming conventions for metrics (_total, _seconds, _bytes, _info) |
prometheus-deep-dive | ✅ Full |
| Understand pushgateway use cases and limitations | prometheus-deep-dive | ✅ Full |
| Describe Kubernetes metrics sources (kube-state-metrics, metrics-server, cAdvisor) | prometheus-deep-dive, k8s-ops (HPA) | ✅ Full |
Dashboarding and Visualization (8%)¶
| Objective | Topic Pack | Coverage |
|---|---|---|
| Use Grafana to visualize Prometheus metrics | grafana | ✅ Full |
| Create dashboards with panels, variables, and time ranges | grafana | ✅ Full |
| Use Prometheus expression browser for ad-hoc queries | prometheus-deep-dive | ✅ Full |
| Understand dashboard best practices (USE method, RED method) | monitoring-fundamentals, observability-deep-dive | ✅ Full |
| Configure Grafana data sources for Prometheus | grafana | ✅ Full |
Alerting and Alertmanager (10%)¶
| Objective | Topic Pack | Coverage |
|---|---|---|
| Write alerting rules in Prometheus | alerting-rules, prometheus-deep-dive | ✅ Full |
| Configure Alertmanager (routing, receivers, grouping, inhibition, silences) | alerting-rules | ✅ Full |
| Understand alert routing trees and match logic | alerting-rules | ✅ Full |
| Configure notification channels (email, Slack, PagerDuty, webhook) | alerting-rules | ⚠️ Partial |
Use for duration in alerting rules to avoid flapping |
alerting-rules | ✅ Full |
| Describe Alertmanager high availability (clustering) | alerting-rules | ⚠️ Partial |
| Understand alert lifecycle (pending, firing, resolved) | alerting-rules, incident-triage | ✅ Full |
Study Plan¶
Phase 1: Foundations (Weeks 1–2)¶
Goal: Solid understanding of observability concepts, Prometheus architecture, and basic PromQL.
- Week 1: Observability and Prometheus fundamentals
- Read: monitoring-fundamentals — monitoring vs observability, metrics types
- Read: observability-deep-dive — three pillars, USE/RED methods
- Read: prometheus-deep-dive — architecture, scraping, data model, TSDB
- Read: postmortem-slo — SLIs, SLOs, error budgets
- Practice: Install Prometheus, scrape node_exporter, explore the expression browser
-
Practice: Configure static targets and file-based service discovery
-
Week 2: PromQL foundations
- Read: prometheus-deep-dive — PromQL section in depth
- Practice: Write queries for instant vectors and range vectors
- Practice: Use
rate()on counter metrics, understand whyrate()notincrease()for alerting - Practice: Aggregation with
sum by(),avg by(),topk() - Practice: Calculate error rates:
sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) - Practice: Calculate latency percentiles:
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
Phase 2: Deep Dive (Weeks 3–4)¶
Goal: Advanced PromQL, instrumentation, alerting, and dashboarding.
- Week 3: Advanced PromQL and instrumentation
- Read: prometheus-deep-dive — binary operators, vector matching, subqueries
- Practice: Use
on(),ignoring(),group_left()for joining metrics - Practice: Use
predict_linear()for capacity planning queries - Practice: Recording rules — pre-compute expensive queries
- Study: Client library instrumentation patterns (counters for events, gauges for current state, histograms for latency)
- Study: Metric naming conventions (
_totalfor counters,_secondsfor duration,_bytesfor size) -
Study: OpenMetrics format and exposition format details
-
Week 4: Alerting, Alertmanager, and Grafana
- Read: alerting-rules — writing alert rules,
forduration, labels, annotations - Read: grafana — dashboards, panels, variables, data sources
- Practice: Write alerting rules for high error rate, high latency, disk almost full
- Practice: Configure Alertmanager: routing tree, grouping, inhibition, silences
- Practice: Build a Grafana dashboard with USE method panels (utilization, saturation, errors)
- Practice: Use Grafana template variables for multi-service dashboards
Phase 3: Exam Simulation (Week 5)¶
Goal: Timed practice and gap remediation.
- Take practice exams (PromQL exercises from promlabs.com are excellent)
- Focus on PromQL (28% of exam) — this is the highest-weighted domain
- Drill: given a metric name and labels, write the correct PromQL query
- Drill: given a PromQL query, predict the output
- Review: vector matching rules (
on,ignoring,group_left,group_right) - Review: counter vs gauge vs histogram vs summary — when to use each
- Review: Alertmanager routing tree — trace an alert through the config to the receiver
- Study the wiki gaps: Thanos/Cortex HA patterns, OpenMetrics format, client library details
Gap Analysis¶
| Gap | Exam Weight | Recommended External Resource |
|---|---|---|
Advanced vector matching (group_left, group_right in depth) |
Medium (within 28%) | PromLabs blog — vector matching explained |
| Prometheus HA (Thanos, Cortex architecture) | Low (within 20%) | Thanos documentation, Cortex architecture docs |
| Client library instrumentation (Go, Python, Java examples) | Medium (within 16%) | Prometheus client_golang, client_python documentation |
| OpenMetrics exposition format vs Prometheus text format | Low (within 16%) | OpenMetrics specification on GitHub |
predict_linear, deriv, delta functions in depth |
Low (within 28%) | Prometheus documentation — functions reference |
| Alertmanager clustering and HA | Low (within 10%) | Alertmanager HA documentation |
| Notification integrations (webhook configuration details) | Low (within 10%) | Alertmanager configuration reference |
Exam-Day Strategy¶
Time Management¶
- ~60 questions in 90 minutes = ~1.5 min per question
- PromQL questions may take longer — budget 2 min for those
- Flag conceptual questions you can revisit quickly at the end
- Don't get bogged down on obscure function syntax — flag and move on
Question Triage¶
- Read the full question and all options
- Identify the domain: is this PromQL, architecture, alerting, or instrumentation?
- For PromQL questions: mentally trace through the query step by step
- For architecture questions: recall the Prometheus component diagram
- For alerting questions: think about the routing tree and alert lifecycle
Common Traps¶
rate()vsirate():rateis per-second average over window,irateuses last two data points only (more volatile)rate()only works on counters — never use it on gaugesincrease()is syntactic sugar forrate() * window— it extrapolates and can return non-integer values for counters- Counter resets:
rate()handles counter resets automatically — don't worry about them in queries histogram_quantilerequires alelabel (from histogram buckets) — summary quantiles are pre-calculated and cannot be aggregated- Histogram vs Summary: histograms are aggregatable (recommended), summaries are not
- Recording rules use
record:notalert:— and they produce new time series, not alerts - Alertmanager
group_bydetermines which alerts are batched together — not which alerts fire
PromQL Mental Model¶
When facing a PromQL question, think in this order:
1. What metric am I starting with? (counter, gauge, histogram?)
2. Do I need a rate? (yes for counters, no for gauges)
3. What time window? (typically [5m] for rate, but exam may specify)
4. Do I need to aggregate? (sum, avg, max — and by which labels?)
5. Do I need a quantile? (histogram_quantile for latency)
6. Does the answer need labels? (check if by or without is needed)
If You're Stuck¶
- Counter metrics always end in
_total— if you see_total, thinkrate() - Duration metrics end in
_seconds— if you see bucket, thinkhistogram_quantile upmetric: 1 = target is healthy, 0 = target is down- Eliminate answers that use incorrect function/metric combinations
- Flag and return — fresh eyes often see the answer immediately
Cross-References¶
- Learning Paths: Observability Path
- Skill Checks: skillchecks/
- Deep Dives: prometheus-deep-dive, observability-deep-dive
- Alerting: alerting-rules
- Grafana: grafana
- SLO Tooling: slo-tooling
- Related Logs Stack: loki — Grafana Loki for log correlation
- Production Readiness: production-readiness/
Pages that link here¶
- Certification Exam Prep
- Incident Postmortem Writing & SLO/SLI - Primer
- Incident Triage Primer
- Learning Paths
- Linux Boot Process — Primer
- Log Analysis & Alerting Rules (PromQL / LogQL) - Primer
- Monitoring Fundamentals - Primer
- Primer
- Primer
- Primer
- Production Readiness Assessment
- Prometheus Deep Dive - Primer
- Skill Checks