Certification Prep: PCA — Prometheus Certified Associate¶

Metadata¶

Field	Value
Issuer	CNCF (Cloud Native Computing Foundation)
Exam Code	PCA
Format	Multiple-choice
Duration	90 minutes
Passing Score	75%
Cost	$250 USD
Retake Policy	One free retake included
Prometheus Version	Current stable (check CNCF site)
Wiki Coverage	~85%

Exam Domains & Wiki Mapping¶

Observability Concepts (18%)¶

Objective	Topic Pack	Coverage
Explain the three pillars of observability (metrics, logs, traces)	monitoring-fundamentals, observability-deep-dive	✅ Full
Understand the difference between monitoring and observability	monitoring-fundamentals, observability-deep-dive	✅ Full
Describe metrics types (counter, gauge, histogram, summary)	prometheus-deep-dive	✅ Full
Explain the pull-based model vs push-based model	monitoring-fundamentals, prometheus-deep-dive	✅ Full
Understand SLIs, SLOs, and error budgets	postmortem-slo, slo-tooling	✅ Full
Describe the role of metrics in incident response	incident-triage, monitoring-fundamentals	✅ Full

Prometheus Fundamentals (20%)¶

Objective	Topic Pack	Coverage
Understand Prometheus architecture (server, TSDB, scrape, rules, alerting)	prometheus-deep-dive	✅ Full
Configure Prometheus via `prometheus.yml`	prometheus-deep-dive	✅ Full
Understand scrape configuration (targets, intervals, relabeling)	prometheus-deep-dive	✅ Full
Describe service discovery mechanisms (static, file, Kubernetes, DNS, Consul)	prometheus-deep-dive	✅ Full
Understand the Prometheus data model (metric name, labels, timestamp, value)	prometheus-deep-dive	✅ Full
Explain storage and retention (local TSDB, remote write, remote read)	prometheus-deep-dive	✅ Full
Understand Prometheus high availability (federation, Thanos, Cortex)	prometheus-deep-dive	⚠️ Partial
Configure recording rules for performance optimization	prometheus-deep-dive	✅ Full

PromQL (28%)¶

Objective	Topic Pack	Coverage
Write basic PromQL queries (instant vectors, range vectors)	prometheus-deep-dive	✅ Full
Use label matchers (`=`, `!=`, `=~`, `!~`)	prometheus-deep-dive	✅ Full
Apply aggregation operators (`sum`, `avg`, `min`, `max`, `count`, `topk`, `bottomk`)	prometheus-deep-dive	✅ Full
Use `by` and `without` clauses for grouping	prometheus-deep-dive	✅ Full
Apply rate functions (`rate`, `irate`, `increase`)	prometheus-deep-dive	✅ Full
Use histogram functions (`histogram_quantile`)	prometheus-deep-dive	✅ Full
Understand offset modifier and subquery syntax	prometheus-deep-dive	⚠️ Partial
Apply binary operators and vector matching (`on`, `ignoring`, `group_left`, `group_right`)	prometheus-deep-dive	⚠️ Partial
Use `predict_linear`, `deriv`, `delta`, `idelta` for trend analysis	prometheus-deep-dive	⚠️ Partial
Write queries for common patterns (error rate, latency percentiles, saturation)	prometheus-deep-dive, slo-tooling	✅ Full

Instrumentation and Exporters (16%)¶

Objective	Topic Pack	Coverage
Describe client library instrumentation (Go, Python, Java)	prometheus-deep-dive	⚠️ Partial
Understand the difference between direct and exporter-based instrumentation	prometheus-deep-dive	✅ Full
Use common exporters (node_exporter, blackbox_exporter, mysqld_exporter)	prometheus-deep-dive, monitoring-fundamentals	✅ Full
Understand the exposition format (OpenMetrics, Prometheus text format)	prometheus-deep-dive	⚠️ Partial
Apply naming conventions for metrics (`_total`, `_seconds`, `_bytes`, `_info`)	prometheus-deep-dive	✅ Full
Understand pushgateway use cases and limitations	prometheus-deep-dive	✅ Full
Describe Kubernetes metrics sources (kube-state-metrics, metrics-server, cAdvisor)	prometheus-deep-dive, k8s-ops (HPA)	✅ Full

Dashboarding and Visualization (8%)¶

Objective	Topic Pack	Coverage
Use Grafana to visualize Prometheus metrics	grafana	✅ Full
Create dashboards with panels, variables, and time ranges	grafana	✅ Full
Use Prometheus expression browser for ad-hoc queries	prometheus-deep-dive	✅ Full
Understand dashboard best practices (USE method, RED method)	monitoring-fundamentals, observability-deep-dive	✅ Full
Configure Grafana data sources for Prometheus	grafana	✅ Full

Alerting and Alertmanager (10%)¶

Objective	Topic Pack	Coverage
Write alerting rules in Prometheus	alerting-rules, prometheus-deep-dive	✅ Full
Configure Alertmanager (routing, receivers, grouping, inhibition, silences)	alerting-rules	✅ Full
Understand alert routing trees and match logic	alerting-rules	✅ Full
Configure notification channels (email, Slack, PagerDuty, webhook)	alerting-rules	⚠️ Partial
Use `for` duration in alerting rules to avoid flapping	alerting-rules	✅ Full
Describe Alertmanager high availability (clustering)	alerting-rules	⚠️ Partial
Understand alert lifecycle (pending, firing, resolved)	alerting-rules, incident-triage	✅ Full

Study Plan¶

Phase 1: Foundations (Weeks 1–2)¶

Goal: Solid understanding of observability concepts, Prometheus architecture, and basic PromQL.

Week 1: Observability and Prometheus fundamentals
Read: monitoring-fundamentals — monitoring vs observability, metrics types
Read: observability-deep-dive — three pillars, USE/RED methods
Read: prometheus-deep-dive — architecture, scraping, data model, TSDB
Read: postmortem-slo — SLIs, SLOs, error budgets
Practice: Install Prometheus, scrape node_exporter, explore the expression browser
Practice: Configure static targets and file-based service discovery
Week 2: PromQL foundations
Read: prometheus-deep-dive — PromQL section in depth
Practice: Write queries for instant vectors and range vectors
Practice: Use rate() on counter metrics, understand why rate() not increase() for alerting
Practice: Aggregation with sum by(), avg by(), topk()
Practice: Calculate error rates: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))
Practice: Calculate latency percentiles: histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

Phase 2: Deep Dive (Weeks 3–4)¶

Goal: Advanced PromQL, instrumentation, alerting, and dashboarding.

Week 3: Advanced PromQL and instrumentation
Read: prometheus-deep-dive — binary operators, vector matching, subqueries
Practice: Use on(), ignoring(), group_left() for joining metrics
Practice: Use predict_linear() for capacity planning queries
Practice: Recording rules — pre-compute expensive queries
Study: Client library instrumentation patterns (counters for events, gauges for current state, histograms for latency)
Study: Metric naming conventions (_total for counters, _seconds for duration, _bytes for size)
Study: OpenMetrics format and exposition format details
Week 4: Alerting, Alertmanager, and Grafana
Read: alerting-rules — writing alert rules, for duration, labels, annotations
Read: grafana — dashboards, panels, variables, data sources
Practice: Write alerting rules for high error rate, high latency, disk almost full
Practice: Configure Alertmanager: routing tree, grouping, inhibition, silences
Practice: Build a Grafana dashboard with USE method panels (utilization, saturation, errors)
Practice: Use Grafana template variables for multi-service dashboards

Phase 3: Exam Simulation (Week 5)¶

Goal: Timed practice and gap remediation.

Take practice exams (PromQL exercises from promlabs.com are excellent)
Focus on PromQL (28% of exam) — this is the highest-weighted domain
Drill: given a metric name and labels, write the correct PromQL query
Drill: given a PromQL query, predict the output
Review: vector matching rules (on, ignoring, group_left, group_right)
Review: counter vs gauge vs histogram vs summary — when to use each
Review: Alertmanager routing tree — trace an alert through the config to the receiver
Study the wiki gaps: Thanos/Cortex HA patterns, OpenMetrics format, client library details

Gap Analysis¶

Gap	Exam Weight	Recommended External Resource
Advanced vector matching (`group_left`, `group_right` in depth)	Medium (within 28%)	PromLabs blog — vector matching explained
Prometheus HA (Thanos, Cortex architecture)	Low (within 20%)	Thanos documentation, Cortex architecture docs
Client library instrumentation (Go, Python, Java examples)	Medium (within 16%)	Prometheus client_golang, client_python documentation
OpenMetrics exposition format vs Prometheus text format	Low (within 16%)	OpenMetrics specification on GitHub
`predict_linear`, `deriv`, `delta` functions in depth	Low (within 28%)	Prometheus documentation — functions reference
Alertmanager clustering and HA	Low (within 10%)	Alertmanager HA documentation
Notification integrations (webhook configuration details)	Low (within 10%)	Alertmanager configuration reference

Exam-Day Strategy¶

Time Management¶

~60 questions in 90 minutes = ~1.5 min per question
PromQL questions may take longer — budget 2 min for those
Flag conceptual questions you can revisit quickly at the end
Don't get bogged down on obscure function syntax — flag and move on

Question Triage¶

Read the full question and all options
Identify the domain: is this PromQL, architecture, alerting, or instrumentation?
For PromQL questions: mentally trace through the query step by step
For architecture questions: recall the Prometheus component diagram
For alerting questions: think about the routing tree and alert lifecycle

Common Traps¶

rate() vs irate(): rate is per-second average over window, irate uses last two data points only (more volatile)
rate() only works on counters — never use it on gauges
increase() is syntactic sugar for rate() * window — it extrapolates and can return non-integer values for counters
Counter resets: rate() handles counter resets automatically — don't worry about them in queries
histogram_quantile requires a le label (from histogram buckets) — summary quantiles are pre-calculated and cannot be aggregated
Histogram vs Summary: histograms are aggregatable (recommended), summaries are not
Recording rules use record: not alert: — and they produce new time series, not alerts
Alertmanager group_by determines which alerts are batched together — not which alerts fire

PromQL Mental Model¶

When facing a PromQL question, think in this order: 1. What metric am I starting with? (counter, gauge, histogram?) 2. Do I need a rate? (yes for counters, no for gauges) 3. What time window? (typically [5m] for rate, but exam may specify) 4. Do I need to aggregate? (sum, avg, max — and by which labels?) 5. Do I need a quantile? (histogram_quantile for latency) 6. Does the answer need labels? (check if by or without is needed)

If You're Stuck¶

Counter metrics always end in _total — if you see _total, think rate()
Duration metrics end in _seconds — if you see bucket, think histogram_quantile
up metric: 1 = target is healthy, 0 = target is down
Eliminate answers that use incorrect function/metric combinations
Flag and return — fresh eyes often see the answer immediately

Cross-References¶

Learning Paths: Observability Path
Skill Checks: skillchecks/
Deep Dives: prometheus-deep-dive, observability-deep-dive
Alerting: alerting-rules
Grafana: grafana
SLO Tooling: slo-tooling
Related Logs Stack: loki — Grafana Loki for log correlation
Production Readiness: production-readiness/