Portal | Level: L2 | Domain: Security
Open Policy Agent — Street-Level Ops¶
Core OPA Commands¶
# Evaluate a query against a policy file and inline input
opa eval \
--data ./policies/ \
--input input.json \
'data.authz.allow'
# Evaluate with inline input (quick checks)
opa eval \
--data ./policies/authz.rego \
--input '{"user": {"role": "admin"}, "method": "GET"}' \
'data.authz.allow'
# Run all tests in the policies directory
opa test ./policies/
# Verbose test output (shows individual test names)
opa test -v ./policies/
# Coverage report — which lines are exercised by tests?
opa test --coverage ./policies/ | jq '.files'
# Strict syntax and type check (catches unresolved references)
opa check --strict ./policies/
# Auto-format all Rego files (modifies in place)
opa fmt --write ./policies/
# Benchmark a query (useful before deploying a policy to production)
opa bench --data ./policies/ 'data.authz.allow'
# Start OPA as a REST server (for local dev/testing)
opa run --server --addr 0.0.0.0:8181 ./policies/
# Query the running server
curl -s -X POST http://localhost:8181/v1/data/authz/allow \
-H 'Content-Type: application/json' \
-d '{"input": {"user": {"role": "admin"}, "method": "GET"}}' | jq .
Gatekeeper Commands¶
# List all installed ConstraintTemplates
kubectl get constrainttemplates
# List all Constraint instances (across all types)
kubectl get constraints
# Describe a specific constraint (see parameters and match scope)
kubectl describe k8srequiredlabels require-team-label
# Check audit violations for a specific constraint
kubectl get k8srequiredlabels require-team-label -o json \
| jq '.status.violations'
# Check audit violations across ALL constraints at once
kubectl get constraint -o json \
| jq '.items[].status.violations'
# See all violations with the constraint name included
kubectl get constraint -o json \
| jq '.items[] | {name: .metadata.name, violations: .status.violations}'
# Watch Gatekeeper controller logs (admission decisions, errors)
kubectl -n gatekeeper-system logs -l control-plane=controller-manager -f
# Watch audit controller logs
kubectl -n gatekeeper-system logs -l control-plane=audit-controller -f
# Check Gatekeeper webhook configuration
kubectl get validatingwebhookconfigurations gatekeeper-validating-webhook-configuration -o yaml
Bundle Operations¶
# Build a bundle from a policies directory
opa build ./policies/ -o bundle.tar.gz
# Build with a specific metadata revision tag
opa build ./policies/ -o bundle.tar.gz --revision "$(git rev-parse --short HEAD)"
# Inspect bundle contents
tar -tzf bundle.tar.gz
# Run OPA with bundle hot-reload from a local directory (dev)
opa run --server --bundle ./policies/
# Test that OPA loaded the bundle successfully
curl -s http://localhost:8181/v1/bundles/authz | jq '.result.revision'
Conftest¶
# Test a single Kubernetes manifest
conftest test deployment.yaml
# Test all manifests in a directory with a specific policy directory
conftest test --policy ./policy/ k8s/
# Test a Terraform plan output
terraform plan -out=tfplan && terraform show -json tfplan > plan.json
conftest test plan.json
# Verify a Dockerfile
conftest test Dockerfile
# Output results in JSON (useful for CI)
conftest test --output json deployment.yaml
Gotcha: Thinking Imperatively in Rego¶
Rego is declarative. You define what is true, not a sequence of steps. A common mistake is trying to write if/else chains:
# WRONG — this is not how Rego works, only one rule wins
allow {
input.role == "admin"
} else {
input.role == "editor"
input.method == "GET"
}
# RIGHT — separate rules are OR'd automatically
allow { input.role == "admin" }
allow { input.role == "editor"; input.method == "GET" }
The else keyword exists in Rego but is used only in rare cases (functions returning priority-ordered defaults). Reach for multiple rule bodies first.
Remember: In Rego, semicolons within a rule body mean AND, separate rule bodies with the same name mean OR. Mnemonic: "semicolons are strict (AND), separate rules are soft (OR)."
Gotcha: Bundle Staleness Is Silent¶
If OPA cannot reach the bundle server, it continues serving the last successfully loaded bundle — indefinitely. There is no built-in alerting by default. A stale bundle means your policy could be weeks out of date.
Mitigate: Monitor the opa_bundle_last_success_time_seconds Prometheus metric and alert when it falls behind by more than your acceptable policy lag. Also expose /health?bundles=true as your liveness probe — this endpoint fails if bundles are not successfully loaded.
War story: A team's bundle S3 bucket got accidentally restricted by a new IAM policy. OPA kept serving the last good bundle for 3 weeks. During that time, a critical policy update (blocking a CVE-affected image) never reached production. The alert on
opa_bundle_last_success_time_secondswould have caught it in minutes.
Gotcha: Gatekeeper Webhook Timeout Blocks All Admission¶
If Gatekeeper's webhook becomes unreachable and failurePolicy: Fail is set (the default in many installs), all Kubernetes API requests that match the webhook scope will be rejected. This can make a cluster completely unmanageable.
Check your webhook config:
kubectl get validatingwebhookconfigurations gatekeeper-validating-webhook-configuration \
-o jsonpath='{.webhooks[*].failurePolicy}'
Mitigate for production: Use failurePolicy: Ignore for non-security-critical clusters, set namespaceSelector to exclude kube-system, and ensure Gatekeeper runs with PodDisruptionBudget and multiple replicas.
Default trap: Gatekeeper's default
failurePolicy: Failmeans if the Gatekeeper pods crash or the webhook times out, you cannot deploy, scale, or even delete pods in matched namespaces. Always excludekube-systemandgatekeeper-systemnamespaces from the webhook scope.
Gotcha: Partial Rule vs Complete Rule Confusion¶
A complete rule assigns exactly one value. A partial rule builds a set or object incrementally. Mixing them up causes subtle bugs:
# Complete rule — can only have ONE body that evaluates to true
default allow = false
allow = true { input.role == "admin" }
# Partial set rule — accumulates elements from ALL matching bodies
violations[msg] { ... }
violations[msg] { ... }
If you accidentally write a complete rule with multiple bodies that can both be true, OPA raises a conflict error at evaluation time.
Pattern: Policy Testing Workflow¶
# 1. Write policy
vim policies/authz.rego
# 2. Write tests in the same directory
vim policies/authz_test.rego
# 3. Check syntax first
opa check --strict policies/
# 4. Run tests
opa test -v policies/
# 5. Check coverage — aim for >80% on decision rules
opa test --coverage policies/ | jq '.coverage'
# 6. Format before committing
opa fmt --write policies/
Pattern: Gatekeeper Dry-Run Migration¶
When introducing a new Gatekeeper constraint in an existing cluster, never go straight to enforcement. Use enforcementAction: dryrun first:
spec:
enforcementAction: dryrun # audit only, no admission blocks
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
Monitor violations:
# Watch violation count over time
watch 'kubectl get constraint -o json | jq "[.items[].status.violations // [] | length] | add"'
Once violations are remediated, switch to enforcementAction: deny.
Pattern: Bundle Distribution via S3¶
# Build and push to S3
opa build ./policies/ -o bundle.tar.gz --revision "$(git rev-parse HEAD)"
aws s3 cp bundle.tar.gz s3://my-opa-bundles/authz/latest.tar.gz
# OPA config.yaml references it
# services:
# bundle-server:
# url: https://s3.amazonaws.com/my-opa-bundles
# bundles:
# authz:
# service: bundle-server
# resource: /authz/latest.tar.gz
# Verify OPA picked up the new bundle
curl http://localhost:8181/v1/data/system/bundle | jq '.result.manifest.revision'