Portal | Level: L2 | Domain: Security

Open Policy Agent — Street-Level Ops¶

Core OPA Commands¶

# Evaluate a query against a policy file and inline input
opa eval \
  --data ./policies/ \
  --input input.json \
  'data.authz.allow'

# Evaluate with inline input (quick checks)
opa eval \
  --data ./policies/authz.rego \
  --input '{"user": {"role": "admin"}, "method": "GET"}' \
  'data.authz.allow'

# Run all tests in the policies directory
opa test ./policies/

# Verbose test output (shows individual test names)
opa test -v ./policies/

# Coverage report — which lines are exercised by tests?
opa test --coverage ./policies/ | jq '.files'

# Strict syntax and type check (catches unresolved references)
opa check --strict ./policies/

# Auto-format all Rego files (modifies in place)
opa fmt --write ./policies/

# Benchmark a query (useful before deploying a policy to production)
opa bench --data ./policies/ 'data.authz.allow'

# Start OPA as a REST server (for local dev/testing)
opa run --server --addr 0.0.0.0:8181 ./policies/

# Query the running server
curl -s -X POST http://localhost:8181/v1/data/authz/allow \
  -H 'Content-Type: application/json' \
  -d '{"input": {"user": {"role": "admin"}, "method": "GET"}}' | jq .

Gatekeeper Commands¶

# List all installed ConstraintTemplates
kubectl get constrainttemplates

# List all Constraint instances (across all types)
kubectl get constraints

# Describe a specific constraint (see parameters and match scope)
kubectl describe k8srequiredlabels require-team-label

# Check audit violations for a specific constraint
kubectl get k8srequiredlabels require-team-label -o json \
  | jq '.status.violations'

# Check audit violations across ALL constraints at once
kubectl get constraint -o json \
  | jq '.items[].status.violations'

# See all violations with the constraint name included
kubectl get constraint -o json \
  | jq '.items[] | {name: .metadata.name, violations: .status.violations}'

# Watch Gatekeeper controller logs (admission decisions, errors)
kubectl -n gatekeeper-system logs -l control-plane=controller-manager -f

# Watch audit controller logs
kubectl -n gatekeeper-system logs -l control-plane=audit-controller -f

# Check Gatekeeper webhook configuration
kubectl get validatingwebhookconfigurations gatekeeper-validating-webhook-configuration -o yaml

Bundle Operations¶

# Build a bundle from a policies directory
opa build ./policies/ -o bundle.tar.gz

# Build with a specific metadata revision tag
opa build ./policies/ -o bundle.tar.gz --revision "$(git rev-parse --short HEAD)"

# Inspect bundle contents
tar -tzf bundle.tar.gz

# Run OPA with bundle hot-reload from a local directory (dev)
opa run --server --bundle ./policies/

# Test that OPA loaded the bundle successfully
curl -s http://localhost:8181/v1/bundles/authz | jq '.result.revision'

Conftest¶

# Test a single Kubernetes manifest
conftest test deployment.yaml

# Test all manifests in a directory with a specific policy directory
conftest test --policy ./policy/ k8s/

# Test a Terraform plan output
terraform plan -out=tfplan && terraform show -json tfplan > plan.json
conftest test plan.json

# Verify a Dockerfile
conftest test Dockerfile

# Output results in JSON (useful for CI)
conftest test --output json deployment.yaml

Gotcha: Thinking Imperatively in Rego¶

Rego is declarative. You define what is true, not a sequence of steps. A common mistake is trying to write if/else chains:

# WRONG — this is not how Rego works, only one rule wins
allow {
    input.role == "admin"
} else {
    input.role == "editor"
    input.method == "GET"
}

# RIGHT — separate rules are OR'd automatically
allow { input.role == "admin" }
allow { input.role == "editor"; input.method == "GET" }

The else keyword exists in Rego but is used only in rare cases (functions returning priority-ordered defaults). Reach for multiple rule bodies first.

Remember: In Rego, semicolons within a rule body mean AND, separate rule bodies with the same name mean OR. Mnemonic: "semicolons are strict (AND), separate rules are soft (OR)."

Gotcha: Bundle Staleness Is Silent¶

If OPA cannot reach the bundle server, it continues serving the last successfully loaded bundle — indefinitely. There is no built-in alerting by default. A stale bundle means your policy could be weeks out of date.

Mitigate: Monitor the opa_bundle_last_success_time_seconds Prometheus metric and alert when it falls behind by more than your acceptable policy lag. Also expose /health?bundles=true as your liveness probe — this endpoint fails if bundles are not successfully loaded.

War story: A team's bundle S3 bucket got accidentally restricted by a new IAM policy. OPA kept serving the last good bundle for 3 weeks. During that time, a critical policy update (blocking a CVE-affected image) never reached production. The alert on opa_bundle_last_success_time_seconds would have caught it in minutes.

curl http://localhost:8181/health?bundles=true

Gotcha: Gatekeeper Webhook Timeout Blocks All Admission¶

If Gatekeeper's webhook becomes unreachable and failurePolicy: Fail is set (the default in many installs), all Kubernetes API requests that match the webhook scope will be rejected. This can make a cluster completely unmanageable.

Check your webhook config:

kubectl get validatingwebhookconfigurations gatekeeper-validating-webhook-configuration \
  -o jsonpath='{.webhooks[*].failurePolicy}'

Mitigate for production: Use failurePolicy: Ignore for non-security-critical clusters, set namespaceSelector to exclude kube-system, and ensure Gatekeeper runs with PodDisruptionBudget and multiple replicas.

Default trap: Gatekeeper's default failurePolicy: Fail means if the Gatekeeper pods crash or the webhook times out, you cannot deploy, scale, or even delete pods in matched namespaces. Always exclude kube-system and gatekeeper-system namespaces from the webhook scope.

Gotcha: Partial Rule vs Complete Rule Confusion¶

A complete rule assigns exactly one value. A partial rule builds a set or object incrementally. Mixing them up causes subtle bugs:

# Complete rule — can only have ONE body that evaluates to true
default allow = false
allow = true { input.role == "admin" }

# Partial set rule — accumulates elements from ALL matching bodies
violations[msg] { ... }
violations[msg] { ... }

If you accidentally write a complete rule with multiple bodies that can both be true, OPA raises a conflict error at evaluation time.

Pattern: Policy Testing Workflow¶

# 1. Write policy
vim policies/authz.rego

# 2. Write tests in the same directory
vim policies/authz_test.rego

# 3. Check syntax first
opa check --strict policies/

# 4. Run tests
opa test -v policies/

# 5. Check coverage — aim for >80% on decision rules
opa test --coverage policies/ | jq '.coverage'

# 6. Format before committing
opa fmt --write policies/

Pattern: Gatekeeper Dry-Run Migration¶

When introducing a new Gatekeeper constraint in an existing cluster, never go straight to enforcement. Use enforcementAction: dryrun first:

spec:
  enforcementAction: dryrun   # audit only, no admission blocks
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]

Monitor violations:

# Watch violation count over time
watch 'kubectl get constraint -o json | jq "[.items[].status.violations // [] | length] | add"'

Once violations are remediated, switch to enforcementAction: deny.

Pattern: Bundle Distribution via S3¶

# Build and push to S3
opa build ./policies/ -o bundle.tar.gz --revision "$(git rev-parse HEAD)"
aws s3 cp bundle.tar.gz s3://my-opa-bundles/authz/latest.tar.gz

# OPA config.yaml references it
# services:
#   bundle-server:
#     url: https://s3.amazonaws.com/my-opa-bundles
# bundles:
#   authz:
#     service: bundle-server
#     resource: /authz/latest.tar.gz

# Verify OPA picked up the new bundle
curl http://localhost:8181/v1/data/system/bundle | jq '.result.manifest.revision'