Anti-Primer: Kubernetes RBAC¶

Everything that can go wrong, will — and in this story, it does.

The Setup¶

A platform team is implementing RBAC for a multi-tenant Kubernetes cluster. Twelve development teams need access, each with different permissions. Under deadline pressure, the admin starts with cluster-wide roles and plans to 'tighten later.'

The Timeline¶

Hour 0: ClusterRole When Role Suffices¶

Creates a ClusterRole with pod exec permissions for a single team's namespace. The deadline was looming, and this seemed like the fastest path forward. But the result is team can exec into pods in every namespace, including production and security-sensitive namespaces.

Footgun #1: ClusterRole When Role Suffices — creates a ClusterRole with pod exec permissions for a single team's namespace, leading to team can exec into pods in every namespace, including production and security-sensitive namespaces.

Nobody notices yet. The engineer moves on to the next task.

Hour 1: Wildcard Verb Permissions¶

Grants verbs: ['*'] on secrets to simplify the role definition. Under time pressure, the team chose speed over caution. But the result is developer reads production database credentials from secrets in another team's namespace.

Footgun #2: Wildcard Verb Permissions — grants verbs: ['*'] on secrets to simplify the role definition, leading to developer reads production database credentials from secrets in another team's namespace.

The first mistake is still invisible, making the next shortcut feel justified.

Hour 2: Service Account Token in CI¶

Creates a service account with cluster-admin, exports the token, and hardcodes it in CI. Nobody pushed back because the shortcut looked harmless in the moment. But the result is CI pipeline compromise gives attacker full cluster control; token never expires.

Footgun #3: Service Account Token in CI — creates a service account with cluster-admin, exports the token, and hardcodes it in CI, leading to CI pipeline compromise gives attacker full cluster control; token never expires.

Pressure is mounting. The team is behind schedule and cutting more corners.

Hour 3: No Audit Logging¶

Skips Kubernetes audit policy configuration because 'RBAC is enough'. The team had gotten away with similar shortcuts before, so nobody raised a flag. But the result is unauthorized access goes undetected for weeks; no evidence for the security investigation.

Footgun #4: No Audit Logging — skips Kubernetes audit policy configuration because 'RBAC is enough', leading to unauthorized access goes undetected for weeks; no evidence for the security investigation.

By hour 3, the compounding failures have reached critical mass. Pages fire. The war room fills up. The team scrambles to understand what went wrong while the system burns.

The Postmortem¶

Root Cause Chain¶

#	Mistake	Consequence	Could Have Been Prevented By
1	ClusterRole When Role Suffices	Team can exec into pods in every namespace, including production and security-sensitive namespaces	Primer: Use namespace-scoped Roles unless cluster-wide access is genuinely required
2	Wildcard Verb Permissions	Developer reads production database credentials from secrets in another team's namespace	Primer: Enumerate specific verbs (get, list, watch); never use wildcards on sensitive resources
3	Service Account Token in CI	CI pipeline compromise gives attacker full cluster control; token never expires	Primer: Use short-lived tokens, OIDC authentication, or bound service account tokens
4	No Audit Logging	Unauthorized access goes undetected for weeks; no evidence for the security investigation	Primer: Enable audit logging for authentication, authorization, and sensitive resource access

Damage Report¶

Downtime: 2-4 hours of pod-level or cluster-wide disruption
Data loss: Risk of volume data loss if StatefulSets were affected
Customer impact: Intermittent 5xx errors, dropped connections, or full service outage
Engineering time to remediate: 10-20 engineer-hours for incident response, rollback, and postmortem
Reputation cost: On-call fatigue; delayed feature work; possible SLA breach notification

What the Primer Teaches¶

Footgun #1: If the engineer had read the primer, section on clusterrole when role suffices, they would have learned: Use namespace-scoped Roles unless cluster-wide access is genuinely required.
Footgun #2: If the engineer had read the primer, section on wildcard verb permissions, they would have learned: Enumerate specific verbs (get, list, watch); never use wildcards on sensitive resources.
Footgun #3: If the engineer had read the primer, section on service account token in ci, they would have learned: Use short-lived tokens, OIDC authentication, or bound service account tokens.
Footgun #4: If the engineer had read the primer, section on no audit logging, they would have learned: Enable audit logging for authentication, authorization, and sensitive resource access.

Cross-References¶

Primer — The right way
Footguns — The mistakes catalogued
Street Ops — How to do it in practice