RBAC Footguns¶

Mistakes that create security holes, lock out legitimate users, or grant unintended cluster-wide access.

1. Granting cluster-admin to a service account¶

A Helm chart or quick-fix tutorial tells you to bind cluster-admin to a service account. A vulnerability in that workload gives the attacker full control over every resource in every namespace.

What happens: Complete cluster compromise from a single pod breach.

Why: cluster-admin grants all verbs on all resources. Any pod running as that service account owns the cluster.

How to avoid: Never bind cluster-admin to workload service accounts. Create a scoped Role with only the specific resources and verbs the workload needs.

War story: In 2018, Tesla's Kubernetes cluster was compromised through an unauthenticated Kubernetes dashboard that had cluster-admin privileges. Attackers deployed cryptominers across the cluster and accessed S3 buckets containing telemetry data. The root cause was a single overprivileged service account on a dashboard pod.

2. ClusterRoleBinding when RoleBinding would suffice¶

You need a CI service account to deploy to the staging namespace. You create a ClusterRoleBinding instead of a RoleBinding. The service account can now deploy to every namespace, including production.

What happens: Unintended cross-namespace access. A staging CI token can modify production workloads.

Why: ClusterRoleBinding grants access across all namespaces. RoleBinding restricts to one namespace.

How to avoid: Default to RoleBinding. Only use ClusterRoleBinding when the subject genuinely needs cluster-wide access. The binding type determines scope, not the role type.

3. Forgetting subresources¶

You grant get on pods and expect users to be able to read logs. They get 403 Forbidden on kubectl logs. Pods and pods/log are separate resources in RBAC.

What happens: Users cannot perform expected operations despite appearing to have pod access.

Why: Subresources (pods/log, pods/exec, pods/portforward, deployments/scale) require separate rules.

How to avoid: Explicitly list subresources in your Role rules. Test with kubectl auth can-i get pods/log -n <ns>.

4. Wildcard permissions in roles¶

You use apiGroups: ["*"], resources: ["*"], verbs: ["*"] because scoping is tedious. This is cluster-admin by another name.

What happens: Any subject bound to this role has unrestricted access. Security audits flag it immediately.

Why: Wildcards match everything, including future resources. A CRD added next week is automatically accessible.

How to avoid: Enumerate specific apiGroups, resources, and verbs. It takes more lines but is the only way to enforce least privilege.

5. Not revoking access when people leave¶

An engineer leaves the team. Their user bindings remain. Their OIDC token may still be valid (depending on token lifetime). They retain access to the cluster.

What happens: Stale access for former team members. Compliance violation.

Why: RBAC bindings are not automatically cleaned up when identity provider group memberships change.

How to avoid: Bind roles to groups, not individual users. When someone leaves the group, all bindings are implicitly revoked. Audit bindings quarterly.

6. Default service account accumulating permissions¶

Multiple Helm charts and operators bind roles to the default service account in a namespace. Every pod in that namespace (unless it specifies a different SA) inherits all those permissions.

What happens: Application pods get permissions they never needed. A breach in any pod grants access to all bound roles.

Why: The default SA is used by any pod that does not specify serviceAccountName.

How to avoid: Create dedicated service accounts for each workload. Set automountServiceAccountToken: false on the default SA. Audit bindings on default regularly.

7. Granting `escalate` or `bind` verbs¶

You grant the escalate verb on roles to a service account so it can "manage RBAC." That SA can now create roles with more permissions than it has, effectively escalating to cluster-admin.

What happens: Privilege escalation path. The subject can grant itself any permission.

Why: escalate allows modifying a role to include permissions beyond the modifier's own. bind allows creating bindings to roles the subject does not have.

How to avoid: Never grant escalate or bind to workload service accounts. Only cluster administrators should have these verbs.

Under the hood: Kubernetes added the escalate verb in v1.12 specifically to close a privilege escalation loophole. Before v1.12, anyone who could update a Role could silently add permissions beyond their own. The escalate verb gates this behavior explicitly. If you audit your RBAC and see escalate granted anywhere, treat it as equivalent to cluster-admin.

8. Token auto-mount on pods that never call the API¶

Your application pod has automountServiceAccountToken: true (the default). The pod never calls the Kubernetes API but the token is mounted at /var/run/secrets/kubernetes.io/serviceaccount/token. A file-read vulnerability leaks the token.

What happens: Leaked SA token can be used to call the API from outside the cluster.

Why: By default, every pod gets a mounted token whether it needs one or not.

How to avoid: Set automountServiceAccountToken: false on the ServiceAccount and only override it on pods that genuinely need API access.

9. Testing RBAC only for positive cases¶

You verify that your CI service account can create deployments. You do not verify that it cannot delete namespaces, read secrets, or exec into pods. The role has more permissions than intended.

What happens: Overly permissive role goes undetected until a security audit or incident.

Why: RBAC testing is typically only "does this work?" without checking "is this blocked?"

How to avoid: Test both positive and negative cases with kubectl auth can-i. Automate these checks in CI against your RBAC manifests.

10. Editing built-in ClusterRoles directly¶

You modify the built-in view or edit ClusterRole to add permissions for your CRD. A Kubernetes upgrade resets the built-in roles. Your custom permissions disappear.

What happens: Permission loss after cluster upgrade. Workloads that relied on the custom rules break.

Why: Built-in ClusterRoles are reconciled by the API server on startup. Manual changes are overwritten.

How to avoid: Use aggregated ClusterRoles. Label your custom ClusterRole with rbac.authorization.k8s.io/aggregate-to-view: "true" and it is automatically merged into the view role.