Portal | Level: L2: Operations | Topics: Multi-Tenancy Patterns, RBAC, Kubernetes Networking, Policy Engines | Domain: Kubernetes
Multi-Tenancy Patterns - Primer¶
Why This Matters¶
You have one Kubernetes cluster. Multiple teams, products, or customers need to share it. Without proper multi-tenancy controls, one tenant's runaway deployment consumes all the cluster's memory, their misconfigured network policy exposes another tenant's database, or their RBAC lets them list secrets in every namespace.
Multi-tenancy is not a feature you toggle on. It is a set of isolation boundaries you construct from Kubernetes primitives: namespaces, resource quotas, network policies, RBAC, and priority classes. Get it wrong and you have a shared cluster that is worse than separate clusters — all the complexity of sharing with none of the isolation guarantees.
Tenancy Models¶
There are three common models. Your choice depends on how much isolation you need and how much operational overhead you can absorb.
┌──────────────────────────────────────────────────────────────┐
│ TENANCY MODELS │
├───────────────┬──────────────────┬───────────────────────────┤
│ Namespace │ Virtual Cluster │ Dedicated Cluster │
│ per Tenant │ per Tenant │ per Tenant │
│ │ │ │
│ ┌─────────┐ │ ┌─────────────┐ │ ┌───────────────────┐ │
│ │ ns: acme│ │ │ vcluster: │ │ │ cluster: acme-prod│ │
│ │ ns: beta│ │ │ acme │ │ │ cluster: beta-prod│ │
│ │ ns: gamma│ │ │ vcluster: │ │ │ cluster: gamma │ │
│ │ │ │ │ beta │ │ │ │ │
│ └─────────┘ │ └─────────────┘ │ └───────────────────┘ │
│ │ │ │
│ Isolation: ★★ │ Isolation: ★★★★ │ Isolation: ★★★★★ │
│ Cost: $ │ Cost: $$ │ Cost: $$$$$ │
│ Complexity: ★ │ Complexity: ★★★ │ Complexity: ★★ │
└───────────────┴──────────────────┴───────────────────────────┘
Namespace per Tenant¶
The most common pattern. Each tenant gets one or more namespaces. Isolation is enforced through RBAC, network policies, and resource quotas at the namespace level.
- Pros: Simple, built-in, low overhead
- Cons: Namespace is a soft boundary — cluster-scoped resources are shared, network policies have gaps, and a compromised API server affects everyone
Virtual Cluster per Tenant¶
Tools like vCluster create lightweight, isolated Kubernetes API servers inside the host cluster. Each tenant sees their own cluster with their own control plane, but workloads run on shared nodes.
- Pros: Stronger isolation, tenants can install CRDs, independent cluster-admin
- Cons: More moving parts, higher resource overhead, requires operator maturity
Dedicated Cluster per Tenant¶
Full physical isolation. Each tenant gets their own cluster.
- Pros: Maximum isolation, no noisy-neighbor risk, independent upgrades
- Cons: Expensive, hard to manage at scale, fleet management tooling required
Analogy: Namespace isolation is like apartments in a building — separate living spaces but shared plumbing, electrical, and foundation. Virtual clusters are like townhouses — separate front doors and utilities but still share the land. Dedicated clusters are like separate houses on separate lots.
For most teams, namespace per tenant is the right starting point. The rest of this primer focuses on making that model robust.
Resource Quotas¶
Without quotas, one namespace can consume every CPU core and byte of memory in the cluster. Resource quotas set hard limits per namespace.
apiVersion: v1
kind: ResourceQuota
metadata:
name: tenant-quota
namespace: acme
spec:
hard:
# Compute
requests.cpu: "20"
requests.memory: 40Gi
limits.cpu: "40"
limits.memory: 80Gi
# Object counts
pods: "100"
services: "20"
persistentvolumeclaims: "30"
configmaps: "50"
secrets: "50"
# Storage
requests.storage: 500Gi
# GPU (if applicable)
requests.nvidia.com/gpu: "4"
Default trap: Once a ResourceQuota exists in a namespace, every pod in that namespace must declare resource requests and limits. Pods without them are rejected. This forces tenants to think about their resource needs.
Quota scopes narrow the target:¶
apiVersion: v1
kind: ResourceQuota
metadata:
name: burst-quota
namespace: acme
spec:
hard:
pods: "10"
scopeSelector:
matchExpressions:
- scopeName: PriorityClass
operator: In
values: ["burst"]
Limit Ranges¶
While quotas cap the total for a namespace, LimitRanges constrain individual pods and containers:
apiVersion: v1
kind: LimitRange
metadata:
name: tenant-limits
namespace: acme
spec:
limits:
- type: Container
default:
cpu: "500m"
memory: 512Mi
defaultRequest:
cpu: "100m"
memory: 128Mi
max:
cpu: "4"
memory: 8Gi
min:
cpu: "50m"
memory: 64Mi
- type: Pod
max:
cpu: "8"
memory: 16Gi
- type: PersistentVolumeClaim
max:
storage: 50Gi
min:
storage: 1Gi
The default and defaultRequest fields inject resource specs into pods that omit them. This is your safety net — even if a tenant forgets to set requests, they get sensible defaults instead of unbounded allocation.
Network Policies¶
By default, all pods can talk to all pods. In a multi-tenant cluster, this means tenant A's web server can reach tenant B's database. Network policies fix this.
Default Deny Everything¶
Start with deny-all, then open specific paths:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: acme
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
Allow Intra-Namespace Traffic¶
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-same-namespace
namespace: acme
spec:
podSelector: {}
policyTypes:
- Ingress
ingress:
- from:
- podSelector: {}
Allow DNS¶
Without DNS egress, pods cannot resolve service names:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns
namespace: acme
spec:
podSelector: {}
policyTypes:
- Egress
egress:
- to: []
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
Allow Ingress from Specific Namespace¶
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-from-ingress
namespace: acme
spec:
podSelector:
matchLabels:
role: api
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: ingress-nginx
ports:
- protocol: TCP
port: 8080
Gotcha: Network policies require a CNI that supports them (Calico, Cilium, Weave). If you are on a CNI that does not enforce network policies (e.g., default kubenet), the policies exist as objects but have zero effect.
RBAC per Tenant¶
Each tenant needs: - A Role (or ClusterRole) defining what they can do - A RoleBinding (namespace-scoped) granting that role to their identity
# Tenant-scoped role
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: tenant-developer
namespace: acme
rules:
- apiGroups: ["", "apps", "batch"]
resources: ["pods", "deployments", "services", "configmaps", "jobs", "cronjobs"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "list"] # Read secrets, but not create/delete
- apiGroups: [""]
resources: ["pods/log", "pods/exec"]
verbs: ["get", "create"] # Can exec into pods and read logs
- apiGroups: [""]
resources: ["resourcequotas", "limitranges"]
verbs: ["get", "list"] # Can view their own quotas
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: acme-developers
namespace: acme
subjects:
- kind: Group
name: acme-team
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: tenant-developer
apiGroup: rbac.authorization.k8s.io
What Tenants Should NOT Have¶
Interview tip: A common interview question is "how do you isolate tenants in a shared Kubernetes cluster?" The strong answer covers all four layers: RBAC (who can do what), ResourceQuotas (how much they can use), NetworkPolicies (who can talk to whom), and PriorityClasses (who survives under pressure). Missing any one of these is an incomplete answer.
ClusterRolebindings (access to all namespaces)- Access to
nodes,namespaces,persistentvolumes(cluster-scoped) escalate,bindverbs on roles (privilege escalation)- Access to
secretsinkube-system
Priority Classes¶
When the cluster is under pressure, the scheduler must decide which pods survive. Without priority classes, eviction is arbitrary.
# System-critical (cluster infrastructure)
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: system-critical
value: 1000000
globalDefault: false
description: "Cluster infrastructure — monitoring, ingress, DNS"
---
# Tenant production workloads
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: tenant-production
value: 100000
globalDefault: false
description: "Production tenant workloads"
---
# Tenant development/staging
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: tenant-development
value: 10000
globalDefault: true
description: "Development and staging workloads"
---
# Burst/preemptible workloads
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: tenant-burst
value: 1000
globalDefault: false
preemptionPolicy: Never
description: "Best-effort burst workloads — evicted first"
Assign in pod specs:
When a high-priority pod needs scheduling and no resources are available, the scheduler evicts lower-priority pods. Without priority classes, your monitoring stack gets evicted alongside batch jobs.
Tenant Provisioning — Putting It All Together¶
When you onboard a new tenant, create these resources as a unit:
Tenant: "newcorp"
├── Namespace: newcorp
├── ResourceQuota: compute and object limits
├── LimitRange: per-container defaults and caps
├── NetworkPolicy: default-deny + explicit allows
├── Role: tenant-developer
├── RoleBinding: newcorp-team -> tenant-developer
├── ServiceAccount: newcorp-deployer (for CI/CD)
└── PriorityClass reference (use existing classes)
Automate this with a controller, Helm chart, or Crossplane composition. Manual namespace provisioning does not scale past five tenants.
Blast Radius Assessment¶
┌──────────────────────────────────────────────────────────┐
│ BLAST RADIUS MAP │
│ │
│ Tenant A runaway: │
│ ├── Without quotas: consumes all cluster compute ████████│
│ ├── With quotas: constrained to their allocation ██ │
│ │ │
│ Tenant A misconfigured netpol: │
│ ├── Without default-deny: can reach all pods ████████████│
│ ├── With default-deny: isolated to own namespace ██ │
│ │ │
│ Tenant A RBAC escalation: │
│ ├── With ClusterRoleBinding: reads all secrets ██████████│
│ ├── With namespace Role: reads only their secrets ██ │
└──────────────────────────────────────────────────────────┘
Every isolation mechanism you skip increases the blast radius of a single tenant's mistake or compromise. Multi-tenancy is not one mechanism — it is the intersection of quotas, network policies, RBAC, and priority.
Key Takeaways¶
- Namespace per tenant is the right default — upgrade to virtual clusters or dedicated clusters when the isolation model breaks
- Resource quotas are not optional — one unquoted namespace can starve the entire cluster
- LimitRanges provide defaults so pods without resource specs do not bypass quotas
- Network policies must start with default-deny — additive allow rules on top
- RBAC must be namespace-scoped (Role + RoleBinding), never ClusterRoleBinding for tenants
- Priority classes decide who survives under resource pressure — do not leave it to chance
- Automate tenant provisioning — manual setup drifts and breaks at scale
Wiki Navigation¶
Prerequisites¶
- Kubernetes Ops (Production) (Topic Pack, L2)
- Platform Engineering Patterns (Topic Pack, L2)
Related Content¶
- Kubernetes Exercises (Quest Ladder) (CLI) (Exercise Set, L1) — Kubernetes Networking, RBAC
- Policy Engines (OPA / Kyverno) (Topic Pack, L2) — RBAC, Policy Engines
- Track: Kubernetes Core (Reference, L1) — Kubernetes Networking, RBAC
- API Gateways & Ingress (Topic Pack, L2) — Kubernetes Networking
- Case Study: CNI Broken After Restart (Case Study, L2) — Kubernetes Networking
- Case Study: Canary Deploy Routing to Wrong Backend — Ingress Misconfigured (Case Study, L2) — Kubernetes Networking
- Case Study: CoreDNS Timeout Pod DNS (Case Study, L2) — Kubernetes Networking
- Case Study: Grafana Dashboard Empty — Prometheus Blocked by NetworkPolicy (Case Study, L2) — Kubernetes Networking
- Case Study: Service Mesh 503s — Envoy Misconfigured, RBAC Policy (Case Study, L2) — Kubernetes Networking
- Case Study: Service No Endpoints (Case Study, L1) — Kubernetes Networking
Pages that link here¶
- Anti-Primer: Multi Tenancy
- Certification Prep: CKA — Certified Kubernetes Administrator
- Certification Prep: CKAD — Certified Kubernetes Application Developer
- Cilium
- Kubernetes Networking
- Kubernetes_Core
- Level 6: Advanced Platform Engineering
- Master Curriculum: 40 Weeks
- Multi-Tenancy Patterns
- Platform Engineering Patterns
- Policy Engines (OPA / Kyverno)
- Scenario: Ingress Returns 404 Intermittently
- Symptoms
- Symptoms: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Symptoms: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy