Skip to content

Portal | Level: L2: Operations | Topics: Multi-Tenancy Patterns, RBAC, Kubernetes Networking, Policy Engines | Domain: Kubernetes

Multi-Tenancy Patterns - Primer

Why This Matters

You have one Kubernetes cluster. Multiple teams, products, or customers need to share it. Without proper multi-tenancy controls, one tenant's runaway deployment consumes all the cluster's memory, their misconfigured network policy exposes another tenant's database, or their RBAC lets them list secrets in every namespace.

Multi-tenancy is not a feature you toggle on. It is a set of isolation boundaries you construct from Kubernetes primitives: namespaces, resource quotas, network policies, RBAC, and priority classes. Get it wrong and you have a shared cluster that is worse than separate clusters — all the complexity of sharing with none of the isolation guarantees.


Tenancy Models

There are three common models. Your choice depends on how much isolation you need and how much operational overhead you can absorb.

┌──────────────────────────────────────────────────────────────┐
│                     TENANCY MODELS                           │
├───────────────┬──────────────────┬───────────────────────────┤
│  Namespace    │  Virtual Cluster │    Dedicated Cluster      │
│  per Tenant   │  per Tenant      │    per Tenant             │
│               │                  │                           │
│  ┌─────────┐  │  ┌─────────────┐ │  ┌───────────────────┐   │
│  │ ns: acme│  │  │ vcluster:   │ │  │ cluster: acme-prod│   │
│  │ ns: beta│  │  │   acme      │ │  │ cluster: beta-prod│   │
│  │ ns: gamma│ │  │ vcluster:   │ │  │ cluster: gamma    │   │
│  │         │  │  │   beta      │ │  │                   │   │
│  └─────────┘  │  └─────────────┘ │  └───────────────────┘   │
│               │                  │                           │
│ Isolation: ★★ │ Isolation: ★★★★ │ Isolation: ★★★★★          │
│ Cost: $       │ Cost: $$         │ Cost: $$$$$               │
│ Complexity: ★ │ Complexity: ★★★ │ Complexity: ★★            │
└───────────────┴──────────────────┴───────────────────────────┘

Namespace per Tenant

The most common pattern. Each tenant gets one or more namespaces. Isolation is enforced through RBAC, network policies, and resource quotas at the namespace level.

  • Pros: Simple, built-in, low overhead
  • Cons: Namespace is a soft boundary — cluster-scoped resources are shared, network policies have gaps, and a compromised API server affects everyone

Virtual Cluster per Tenant

Tools like vCluster create lightweight, isolated Kubernetes API servers inside the host cluster. Each tenant sees their own cluster with their own control plane, but workloads run on shared nodes.

  • Pros: Stronger isolation, tenants can install CRDs, independent cluster-admin
  • Cons: More moving parts, higher resource overhead, requires operator maturity

Dedicated Cluster per Tenant

Full physical isolation. Each tenant gets their own cluster.

  • Pros: Maximum isolation, no noisy-neighbor risk, independent upgrades
  • Cons: Expensive, hard to manage at scale, fleet management tooling required

Analogy: Namespace isolation is like apartments in a building — separate living spaces but shared plumbing, electrical, and foundation. Virtual clusters are like townhouses — separate front doors and utilities but still share the land. Dedicated clusters are like separate houses on separate lots.

For most teams, namespace per tenant is the right starting point. The rest of this primer focuses on making that model robust.


Resource Quotas

Without quotas, one namespace can consume every CPU core and byte of memory in the cluster. Resource quotas set hard limits per namespace.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: tenant-quota
  namespace: acme
spec:
  hard:
    # Compute
    requests.cpu: "20"
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi

    # Object counts
    pods: "100"
    services: "20"
    persistentvolumeclaims: "30"
    configmaps: "50"
    secrets: "50"

    # Storage
    requests.storage: 500Gi

    # GPU (if applicable)
    requests.nvidia.com/gpu: "4"

Default trap: Once a ResourceQuota exists in a namespace, every pod in that namespace must declare resource requests and limits. Pods without them are rejected. This forces tenants to think about their resource needs.

Quota scopes narrow the target:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: burst-quota
  namespace: acme
spec:
  hard:
    pods: "10"
  scopeSelector:
    matchExpressions:
      - scopeName: PriorityClass
        operator: In
        values: ["burst"]

Limit Ranges

While quotas cap the total for a namespace, LimitRanges constrain individual pods and containers:

apiVersion: v1
kind: LimitRange
metadata:
  name: tenant-limits
  namespace: acme
spec:
  limits:
    - type: Container
      default:
        cpu: "500m"
        memory: 512Mi
      defaultRequest:
        cpu: "100m"
        memory: 128Mi
      max:
        cpu: "4"
        memory: 8Gi
      min:
        cpu: "50m"
        memory: 64Mi
    - type: Pod
      max:
        cpu: "8"
        memory: 16Gi
    - type: PersistentVolumeClaim
      max:
        storage: 50Gi
      min:
        storage: 1Gi

The default and defaultRequest fields inject resource specs into pods that omit them. This is your safety net — even if a tenant forgets to set requests, they get sensible defaults instead of unbounded allocation.


Network Policies

By default, all pods can talk to all pods. In a multi-tenant cluster, this means tenant A's web server can reach tenant B's database. Network policies fix this.

Default Deny Everything

Start with deny-all, then open specific paths:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: acme
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress

Allow Intra-Namespace Traffic

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-same-namespace
  namespace: acme
spec:
  podSelector: {}
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector: {}

Allow DNS

Without DNS egress, pods cannot resolve service names:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns
  namespace: acme
spec:
  podSelector: {}
  policyTypes:
    - Egress
  egress:
    - to: []
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53

Allow Ingress from Specific Namespace

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-ingress
  namespace: acme
spec:
  podSelector:
    matchLabels:
      role: api
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: ingress-nginx
      ports:
        - protocol: TCP
          port: 8080

Gotcha: Network policies require a CNI that supports them (Calico, Cilium, Weave). If you are on a CNI that does not enforce network policies (e.g., default kubenet), the policies exist as objects but have zero effect.


RBAC per Tenant

Each tenant needs: - A Role (or ClusterRole) defining what they can do - A RoleBinding (namespace-scoped) granting that role to their identity

# Tenant-scoped role
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: tenant-developer
  namespace: acme
rules:
  - apiGroups: ["", "apps", "batch"]
    resources: ["pods", "deployments", "services", "configmaps", "jobs", "cronjobs"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["get", "list"]       # Read secrets, but not create/delete
  - apiGroups: [""]
    resources: ["pods/log", "pods/exec"]
    verbs: ["get", "create"]     # Can exec into pods and read logs
  - apiGroups: [""]
    resources: ["resourcequotas", "limitranges"]
    verbs: ["get", "list"]       # Can view their own quotas

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: acme-developers
  namespace: acme
subjects:
  - kind: Group
    name: acme-team
    apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: tenant-developer
  apiGroup: rbac.authorization.k8s.io

What Tenants Should NOT Have

Interview tip: A common interview question is "how do you isolate tenants in a shared Kubernetes cluster?" The strong answer covers all four layers: RBAC (who can do what), ResourceQuotas (how much they can use), NetworkPolicies (who can talk to whom), and PriorityClasses (who survives under pressure). Missing any one of these is an incomplete answer.

  • ClusterRole bindings (access to all namespaces)
  • Access to nodes, namespaces, persistentvolumes (cluster-scoped)
  • escalate, bind verbs on roles (privilege escalation)
  • Access to secrets in kube-system

Priority Classes

When the cluster is under pressure, the scheduler must decide which pods survive. Without priority classes, eviction is arbitrary.

# System-critical (cluster infrastructure)
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: system-critical
value: 1000000
globalDefault: false
description: "Cluster infrastructure  monitoring, ingress, DNS"

---
# Tenant production workloads
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: tenant-production
value: 100000
globalDefault: false
description: "Production tenant workloads"

---
# Tenant development/staging
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: tenant-development
value: 10000
globalDefault: true
description: "Development and staging workloads"

---
# Burst/preemptible workloads
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: tenant-burst
value: 1000
globalDefault: false
preemptionPolicy: Never
description: "Best-effort burst workloads  evicted first"

Assign in pod specs:

spec:
  priorityClassName: tenant-production

When a high-priority pod needs scheduling and no resources are available, the scheduler evicts lower-priority pods. Without priority classes, your monitoring stack gets evicted alongside batch jobs.


Tenant Provisioning — Putting It All Together

When you onboard a new tenant, create these resources as a unit:

Tenant: "newcorp"
├── Namespace: newcorp
├── ResourceQuota: compute and object limits
├── LimitRange: per-container defaults and caps
├── NetworkPolicy: default-deny + explicit allows
├── Role: tenant-developer
├── RoleBinding: newcorp-team -> tenant-developer
├── ServiceAccount: newcorp-deployer (for CI/CD)
└── PriorityClass reference (use existing classes)

Automate this with a controller, Helm chart, or Crossplane composition. Manual namespace provisioning does not scale past five tenants.


Blast Radius Assessment

┌──────────────────────────────────────────────────────────┐
│                   BLAST RADIUS MAP                        │
│                                                           │
│  Tenant A runaway:                                        │
│  ├── Without quotas: consumes all cluster compute ████████│
│  ├── With quotas: constrained to their allocation ██      │
│  │                                                        │
│  Tenant A misconfigured netpol:                           │
│  ├── Without default-deny: can reach all pods ████████████│
│  ├── With default-deny: isolated to own namespace ██      │
│  │                                                        │
│  Tenant A RBAC escalation:                                │
│  ├── With ClusterRoleBinding: reads all secrets ██████████│
│  ├── With namespace Role: reads only their secrets ██     │
└──────────────────────────────────────────────────────────┘

Every isolation mechanism you skip increases the blast radius of a single tenant's mistake or compromise. Multi-tenancy is not one mechanism — it is the intersection of quotas, network policies, RBAC, and priority.


Key Takeaways

  1. Namespace per tenant is the right default — upgrade to virtual clusters or dedicated clusters when the isolation model breaks
  2. Resource quotas are not optional — one unquoted namespace can starve the entire cluster
  3. LimitRanges provide defaults so pods without resource specs do not bypass quotas
  4. Network policies must start with default-deny — additive allow rules on top
  5. RBAC must be namespace-scoped (Role + RoleBinding), never ClusterRoleBinding for tenants
  6. Priority classes decide who survives under resource pressure — do not leave it to chance
  7. Automate tenant provisioning — manual setup drifts and breaks at scale

Wiki Navigation

Prerequisites