Multi-Tenancy Patterns Footguns¶

No resource quotas on tenant namespaces. A tenant deploys a workload with no resource limits. During a traffic spike, it consumes every available CPU core and gigabyte of memory on the node. Other tenants' pods get OOMKilled or cannot schedule. The cluster appears "full" but one tenant is using 80% of it.

Fix: Apply a ResourceQuota to every tenant namespace at provisioning time. Include both compute limits (requests.cpu, limits.memory) and object counts (pods, services, pvcs). Automate this so no namespace exists without a quota.

Everyone deploying to the default namespace. Teams skip namespace creation and deploy everything to default. There is no RBAC isolation, no quota enforcement, and no way to distinguish Tenant A's pods from Tenant B's. Cleanup requires manual identification of every resource by label, assuming labels exist.

Fix: Lock down the default namespace with a restrictive ResourceQuota (zero pods allowed) or a policy engine rule that denies deployments to default. Make namespace creation part of the onboarding process.

Debug clue: To find resources deployed in default that shouldn't be there: kubectl get all -n default --no-headers | grep -v kubernetes. The kubernetes service is the only resource that belongs in default. Everything else is likely an accident. Audit this periodically — CI pipelines that don't specify -n deploy to whatever namespace is in the kubeconfig context, which is usually default.

No network policies — flat network between all tenants. Every pod can reach every other pod on every port. Tenant A's compromised web server scans the internal network and finds Tenant B's unauthed Redis on port 6379. Data exfiltrated. No one knew because there were no network boundaries to alert on.

Fix: Apply a default-deny ingress AND egress NetworkPolicy to every tenant namespace. Then add explicit allow rules for required traffic: intra-namespace, DNS (port 53), and specific cross-namespace paths. Verify your CNI enforces policies (Calico, Cilium).

Default trap: The default CNI in many managed Kubernetes services (including older EKS with amazon-vpc-cni) does NOT enforce NetworkPolicy. You can create and apply NetworkPolicy objects — they're accepted by the API server with no errors — but they have zero effect on traffic. Verify enforcement by creating a deny-all policy and testing connectivity. If traffic still flows, your CNI doesn't enforce policies.

Using ClusterRoleBindings for tenant access. You grant a tenant the edit ClusterRole via a ClusterRoleBinding because it was faster than creating namespace-scoped roles. That tenant now has edit access to every namespace in the cluster, including kube-system, monitoring, and other tenants' namespaces.

Fix: Never use ClusterRoleBindings for tenant access. Always use namespace-scoped RoleBindings. Create a Role in the tenant's namespace with exactly the permissions they need. Use ClusterRoles as templates but bind them with RoleBindings.

No priority classes — eviction is a lottery. The cluster runs out of resources during a spike. The scheduler evicts pods to make room. Without priority classes, eviction order is essentially random. Your monitoring stack, ingress controllers, and critical tenant workloads get killed alongside batch jobs and dev experiments.

Fix: Define at least three priority classes: system-critical (monitoring, ingress, DNS), tenant-production (production workloads), and tenant-burst (dev, batch, preemptible). Assign them in pod specs. Set preemptionPolicy: Never on burst workloads so they cannot evict others.

LimitRange defaults that are too low for production workloads. You set a LimitRange default memory limit of 256Mi because it seemed reasonable. A tenant deploys a Java application that needs 2Gi. The container gets OOMKilled repeatedly. The tenant does not realize the limit was injected by the LimitRange because they never set one themselves.

Fix: Set LimitRange defaults that are generous enough for common workloads (e.g., 512Mi-1Gi memory). Document the defaults prominently. Use max to cap the upper bound rather than relying on tight defaults. Train tenants to always specify explicit resource requests and limits.

Network policy namespaceSelector matching on names instead of labels. You write a NetworkPolicy that allows traffic from namespace "ingress-nginx" using a namespaceSelector. But namespaceSelector matches on labels, not names. If the ingress namespace does not have the expected label, the policy matches nothing and traffic is denied.

Fix: Always verify namespace labels: kubectl get namespace --show-labels. Use the automatic label kubernetes.io/metadata.name (available since k8s 1.21) which matches the namespace name. Or apply explicit labels during namespace provisioning and reference those in policies.

Quotas set but never monitored. You provision quotas during onboarding and forget about them. Six months later, a tenant is at 95% of their CPU quota. They try to scale during an incident and cannot. The quota rejection error is buried in deployment events that nobody watches.

Fix: Export quota usage as Prometheus metrics (kube_resourcequota). Alert when any quota exceeds 80% usage. Include quota utilization in tenant dashboards. Review and adjust quotas quarterly based on actual usage patterns.

Allowing pods to mount hostPath volumes. A tenant creates a pod that mounts / as a hostPath volume. They now have read-write access to the host filesystem — every other tenant's data, kubelet credentials, container runtime socket. Full cluster compromise from a single pod spec.

Fix: Use a policy engine (OPA/Gatekeeper, Kyverno, or Pod Security Standards) to block hostPath volumes in tenant namespaces. Enforce the restricted Pod Security Standard. Only allow hostPath in system namespaces for specific workloads like monitoring agents.

CVE: CVE-2017-1002101 — Kubernetes allowed containers with subPath volume mounts to access files outside the intended path via symlink attacks. A tenant could create a symlink inside their volume pointing to the host root filesystem, then access arbitrary host files. Fixed in Kubernetes 1.9.4+, 1.10.0+. This CVE is a reminder that even without hostPath, volume mount features can be exploited. Keep Kubernetes patched and enforce Pod Security Standards.

Not isolating tenant service accounts. Every namespace gets a default service account automatically. If you do not disable auto-mounting of service account tokens, every pod gets a token that can query the Kubernetes API. Combined with overly permissive RBAC, this turns any compromised pod into an API client.

Fix: Disable auto-mounting on the default service account: automountServiceAccountToken: false. Create dedicated service accounts for workloads that actually need API access. Apply RBAC to those specific service accounts with least-privilege permissions.

Multi-Tenancy Patterns Footguns¶

Pages that link here¶