Comparison: CNI Plugins¶
Category: Networking Last meaningful update consideration: 2026-03 Verdict (opinionated): Cilium for new clusters — eBPF-based networking is the future and it comes with observability (Hubble) and policy enforcement built in. Calico for proven stability and the broadest deployment base. Flannel only for simple, small clusters.
Quick Decision Matrix¶
| Factor | Calico | Cilium | Flannel | AWS VPC CNI |
|---|---|---|---|---|
| Learning curve | Medium | Medium-High | Low | Low (AWS-specific) |
| Operational overhead | Low-Medium | Medium | Low | Low (AWS-managed) |
| Cost at small scale | Free | Free | Free | Free (AWS node cost) |
| Cost at large scale | Free + ops | Free + ops | Free + ops | Free (but IP exhaustion risk) |
| Community/ecosystem | Large (Tigera) | Large (Isovalent/Cisco) | Moderate | AWS-only |
| Hiring | Easy | Growing rapidly | Easy | AWS engineers |
| Network policy | Full (Calico + K8s) | Full (Cilium + K8s) + L7 | None | Security groups for pods |
| Dataplane | iptables / eBPF / Bird | eBPF | VXLAN overlay | AWS ENI |
| Encryption | WireGuard | WireGuard / IPsec | None | VPC encryption |
| Observability | Basic (Flow logs via Enterprise) | Hubble (excellent) | None | VPC Flow Logs |
| Performance | Excellent (eBPF mode) | Excellent (eBPF native) | Good (VXLAN overhead) | Excellent (native VPC) |
| Service mesh | None (separate) | Built-in option | None | None |
| BGP support | Yes (native) | Yes | No | No |
| Multi-cluster | Calico Federation | ClusterMesh | No | VPC peering |
| IP management | IPAM (Calico IPAM or host-local) | IPAM (Cilium IPAM or host-local) | host-local | ENI-based (VPC IPs) |
When to Pick Each¶
Pick Calico when:¶
- You want the most battle-tested CNI with the longest production track record
- BGP peering with physical network infrastructure is required (data center, bare metal)
- You need network policy enforcement but do not need L7 (HTTP-aware) policies
- Your nodes run older kernels where eBPF is not fully supported
- You want multiple dataplane options: iptables (proven), eBPF (newer), or VPP
- Enterprise support from Tigera is available if needed
Pick Cilium when:¶
- You are building a new cluster and want the most modern networking stack
- eBPF-based networking and security are priorities — kernel-level processing without iptables overhead
- Hubble observability (network flow visualization, DNS monitoring, HTTP metrics) is valuable
- You want optional service mesh capabilities without adding another component
- L7-aware network policies (e.g., allow GET /api/v1/* but deny POST) are needed
- You plan to use ClusterMesh for multi-cluster networking
Pick Flannel when:¶
- You need the simplest possible CNI with minimal configuration
- Your cluster is small (< 50 nodes) and network policy is not needed
- You are running K3s, Kind, or a local development cluster
- You want a VXLAN overlay that "just works" without tuning
- You do NOT need network policy enforcement (Flannel does not support it)
Pick AWS VPC CNI when:¶
- You are on EKS and want pods with native VPC IP addresses
- Security groups for pods (per-pod network policy via AWS SGs) is your security model
- Integration with AWS networking (ALB target groups in IP mode, PrivateLink) is important
- You want the lowest latency — no overlay, no encapsulation, native VPC routing
- You accept the IP address management constraints (limited IPs per ENI per instance type)
Nobody Tells You¶
Calico¶
- Calico has three dataplanes: iptables (default, proven), eBPF (newer, requires kernel 5.3+), and VPP (niche, high performance). Most production deployments use iptables, not eBPF. If you want eBPF, Cilium is the more mature choice.
- At scale (1000+ nodes), iptables rules become a performance bottleneck. Each NetworkPolicy adds iptables rules, and rule evaluation is linear. This is why Calico added eBPF mode.
- Calico's BIRD BGP agent is powerful for peering with physical routers but is another component to debug. BGP sessions flapping cause route convergence delays that affect pod connectivity.
- Calico's default IPAM can lead to IP address fragmentation across nodes. Nodes get IP blocks allocated and do not return unused IPs efficiently.
- The Calico NetworkPolicy spec extends K8s NetworkPolicy with additional features (global policies, DNS-based rules, application-layer policies). This is convenient but creates Calico-specific manifests.
- Calico Enterprise (Tigera) adds compliance reporting, threat detection, and a management UI. The OSS version is solid but lacks visibility tooling.
Cilium¶
- Cilium requires Linux kernel 5.10+ for full eBPF functionality. On older kernels, features degrade or fail entirely. Verify your node kernel version before deployment.
- Cilium's eBPF programs are loaded into the kernel. A bug in a Cilium eBPF program can affect kernel-level networking for all pods on a node. This is rare but the blast radius is larger than userspace proxies.
- Hubble is Cilium's observability layer and it is genuinely excellent — real-time network flow visualization, DNS query monitoring, HTTP request/response metrics. But Hubble adds resource consumption and storage requirements.
- Cilium upgrades require restarting the Cilium agent on each node, which briefly disrupts eBPF programs. Pods stay running but new connections may be delayed. Use a rolling upgrade strategy.
- Cilium's identity-based security model (labels → identity → policy) is more scalable than IP-based policies but requires understanding a new abstraction. "Why was this connection denied?" requires understanding identity resolution.
- The Isovalent acquisition by Cisco introduced enterprise features behind a license. Core Cilium remains open source, but advanced features (like Tetragon security observability, enterprise Hubble) are commercial.
Flannel¶
- Flannel has no network policy support. Zero. If you need any pod-to-pod access control, you need a different CNI or a policy-only add-on like Calico (but that combination adds complexity).
- VXLAN overlay adds encapsulation overhead (~50 bytes per packet). For latency-sensitive workloads, this matters.
- Flannel is a mature, stable project that does one thing (overlay networking) well. It is not declining — it is just simple. For environments where that simplicity is valued, it is the right choice.
- Flannel's lack of features is actually a feature for development and CI clusters. There is nothing to misconfigure.
- Flannel does not provide any observability into network flows. You are blind to pod-to-pod traffic patterns.
AWS VPC CNI¶
- The VPC CNI allocates ENI (Elastic Network Interface) secondary IPs to pods. Each EC2 instance type has a maximum ENI count and IPs per ENI. On small instances (t3.small: 3 ENIs, 4 IPs each = 12 pod IPs), you run out fast.
- The
WARM_IP_TARGETandMINIMUM_IP_TARGETsettings control how many IPs are pre-allocated. Default behavior aggressively pre-allocates IPs, which can exhaust your subnet CIDR faster than expected. - VPC CNI's
prefix delegationmode (assigning /28 prefixes instead of individual IPs) dramatically increases pod density per node but requires careful subnet planning. - Pods with VPC IPs are directly addressable from outside the cluster (within the VPC). This is convenient for ALB IP-mode targeting but means your pods are on the VPC network — no overlay isolation.
- Custom networking (specifying different subnets for pods vs. nodes) works but the setup is complex and easy to misconfigure.
- If your EKS nodes span multiple availability zones and subnets, pods in different AZs can communicate but traffic crosses AZ boundaries with associated costs.
Migration Pain Assessment¶
| From → To | Effort | Risk | Timeline |
|---|---|---|---|
| Flannel → Calico | Medium | Medium | 1-2 days (cluster recreation) |
| Flannel → Cilium | Medium | Medium | 1-2 days (cluster recreation) |
| Calico → Cilium | High | High | 1-3 days (rolling or recreation) |
| Cilium → Calico | High | High | 1-3 days (rolling or recreation) |
| VPC CNI → Cilium/Calico | Very High | High | Full cluster rebuild |
| Any → VPC CNI | Very High | High | Full cluster rebuild |
CNI migration usually means cluster recreation. In-place CNI swap is technically possible but extremely risky — if networking breaks mid-migration, every pod loses connectivity. The safest path is standing up a new cluster with the target CNI and migrating workloads.
The Interview Answer¶
"For new clusters, I lean toward Cilium because eBPF-based networking is more efficient than iptables chains, and Hubble gives you network observability that no other CNI includes. Calico is the safe, proven choice with the longest track record and the broadest deployment base. The CNI decision is one of the most permanent choices you make — changing it later usually means rebuilding the cluster. So choose based on your long-term needs: if you want network policy, observability, and potentially service mesh from the CNI layer, Cilium. If you want proven stability and BGP integration, Calico. If you want simplicity for a dev cluster, Flannel."
Cross-References¶
- Topic Packs: Cilium, K8s Networking, eBPF Observability
- Related Comparisons: Service Meshes, Ingress Controllers