Linux Networking: Bridges, Bonds, and VLANs

lesson
network-namespaces
veth-pairs
linux-bridges
bonding/lacp
vlans
macvlan/ipvlan
tap/tun
docker-networking
ovs
tc
kubernetes-cni ---# Linux Networking — Bridges, Bonds, and VLANs

Topics: network namespaces, veth pairs, Linux bridges, bonding/LACP, VLANs, macvlan/ipvlan, tap/tun, Docker networking, OVS, tc, Kubernetes CNI Level: L1–L2 (Foundations → Operations) Time: 75–90 minutes Prerequisites: None (everything is explained from scratch)

The Mission¶

You just inherited a bare-metal server that needs to host four isolated tenant workloads. Each tenant gets its own network segment. Two tenants need VLAN access to the physical network. The server has two 10G NICs that should be bonded for redundancy. And the whole thing needs to resemble — at a conceptual level — what Docker and Kubernetes do under the hood.

By the end of this lesson, you'll have built the whole setup from scratch using nothing but ip commands. More importantly, you'll understand why container networking works the way it does, because you'll have built it yourself, piece by piece:

Network namespaces: the isolation primitive that makes containers possible
veth pairs: the virtual cables that connect isolated worlds
Linux bridges: the software switches that tie everything together
VLANs: Layer 2 segmentation on a single wire
Bonding: turning two NICs into one for redundancy and bandwidth
How Docker's bridge networking is just namespaces + veth + bridge + iptables
Where Kubernetes CNI picks up the story

We start with a single namespace. We end with a multi-tenant network. Let's go.

Part 1: Network Namespaces — Your Own Private Network Stack¶

Every process on Linux shares the same network stack by default — the same interfaces, the same routing table, the same iptables rules. Network namespaces change that. A namespace gets its own everything: interfaces, routes, ARP table, firewall rules, sockets. It's a complete network stack in a box.

Name Origin: The first Linux namespace (mount, 2002) used the flag CLONE_NEWNS — "new namespace" — because nobody expected there would be more than one type. Every subsequent namespace got a more specific name: CLONE_NEWPID, CLONE_NEWNET, etc. The mount namespace is still stuck with the generic flag as a historical accident.

Let's create one:

# Create a network namespace called "tenant1"
ip netns add tenant1

# List namespaces
ip netns list

Now look inside it:

# Run 'ip link' inside the namespace
ip netns exec tenant1 ip link

Output:

1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

That's it. One loopback interface, and it's DOWN. No eth0. No routes. No connectivity. This namespace is completely isolated from the host and from every other namespace.

# Bring up loopback (you'll need this for local communication)
ip netns exec tenant1 ip link set lo up

# Check the routing table — it's empty
ip netns exec tenant1 ip route

Nothing comes back. This namespace can't reach anything. That's the point.

Mental Model: Think of a network namespace as a brand new computer with no network cables plugged in. It has a network stack, but no connections. Everything you want it to reach, you have to wire up yourself.

What lives in a namespace¶

Each namespace has its own:

Resource	Isolated?	Example
Interfaces	Yes	`lo`, `eth0`, veth, bridges
Routing table	Yes	`ip route` shows different routes per namespace
ARP/neighbor table	Yes	`ip neigh` is per-namespace
iptables/nftables rules	Yes	Firewall rules are namespace-scoped
Sockets	Yes	A port 80 listener in ns1 doesn't conflict with ns2
`/proc/net/*`	Yes	Each namespace has its own proc network files

Interview Bridge: "How does a container get its own IP address and routing table?" The answer is network namespaces. Every container runtime (Docker, containerd, CRI-O) creates a network namespace per container (or per pod in Kubernetes). That's the entire isolation mechanism. There's no magic.

Part 2: veth Pairs — Virtual Ethernet Cables¶

A namespace with no connections is useless. You need a way to get packets in and out. Enter veth pairs.

Name Origin: veth = virtual ethernet. A veth pair is two virtual Ethernet interfaces connected back-to-back. Whatever goes in one end comes out the other. Think of it as a virtual crossover cable with an interface on each end.

# Create a veth pair: veth-host and veth-tenant1
ip link add veth-host type veth peer name veth-tenant1

You now have two interfaces on the host. Let's move one end into the namespace:

# Move veth-tenant1 into the tenant1 namespace
ip link set veth-tenant1 netns tenant1

Now veth-tenant1 has vanished from the host — it only exists inside tenant1. But the two ends are still connected. Assign IPs and bring them up:

# Host side
ip addr add 10.0.1.1/24 dev veth-host
ip link set veth-host up

# Tenant side (run inside the namespace)
ip netns exec tenant1 ip addr add 10.0.1.2/24 dev veth-tenant1
ip netns exec tenant1 ip link set veth-tenant1 up

Test it:

# From the host, ping the tenant
ping -c 2 10.0.1.2

# From the tenant, ping the host
ip netns exec tenant1 ping -c 2 10.0.1.1

Both should work. You just connected an isolated namespace to the host using a virtual cable.

Under the Hood: When you write to one end of a veth pair, the kernel's veth_xmit() function takes the packet, flips the source/destination device pointers, and delivers it to the peer's receive path — as if it arrived from a physical wire. There's no copy; the same sk_buff (socket buffer) is passed to the other end. This is why veth pairs have near-zero overhead.

The problem with point-to-point¶

What we built works for one namespace. But what if you have four tenants that all need to talk to each other and to the host? You'd need a veth pair between every pair of namespaces — that's 6 pairs for 4 namespaces, 10 pairs for 5, and it scales as n(n-1)/2.

This is exactly the problem that switches solve in the physical world. In the virtual world, we use a Linux bridge.

Flashcard Check #1¶

Cover the answers. Test yourself.

Question	Answer
What kernel feature gives a container its own network stack?	Network namespace (`CLONE_NEWNET`)
What does `ip netns exec tenant1 bash` do?	Opens a shell inside the tenant1 network namespace
What is a veth pair?	Two virtual Ethernet interfaces connected back-to-back — a virtual cable
Why can't you see `veth-tenant1` on the host after moving it?	It was moved into the tenant1 namespace; interfaces belong to exactly one namespace
What's the scaling problem with veth-only connectivity?	Point-to-point pairs scale as n(n-1)/2 — you need a bridge

Part 3: Linux Bridges — Software Switches¶

A Linux bridge is a Layer 2 switch implemented in the kernel. It learns MAC addresses, forwards frames between ports, and acts as the central meeting point for veth pairs, physical NICs, VLAN interfaces, and tap devices.

Name Origin: The term "bridge" comes from the original networking device that "bridged" two separate network segments, allowing them to act as one. The Linux bridge implementation dates back to the 2.2 kernel era (late 1990s). The old tool was brctl (bridge control); the modern equivalent is ip link add type bridge.

# Create a bridge
ip link add br-tenant type bridge
ip link set br-tenant up

# Give the bridge an IP (this becomes the gateway for tenants)
ip addr add 10.0.1.1/24 dev br-tenant

Now connect namespaces to it. Let's set up two tenants this time:

# Clean up the earlier point-to-point setup
ip link del veth-host 2>/dev/null

# Create namespace and veth pairs for tenant1 and tenant2
for i in 1 2; do
    ip netns add tenant${i} 2>/dev/null
    ip link add veth-br-t${i} type veth peer name veth-t${i}
    ip link set veth-t${i} netns tenant${i}

    # Attach host end to the bridge
    ip link set veth-br-t${i} master br-tenant
    ip link set veth-br-t${i} up

    # Configure inside the namespace
    ip netns exec tenant${i} ip addr add 10.0.1.$((i+1))/24 dev veth-t${i}
    ip netns exec tenant${i} ip link set veth-t${i} up
    ip netns exec tenant${i} ip link set lo up

    # Set the bridge as the default gateway
    ip netns exec tenant${i} ip route add default via 10.0.1.1
done

Let's break down what just happened:

Command	What it does
`ip link add veth-br-t1 type veth peer name veth-t1`	Create a veth pair
`ip link set veth-t1 netns tenant1`	Move one end into the namespace
`ip link set veth-br-t1 master br-tenant`	Attach the other end to the bridge
`ip netns exec tenant1 ip route add default via 10.0.1.1`	Route traffic through the bridge

Test connectivity:

# Tenant1 → Tenant2 (through the bridge)
ip netns exec tenant1 ping -c 2 10.0.1.3

# Tenant2 → Host (through the bridge)
ip netns exec tenant2 ping -c 2 10.0.1.1

Both tenants can reach each other and the host through the bridge. The bridge does MAC learning — it knows which MAC is behind which port, just like a physical switch.

# See the bridge's MAC address table
bridge fdb show br br-tenant

Trivia: The docker0 bridge that Docker creates automatically is exactly this — a Linux bridge. When you run docker run, Docker creates a veth pair, moves one end into the container's network namespace, and attaches the other to docker0. Every default Docker container network is built on the same primitives you just used.

Giving tenants internet access¶

Right now, tenants can reach the host and each other, but not the outside world. For that, you need IP forwarding and NAT — the same thing your home router does:

# Enable IP forwarding
sysctl -w net.ipv4.ip_forward=1

# NAT outbound traffic from the bridge network
iptables -t nat -A POSTROUTING -s 10.0.1.0/24 ! -o br-tenant -j MASQUERADE

# Allow forwarding for established connections
iptables -A FORWARD -i br-tenant -j ACCEPT
iptables -A FORWARD -o br-tenant -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

Now tenants can reach the internet:

ip netns exec tenant1 ping -c 2 8.8.8.8

Under the Hood: That MASQUERADE rule is doing SNAT — rewriting the source IP of outbound packets from 10.0.1.x to whatever IP is on the outgoing interface. The kernel's conntrack module remembers the mapping so return packets are translated back. This is exactly what Docker does when you docker run without -p. With -p 8080:80, Docker adds a DNAT rule in the PREROUTING chain to forward incoming traffic on host port 8080 to the container's port 80.

Part 4: How Docker Networking Actually Works¶

Now that you've built a bridge network from scratch, here's the punchline: Docker's default bridge network is this exact setup, automated.

When you run docker run -d --name web -p 8080:80 nginx, Docker:

Creates a network namespace for the container
Creates a veth pair
Moves one end (eth0 inside the container) into the namespace
Attaches the other end to the docker0 bridge
Assigns an IP from the bridge's subnet (e.g., 172.17.0.2/16)
Adds an iptables MASQUERADE rule for outbound NAT
Adds an iptables DNAT rule to forward host:8080 → container:80
Adds DNS configuration pointing to Docker's embedded DNS server (127.0.0.11)

You can see all of this:

# See the docker0 bridge
ip link show docker0
bridge link show

# See the veth pairs
ip link show type veth

# See Docker's iptables rules
iptables -t nat -L -n -v | grep -A5 DOCKER

# See the container's namespace
pid=$(docker inspect --format '{{.State.Pid}}' web)
nsenter -t $pid -n ip addr
nsenter -t $pid -n ip route

Gotcha: Docker's default bridge does not provide DNS resolution between containers by name. Only user-defined bridge networks (docker network create) get Docker's built-in DNS. This is why docker-compose always creates a custom network — so services can reach each other by name.

The macvlan alternative¶

Sometimes you don't want NAT. You want the container to appear as a real host on the physical network. Docker's macvlan driver does this:

docker network create -d macvlan \
    --subnet=10.100.0.0/24 \
    --gateway=10.100.0.1 \
    -o parent=eth0 \
    direct_net

docker run --network direct_net --ip 10.100.0.50 -d nginx

The container gets 10.100.0.50 directly on the physical network. No NAT, no bridge. The switch sees the container's MAC address as a separate host.

Gotcha: With macvlan, the container can reach everything on the network except the host itself. This is a known kernel limitation — the host's interface and its macvlan children can't communicate at Layer 2. You need a separate macvlan interface on the host or a different physical NIC for host-to-container traffic.

Flashcard Check #2¶

Question	Answer
What does `ip link set veth-br master br0` do?	Attaches the veth interface to bridge br0 (like plugging a cable into a switch port)
What iptables chain does Docker use for port forwarding (-p)?	PREROUTING with a DNAT rule
Why doesn't Docker's default bridge support container name DNS?	Only user-defined networks get Docker's embedded DNS server
What does MASQUERADE do in the POSTROUTING chain?	Rewrites the source IP to the outgoing interface's IP (dynamic SNAT)
Why can't a macvlan container reach its host?	Kernel limitation: a physical interface and its macvlan children can't communicate at L2

Part 5: VLANs — Segmenting the Wire¶

So far, all our tenants share the same Layer 2 domain. Tenant1 can see Tenant2's broadcast traffic. For real isolation, you need VLANs — separate broadcast domains on the same physical wire.

Name Origin: VLAN = Virtual Local Area Network. Standardized as IEEE 802.1Q in 1998, VLANs were invented because moving a user between departments used to require physically re-cabling their switch port. The "virtual" means the segmentation is logical, not physical.

Trivia: The 802.1Q tag is only 4 bytes — inserted between the source MAC and the EtherType field. Those 4 bytes contain a 12-bit VLAN ID, giving you 4,094 usable VLANs (0 and 4095 are reserved). That seemed enormous in 1998. It became a hard constraint that drove the invention of VXLAN (24-bit ID, 16 million segments) for cloud-scale multi-tenancy.

Creating VLAN interfaces on Linux¶

First, load the kernel module:

# Load 802.1Q support
modprobe 8021q
lsmod | grep 8021q

Gotcha: If the 8021q module isn't loaded, Linux will happily create the VLAN interface and show it as UP, but no tagged frames will be sent or received. Everything looks fine, nothing works. Always verify the module is loaded.

Now create VLAN interfaces on a physical NIC:

# Create VLAN 100 on eth0
ip link add link eth0 name eth0.100 type vlan id 100
ip addr add 10.100.0.5/24 dev eth0.100
ip link set eth0.100 up

# Create VLAN 200 on eth0
ip link add link eth0 name eth0.200 type vlan id 200
ip addr add 10.200.0.5/24 dev eth0.200
ip link set eth0.200 up

# Verify — look for "vlan protocol 802.1Q id 100"
ip -d link show eth0.100

The switch port connected to eth0 must be a trunk carrying VLANs 100 and 200. If it's an access port, tagged frames are silently dropped.

VLAN-aware bridges for tenant isolation¶

Here's where it gets powerful. You can create a separate bridge per VLAN, giving each tenant true Layer 2 isolation:

# Bridge for VLAN 100 tenants
ip link add br-vlan100 type bridge
ip link set br-vlan100 up
ip link set eth0.100 master br-vlan100

# Bridge for VLAN 200 tenants
ip link add br-vlan200 type bridge
ip link set br-vlan200 up
ip link set eth0.200 master br-vlan200

# Connect tenant3 to VLAN 100
ip netns add tenant3
ip link add veth-br-t3 type veth peer name veth-t3
ip link set veth-t3 netns tenant3
ip link set veth-br-t3 master br-vlan100
ip link set veth-br-t3 up
ip netns exec tenant3 ip addr add 10.100.0.10/24 dev veth-t3
ip netns exec tenant3 ip link set veth-t3 up

# Connect tenant4 to VLAN 200
ip netns add tenant4
ip link add veth-br-t4 type veth peer name veth-t4
ip link set veth-t4 netns tenant4
ip link set veth-br-t4 master br-vlan200
ip link set veth-br-t4 up
ip netns exec tenant4 ip addr add 10.200.0.10/24 dev veth-t4
ip netns exec tenant4 ip link set veth-t4 up

Now tenant3 is on VLAN 100 and tenant4 is on VLAN 200. They're completely isolated at Layer 2 — tenant3's broadcasts never reach tenant4, and vice versa. Exactly like being on different physical switches.

# tenant3 can reach other VLAN 100 hosts
ip netns exec tenant3 ping -c 2 10.100.0.5

# tenant4 can reach other VLAN 200 hosts
ip netns exec tenant4 ping -c 2 10.200.0.5

# tenant3 CANNOT reach tenant4 (different L2 domain)
ip netns exec tenant3 ping -c 2 10.200.0.10  # fails — no route, different VLAN

Mental Model: A bridge-per-VLAN is like having multiple physical switches inside your server. Each bridge is a switch, each VLAN interface is an uplink to the physical network, and each veth pair is a patch cable to a namespace. The namespaces are the servers.

Part 6: Bonding — Two NICs, One Fate¶

A single NIC is a single point of failure. Bonding combines multiple physical interfaces into one logical interface for redundancy and aggregate bandwidth.

Bonding modes at a glance¶

Mode	Name	What it does	Switch config?
0	balance-rr	Round-robin packets across links	Yes (static LAG)
1	active-backup	One link active, others standby	No
2	balance-xor	Hash-based distribution	Yes (static LAG)
3	broadcast	Send on all links	Yes
4	802.3ad (LACP)	Dynamic aggregation with negotiation	Yes (LACP)
5	balance-tlb	Adaptive transmit load balance	No
6	balance-alb	Adaptive TX+RX load balance	No

Remember: "1 for simple, 4 for fast." Mode 1 (active-backup) is the safe default — no switch coordination needed, instant failover. Mode 4 (LACP) is production standard when you want both bandwidth and redundancy, but requires switch configuration.

Setting up mode 4 (LACP)¶

# Create the bond
ip link add bond0 type bond mode 802.3ad

# Set fast LACP rate (1-second PDU interval, 3-second failure detection)
ip link set bond0 type bond lacp_rate fast

# Set hash policy for good traffic distribution
ip link set bond0 type bond xmit_hash_policy layer3+4

# Enable link monitoring (100ms polling)
ip link set bond0 type bond miimon 100

# Add member interfaces
ip link set eth0 down
ip link set eth1 down
ip link set eth0 master bond0
ip link set eth1 master bond0

# Bring everything up
ip link set bond0 up
ip addr add 10.0.0.1/24 dev bond0

Let's break down the key options:

Option	Value	Why
`mode 802.3ad`	LACP	Dynamic negotiation, detects one-sided failures
`lacp_rate fast`	1-second PDUs	Failure detection in 3 seconds (vs 90 seconds on slow)
`xmit_hash_policy layer3+4`	Hash on IP+port	Distributes flows across links
`miimon 100`	Poll every 100ms	Detects physical link failure

Gotcha: The default lacp_rate is slow — PDUs every 30 seconds, failure detection at 90 seconds. That's a minute and a half of sending traffic into a dead link. Always set lacp_rate fast in production.

Verifying the bond¶

# Full bond status
cat /proc/net/bonding/bond0

Look for:

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
LACP rate: fast
MII Status: up

Slave Interface: eth0
MII Status: up
Aggregator ID: 1
Partner Mac Address: aa:bb:cc:dd:ee:ff    # <-- switch's MAC

Slave Interface: eth1
MII Status: up
Aggregator ID: 1                          # <-- same ID = correctly bundled
Partner Mac Address: aa:bb:cc:dd:ee:ff

Debug Clue: If Partner Mac Address shows 00:00:00:00:00:00, the switch isn't sending LACP PDUs. Either the switch port isn't configured for LACP, the switch is in passive mode (and so is your host), or there's a physical layer issue. If the two members show different Aggregator IDs, they're not actually bundled — check for speed/duplex mismatches.

Setting up mode 1 (active-backup)¶

When you don't control the switch or just need simple failover:

ip link add bond0 type bond mode active-backup
ip link set bond0 type bond miimon 100
ip link set bond0 type bond primary eth0
ip link set eth0 master bond0
ip link set eth1 master bond0
ip link set bond0 up

No switch configuration needed. eth0 handles all traffic. If eth0 goes down, eth1 takes over immediately. When eth0 recovers, it becomes active again (because of primary eth0).

War Story: The Bonding Mode That Split the Brain¶

War Story: A team configured a 2x10G bond on their database servers using mode 4 (LACP). Everything worked great — until a firmware update on the switch silently changed the port-channel configuration from LACP to static. The Linux side kept sending LACP PDUs. The switch ignored them. The bond stayed "up" because physical links were fine, but the switch now treated each port independently. Inbound traffic arrived on both ports with different MAC forwarding, causing duplicate packets and MAC flapping across the switch fabric. The database saw intermittent connection resets. It took three days to diagnose because monitoring only checked "bond0 is up" — nobody was checking whether the LACP partner was actually responding.

The fix: monitor /proc/net/bonding/bond0 for Partner Mac Address: 00:00:00:00:00:00 and alert on it. Also: always use lacp_rate fast so you detect switch-side misconfigurations in seconds, not minutes.

Part 7: VLANs on a Bond — The Full Stack¶

In production, you don't put VLANs on a bare NIC. You put them on a bond. The layering looks like this:

                    ┌─────────────┐
                    │  br-vlan100  │ ← bridge (switch for VLAN 100)
                    └──────┬──────┘
                           │
                    ┌──────┴──────┐
                    │  bond0.100  │ ← VLAN sub-interface
                    └──────┬──────┘
                           │
                    ┌──────┴──────┐
                    │    bond0    │ ← bond (2x10G LACP)
                    └──┬──────┬──┘
                       │      │
                    ┌──┴──┐┌──┴──┐
                    │eth0 ││eth1 │ ← physical NICs
                    └─────┘└─────┘

Build it:

# Assume bond0 already exists from the previous section

# Create VLAN interfaces on the bond
ip link add link bond0 name bond0.100 type vlan id 100
ip link add link bond0 name bond0.200 type vlan id 200
ip link set bond0.100 up
ip link set bond0.200 up

# Create bridges for each VLAN
ip link add br-vlan100 type bridge
ip link add br-vlan200 type bridge
ip link set br-vlan100 up
ip link set br-vlan200 up

# Attach VLAN interfaces to their bridges
ip link set bond0.100 master br-vlan100
ip link set bond0.200 master br-vlan200

# Give bridges IPs (optional — if this host routes between VLANs)
ip addr add 10.100.0.1/24 dev br-vlan100
ip addr add 10.200.0.1/24 dev br-vlan200

Now you can connect tenant namespaces to these bridges exactly like before. Each tenant lands on a VLAN with full Layer 2 isolation, carried over a redundant bonded link.

Gotcha: When you switch from bare NICs to a bond, delete the old VLAN interfaces first. An eth0.100 and a bond0.100 can coexist — one will work, the other will silently drop traffic, and you'll spend hours confused about why half your connections fail.

Part 8: Other Virtual Interface Types¶

veth pairs and bridges aren't the only virtual interfaces. Here's the extended family:

tap and tun¶

Name Origin: tun = tunnel (operates at Layer 3, IP packets). tap = network tap (operates at Layer 2, Ethernet frames). The names describe what level of the stack they expose to userspace.

# Create a tap device
ip tuntap add dev tap0 mode tap
ip link set tap0 up

# Create a tun device
ip tuntap add dev tun0 mode tun
ip link set tun0 up

tap/tun devices let userspace programs send and receive packets by reading/writing a file descriptor. This is how VPNs work — OpenVPN reads encrypted packets from the network, decrypts them, and writes cleartext packets into a tun device. The kernel routes them as if they arrived on a real interface.

Device	Layer	Delivers to userspace	Used by
`tun`	L3	Raw IP packets	OpenVPN, WireGuard (older mode)
`tap`	L2	Ethernet frames	QEMU/KVM VMs, OpenVPN (bridge mode)

macvlan and ipvlan¶

Both create virtual interfaces on a physical NIC. The key difference:

Feature	macvlan	ipvlan
MAC address	Unique per child	Shared with parent
Switch sees	Multiple MACs per port	One MAC per port
Host-to-child L2	Broken (kernel limitation)	Works
Use case	Containers as "real" hosts	Environments with MAC port-security limits

# macvlan — each child gets its own MAC
ip link add macvlan0 link eth0 type macvlan mode bridge

# ipvlan — all children share parent's MAC
ip link add ipvlan0 link eth0 type ipvlan mode l2

Part 9: Traffic Control (tc) — One-Minute Overview¶

The tc command controls how the kernel queues outbound packets. Two things worth knowing:

# Limit outbound bandwidth to 10 Mbit
tc qdisc add dev veth-br-t1 root tbf rate 10mbit burst 32kbit latency 400ms

# Simulate 100ms latency and 1% packet loss (chaos engineering)
tc qdisc add dev veth-br-t1 root netem delay 100ms loss 1%

# Remove
tc qdisc del dev veth-br-t1 root

Interview Bridge: "How would you test whether your application handles network latency gracefully?" Use tc netem. This is what chaos engineering tools (Pumba, Chaos Mesh) use under the hood.

Part 10: Open vSwitch (OVS) — When Linux Bridges Aren't Enough¶

When you need thousands of virtual ports, OpenFlow programming, or VXLAN tunnel endpoints, you reach for Open vSwitch:

# Create a switch and add ports
ovs-vsctl add-br ovs-br0
ovs-vsctl add-port ovs-br0 eth0
ovs-vsctl add-port ovs-br0 veth-br-t1

# Add a VXLAN tunnel to another host
ovs-vsctl add-port ovs-br0 vxlan0 -- \
    set Interface vxlan0 type=vxlan options:remote_ip=10.0.0.2

ovs-vsctl show

OVS is the networking backbone of OpenStack and several Kubernetes CNI plugins (Antrea, OVN-Kubernetes).

Trivia: OVS was developed at Nicira (founded by Martin Casado, who also invented OpenFlow as part of his PhD at Stanford). VMware acquired Nicira in 2012 for $1.26 billion. OVS remains open source.

Part 11: Kubernetes CNI — Where This All Comes Together¶

Everything we've built in this lesson — namespaces, veth pairs, bridges, VLANs, OVS — is exactly what Kubernetes CNI plugins do. CNI (Container Network Interface) is a specification: the kubelet calls a CNI binary, passes it a namespace path, and says "set up networking for this pod."

Different CNI plugins use different strategies:

CNI Plugin	Strategy	What it creates
Flannel (VXLAN)	Overlay	Bridge + veth pair + VXLAN tunnel per node
Calico (no overlay)	Routing	veth pair + BGP routes (no bridge)
Cilium	eBPF	veth pair, bypasses iptables entirely
Weave	Overlay	Bridge + veth + encrypted tunnel
Multus	Meta-CNI	Delegates to multiple CNIs per pod

But they all start with the same two steps:

Create a veth pair
Move one end into the pod's network namespace

The differences are in step 3: how traffic gets from the veth's host end to other pods and the outside world.

Mental Model: Every Kubernetes CNI plugin is answering the same question: "I have a veth pair. The pod end has an IP. How does a packet from this pod reach a pod on another node?" Flannel says "wrap it in VXLAN." Calico says "route it with BGP." Cilium says "program eBPF to forward it." The primitives are always the same.

Flashcard Check #3¶

Question	Answer
What Linux bonding mode uses LACP for dynamic negotiation?	Mode 4 (802.3ad)
Why should you set `lacp_rate fast`?	Default slow rate takes 90 seconds to detect a dead link; fast rate detects in 3 seconds
What's the relationship between `bond0.100` and `br-vlan100`?	`bond0.100` is a VLAN sub-interface attached to bridge `br-vlan100` as an uplink
What does `tc netem delay 100ms` do?	Adds 100ms of simulated latency to outbound packets
How does the tun device differ from tap?	tun passes L3 (IP) packets to userspace; tap passes L2 (Ethernet) frames
What two steps do ALL Kubernetes CNI plugins share?	Create a veth pair, move one end into the pod namespace
How does Docker implement port forwarding (-p)?	DNAT rule in iptables PREROUTING chain

Exercises¶

Exercise 1: Build a two-namespace bridge (5 minutes)¶

Create two namespaces (ns1 and ns2), a bridge, and connect them. Verify they can ping each other.

Hint

Follow the pattern from Part 3: create bridge, create veth pairs, move ends into namespaces, attach host ends to bridge, assign IPs, bring everything up.

Solution

ip link add br0 type bridge
ip link set br0 up

for i in 1 2; do
    ip netns add ns${i}
    ip link add veth-br${i} type veth peer name veth${i}
    ip link set veth${i} netns ns${i}
    ip link set veth-br${i} master br0
    ip link set veth-br${i} up
    ip netns exec ns${i} ip addr add 10.0.0.${i}/24 dev veth${i}
    ip netns exec ns${i} ip link set veth${i} up
    ip netns exec ns${i} ip link set lo up
done

ip netns exec ns1 ping -c 2 10.0.0.2

Exercise 2: Isolate with VLANs (10 minutes)¶

Extend Exercise 1. Create two bridges, one per VLAN (100 and 200). Put ns1 on VLAN 100 and ns2 on VLAN 200. Verify they cannot ping each other.

Hint

You'll need VLAN sub-interfaces on a parent interface (or you can use separate bridges without VLAN uplinks for pure L2 isolation between namespaces).

Exercise 3: Trace Docker's network setup (15 minutes)¶

Run docker run -d --name trace-me nginx. Then: 1. Find the container's PID 2. Find its veth pair on the host 3. Confirm the veth is attached to the docker0 bridge 4. List the iptables NAT rules Docker created 5. Enter the container's network namespace with nsenter and run ip route

Hint

pid=$(docker inspect --format '{{.State.Pid}}' trace-me)
# The veth shows up in 'ip link' with a peer ifindex matching the container's eth0
# Use 'bridge link show' to see what's attached to docker0
# 'iptables -t nat -L DOCKER -n' shows the DNAT rules
nsenter -t $pid -n ip route

Exercise 4: Bond + VLAN (20 minutes, requires two NICs or VMs)¶

Set up a mode 1 bond with two interfaces, create a VLAN 100 sub-interface on the bond, and verify connectivity. Check /proc/net/bonding/bond0 and pull a cable (or bring down an interface) to test failover.

Cheat Sheet¶

Namespace operations¶

Task	Command
Create namespace	`ip netns add NAME`
List namespaces	`ip netns list`
Run command in namespace	`ip netns exec NAME COMMAND`
Delete namespace	`ip netns del NAME`
Enter container's netns	`nsenter -t PID -n COMMAND`

veth and bridge operations¶

Task	Command
Create veth pair	`ip link add NAME type veth peer name PEER`
Move interface to namespace	`ip link set NAME netns NSNAME`
Create bridge	`ip link add NAME type bridge`
Attach port to bridge	`ip link set NAME master BRIDGE`
Show bridge members	`bridge link show`
Show bridge MAC table	`bridge fdb show br BRIDGE`

VLAN operations¶

Task	Command
Load 802.1Q module	`modprobe 8021q`
Create VLAN interface	`ip link add link PARENT name PARENT.VID type vlan id VID`
Show VLAN details	`ip -d link show PARENT.VID`
Capture tagged frames	`tcpdump -eni PARENT 'vlan VID'`

Bond operations¶

Task	Command
Create bond	`ip link add bond0 type bond mode 802.3ad`
Set LACP rate	`ip link set bond0 type bond lacp_rate fast`
Set hash policy	`ip link set bond0 type bond xmit_hash_policy layer3+4`
Add member	`ip link set ethX master bond0`
Check status	`cat /proc/net/bonding/bond0`
Set monitoring	`ip link set bond0 type bond miimon 100`

Traffic control (tc)¶

Task	Command
Limit bandwidth	`tc qdisc add dev DEV root tbf rate 10mbit burst 32kbit latency 400ms`
Simulate latency	`tc qdisc add dev DEV root netem delay 100ms`
Remove qdisc	`tc qdisc del dev DEV root`

Takeaways¶

Network namespaces are the foundation of container networking. Every container gets its own network stack through CLONE_NEWNET. No namespace, no isolation.
veth pairs are the only way to get packets across namespace boundaries. They're virtual cables. One end in the namespace, one end on the host (or bridge). Every container runtime uses them.
Docker's bridge network is just namespace + veth + bridge + iptables NAT. Once you understand the primitives, Docker networking stops being magic and starts being predictable.
LACP (mode 4) is the production standard for NIC bonding. Always set lacp_rate fast and miimon 100. Monitor the partner MAC address — if it's all zeros, your bond is not actually bonded.
VLANs are Layer 2 isolation on a single wire. The 802.1Q tag is 4 bytes. Load the 8021q module. Make sure the switch port is a trunk. The 4,094 VLAN limit is why clouds use VXLAN.
Every Kubernetes CNI plugin starts with the same two steps: create a veth pair, move one end into the pod's namespace. The difference is what happens after that.

What Happens When You Click a Link — follows a packet end-to-end through DNS, TCP, TLS
iptables: Following a Packet Through the Chains — deep dive into the netfilter framework Docker uses for NAT
The Hanging Deploy — processes, namespaces, and cgroups from the PID perspective
Kubernetes Services: How Traffic Finds Your Pod — what happens after CNI sets up the namespace
The Subnet Calculator in Your Head — IP addressing fundamentals for the VLAN subnets in this lesson

Linux Networking: Bridges, Bonds, and VLANs

The Mission¶

Part 1: Network Namespaces — Your Own Private Network Stack¶

What lives in a namespace¶

Part 2: veth Pairs — Virtual Ethernet Cables¶

The problem with point-to-point¶

Flashcard Check #1¶

Part 3: Linux Bridges — Software Switches¶

Giving tenants internet access¶

Part 4: How Docker Networking Actually Works¶

The macvlan alternative¶

Flashcard Check #2¶

Part 5: VLANs — Segmenting the Wire¶

Creating VLAN interfaces on Linux¶

VLAN-aware bridges for tenant isolation¶

Part 6: Bonding — Two NICs, One Fate¶

Bonding modes at a glance¶

Setting up mode 4 (LACP)¶

Verifying the bond¶

Setting up mode 1 (active-backup)¶

War Story: The Bonding Mode That Split the Brain¶

Part 7: VLANs on a Bond — The Full Stack¶

Part 8: Other Virtual Interface Types¶

tap and tun¶

macvlan and ipvlan¶

Part 9: Traffic Control (tc) — One-Minute Overview¶

Part 10: Open vSwitch (OVS) — When Linux Bridges Aren't Enough¶

Part 11: Kubernetes CNI — Where This All Comes Together¶

Flashcard Check #3¶

Exercises¶

Exercise 1: Build a two-namespace bridge (5 minutes)¶

Exercise 2: Isolate with VLANs (10 minutes)¶

Exercise 3: Trace Docker's network setup (15 minutes)¶

Exercise 4: Bond + VLAN (20 minutes, requires two NICs or VMs)¶

Cheat Sheet¶

Namespace operations¶

veth and bridge operations¶

VLAN operations¶

Bond operations¶

Traffic control (tc)¶

Takeaways¶

Related Lessons¶

Pages that link here¶