Skip to content

Linux Networking: Bridges, Bonds, and VLANs

  • lesson
  • network-namespaces
  • veth-pairs
  • linux-bridges
  • bonding/lacp
  • vlans
  • macvlan/ipvlan
  • tap/tun
  • docker-networking
  • ovs
  • tc
  • kubernetes-cni ---# Linux Networking — Bridges, Bonds, and VLANs

Topics: network namespaces, veth pairs, Linux bridges, bonding/LACP, VLANs, macvlan/ipvlan, tap/tun, Docker networking, OVS, tc, Kubernetes CNI Level: L1–L2 (Foundations → Operations) Time: 75–90 minutes Prerequisites: None (everything is explained from scratch)


The Mission

You just inherited a bare-metal server that needs to host four isolated tenant workloads. Each tenant gets its own network segment. Two tenants need VLAN access to the physical network. The server has two 10G NICs that should be bonded for redundancy. And the whole thing needs to resemble — at a conceptual level — what Docker and Kubernetes do under the hood.

By the end of this lesson, you'll have built the whole setup from scratch using nothing but ip commands. More importantly, you'll understand why container networking works the way it does, because you'll have built it yourself, piece by piece:

  • Network namespaces: the isolation primitive that makes containers possible
  • veth pairs: the virtual cables that connect isolated worlds
  • Linux bridges: the software switches that tie everything together
  • VLANs: Layer 2 segmentation on a single wire
  • Bonding: turning two NICs into one for redundancy and bandwidth
  • How Docker's bridge networking is just namespaces + veth + bridge + iptables
  • Where Kubernetes CNI picks up the story

We start with a single namespace. We end with a multi-tenant network. Let's go.


Part 1: Network Namespaces — Your Own Private Network Stack

Every process on Linux shares the same network stack by default — the same interfaces, the same routing table, the same iptables rules. Network namespaces change that. A namespace gets its own everything: interfaces, routes, ARP table, firewall rules, sockets. It's a complete network stack in a box.

Name Origin: The first Linux namespace (mount, 2002) used the flag CLONE_NEWNS — "new namespace" — because nobody expected there would be more than one type. Every subsequent namespace got a more specific name: CLONE_NEWPID, CLONE_NEWNET, etc. The mount namespace is still stuck with the generic flag as a historical accident.

Let's create one:

# Create a network namespace called "tenant1"
ip netns add tenant1

# List namespaces
ip netns list

Now look inside it:

# Run 'ip link' inside the namespace
ip netns exec tenant1 ip link

Output:

1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

That's it. One loopback interface, and it's DOWN. No eth0. No routes. No connectivity. This namespace is completely isolated from the host and from every other namespace.

# Bring up loopback (you'll need this for local communication)
ip netns exec tenant1 ip link set lo up

# Check the routing table — it's empty
ip netns exec tenant1 ip route

Nothing comes back. This namespace can't reach anything. That's the point.

Mental Model: Think of a network namespace as a brand new computer with no network cables plugged in. It has a network stack, but no connections. Everything you want it to reach, you have to wire up yourself.

What lives in a namespace

Each namespace has its own:

Resource Isolated? Example
Interfaces Yes lo, eth0, veth, bridges
Routing table Yes ip route shows different routes per namespace
ARP/neighbor table Yes ip neigh is per-namespace
iptables/nftables rules Yes Firewall rules are namespace-scoped
Sockets Yes A port 80 listener in ns1 doesn't conflict with ns2
/proc/net/* Yes Each namespace has its own proc network files

Interview Bridge: "How does a container get its own IP address and routing table?" The answer is network namespaces. Every container runtime (Docker, containerd, CRI-O) creates a network namespace per container (or per pod in Kubernetes). That's the entire isolation mechanism. There's no magic.


Part 2: veth Pairs — Virtual Ethernet Cables

A namespace with no connections is useless. You need a way to get packets in and out. Enter veth pairs.

Name Origin: veth = virtual ethernet. A veth pair is two virtual Ethernet interfaces connected back-to-back. Whatever goes in one end comes out the other. Think of it as a virtual crossover cable with an interface on each end.

# Create a veth pair: veth-host and veth-tenant1
ip link add veth-host type veth peer name veth-tenant1

You now have two interfaces on the host. Let's move one end into the namespace:

# Move veth-tenant1 into the tenant1 namespace
ip link set veth-tenant1 netns tenant1

Now veth-tenant1 has vanished from the host — it only exists inside tenant1. But the two ends are still connected. Assign IPs and bring them up:

# Host side
ip addr add 10.0.1.1/24 dev veth-host
ip link set veth-host up

# Tenant side (run inside the namespace)
ip netns exec tenant1 ip addr add 10.0.1.2/24 dev veth-tenant1
ip netns exec tenant1 ip link set veth-tenant1 up

Test it:

# From the host, ping the tenant
ping -c 2 10.0.1.2

# From the tenant, ping the host
ip netns exec tenant1 ping -c 2 10.0.1.1

Both should work. You just connected an isolated namespace to the host using a virtual cable.

Under the Hood: When you write to one end of a veth pair, the kernel's veth_xmit() function takes the packet, flips the source/destination device pointers, and delivers it to the peer's receive path — as if it arrived from a physical wire. There's no copy; the same sk_buff (socket buffer) is passed to the other end. This is why veth pairs have near-zero overhead.

The problem with point-to-point

What we built works for one namespace. But what if you have four tenants that all need to talk to each other and to the host? You'd need a veth pair between every pair of namespaces — that's 6 pairs for 4 namespaces, 10 pairs for 5, and it scales as n(n-1)/2.

This is exactly the problem that switches solve in the physical world. In the virtual world, we use a Linux bridge.


Flashcard Check #1

Cover the answers. Test yourself.

Question Answer
What kernel feature gives a container its own network stack? Network namespace (CLONE_NEWNET)
What does ip netns exec tenant1 bash do? Opens a shell inside the tenant1 network namespace
What is a veth pair? Two virtual Ethernet interfaces connected back-to-back — a virtual cable
Why can't you see veth-tenant1 on the host after moving it? It was moved into the tenant1 namespace; interfaces belong to exactly one namespace
What's the scaling problem with veth-only connectivity? Point-to-point pairs scale as n(n-1)/2 — you need a bridge

Part 3: Linux Bridges — Software Switches

A Linux bridge is a Layer 2 switch implemented in the kernel. It learns MAC addresses, forwards frames between ports, and acts as the central meeting point for veth pairs, physical NICs, VLAN interfaces, and tap devices.

Name Origin: The term "bridge" comes from the original networking device that "bridged" two separate network segments, allowing them to act as one. The Linux bridge implementation dates back to the 2.2 kernel era (late 1990s). The old tool was brctl (bridge control); the modern equivalent is ip link add type bridge.

# Create a bridge
ip link add br-tenant type bridge
ip link set br-tenant up

# Give the bridge an IP (this becomes the gateway for tenants)
ip addr add 10.0.1.1/24 dev br-tenant

Now connect namespaces to it. Let's set up two tenants this time:

# Clean up the earlier point-to-point setup
ip link del veth-host 2>/dev/null

# Create namespace and veth pairs for tenant1 and tenant2
for i in 1 2; do
    ip netns add tenant${i} 2>/dev/null
    ip link add veth-br-t${i} type veth peer name veth-t${i}
    ip link set veth-t${i} netns tenant${i}

    # Attach host end to the bridge
    ip link set veth-br-t${i} master br-tenant
    ip link set veth-br-t${i} up

    # Configure inside the namespace
    ip netns exec tenant${i} ip addr add 10.0.1.$((i+1))/24 dev veth-t${i}
    ip netns exec tenant${i} ip link set veth-t${i} up
    ip netns exec tenant${i} ip link set lo up

    # Set the bridge as the default gateway
    ip netns exec tenant${i} ip route add default via 10.0.1.1
done

Let's break down what just happened:

Command What it does
ip link add veth-br-t1 type veth peer name veth-t1 Create a veth pair
ip link set veth-t1 netns tenant1 Move one end into the namespace
ip link set veth-br-t1 master br-tenant Attach the other end to the bridge
ip netns exec tenant1 ip route add default via 10.0.1.1 Route traffic through the bridge

Test connectivity:

# Tenant1 → Tenant2 (through the bridge)
ip netns exec tenant1 ping -c 2 10.0.1.3

# Tenant2 → Host (through the bridge)
ip netns exec tenant2 ping -c 2 10.0.1.1

Both tenants can reach each other and the host through the bridge. The bridge does MAC learning — it knows which MAC is behind which port, just like a physical switch.

# See the bridge's MAC address table
bridge fdb show br br-tenant

Trivia: The docker0 bridge that Docker creates automatically is exactly this — a Linux bridge. When you run docker run, Docker creates a veth pair, moves one end into the container's network namespace, and attaches the other to docker0. Every default Docker container network is built on the same primitives you just used.

Giving tenants internet access

Right now, tenants can reach the host and each other, but not the outside world. For that, you need IP forwarding and NAT — the same thing your home router does:

# Enable IP forwarding
sysctl -w net.ipv4.ip_forward=1

# NAT outbound traffic from the bridge network
iptables -t nat -A POSTROUTING -s 10.0.1.0/24 ! -o br-tenant -j MASQUERADE

# Allow forwarding for established connections
iptables -A FORWARD -i br-tenant -j ACCEPT
iptables -A FORWARD -o br-tenant -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

Now tenants can reach the internet:

ip netns exec tenant1 ping -c 2 8.8.8.8

Under the Hood: That MASQUERADE rule is doing SNAT — rewriting the source IP of outbound packets from 10.0.1.x to whatever IP is on the outgoing interface. The kernel's conntrack module remembers the mapping so return packets are translated back. This is exactly what Docker does when you docker run without -p. With -p 8080:80, Docker adds a DNAT rule in the PREROUTING chain to forward incoming traffic on host port 8080 to the container's port 80.


Part 4: How Docker Networking Actually Works

Now that you've built a bridge network from scratch, here's the punchline: Docker's default bridge network is this exact setup, automated.

When you run docker run -d --name web -p 8080:80 nginx, Docker:

  1. Creates a network namespace for the container
  2. Creates a veth pair
  3. Moves one end (eth0 inside the container) into the namespace
  4. Attaches the other end to the docker0 bridge
  5. Assigns an IP from the bridge's subnet (e.g., 172.17.0.2/16)
  6. Adds an iptables MASQUERADE rule for outbound NAT
  7. Adds an iptables DNAT rule to forward host:8080 → container:80
  8. Adds DNS configuration pointing to Docker's embedded DNS server (127.0.0.11)

You can see all of this:

# See the docker0 bridge
ip link show docker0
bridge link show

# See the veth pairs
ip link show type veth

# See Docker's iptables rules
iptables -t nat -L -n -v | grep -A5 DOCKER

# See the container's namespace
pid=$(docker inspect --format '{{.State.Pid}}' web)
nsenter -t $pid -n ip addr
nsenter -t $pid -n ip route

Gotcha: Docker's default bridge does not provide DNS resolution between containers by name. Only user-defined bridge networks (docker network create) get Docker's built-in DNS. This is why docker-compose always creates a custom network — so services can reach each other by name.

The macvlan alternative

Sometimes you don't want NAT. You want the container to appear as a real host on the physical network. Docker's macvlan driver does this:

docker network create -d macvlan \
    --subnet=10.100.0.0/24 \
    --gateway=10.100.0.1 \
    -o parent=eth0 \
    direct_net

docker run --network direct_net --ip 10.100.0.50 -d nginx

The container gets 10.100.0.50 directly on the physical network. No NAT, no bridge. The switch sees the container's MAC address as a separate host.

Gotcha: With macvlan, the container can reach everything on the network except the host itself. This is a known kernel limitation — the host's interface and its macvlan children can't communicate at Layer 2. You need a separate macvlan interface on the host or a different physical NIC for host-to-container traffic.


Flashcard Check #2

Question Answer
What does ip link set veth-br master br0 do? Attaches the veth interface to bridge br0 (like plugging a cable into a switch port)
What iptables chain does Docker use for port forwarding (-p)? PREROUTING with a DNAT rule
Why doesn't Docker's default bridge support container name DNS? Only user-defined networks get Docker's embedded DNS server
What does MASQUERADE do in the POSTROUTING chain? Rewrites the source IP to the outgoing interface's IP (dynamic SNAT)
Why can't a macvlan container reach its host? Kernel limitation: a physical interface and its macvlan children can't communicate at L2

Part 5: VLANs — Segmenting the Wire

So far, all our tenants share the same Layer 2 domain. Tenant1 can see Tenant2's broadcast traffic. For real isolation, you need VLANs — separate broadcast domains on the same physical wire.

Name Origin: VLAN = Virtual Local Area Network. Standardized as IEEE 802.1Q in 1998, VLANs were invented because moving a user between departments used to require physically re-cabling their switch port. The "virtual" means the segmentation is logical, not physical.

Trivia: The 802.1Q tag is only 4 bytes — inserted between the source MAC and the EtherType field. Those 4 bytes contain a 12-bit VLAN ID, giving you 4,094 usable VLANs (0 and 4095 are reserved). That seemed enormous in 1998. It became a hard constraint that drove the invention of VXLAN (24-bit ID, 16 million segments) for cloud-scale multi-tenancy.

Creating VLAN interfaces on Linux

First, load the kernel module:

# Load 802.1Q support
modprobe 8021q
lsmod | grep 8021q

Gotcha: If the 8021q module isn't loaded, Linux will happily create the VLAN interface and show it as UP, but no tagged frames will be sent or received. Everything looks fine, nothing works. Always verify the module is loaded.

Now create VLAN interfaces on a physical NIC:

# Create VLAN 100 on eth0
ip link add link eth0 name eth0.100 type vlan id 100
ip addr add 10.100.0.5/24 dev eth0.100
ip link set eth0.100 up

# Create VLAN 200 on eth0
ip link add link eth0 name eth0.200 type vlan id 200
ip addr add 10.200.0.5/24 dev eth0.200
ip link set eth0.200 up

# Verify — look for "vlan protocol 802.1Q id 100"
ip -d link show eth0.100

The switch port connected to eth0 must be a trunk carrying VLANs 100 and 200. If it's an access port, tagged frames are silently dropped.

VLAN-aware bridges for tenant isolation

Here's where it gets powerful. You can create a separate bridge per VLAN, giving each tenant true Layer 2 isolation:

# Bridge for VLAN 100 tenants
ip link add br-vlan100 type bridge
ip link set br-vlan100 up
ip link set eth0.100 master br-vlan100

# Bridge for VLAN 200 tenants
ip link add br-vlan200 type bridge
ip link set br-vlan200 up
ip link set eth0.200 master br-vlan200

# Connect tenant3 to VLAN 100
ip netns add tenant3
ip link add veth-br-t3 type veth peer name veth-t3
ip link set veth-t3 netns tenant3
ip link set veth-br-t3 master br-vlan100
ip link set veth-br-t3 up
ip netns exec tenant3 ip addr add 10.100.0.10/24 dev veth-t3
ip netns exec tenant3 ip link set veth-t3 up

# Connect tenant4 to VLAN 200
ip netns add tenant4
ip link add veth-br-t4 type veth peer name veth-t4
ip link set veth-t4 netns tenant4
ip link set veth-br-t4 master br-vlan200
ip link set veth-br-t4 up
ip netns exec tenant4 ip addr add 10.200.0.10/24 dev veth-t4
ip netns exec tenant4 ip link set veth-t4 up

Now tenant3 is on VLAN 100 and tenant4 is on VLAN 200. They're completely isolated at Layer 2 — tenant3's broadcasts never reach tenant4, and vice versa. Exactly like being on different physical switches.

# tenant3 can reach other VLAN 100 hosts
ip netns exec tenant3 ping -c 2 10.100.0.5

# tenant4 can reach other VLAN 200 hosts
ip netns exec tenant4 ping -c 2 10.200.0.5

# tenant3 CANNOT reach tenant4 (different L2 domain)
ip netns exec tenant3 ping -c 2 10.200.0.10  # fails — no route, different VLAN

Mental Model: A bridge-per-VLAN is like having multiple physical switches inside your server. Each bridge is a switch, each VLAN interface is an uplink to the physical network, and each veth pair is a patch cable to a namespace. The namespaces are the servers.


Part 6: Bonding — Two NICs, One Fate

A single NIC is a single point of failure. Bonding combines multiple physical interfaces into one logical interface for redundancy and aggregate bandwidth.

Bonding modes at a glance

Mode Name What it does Switch config?
0 balance-rr Round-robin packets across links Yes (static LAG)
1 active-backup One link active, others standby No
2 balance-xor Hash-based distribution Yes (static LAG)
3 broadcast Send on all links Yes
4 802.3ad (LACP) Dynamic aggregation with negotiation Yes (LACP)
5 balance-tlb Adaptive transmit load balance No
6 balance-alb Adaptive TX+RX load balance No

Remember: "1 for simple, 4 for fast." Mode 1 (active-backup) is the safe default — no switch coordination needed, instant failover. Mode 4 (LACP) is production standard when you want both bandwidth and redundancy, but requires switch configuration.

Setting up mode 4 (LACP)

# Create the bond
ip link add bond0 type bond mode 802.3ad

# Set fast LACP rate (1-second PDU interval, 3-second failure detection)
ip link set bond0 type bond lacp_rate fast

# Set hash policy for good traffic distribution
ip link set bond0 type bond xmit_hash_policy layer3+4

# Enable link monitoring (100ms polling)
ip link set bond0 type bond miimon 100

# Add member interfaces
ip link set eth0 down
ip link set eth1 down
ip link set eth0 master bond0
ip link set eth1 master bond0

# Bring everything up
ip link set bond0 up
ip addr add 10.0.0.1/24 dev bond0

Let's break down the key options:

Option Value Why
mode 802.3ad LACP Dynamic negotiation, detects one-sided failures
lacp_rate fast 1-second PDUs Failure detection in 3 seconds (vs 90 seconds on slow)
xmit_hash_policy layer3+4 Hash on IP+port Distributes flows across links
miimon 100 Poll every 100ms Detects physical link failure

Gotcha: The default lacp_rate is slow — PDUs every 30 seconds, failure detection at 90 seconds. That's a minute and a half of sending traffic into a dead link. Always set lacp_rate fast in production.

Verifying the bond

# Full bond status
cat /proc/net/bonding/bond0

Look for:

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
LACP rate: fast
MII Status: up

Slave Interface: eth0
MII Status: up
Aggregator ID: 1
Partner Mac Address: aa:bb:cc:dd:ee:ff    # <-- switch's MAC

Slave Interface: eth1
MII Status: up
Aggregator ID: 1                          # <-- same ID = correctly bundled
Partner Mac Address: aa:bb:cc:dd:ee:ff

Debug Clue: If Partner Mac Address shows 00:00:00:00:00:00, the switch isn't sending LACP PDUs. Either the switch port isn't configured for LACP, the switch is in passive mode (and so is your host), or there's a physical layer issue. If the two members show different Aggregator IDs, they're not actually bundled — check for speed/duplex mismatches.

Setting up mode 1 (active-backup)

When you don't control the switch or just need simple failover:

ip link add bond0 type bond mode active-backup
ip link set bond0 type bond miimon 100
ip link set bond0 type bond primary eth0
ip link set eth0 master bond0
ip link set eth1 master bond0
ip link set bond0 up

No switch configuration needed. eth0 handles all traffic. If eth0 goes down, eth1 takes over immediately. When eth0 recovers, it becomes active again (because of primary eth0).


War Story: The Bonding Mode That Split the Brain

War Story: A team configured a 2x10G bond on their database servers using mode 4 (LACP). Everything worked great — until a firmware update on the switch silently changed the port-channel configuration from LACP to static. The Linux side kept sending LACP PDUs. The switch ignored them. The bond stayed "up" because physical links were fine, but the switch now treated each port independently. Inbound traffic arrived on both ports with different MAC forwarding, causing duplicate packets and MAC flapping across the switch fabric. The database saw intermittent connection resets. It took three days to diagnose because monitoring only checked "bond0 is up" — nobody was checking whether the LACP partner was actually responding.

The fix: monitor /proc/net/bonding/bond0 for Partner Mac Address: 00:00:00:00:00:00 and alert on it. Also: always use lacp_rate fast so you detect switch-side misconfigurations in seconds, not minutes.


Part 7: VLANs on a Bond — The Full Stack

In production, you don't put VLANs on a bare NIC. You put them on a bond. The layering looks like this:

                    ┌─────────────┐
                    │  br-vlan100  │ ← bridge (switch for VLAN 100)
                    └──────┬──────┘
                    ┌──────┴──────┐
                    │  bond0.100  │ ← VLAN sub-interface
                    └──────┬──────┘
                    ┌──────┴──────┐
                    │    bond0    │ ← bond (2x10G LACP)
                    └──┬──────┬──┘
                       │      │
                    ┌──┴──┐┌──┴──┐
                    │eth0 ││eth1 │ ← physical NICs
                    └─────┘└─────┘

Build it:

# Assume bond0 already exists from the previous section

# Create VLAN interfaces on the bond
ip link add link bond0 name bond0.100 type vlan id 100
ip link add link bond0 name bond0.200 type vlan id 200
ip link set bond0.100 up
ip link set bond0.200 up

# Create bridges for each VLAN
ip link add br-vlan100 type bridge
ip link add br-vlan200 type bridge
ip link set br-vlan100 up
ip link set br-vlan200 up

# Attach VLAN interfaces to their bridges
ip link set bond0.100 master br-vlan100
ip link set bond0.200 master br-vlan200

# Give bridges IPs (optional — if this host routes between VLANs)
ip addr add 10.100.0.1/24 dev br-vlan100
ip addr add 10.200.0.1/24 dev br-vlan200

Now you can connect tenant namespaces to these bridges exactly like before. Each tenant lands on a VLAN with full Layer 2 isolation, carried over a redundant bonded link.

Gotcha: When you switch from bare NICs to a bond, delete the old VLAN interfaces first. An eth0.100 and a bond0.100 can coexist — one will work, the other will silently drop traffic, and you'll spend hours confused about why half your connections fail.


Part 8: Other Virtual Interface Types

veth pairs and bridges aren't the only virtual interfaces. Here's the extended family:

tap and tun

Name Origin: tun = tunnel (operates at Layer 3, IP packets). tap = network tap (operates at Layer 2, Ethernet frames). The names describe what level of the stack they expose to userspace.

# Create a tap device
ip tuntap add dev tap0 mode tap
ip link set tap0 up

# Create a tun device
ip tuntap add dev tun0 mode tun
ip link set tun0 up

tap/tun devices let userspace programs send and receive packets by reading/writing a file descriptor. This is how VPNs work — OpenVPN reads encrypted packets from the network, decrypts them, and writes cleartext packets into a tun device. The kernel routes them as if they arrived on a real interface.

Device Layer Delivers to userspace Used by
tun L3 Raw IP packets OpenVPN, WireGuard (older mode)
tap L2 Ethernet frames QEMU/KVM VMs, OpenVPN (bridge mode)

macvlan and ipvlan

Both create virtual interfaces on a physical NIC. The key difference:

Feature macvlan ipvlan
MAC address Unique per child Shared with parent
Switch sees Multiple MACs per port One MAC per port
Host-to-child L2 Broken (kernel limitation) Works
Use case Containers as "real" hosts Environments with MAC port-security limits
# macvlan — each child gets its own MAC
ip link add macvlan0 link eth0 type macvlan mode bridge

# ipvlan — all children share parent's MAC
ip link add ipvlan0 link eth0 type ipvlan mode l2

Part 9: Traffic Control (tc) — One-Minute Overview

The tc command controls how the kernel queues outbound packets. Two things worth knowing:

# Limit outbound bandwidth to 10 Mbit
tc qdisc add dev veth-br-t1 root tbf rate 10mbit burst 32kbit latency 400ms

# Simulate 100ms latency and 1% packet loss (chaos engineering)
tc qdisc add dev veth-br-t1 root netem delay 100ms loss 1%

# Remove
tc qdisc del dev veth-br-t1 root

Interview Bridge: "How would you test whether your application handles network latency gracefully?" Use tc netem. This is what chaos engineering tools (Pumba, Chaos Mesh) use under the hood.


Part 10: Open vSwitch (OVS) — When Linux Bridges Aren't Enough

When you need thousands of virtual ports, OpenFlow programming, or VXLAN tunnel endpoints, you reach for Open vSwitch:

# Create a switch and add ports
ovs-vsctl add-br ovs-br0
ovs-vsctl add-port ovs-br0 eth0
ovs-vsctl add-port ovs-br0 veth-br-t1

# Add a VXLAN tunnel to another host
ovs-vsctl add-port ovs-br0 vxlan0 -- \
    set Interface vxlan0 type=vxlan options:remote_ip=10.0.0.2

ovs-vsctl show

OVS is the networking backbone of OpenStack and several Kubernetes CNI plugins (Antrea, OVN-Kubernetes).

Trivia: OVS was developed at Nicira (founded by Martin Casado, who also invented OpenFlow as part of his PhD at Stanford). VMware acquired Nicira in 2012 for $1.26 billion. OVS remains open source.


Part 11: Kubernetes CNI — Where This All Comes Together

Everything we've built in this lesson — namespaces, veth pairs, bridges, VLANs, OVS — is exactly what Kubernetes CNI plugins do. CNI (Container Network Interface) is a specification: the kubelet calls a CNI binary, passes it a namespace path, and says "set up networking for this pod."

Different CNI plugins use different strategies:

CNI Plugin Strategy What it creates
Flannel (VXLAN) Overlay Bridge + veth pair + VXLAN tunnel per node
Calico (no overlay) Routing veth pair + BGP routes (no bridge)
Cilium eBPF veth pair, bypasses iptables entirely
Weave Overlay Bridge + veth + encrypted tunnel
Multus Meta-CNI Delegates to multiple CNIs per pod

But they all start with the same two steps:

  1. Create a veth pair
  2. Move one end into the pod's network namespace

The differences are in step 3: how traffic gets from the veth's host end to other pods and the outside world.

Mental Model: Every Kubernetes CNI plugin is answering the same question: "I have a veth pair. The pod end has an IP. How does a packet from this pod reach a pod on another node?" Flannel says "wrap it in VXLAN." Calico says "route it with BGP." Cilium says "program eBPF to forward it." The primitives are always the same.


Flashcard Check #3

Question Answer
What Linux bonding mode uses LACP for dynamic negotiation? Mode 4 (802.3ad)
Why should you set lacp_rate fast? Default slow rate takes 90 seconds to detect a dead link; fast rate detects in 3 seconds
What's the relationship between bond0.100 and br-vlan100? bond0.100 is a VLAN sub-interface attached to bridge br-vlan100 as an uplink
What does tc netem delay 100ms do? Adds 100ms of simulated latency to outbound packets
How does the tun device differ from tap? tun passes L3 (IP) packets to userspace; tap passes L2 (Ethernet) frames
What two steps do ALL Kubernetes CNI plugins share? Create a veth pair, move one end into the pod namespace
How does Docker implement port forwarding (-p)? DNAT rule in iptables PREROUTING chain

Exercises

Exercise 1: Build a two-namespace bridge (5 minutes)

Create two namespaces (ns1 and ns2), a bridge, and connect them. Verify they can ping each other.

Hint Follow the pattern from Part 3: create bridge, create veth pairs, move ends into namespaces, attach host ends to bridge, assign IPs, bring everything up.
Solution
ip link add br0 type bridge
ip link set br0 up

for i in 1 2; do
    ip netns add ns${i}
    ip link add veth-br${i} type veth peer name veth${i}
    ip link set veth${i} netns ns${i}
    ip link set veth-br${i} master br0
    ip link set veth-br${i} up
    ip netns exec ns${i} ip addr add 10.0.0.${i}/24 dev veth${i}
    ip netns exec ns${i} ip link set veth${i} up
    ip netns exec ns${i} ip link set lo up
done

ip netns exec ns1 ping -c 2 10.0.0.2

Exercise 2: Isolate with VLANs (10 minutes)

Extend Exercise 1. Create two bridges, one per VLAN (100 and 200). Put ns1 on VLAN 100 and ns2 on VLAN 200. Verify they cannot ping each other.

Hint You'll need VLAN sub-interfaces on a parent interface (or you can use separate bridges without VLAN uplinks for pure L2 isolation between namespaces).

Exercise 3: Trace Docker's network setup (15 minutes)

Run docker run -d --name trace-me nginx. Then: 1. Find the container's PID 2. Find its veth pair on the host 3. Confirm the veth is attached to the docker0 bridge 4. List the iptables NAT rules Docker created 5. Enter the container's network namespace with nsenter and run ip route

Hint
pid=$(docker inspect --format '{{.State.Pid}}' trace-me)
# The veth shows up in 'ip link' with a peer ifindex matching the container's eth0
# Use 'bridge link show' to see what's attached to docker0
# 'iptables -t nat -L DOCKER -n' shows the DNAT rules
nsenter -t $pid -n ip route

Exercise 4: Bond + VLAN (20 minutes, requires two NICs or VMs)

Set up a mode 1 bond with two interfaces, create a VLAN 100 sub-interface on the bond, and verify connectivity. Check /proc/net/bonding/bond0 and pull a cable (or bring down an interface) to test failover.


Cheat Sheet

Namespace operations

Task Command
Create namespace ip netns add NAME
List namespaces ip netns list
Run command in namespace ip netns exec NAME COMMAND
Delete namespace ip netns del NAME
Enter container's netns nsenter -t PID -n COMMAND

veth and bridge operations

Task Command
Create veth pair ip link add NAME type veth peer name PEER
Move interface to namespace ip link set NAME netns NSNAME
Create bridge ip link add NAME type bridge
Attach port to bridge ip link set NAME master BRIDGE
Show bridge members bridge link show
Show bridge MAC table bridge fdb show br BRIDGE

VLAN operations

Task Command
Load 802.1Q module modprobe 8021q
Create VLAN interface ip link add link PARENT name PARENT.VID type vlan id VID
Show VLAN details ip -d link show PARENT.VID
Capture tagged frames tcpdump -eni PARENT 'vlan VID'

Bond operations

Task Command
Create bond ip link add bond0 type bond mode 802.3ad
Set LACP rate ip link set bond0 type bond lacp_rate fast
Set hash policy ip link set bond0 type bond xmit_hash_policy layer3+4
Add member ip link set ethX master bond0
Check status cat /proc/net/bonding/bond0
Set monitoring ip link set bond0 type bond miimon 100

Traffic control (tc)

Task Command
Limit bandwidth tc qdisc add dev DEV root tbf rate 10mbit burst 32kbit latency 400ms
Simulate latency tc qdisc add dev DEV root netem delay 100ms
Remove qdisc tc qdisc del dev DEV root

Takeaways

  • Network namespaces are the foundation of container networking. Every container gets its own network stack through CLONE_NEWNET. No namespace, no isolation.

  • veth pairs are the only way to get packets across namespace boundaries. They're virtual cables. One end in the namespace, one end on the host (or bridge). Every container runtime uses them.

  • Docker's bridge network is just namespace + veth + bridge + iptables NAT. Once you understand the primitives, Docker networking stops being magic and starts being predictable.

  • LACP (mode 4) is the production standard for NIC bonding. Always set lacp_rate fast and miimon 100. Monitor the partner MAC address — if it's all zeros, your bond is not actually bonded.

  • VLANs are Layer 2 isolation on a single wire. The 802.1Q tag is 4 bytes. Load the 8021q module. Make sure the switch port is a trunk. The 4,094 VLAN limit is why clouds use VXLAN.

  • Every Kubernetes CNI plugin starts with the same two steps: create a veth pair, move one end into the pod's namespace. The difference is what happens after that.