Portal | Level: L2: Operations | Topics: BGP EVPN / VXLAN | Domain: Networking
BGP EVPN / VXLAN — Primer¶
Why This Matters¶
Traditional datacenter networks stretched VLANs everywhere using STP. STP works by blocking redundant links — you buy two expensive 10G uplinks and STP blocks one. VLAN IDs are 12 bits: 4094 maximum. In a multi-tenant datacenter with hundreds of customers each needing isolated broadcast domains, you hit that ceiling quickly. Spanning tree also cannot forward across the wide area without vendor-specific hacks.
VXLAN (Virtual Extensible LAN) solves the scale problem by tunneling Layer 2 frames inside UDP packets. Each tunnel endpoint (VTEP) encapsulates an Ethernet frame in a UDP/IP packet with a 24-bit VNI (VXLAN Network Identifier) — up to 16 million segments instead of 4094 VLANs. The result is an overlay network where virtual Layer 2 segments can span physical Layer 3 boundaries without STP.
EVPN (Ethernet VPN, RFC 7432) is the control plane that makes VXLAN production-grade. Without EVPN, VTEPs flood all unknown BUM (Broadcast, Unknown unicast, Multicast) traffic to every other VTEP. EVPN distributes MAC and IP address information via MP-BGP routes, so VTEPs learn where endpoints live without flooding. Every network engineer building or troubleshooting modern datacenter fabrics needs to understand this stack.
Who made it: VXLAN was jointly developed by VMware, Cisco, and Arista Networks. The initial IETF draft was submitted in 2011 and published as RFC 7348 in August 2014. EVPN (RFC 7432, February 2015) was developed at Juniper Networks and standardized through the IETF's L2VPN working group. The combination of VXLAN + EVPN replaced the earlier multicast-based flood-and-learn approach and became the dominant datacenter fabric architecture by 2018.
Name origin: VXLAN stands for Virtual eXtensible Local Area Network. EVPN stands for Ethernet Virtual Private Network. VTEP stands for VXLAN Tunnel EndPoint. NVE stands for Network Virtualization Edge (Cisco's term for VTEP). VNI stands for VXLAN Network Identifier. The acronym density in this domain is high — having these expansions memorized prevents confusion when reading vendor documentation.
Core Concepts¶
1. VXLAN Encapsulation¶
VXLAN wraps an Ethernet frame (with its original 802.1Q header if present) in: - Outer Ethernet header (VTEP MAC-to-MAC) - Outer IP header (VTEP IP-to-VTEP IP — the underlay) - UDP header (destination port 4789) - VXLAN header (8 bytes, includes 24-bit VNI) - Original inner Ethernet frame
VNI to VLAN mapping (NX-OS example):
vlan 100
vn-segment 10100 # VNI 10100 mapped to VLAN 100
interface nve1 # NVE = Network Virtualization Edge (the VTEP)
no shutdown
source-interface loopback1
member vni 10100
mcast-group 239.1.1.1 # multicast for BUM (flood mode, pre-EVPN)
# or: ingress-replication protocol bgp (EVPN mode)
Arista EOS VTEP config:
interface Vxlan1
vxlan source-interface Loopback1
vxlan udp-port 4789
vxlan vlan 100 vni 10100
vxlan vlan 200 vni 10200
vxlan learn-restrict any
2. Underlay vs Overlay¶
The underlay is the physical IP-routed network that carries VXLAN tunnels. The overlay is the virtual L2/L3 network built on top.
Underlay requirements: - All VTEP loopbacks must be reachable from all other VTEPs (via OSPF, ISIS, or eBGP underlay) - MTU must accommodate VXLAN overhead: ~50 bytes. If inner MTU is 1500, set physical MTU to at least 1554 - ECMP must be enabled for load balancing across fabric links
NX-OS underlay (OSPF):
feature ospf
router ospf 1
router-id 10.0.0.1
interface Ethernet1/1 # spine uplink
ip address 10.100.1.1/31
ip ospf 1 area 0.0.0.0
ip ospf network point-to-point
no ip ospf passive-interface
interface Loopback0 # VTEP source
ip address 10.0.0.1/32
ip ospf 1 area 0.0.0.0
Arista underlay (eBGP):
router bgp 65001 # unique ASN per leaf (eBGP between every device)
router-id 10.0.0.11
maximum-paths 4 ecmp 4
neighbor 10.100.1.0 remote-as 65000 # spine1
neighbor 10.100.1.2 remote-as 65000 # spine2
redistribute connected route-map LOOPBACKS
Gotcha: VXLAN encapsulation adds approximately 50 bytes of overhead (outer Ethernet 14 + outer IP 20 + UDP 8 + VXLAN 8 = 50 bytes). If the underlay MTU is the standard 1500, inner frames are limited to ~1450 bytes — which will silently break any application expecting 1500-byte payloads. Always set physical interface MTU to at least 9000 (jumbo frames) or a minimum of 1554 on the underlay. MTU mismatch is the #1 cause of "VXLAN tunnel is up but large packets are dropped" — small packets work, large packets vanish.
3. EVPN Control Plane and Route Types¶
EVPN runs as an address family inside MP-BGP. BGP carries EVPN routes in the L2VPN EVPN AFI/SAFI (25/70). The five EVPN route types are:
| Route Type | Name | Purpose |
|---|---|---|
| Type 1 | Ethernet Auto-Discovery | Mass withdrawal, aliasing for multihoming |
| Type 2 | MAC/IP Advertisement | Distribute MAC and optionally IP of a host |
| Type 3 | Inclusive Multicast Ethernet Tag | Advertise VTEP membership in a VNI (BUM handling) |
| Type 4 | Ethernet Segment Route | Multihoming DF election |
| Type 5 | IP Prefix Route | Route type for L3VPN / IRB inter-subnet routing |
Type 2 route — the core MAC learning mechanism:
# From 'show bgp l2vpn evpn' on NX-OS
BGP routing table entry for [2]:[0]:[0]:[48]:[aabb.cc00.1234]:[32]:[192.168.100.10]/272
Paths: (1 available)
Path type: internal, path is valid, not best reason: Router Id
AS-Path: NONE, path locally originated
0.0.0.0 (metric 0) from 0.0.0.0 (10.0.0.11)
Origin IGP, MED not set, localpref 100, weight 32768
Received label 10100 10001 # VNI for L2, VNI for L3 (symmetric IRB)
Extcommunity: RT:65001:10100 RT:65001:10001 ENCAP:8
Type 3 route — VTEP joins VNI:
# Every VTEP that has hosts in VNI 10100 advertises a Type 3 route
# Other VTEPs use this to build their BUM replication list
BGP routing table entry for [3]:[0]:[32]:[10.0.0.11]/88
PMSI Attribute: tunnel-type:6, label:10100, tunnel-id:10.0.0.11
Extcommunity: RT:65001:10100 ENCAP:8
Type 5 route — inter-subnet (IP prefix):
BGP routing table entry for [5]:[0]:[0]:[24]:[192.168.200.0]/224
Paths: (2 available)
Gateway IP: 0.0.0.0
Extcommunity: RT:65001:10001 ENCAP:8 Router MAC:aabb.cc00.0001
4. Symmetric vs Asymmetric IRB¶
IRB (Integrated Routing and Bridging) allows a VTEP to route between VNIs (subnets) locally.
Asymmetric IRB: - The ingress VTEP both bridges (L2 lookup in source VNI) and routes (L3 lookup) - Packet arrives at egress VTEP already in destination VNI — egress only bridges - Simpler but requires every VTEP to have every VLAN/VNI configured - Does not scale when you have hundreds of VLANs
Symmetric IRB: - Both ingress and egress VTEPs route - Uses an L3 VNI (also called VRF VNI or L3 VNI) in addition to per-VLAN L2 VNIs - Ingress: route from source subnet VNI → L3 VNI - Egress: route from L3 VNI → destination subnet VNI - Only requires VTEPs serving a particular subnet to know about it
NX-OS symmetric IRB config:
vrf context TENANT_A
vni 10001 # L3 VNI for this VRF
interface Vlan100 # SVI for subnet 192.168.100.0/24
vrf member TENANT_A
ip address 192.168.100.1/24
fabric forwarding mode anycast-gateway
interface nve1
member vni 10001 associate-vrf # L3 VNI bound to VRF
member vni 10100 # L2 VNI for VLAN 100
ingress-replication protocol bgp
5. Anycast Gateway¶
Every leaf in the fabric advertises the same IP and MAC as the default gateway. Hosts always route to the local leaf's gateway rather than hairpinning to a central router.
# Arista EOS anycast gateway
ip virtual-router mac-address aabb.cc00.fffe # fabric-wide shared MAC
interface Vlan100
ip address virtual 192.168.100.1/24 # anycast gateway IP
All leaves present the same gateway IP (192.168.100.1) and the same virtual MAC (aabb.cc00.fffe). A host moving from one leaf to another keeps its default gateway address. The fabric routes the first-hop traffic from whichever leaf the host is connected to.
6. Spine-Leaf Topology Design¶
- Every leaf connects to every spine (full mesh between tiers)
- Leaves do not connect to other leaves directly
- Spines carry only IP routing (underlay) — they do not participate in EVPN MAC/IP learning
- EVPN sessions are between leaves (or between leaves and route reflectors on spines)
- Spine BGP config with route reflector role:
# NX-OS spine as BGP route reflector for EVPN
router bgp 65000
template peer-policy EVPN_RR_CLIENT
route-reflector-client
send-community both
neighbor 10.0.0.11 remote-as 65000 # leaf1
update-source loopback0
address-family l2vpn evpn
inherit peer-policy EVPN_RR_CLIENT 1
neighbor 10.0.0.12 remote-as 65000 # leaf2
update-source loopback0
address-family l2vpn evpn
inherit peer-policy EVPN_RR_CLIENT 1
7. BUM Traffic Handling¶
BUM (Broadcast, Unknown unicast, Multicast) traffic cannot be sent point-to-point because the destination VTEP is unknown.
Two approaches: 1. Multicast underlay: VTEPs join a multicast group per VNI. BUM is sent to the group; all VTEPs in the VNI receive it. Requires PIM in the underlay. 2. Head-end replication (ingress replication): The ingress VTEP sends a unicast copy of the BUM frame to every other VTEP in the VNI. EVPN Type 3 routes advertise who needs a copy. No multicast needed.
For new deployments, ingress replication + EVPN is the modern default. Multicast requires additional underlay complexity and is harder to troubleshoot.
Remember: The five EVPN route types mnemonic: "1-Auto, 2-MAC, 3-Mcast, 4-ES, 5-IP". Type 2 (MAC/IP) is the workhorse — it distributes host locations. Type 3 (Inclusive Multicast) builds the BUM replication list. Type 5 (IP Prefix) enables inter-subnet routing via L3 VNI. Types 1 and 4 are for multihoming scenarios. In day-to-day troubleshooting, Types 2, 3, and 5 are the ones you will inspect most frequently.
8. ECMP in the Fabric¶
VXLAN uses UDP encapsulation, and the source UDP port is derived from a hash of the inner packet's 5-tuple. This means different inner flows hash to different source ports, allowing ECMP to spread traffic across multiple spine uplinks.
# Verify ECMP on NX-OS
show ip load-sharing
# Check that ECMP paths are installed
show ip route 10.0.0.12 # remote leaf VTEP address
# Should show multiple next-hops (via spine1 and spine2)
# IP Route Table for VRF "default"
# '10.0.0.12/32', ubest/mbest: 2/0
# *via 10.100.1.0, Eth1/1, ... (via spine1)
# *via 10.100.1.2, Eth1/2, ... (via spine2)
Quick Reference¶
# NX-OS EVPN verification
show nve peers # VTEP peer state
show nve vni # VNI to VTEP/VLAN mapping
show bgp l2vpn evpn summary # BGP EVPN sessions
show bgp l2vpn evpn # All EVPN routes
show bgp l2vpn evpn route-type 2 # MAC/IP routes only
show bgp l2vpn evpn route-type 3 # VTEP membership routes
show mac address-table vlan 100 # L2 MAC table
show ip arp vrf TENANT_A # ARP entries in VRF
# Arista EOS verification
show vxlan vtep # VTEP peer table
show vxlan address-table # VXLAN MAC table
show bgp evpn summary # EVPN BGP sessions
show bgp evpn route-type mac-ip # Type 2 routes
show bgp evpn route-type imet # Type 3 routes
show vxlan flood vtep # BUM replication list per VNI
show interfaces Vxlan1 # VTEP interface state
# Troubleshooting VTEP reachability
ping 10.0.0.12 source loopback1 # Can local VTEP reach remote?
traceroute 10.0.0.12 source loopback1 # What path does it take?
# Check inner packet flow
# Source UDP port for VXLAN should vary per flow (ECMP)
tcpdump -i eth1 udp port 4789 -n
Key numbers: | Item | Value | |------|-------| | VXLAN UDP port | 4789 | | VXLAN overhead | ~50 bytes | | VNI size | 24-bit (up to 16M segments) | | EVPN AFI/SAFI | 25/70 (L2VPN EVPN) | | Anycast gateway concept | Same IP+MAC on all leaves per subnet |
Wiki Navigation¶
Prerequisites¶
- Networking Deep Dive (Topic Pack, L1)
- Routing (Topic Pack, L1)
Related Content¶
- Case Study: API Latency Spike — BGP Route Leak, Fix Is Network ACL (Case Study, L2) — BGP EVPN / VXLAN