Skip to content

Portal | Level: L2: Operations | Topics: VLANs, Cisco CLI | Domain: Networking

Scenario: VLAN Trunk Mismatch — Server Cannot Reach Its Gateway

Situation

At 08:30 UTC, a newly provisioned application server (app-web-04) in VLAN 150 cannot reach its default gateway or any other host. The server was set up identically to app-web-03 which works fine on the same VLAN. The network team says the switch port is configured and active. The server has an IP address and the link light is on, but all traffic beyond the directly connected segment fails. The application launch planned for 10:00 UTC is at risk.

What You Know

  • app-web-04 has IP 10.150.1.14/24, gateway 10.150.1.1 (VLAN 150)
  • app-web-03 (working) is on the same VLAN and same switch, different port
  • Link is up (LED active, ip link shows UP)
  • ARP for the gateway fails — no response
  • The server was just plugged into a switch port that was previously used for a trunk to another switch
  • Pinging any IP, including the gateway, gets "Destination Host Unreachable" (no ARP reply)

Investigation Steps

1. Verify the server's network configuration and VLAN tagging

Command(s):

# Check IP configuration
ip addr show
ip route show

# Check if the server is sending tagged (802.1Q) frames
ip -d link show
ip link show type vlan

# Check for any VLAN subinterfaces
ls /proc/net/vlan/ 2>/dev/null
cat /proc/net/vlan/config 2>/dev/null

# Check network configuration files
cat /etc/netplan/*.yaml 2>/dev/null
cat /etc/sysconfig/network-scripts/ifcfg-eth0 2>/dev/null
cat /etc/network/interfaces 2>/dev/null
What to look for: Two possible mismatches: (A) The server is sending VLAN-tagged frames (802.1Q) but the switch port is configured as an access port, so the switch drops the tagged frames. This happens when someone configures a VLAN subinterface like eth0.150 on the server. (B) The switch port is still configured as a trunk (from its previous use) and expects tagged frames, but the server is sending untagged frames on bare eth0. In case B, the switch puts untagged frames into the native VLAN (often VLAN 1), not VLAN 150.

2. Capture frames at layer 2 to check for VLAN tags

Command(s):

# Capture with -e to show ethernet headers, look for 802.1Q tags
tcpdump -nn -e -i eth0 -c 20

# Send ARP and watch what goes out
arping -I eth0 10.150.1.1

# Capture specifically looking for 802.1Q tagged frames
# VLAN tag ethertype is 0x8100
tcpdump -nn -e -i eth0 'ether proto 0x8100'

# Check ARP table
ip neigh show
arp -n
What to look for: In the tcpdump -e output, look for vlan 150 tags in the ethernet header. If you see ethertype 802.1Q (0x8100), vlan 150, the server is sending tagged frames. If you see plain ethertype IPv4 (0x0800) or ethertype ARP (0x0806) with no VLAN tag, the server is sending untagged frames. For ARP, you should see ARP requests going out but no ARP replies coming back — this confirms the frames are not reaching the gateway (or the gateway's replies are not reaching the server).

3. Compare with the working server

Command(s):

# On the working server (app-web-03)
ip -d link show
ip link show type vlan
tcpdump -nn -e -i eth0 -c 5

# Check if the working server uses a VLAN subinterface or bare interface
ip addr show eth0
ip addr show eth0.150 2>/dev/null
What to look for: If app-web-03 uses a bare eth0 with no VLAN tagging (and works), then its switch port is an access port in VLAN 150. If app-web-03 uses eth0.150 (a VLAN subinterface), then its switch port is a trunk that allows VLAN 150. The new server must match this configuration, AND its switch port must match as well.

4. Verify from the switch side (if accessible)

Command(s):

# SSH to the switch and check port configuration
ssh admin@switch

# Cisco IOS style commands:
# show running-config interface GigabitEthernet0/4
# show interfaces GigabitEthernet0/4 switchport
# show interfaces GigabitEthernet0/4 trunk

# Look for:
# - "switchport mode access" vs "switchport mode trunk"
# - "switchport access vlan 150" (for access mode)
# - "switchport trunk allowed vlan" list (for trunk mode)

# If no switch access, use LLDP/CDP from the server to identify the switch port
lldpctl 2>/dev/null
What to look for: If the port shows switchport mode trunk with allowed vlan 1,100,200 (VLAN 150 not in the list), the switch is trunking but not carrying VLAN 150. If the port shows switchport mode access with switchport access vlan 1 (default VLAN), it was never reconfigured from its default. The correct configuration depends on whether the server sends tagged or untagged frames.

Root Cause

The switch port for app-web-04 was previously used as a trunk link to another switch. When it was repurposed for the new server, the network team said they "configured" it, but the port was still in trunk mode with an allowed VLAN list of 1, 100, and 200 (the VLANs needed by the old trunk). VLAN 150 was not in the allowed list. The server was configured with a bare eth0 interface (no VLAN tagging), sending untagged frames. The trunk port placed untagged frames into the native VLAN (VLAN 1), not VLAN 150. So the server's traffic ended up in VLAN 1 where no gateway at 10.150.1.1 existed, and ARP requests went unanswered. The fix required either converting the port to access mode in VLAN 150, or adding VLAN 150 to the trunk's allowed list and configuring the server to tag its frames.

Fix

Immediate:

# Option A: If the server should send untagged frames (standard server setup)
# On the switch, convert the port to access mode:
#   configure terminal
#   interface GigabitEthernet0/4
#   switchport mode access
#   switchport access vlan 150
#   no shutdown
#   end

# Option B: If the server must tag its own frames (less common, multi-VLAN server)
# On the server, create a VLAN subinterface:
ip link add link eth0 name eth0.150 type vlan id 150
ip addr add 10.150.1.14/24 dev eth0.150
ip link set eth0.150 up
ip route add default via 10.150.1.1 dev eth0.150

# Then on the switch, ensure VLAN 150 is in the trunk allowed list:
#   configure terminal
#   interface GigabitEthernet0/4
#   switchport trunk allowed vlan add 150
#   end

# Verify — ARP should now resolve
arping -I eth0 10.150.1.1
ping -c 3 10.150.1.1

Preventive: - Implement a switch port provisioning template. When repurposing a port, always reset it to default first (default interface GigabitEthernet0/X on Cisco) before applying the new configuration. - Use a configuration management tool (Ansible, Napalm) for switch port configurations so the intended state is documented and enforced:

# Example Ansible task
- name: Configure access port for app server
  ios_l2_interfaces:
    config:
      - name: GigabitEthernet0/4
        mode: access
        access:
          vlan: 150
- Enable LLDP on servers and switches so you can verify which switch port a server is connected to and cross-reference the configuration. - Run a post-provisioning connectivity test as part of the server deployment automation. Do not wait for the application team to discover the issue. - Set unused switch ports to a quarantine VLAN (switchport access vlan 999) and shut them down. This prevents accidentally inheriting old configurations.

Common Mistakes

  • Assuming "link is up means the network is working." Layer 1 (physical link) being up tells you nothing about layer 2 (VLAN) correctness. The port can be up but in the completely wrong VLAN.
  • Not checking for 802.1Q tags in the frame. Without using tcpdump -e, you cannot see whether the server is tagging its frames. This is invisible at the IP layer.
  • Forgetting about the native VLAN on trunk ports. Untagged frames on a trunk port go into the native VLAN, which is usually VLAN 1. This silently puts the server in the wrong network with no error messages.
  • Checking only the server side. VLAN issues almost always require checking both the server's configuration and the switch port's configuration. The mismatch is between the two.
  • Confusing "no ARP reply" with "server misconfiguration." If ARP fails for the gateway, the problem is nearly always at layer 2 (wrong VLAN, wrong port config, cable in wrong port) rather than the server's IP configuration.

Interview Angle

Q: A server has the correct IP and the link is up, but it cannot ARP for its gateway. What do you check? Good answer shape: This is a layer 2 problem. When ARP fails for a directly connected gateway, the server's frames are not reaching the gateway at the ethernet level. Check if there is a VLAN mismatch: the server might be in the wrong VLAN (switch port configured incorrectly), or the server might be sending tagged frames on an access port (or vice versa). Use tcpdump -e to look for 802.1Q tags in outgoing frames. Verify the switch port configuration — is it access or trunk mode, and what VLAN is it assigned to? Compare with a working server on the same VLAN. Mention that repurposed switch ports are a common culprit because they retain their old configuration (trunk mode, old VLAN list) unless explicitly reset. The systematic approach is: verify layer 1 (link up), then layer 2 (correct VLAN, correct tagging), then layer 3 (correct IP/mask/gateway).


Wiki Navigation

Prerequisites