Investigation: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config¶
Phase 1: Linux Ops Investigation (Dead End)¶
Check the iSCSI initiator on db-primary-01:
$ systemctl status iscsid
● iscsid.service - Open-iSCSI
Active: active (running) since Mon 2026-03-16 10:00:00 UTC; 3 days ago
$ iscsiadm -m discovery -t sendtargets -p iscsi-san.storage.internal:3260
iscsiadm: connect to 10.0.20.50:3260 failed (connect timed out)
iscsiadm: connection login retries (reopen_max) 5 exceeded
iscsiadm: Could not perform SendTargets discovery: encountered connection failure
Discovery fails with connection timeout on port 3260. But ping works:
$ ping -c 3 iscsi-san.storage.internal
PING iscsi-san.storage.internal (10.0.20.50) 56(84) bytes of data.
64 bytes from 10.0.20.50: icmp_seq=1 ttl=64 time=0.4 ms
64 bytes from 10.0.20.50: icmp_seq=2 ttl=64 time=0.3 ms
64 bytes from 10.0.20.50: icmp_seq=3 ttl=64 time=0.3 ms
ICMP works but TCP to port 3260 does not. Check if the port is reachable:
$ nc -zv iscsi-san.storage.internal 3260 -w 5
nc: connect to iscsi-san.storage.internal (10.0.20.50) port 3260 (tcp) failed: Connection timed out
$ telnet iscsi-san.storage.internal 3260
Trying 10.0.20.50...
# (hangs)
TCP connectivity to the iSCSI port is broken. The service is up (ping works), but iSCSI traffic cannot reach port 3260. Check the storage array's management interface:
$ ssh admin@iscsi-san.storage.internal
storage> show iscsi status
iSCSI Service: Running
Portal: 10.0.20.50:3260 (Active)
Sessions: 0 active (expected: 5)
The storage array's iSCSI service is running and listening. The problem is in the network path between the servers and the storage array on TCP port 3260.
The Pivot¶
The ping works (ICMP) but TCP to 3260 fails. This means either a firewall or VLAN/routing issue is blocking TCP on that port. Check the network path:
$ traceroute -T -p 3260 10.0.20.50
traceroute to 10.0.20.50 (10.0.20.50), 30 hops max
1 10.0.10.1 (10.0.10.1) 0.5 ms
2 * * *
3 * * *
TCP traceroute fails after the first hop (the gateway). ICMP traceroute succeeds:
The ICMP path is fine (2 hops). But TCP to 3260 is blocked at the gateway. Check which VLAN the servers are on:
$ ip addr show ens192
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP
inet 10.0.10.15/24 brd 10.0.10.255 scope global ens192
The servers are on VLAN 10 (10.0.10.0/24). The storage array is on VLAN 20 (10.0.20.0/24). iSCSI traffic should be on a dedicated storage VLAN.
Phase 2: Datacenter Ops Investigation (Root Cause)¶
Check the switch ACLs between VLAN 10 and VLAN 20:
# On the core switch
switch# show access-lists VLAN10-to-VLAN20
Extended IP access list VLAN10-to-VLAN20
10 permit icmp 10.0.10.0 0.0.0.255 10.0.20.0 0.0.0.255
20 permit tcp 10.0.10.0 0.0.0.255 10.0.20.0 0.0.0.255 eq 22
30 permit tcp 10.0.10.0 0.0.0.255 10.0.20.0 0.0.0.255 eq 443
40 deny ip any any log
The inter-VLAN ACL permits ICMP and TCP ports 22 and 443 — but not TCP 3260 (iSCSI). This ACL was recently updated:
switch# show access-lists VLAN10-to-VLAN20 | include remark
remark Updated 2026-03-16 by netops per DC-hardening-phase2 ticket
remark Previous rules: permit ip 10.0.10.0/24 10.0.20.0/24 (any)
Three days ago, the datacenter hardening project replaced a permissive "allow all" inter-VLAN rule with an explicit allowlist. The allowlist included SSH and HTTPS (for storage management) but forgot iSCSI (TCP 3260).
The timing matches: backups worked 3 days ago, the ACL was changed 3 days ago, and the first backup failure was the night of the change.
Domain Bridge: Why This Crossed Domains¶
Key insight: The symptom was a Linux backup failure due to an unmounted iSCSI filesystem (linux_ops), the root cause was an inter-VLAN ACL change during datacenter hardening that blocked iSCSI traffic (datacenter_ops), and the fix requires updating the VLAN ACL (networking). This is common because: iSCSI relies on network connectivity that is often taken for granted. Datacenter hardening projects that replace permissive rules with explicit allowlists frequently miss storage protocols because they focus on management and application traffic.
Root Cause¶
A datacenter hardening initiative replaced permissive inter-VLAN ACLs with explicit allowlists. The allowlist for VLAN 10 (servers) to VLAN 20 (storage) included ICMP, SSH, and HTTPS but omitted TCP 3260 (iSCSI). All iSCSI connections from database servers to the storage array were silently dropped, preventing backup volume mounts.