The Firewall Rule That Blocked Itself¶

Category: The Incident Domains: firewalls, networking Read time: ~5 min

Setting the Scene¶

I was a network engineer at a regional hospital system, about 1,200 employees. We ran our own small datacenter — two racks, a dozen servers, a pair of FortiGate firewalls in HA. Most of the job was routine: patching, VLAN changes, the occasional new server. One Thursday afternoon, the security team asked me to add a firewall rule to block outbound traffic on some ports flagged in a recent audit. Simple stuff. I'd done it a hundred times.

What Happened¶

Thursday 3:15 PM — I log into the primary FortiGate via SSH on port 22 from my workstation on VLAN 10 (the management VLAN). I start adding deny rules for the flagged outbound ports: 6667 (IRC), 4444 (common for reverse shells), and a few others. Standard security hardening.

3:20 PM — I'm going through the audit list and I see port 22 flagged as "outbound SSH — restrict to known destinations." I read it as "block outbound SSH except to known hosts" and create a deny rule for port 22 outbound from all source IPs, planning to add exceptions after. I apply the rule.

3:20 PM + 3 seconds — My SSH session freezes. I stare at the terminal for ten seconds before the realization hits me like a truck. I just blocked port 22. My management session runs over port 22. I blocked my own management connection.

3:22 PM — I try to open a new SSH session. Connection refused. I try the web GUI on port 443 — that still works because I only blocked port 22. I log in, navigate to the firewall policy page, and discover the web GUI is in read-only mode because the HA sync is also using a port I just blocked. The secondary firewall has already picked up the bad rule via HA sync.

3:28 PM — The web GUI lets me view but not edit because the HA cluster is in a degraded state from the sync port being blocked. I try console access — we have a serial console server, but nobody has updated the password since it was installed in 2019. The default password doesn't work.

3:35 PM — I call my colleague. He drives to the datacenter (20 minutes away). He connects a laptop via console cable to the primary FortiGate, logs in, and removes the rule. HA sync recovers. Total management lockout: about 40 minutes. No patient-facing services were affected — this only blocked management traffic, not data plane traffic. But for 40 minutes, we couldn't manage our firewalls at all.

The Moment of Truth¶

I typed a rule that blocked the very protocol I was using to type rules. It's the networking equivalent of cutting the branch you're sitting on. And because of HA sync, both firewalls got the bad rule simultaneously — there was no failover path.

The Aftermath¶

We implemented three changes immediately. First: a cron-equivalent on the FortiGate that auto-reverts any firewall policy changes after 10 minutes unless explicitly confirmed (similar to at now + 10 minutes approach on Linux). Second: we set up out-of-band management via a dedicated cellular modem connected to the console port, so we'd never need to physically drive to the datacenter for a lockout again. Third: we added a standing rule at the top of the policy table that explicitly permits management traffic from the management VLAN, with a comment: "DO NOT MODIFY — management access lifeline."

The Lessons¶

Never test firewall rules on your management interface first: Always ensure your management access is protected by an immutable rule before making changes to the ports or protocols you're using to manage the device.
Use auto-revert for firewall changes: Set a timer to automatically roll back changes unless you explicitly confirm them. The Linux at command pattern (shutdown -r +5 style) should be standard practice.
Always have out-of-band access: Console servers, IPMI, cellular modems — if your only management path is through the firewall you're editing, you're one typo away from a road trip.

What I'd Do Differently¶

I'd never apply firewall rules directly in a production environment again. Every change would go through a staging firewall first, then be applied via automation (Ansible or Terraform) with a mandatory canary period and auto-rollback. I'd also separate the management plane entirely — dedicated management interfaces on a separate network that firewall policy changes can never touch.

The Quote¶

"I locked myself out of a firewall using the firewall. It took a 20-minute car ride to undo a 3-second mistake."

Cross-References¶

Topic Packs: Firewalls, Networking, Change Management
Case Studies: Networking