Incident Replay: iptables Blocking Unexpected Traffic¶

Setup¶

System context: Production application server that was recently hardened with new firewall rules. Some application features stopped working intermittently after the hardening.
Time: Monday 10:30 UTC
Your role: On-call SRE / Linux engineer

Round 1: Alert Fires¶

[Pressure cue: "Application team reports intermittent failures connecting to the payment gateway. Worked fine before the weekend. 'Something changed.'"]

What you see: curl to the payment gateway from the server sometimes works, sometimes times out. The gateway is reachable from other servers. ping to the gateway works consistently.

Choose your action: - A) Check DNS resolution for the payment gateway - B) Check iptables rules on the server - C) Contact the payment gateway provider about their uptime - D) Check the server's network interface for errors

If you chose B (recommended):¶

[Result: iptables -L -n -v shows a DROP rule in the OUTPUT chain for all TCP traffic to ports above 1024, except for a whitelist of known ports (80, 443, 22). The payment gateway uses port 8443. This rule was added during the weekend hardening. Proceed to Round 2.]

If you chose A:¶

[Result: DNS resolves correctly. The issue is not DNS.]

If you chose C:¶

[Result: Gateway provider confirms 100% uptime. The issue is client-side.]

If you chose D:¶

[Result: Network interface is clean — no errors, no drops at the NIC level. The drops are at the firewall level.]

Round 2: First Triage Data¶

[Pressure cue: "Payment processing is unreliable. Revenue impacted. The hardening was supposed to improve security, not break things."]

What you see: The hardening script added an OUTPUT chain rule: -A OUTPUT -p tcp --dport 1024:65535 -j DROP with exceptions for ports 80, 443, 22, and 3306. The payment gateway on port 8443 was not in the exception list. The "sometimes works" behavior was because the connection occasionally used ephemeral source ports that happened to match an ESTABLISHED,RELATED rule.

Choose your action: - A) Add port 8443 to the iptables exception list - B) Remove the OUTPUT chain restrictions entirely - C) Audit all external services the application connects to and whitelist them - D) Switch from port-based filtering to conntrack-based filtering

If you chose C (recommended):¶

[Result: Audit reveals 5 external services on non-standard ports: payment gateway (8443), metrics collector (9090), log aggregator (5044), webhook endpoint (8080), and SMTP relay (587). Add all to the whitelist. Proceed to Round 3.]

If you chose A:¶

[Result: Fixes payment but the next service on a non-standard port will break. You are playing whack-a-mole.]

If you chose B:¶

[Result: Removes the security hardening entirely. The security team will revert your change.]

If you chose D:¶

[Result: Conntrack-based filtering is the right architecture but requires rewriting the entire ruleset. Not an incident-time change.]

Round 3: Root Cause Identification¶

[Pressure cue: "Whitelist updated. Why was this not caught before the hardening went live?"]

What you see: Root cause: The hardening script was tested on a staging server that only connects to services on ports 80/443. The production server's external service dependencies were not inventoried before applying the firewall rules.

Choose your action: - A) Add a pre-hardening dependency audit step to the hardening procedure - B) Implement the firewall rules with logging before blocking (DROP -> LOG first) - C) Use conntrack to allow ESTABLISHED,RELATED connections and only filter NEW - D) All of the above

If you chose D (recommended):¶

[Result: Dependency audit catches services before blocking. LOG rules provide visibility before enforcement. Conntrack allows established connections cleanly. Proceed to Round 4.]

If you chose A:¶

[Result: Manual audit is good but can miss dynamically discovered services.]

If you chose B:¶

[Result: LOG before DROP is excellent for testing but should not stay permanently — log volume can be huge.]

If you chose C:¶

[Result: Conntrack is the right approach for stateful filtering but needs the audit to set initial policy correctly.]

Round 4: Remediation¶

[Pressure cue: "All services restored. Harden the hardening process."]

Actions: 1. Verify all external service connections work: test each whitelisted port 2. Verify iptables rules are correct: iptables -L -n -v 3. Add the external service inventory to the hardening runbook 4. Implement a "log-and-monitor" phase before "block" phase for future hardening 5. Add automated connectivity tests for all external dependencies

Damage Report¶

Total downtime: 0 (intermittent failures, not complete outage)
Blast radius: Payment processing unreliable for ~36 hours (since weekend hardening)
Optimal resolution time: 15 minutes (check iptables -> identify missing port -> audit all services -> whitelist)
If every wrong choice was made: 2+ hours plus repeated service breakage from incomplete whitelisting

Incident Replay: iptables Blocking Unexpected Traffic¶

Setup¶

Round 1: Alert Fires¶

If you chose B (recommended):¶

If you chose A:¶

If you chose C:¶

If you chose D:¶

Round 2: First Triage Data¶

If you chose C (recommended):¶

If you chose A:¶

If you chose B:¶

If you chose D:¶

Round 3: Root Cause Identification¶

If you chose D (recommended):¶

If you chose A:¶

If you chose B:¶

If you chose C:¶

Round 4: Remediation¶

Damage Report¶

Cross-References¶

Pages that link here¶