Infrastructure Forensics Footguns¶

Mistakes that destroy evidence, tip off attackers, or turn a contained incident into a full breach.

1. Rebooting the server as your first response¶

Something looks wrong. Your instinct says "reboot it." You reboot. The running processes, network connections, memory contents, and the attacker's active session — all gone. The IR team arrives to a clean-booted system with no volatile evidence to analyze.

Fix: Never reboot a potentially compromised system until evidence is captured. Run your evidence collection first: ps, ss, lsof, memory dump. Copy everything off-system. Then and only then discuss remediation with the security team.

Remember: The order of volatility for evidence collection: (1) CPU registers/cache, (2) memory (RAM), (3) network connections, (4) running processes, (5) disk contents, (6) remote logs. Collect in this order — each step loses less data than a reboot. RFC 3227 formalizes this as the "Guidelines for Evidence Collection."

2. Running `rm` on the suspicious file¶

You find /tmp/.hidden/miner and delete it. You feel good — threat eliminated. But now you can't analyze what it was, how it got there, who it communicated with, or whether it dropped other payloads. The IR team has nothing to reverse-engineer.

Fix: Quarantine, don't delete. Move the file to an evidence directory. Hash it with SHA-256. If you must stop it from executing, chmod 000 the file or rename it. Better yet: leave it in place and let the IR team handle it. They have sandboxes for safe analysis.

Debug clue: Before quarantining, collect metadata: stat <file> (timestamps), file <file> (file type), sha256sum <file> (hash for VirusTotal lookup), lsof <file> (what processes have it open), ls -la /proc/$(pgrep -f <file>)/exe (what binary the process is actually running). This takes 30 seconds and provides critical forensic context.

3. Logging into the compromised server with shared credentials¶

You SSH in using the shared admin account that 10 people know the password to. Your login is now indistinguishable from the attacker's in the auth logs. Chain of custody is compromised because you can't prove which human was behind the admin session.

Fix: Always use your personal, named account. Use key-based authentication. Every action you take during investigation should be attributable to you specifically. If you must use a shared account, document the exact time you logged in and every command you ran.

4. Investigating on a public Slack channel¶

You post "Hey, I think server-42 might be compromised — seeing weird processes" in #engineering. The attacker, who compromised a developer's Slack account last week, now knows you're onto them. They wipe their tracks and move to a different server before you can collect evidence.

Fix: Use out-of-band communication for security incidents. Phone calls, encrypted messaging (Signal), or a dedicated incident channel with restricted membership. Assume the attacker may have access to your internal communication tools until proven otherwise.

5. Trusting the compromised system's own tools¶

You run ps aux on the compromised server and see nothing unusual. You conclude there's no malicious process. But the attacker replaced /usr/bin/ps with a version that filters out their process. The system is lying to you.

Fix: Verify binary integrity with the package manager (rpm -Vf /usr/bin/ps). Compare SHA-256 hashes against known-good values. For deep investigations, mount the server's disk on a separate trusted system and examine files from there. Never trust output from a system you believe is compromised.

Under the hood: Rootkits that replace system binaries (userspace rootkits) are detectable with rpm -Vf. Kernel-level rootkits that hook syscalls are not — ps, ls, and netstat all use the same compromised syscalls. For kernel rootkit detection, use a bootable USB with trusted tools or examine /proc entries directly and compare with userspace output.

6. Not having a baseline to compare against¶

You find an SUID binary at /usr/local/bin/helper. Is it supposed to be there? You don't know because you never documented what SUID binaries exist on a clean system. You find a cron job in /etc/cron.d/cleanup. Legitimate or attacker persistence? No baseline, no way to tell quickly.

Fix: Create and maintain baselines for all production systems. Record: SUID files, listening ports, enabled services, installed packages, cron jobs, user accounts. Store baselines in a tamper-proof location. Compare during investigations. AIDE/Tripwire automate this, but even a monthly shell script that captures baselines is better than nothing.

7. Killing the attacker's session without capturing traffic¶

You see an active attacker session and immediately kill it. You stopped the immediate threat but now you have no idea what they were doing, what data they accessed, where they came from (VPN, Tor, compromised jump box), or whether they have other access paths.

Fix: If the attacker is active and the risk is manageable, observe before acting. Capture network traffic with tcpdump. Watch their process activity. Notify the security team while the session is still live — they may want to monitor. Only cut the session when instructed to contain, or if you see active data destruction/exfiltration.

8. Assuming the compromise is limited to one server¶

You found the compromised server. You clean it up, rebuild it, and consider the incident closed. Two weeks later, another server shows the same symptoms. The attacker used the first server to pivot to three other systems, and you only found one.

Fix: Assume lateral movement until proven otherwise. Check every system the compromised server could reach: SSH keys, shared credentials, network access. Review authentication logs on adjacent systems. The IR investigation isn't complete until you've mapped the full scope of the attacker's access.

Gotcha: Check ~/.ssh/known_hosts on the compromised server — it lists every host the server has previously SSH'd to. Also check ~/.bash_history for SSH commands, env for connection strings, and cat /proc/*/environ for environment variables in running processes. These reveal the attacker's potential lateral movement targets.

9. No centralized logging when you need it most¶

The attacker deleted /var/log/auth.log on the compromised server. Your only record of SSH logins was on that server. There's no central syslog, no SIEM, no log shipping. The evidence is gone because all your eggs were in one basket — the compromised basket.

Fix: Ship logs off-server in real time. Use rsyslog, Fluentd, Vector, or any log forwarder to send logs to a central, append-only logging system. The attacker can delete local logs, but they can't retroactively modify logs that were already shipped to your SIEM. This is the single most important forensic control you can implement.

War story: The SolarWinds attack (2020) demonstrated that attackers specifically target logging infrastructure. The SUNBURST malware checked for security tools and logging agents before activating, and avoided actions that would generate anomalous log entries. Centralized, immutable logging with anomaly detection is the forensic baseline — not the ceiling.

10. Forgetting to rotate credentials after the incident¶

You rebuild the server, patch the vulnerability, and close the incident. But the server had database passwords, API keys, and SSH keys to other systems in its environment variables, config files, and SSH agent. The attacker had access to all of those credentials and may have exfiltrated them.

Fix: After any compromise, rotate every credential the server had access to. Database passwords, API tokens, SSH keys, TLS certificates, service account tokens, cloud IAM keys — all of them. Yes, this is painful. But a compromised credential that wasn't rotated is a ticking time bomb. The attacker can come back any time using credentials you forgot to change.