Decision Tree: Suspicious Activity Detected¶
Category: Security Response Starting Question: "Something looks like a security incident — is it?" Estimated traversal: 2-5 minutes Domains: security, incident-response, observability, linux-performance, networking
The Tree¶
Something looks like a security incident — is it?
│
├── What triggered the alert or observation?
│ │
│ ├── WAF rule fired / IDS/IPS alert
│ │ │
│ │ ├── Is this a known false positive pattern for this rule?
│ │ │ Check: `grep "<rule-id>" /var/log/waf/false-positive-registry.txt`
│ │ │ Or check your WAF tuning notes / runbook for this rule ID
│ │ │ │
│ │ │ ├── YES — documented false positive
│ │ │ │ └── ✅ ACTION: Tune Alert + Document
│ │ │ │ Log this occurrence; update tuning if frequency is high
│ │ │ │
│ │ │ └── NO — not a known false positive
│ │ │ │
│ │ │ ├── Does the triggering request match a deploy/automation pattern?
│ │ │ │ `kubectl get events --field-selector=reason=Scheduled -n <ns>`
│ │ │ │ Check CI/CD pipeline for recent deploys at same time
│ │ │ │ │
│ │ │ │ ├── YES — deploy/automation explains it
│ │ │ │ │ └── ✅ ACTION: Tune Alert + Document automation exception
│ │ │ │ │
│ │ │ │ └── NO — unexplained WAF trigger
│ │ │ │ └── Continue to source IP check below
│ │ │
│ ├── Anomalous login / auth event
│ │ (New location, impossible travel, off-hours access, many failed attempts)
│ │ │
│ │ ├── Is the account a service account or human account?
│ │ │ │
│ │ │ ├── Service account — check if automation changed recently
│ │ │ │ `git log --since="48 hours ago" -- devops/ .github/`
│ │ │ │ │
│ │ │ │ ├── Automation change explains it → ✅ ACTION: Document + Monitor
│ │ │ │ │
│ │ │ │ └── No change → unusual service account login = high suspicion
│ │ │ │ └── → Escalate to security team immediately
│ │ │ │
│ │ │ └── Human account — contact the account owner directly
│ │ │ "Did you log in from <location> at <time>?"
│ │ │ │
│ │ │ ├── YES — owner confirms → document + review if MFA is enforced
│ │ │ │
│ │ │ └── NO — owner denies → assume account compromise
│ │ │ └── ✅ ACTION: Contain — disable account, rotate credentials
│ │ │ → Escalate to security team
│ │ │
│ ├── Unusual process or binary running on a host/container
│ │ `ps aux | grep -vE "expected-process-list"`
│ │ `ls -la /proc/<pid>/exe` (check binary path)
│ │ `cat /proc/<pid>/cmdline | tr '\0' ' '`
│ │ │
│ │ ├── Is the process name a known system process?
│ │ │ (But verify the binary path — attackers masquerade as `sshd`, `kworker`, etc.)
│ │ │ `readlink /proc/<pid>/exe`
│ │ │ │
│ │ │ ├── Path is wrong (e.g., `/tmp/kworker` instead of `/usr/sbin/kworker`)
│ │ │ │ └── ✅ ACTION: Contain Immediately — this is an IOC
│ │ │ │
│ │ │ └── Path matches known good binary
│ │ │ └── Verify file hash: `sha256sum /proc/<pid>/exe`
│ │ │ Compare against known-good hash
│ │ │
│ │ └── Unknown process with no explanation
│ │ └── → Treat as malicious; escalate immediately
│ │
│ ├── Network traffic spike / unusual connection
│ │ `ss -tnp | grep ESTABLISHED`
│ │ `netstat -an | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn | head -20`
│ │ `tcpdump -i eth0 -n -c 200 'not port 22 and not port 443'`
│ │ │
│ │ ├── Is the destination IP/domain known and expected?
│ │ │ `whois <ip>` / `dig -x <ip>` / check against allowlist
│ │ │ │
│ │ │ ├── YES — expected CDN, vendor, cloud provider
│ │ │ │ └── ✅ ACTION: Document + Monitor — likely false positive
│ │ │ │
│ │ │ └── NO — unknown destination, especially on unusual port
│ │ │ └── Check volume: large outbound transfer?
│ │ │ `iftop -i eth0 -n` or check NetFlow/VPC flow logs
│ │ │ │
│ │ │ ├── Large outbound (> baseline by 10x or to new destination)
│ │ │ │ └── ✅ ACTION: Contain Immediately — exfiltration signature
│ │ │ │
│ │ │ └── Small/normal volume — unknown but not alarming
│ │ │ └── → Monitor + Investigate
│ │ │
│ └── File modification alert (FIM, auditd, Falco)
│ `ausearch -k <watch-key> -ts recent`
│ `falco --list | grep <rule-name>`
│ │
│ ├── Was there a recent deployment or package update at that time?
│ │ `rpm -qa --last | head -20` / `dpkg --get-selections | xargs dpkg -l`
│ │ `kubectl rollout history deploy/<name>`
│ │ │
│ │ ├── YES — deploy explains the modification → ✅ ACTION: Document + Monitor
│ │ │
│ │ └── NO — unexplained file modification
│ │ │
│ │ ├── Is the modified file a binary, cron job, or startup script?
│ │ │ │
│ │ │ ├── YES → ✅ ACTION: Contain Immediately — persistence mechanism
│ │ │ │
│ │ │ └── NO — data file or log
│ │ │ └── → Monitor + Investigate
│ │
├── Is the activity happening on a production system?
│ │
│ ├── NO — dev/staging/sandbox only
│ │ └── Escalation threshold is lower; investigate but no immediate containment needed
│ │ unless there is evidence of lateral movement toward prod
│ │
│ └── YES — production system
│ └── Any ambiguous activity on production defaults to escalation
│ Do not wait for certainty before involving security team
│
├── Is the activity escalating? (Spreading to more systems)
│ `kubectl get events --all-namespaces | grep -i "failed\|error\|kill" | tail -30`
│ Check for new processes / connections appearing on adjacent hosts
│ │
│ ├── YES — activity spreading, multiple systems affected
│ │ └── ✅ ACTION: Declare Incident — Follow IR Playbook
│ │ Containment takes priority over investigation
│ │
│ └── NO — isolated to one system
│ └── Continue to exfiltration check
│
└── Is there a data exfiltration signature?
(Large outbound transfers, DNS tunneling, unusual destinations, bulk data queries)
│
├── Check outbound transfer volume
│ `iftop -i eth0 -t -s 10` or VPC Flow Logs filtered by source pod IP
│ Check application audit logs for bulk data queries or exports
│ │
├── Check for DNS tunneling
│ `tcpdump -i eth0 -n port 53 | awk '{print $NF}' | sort | uniq -c | sort -rn | head -20`
│ Unusually long DNS queries or high query volume to one domain = tunneling
│
├── YES — exfiltration signature present
│ └── ✅ ACTION: Contain Immediately + Declare Incident
│ Block outbound at network level NOW; data loss is imminent or ongoing
│
└── NO — no exfiltration evidence
│
├── Activity is clearly malicious (confirmed IOC, unknown process, account compromise)
│ └── ✅ ACTION: Contain Immediately
│
├── Activity is ambiguous (unusual but no confirmed malicious indicator)
│ └── ⚠️ ESCALATION: Security Team — ambiguous production activity
│
└── Activity has a plausible benign explanation (automation, deploy, known tool)
└── ✅ ACTION: Tune Alert + Document
Node Details¶
Check 1: Is this a known false positive?¶
Command/method:
# WAF rule lookup — check recent false positive registry
grep -i "<rule-id>\|<alert-name>" /var/log/waf/known-fp.log
# Check if alert correlates with a scheduled job
grep "<alert-time>" /var/log/cron
kubectl get cronjobs --all-namespaces
# List recent deployments at the same time
kubectl rollout history deploy --all-namespaces 2>/dev/null | head -20
Check 2: Attribute to known automation or deploy¶
Command/method:
# Check CI/CD pipeline timing
gh run list --limit 10 --json createdAt,headBranch,status
# Check scheduled jobs
kubectl get cronjobs --all-namespaces -o wide
crontab -l && cat /etc/cron.d/*
# Check Kubernetes events around the time of the alert
kubectl get events --all-namespaces --sort-by='.metadata.creationTimestamp' | tail -30
Check 3: Source IP — internal vs external¶
Command/method:
# Check WAF/nginx log for source IP
grep "<suspicious-request-path>" /var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -rn
# Check IP reputation
curl -s "https://ipinfo.io/<ip>/json"
# Or local: check if IP is in your RFC1918 space
python3 -c "import ipaddress; print(ipaddress.ip_address('<ip>').is_private)"
# Check active connections on the host
ss -tnp | grep ESTABLISHED | awk '{print $5}' | cut -d: -f1 | sort | uniq
Check 4: Is the activity escalating?¶
Command/method:
# Check for activity on adjacent hosts (same node, same namespace)
kubectl get events --all-namespaces | grep -i "$(date -d '30 minutes ago' +%H:%M)" | head -50
# Check for new processes spawned across the cluster
falco --list 2>/dev/null || journalctl -u falco --since "30 minutes ago" | tail -50
# Network scan detection — are other hosts logging similar activity?
grep "<source-ip>" /var/log/*/access.log 2>/dev/null | wc -l
Check 5: Data exfiltration signature¶
Command/method:
# VPC/cloud flow logs — check for large outbound
# AWS: filter by source pod IP in VPC Flow Logs (last 30 min)
aws logs filter-log-events \
--log-group-name /aws/vpc/flowlogs \
--filter-pattern "[version, account, eni, srcaddr=<pod-ip>, ...]" \
--start-time $(date -d '30 minutes ago' +%s)000
# Local: check top bandwidth consumers
iftop -i eth0 -t -s 30 -n 2>/dev/null
# DNS tunneling check — look for long subdomain queries
tcpdump -i eth0 -n port 53 -l | awk '/\. A\?/{print length($NF), $NF}' | sort -rn | head -20
# Check application for bulk data exports or unusual query patterns
kubectl logs deploy/<app> --since=30m | grep -iE "SELECT \*|LIMIT 100000|export|dump|download"
Terminal Actions¶
✅ Action: False Positive — Tune Alert + Document¶
Do:
1. Add a comment to the alert rule with the false positive explanation
2. If the rule fires too frequently: adjust threshold, add exception for known-good source, or suppress specific pattern
3. Document in your false positive registry: echo "<date> <rule-id> <reason>" >> /var/log/waf/known-fp.log
4. Review whether the tuning change could hide a real attack — consult with security team if unsure
Verify: Alert does not fire on the next occurrence of the known-benign behavior. Alert still fires on a simulated malicious version of the pattern.
✅ Action: Monitor + Investigate¶
Do: 1. Capture a snapshot of current state: running processes, network connections, open files
ps auxf > /tmp/proc-snapshot-$(date +%Y%m%d-%H%M).txt
ss -tnp >> /tmp/proc-snapshot-$(date +%Y%m%d-%H%M).txt
lsof -n >> /tmp/proc-snapshot-$(date +%Y%m%d-%H%M).txt
watch -n 60 "ss -tnp | grep ESTABLISHED; ps auxf | grep -v '\[kworker'"
3. Increase logging verbosity temporarily on the affected service
4. Correlate with SIEM/Datadog for historical context on this IP/host/account
5. Set alert for recurrence or escalation
Verify: Activity does not recur or escalate within 1 hour. Document findings (even if conclusion is "unexplained but low risk"). Do not close investigation without a written conclusion.
✅ Action: Contain Immediately¶
Do: 1. Isolate the affected system — block network access at security group / NetworkPolicy level:
# Kubernetes: apply a deny-all NetworkPolicy to the affected pod label
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: emergency-isolate
namespace: <namespace>
spec:
podSelector:
matchLabels:
app: <affected-app>
policyTypes: [Ingress, Egress]
EOF
kubectl delete pod <pod> --grace-period=0
4. Cordon the node if host-level compromise is suspected: kubectl cordon <node>
5. Escalate to security team immediately with collected evidence
Verify: Affected system no longer has network connectivity. Evidence snapshots are saved. Security team is paged and has the evidence. Incident ticket is open.
⚠️ Escalation: Security Team — Ambiguous Production Activity¶
When: Activity on production that cannot be explained by automation or known patterns, but no confirmed IOC yet. Who: Security team on-call; if unavailable, CISO. Include in page: Alert description, system/service affected, timestamp and duration, source IP and destination, what makes it ambiguous (what you've ruled out and what you haven't), whether it is ongoing.
✅ Action: Declare Incident — Follow IR Playbook¶
Do:
1. Declare incident in PagerDuty/incident.io — severity based on: confirmed breach = P1, active spreading = P1, suspected exfiltration = P1
2. Assign incident commander — this person drives all decisions, communications, and escalations
3. Open incident Slack channel: #inc-<date>-<short-description>
4. Begin timeline log — document every action with timestamp from this point forward
5. Do NOT reboot or terminate systems without IC approval — you may destroy forensic evidence
6. Execute IR playbook: training/library/runbooks/incident_response.md
Verify: Incident is declared, IC is assigned, channel is open, timeline is started. Security team and management are notified.
Edge Cases¶
- The alert is from a third-party security vendor scanner (e.g., Qualys, Tenable) doing an authorized scan: Their scanner traffic is indistinguishable from an attacker. Maintain a scheduled scanner IP allowlist and check scan schedules before investigating.
- Activity looks like it is from your own monitoring (Prometheus, Datadog agent): Monitoring agents do read system files, make network connections, and run processes. Know your monitoring agent's behavior and confirm via process tree (
pstree -p <pid>). - Suspicious activity is in a CI/CD pipeline runner: Pipeline runners execute arbitrary code — a supply chain compromise can inject malicious steps. Check if the suspicious activity correlates with a specific pipeline run and review that run's definition.
- You are not sure if you have the right to contain (service is owned by another team): Contact the service owner first but do not wait more than 5 minutes if the activity is escalating. Containment first, apology later.
- The suspicious IP is your own load balancer or proxy: Health checks, proxied connections, and NAT gateways cause this constantly. Know your infrastructure's IP ranges before investigating an internal IP as external.
Cross-References¶
- Topic Packs: security, incident-response, observability-deep-dive, networking
- Related trees: found-vulnerability.md, secret-exposed.md
- Runbooks: incident_response.md