Decision Tree: Suspicious Activity Detected¶

Category: Security Response Starting Question: "Something looks like a security incident — is it?" Estimated traversal: 2-5 minutes Domains: security, incident-response, observability, linux-performance, networking

The Tree¶

Something looks like a security incident — is it?
│
├── What triggered the alert or observation?
│   │
│   ├── WAF rule fired / IDS/IPS alert
│   │   │
│   │   ├── Is this a known false positive pattern for this rule?
│   │   │   Check: `grep "<rule-id>" /var/log/waf/false-positive-registry.txt`
│   │   │   Or check your WAF tuning notes / runbook for this rule ID
│   │   │   │
│   │   │   ├── YES — documented false positive
│   │   │   │   └── ✅ ACTION: Tune Alert + Document
│   │   │   │       Log this occurrence; update tuning if frequency is high
│   │   │   │
│   │   │   └── NO — not a known false positive
│   │   │       │
│   │   │       ├── Does the triggering request match a deploy/automation pattern?
│   │   │       │   `kubectl get events --field-selector=reason=Scheduled -n <ns>`
│   │   │       │   Check CI/CD pipeline for recent deploys at same time
│   │   │       │   │
│   │   │       │   ├── YES — deploy/automation explains it
│   │   │       │   │   └── ✅ ACTION: Tune Alert + Document automation exception
│   │   │       │   │
│   │   │       │   └── NO — unexplained WAF trigger
│   │   │       │       └── Continue to source IP check below
│   │   │
│   ├── Anomalous login / auth event
│   │   (New location, impossible travel, off-hours access, many failed attempts)
│   │   │
│   │   ├── Is the account a service account or human account?
│   │   │   │
│   │   │   ├── Service account — check if automation changed recently
│   │   │   │   `git log --since="48 hours ago" -- devops/ .github/`
│   │   │   │   │
│   │   │   │   ├── Automation change explains it → ✅ ACTION: Document + Monitor
│   │   │   │   │
│   │   │   │   └── No change → unusual service account login = high suspicion
│   │   │   │       └── → Escalate to security team immediately
│   │   │   │
│   │   │   └── Human account — contact the account owner directly
│   │   │       "Did you log in from <location> at <time>?"
│   │   │       │
│   │   │       ├── YES — owner confirms → document + review if MFA is enforced
│   │   │       │
│   │   │       └── NO — owner denies → assume account compromise
│   │   │           └── ✅ ACTION: Contain — disable account, rotate credentials
│   │   │               → Escalate to security team
│   │   │
│   ├── Unusual process or binary running on a host/container
│   │   `ps aux | grep -vE "expected-process-list"`
│   │   `ls -la /proc/<pid>/exe`  (check binary path)
│   │   `cat /proc/<pid>/cmdline | tr '\0' ' '`
│   │   │
│   │   ├── Is the process name a known system process?
│   │   │   (But verify the binary path — attackers masquerade as `sshd`, `kworker`, etc.)
│   │   │   `readlink /proc/<pid>/exe`
│   │   │   │
│   │   │   ├── Path is wrong (e.g., `/tmp/kworker` instead of `/usr/sbin/kworker`)
│   │   │   │   └── ✅ ACTION: Contain Immediately — this is an IOC
│   │   │   │
│   │   │   └── Path matches known good binary
│   │   │       └── Verify file hash: `sha256sum /proc/<pid>/exe`
│   │   │           Compare against known-good hash
│   │   │
│   │   └── Unknown process with no explanation
│   │       └── → Treat as malicious; escalate immediately
│   │
│   ├── Network traffic spike / unusual connection
│   │   `ss -tnp | grep ESTABLISHED`
│   │   `netstat -an | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn | head -20`
│   │   `tcpdump -i eth0 -n -c 200 'not port 22 and not port 443'`
│   │   │
│   │   ├── Is the destination IP/domain known and expected?
│   │   │   `whois <ip>` / `dig -x <ip>` / check against allowlist
│   │   │   │
│   │   │   ├── YES — expected CDN, vendor, cloud provider
│   │   │   │   └── ✅ ACTION: Document + Monitor — likely false positive
│   │   │   │
│   │   │   └── NO — unknown destination, especially on unusual port
│   │   │       └── Check volume: large outbound transfer?
│   │   │           `iftop -i eth0 -n` or check NetFlow/VPC flow logs
│   │   │           │
│   │   │           ├── Large outbound (> baseline by 10x or to new destination)
│   │   │           │   └── ✅ ACTION: Contain Immediately — exfiltration signature
│   │   │           │
│   │   │           └── Small/normal volume — unknown but not alarming
│   │   │               └── → Monitor + Investigate
│   │   │
│   └── File modification alert (FIM, auditd, Falco)
│       `ausearch -k <watch-key> -ts recent`
│       `falco --list | grep <rule-name>`
│       │
│       ├── Was there a recent deployment or package update at that time?
│       │   `rpm -qa --last | head -20` / `dpkg --get-selections | xargs dpkg -l`
│       │   `kubectl rollout history deploy/<name>`
│       │   │
│       │   ├── YES — deploy explains the modification → ✅ ACTION: Document + Monitor
│       │   │
│       │   └── NO — unexplained file modification
│       │       │
│       │       ├── Is the modified file a binary, cron job, or startup script?
│       │       │   │
│       │       │   ├── YES → ✅ ACTION: Contain Immediately — persistence mechanism
│       │       │   │
│       │       │   └── NO — data file or log
│       │       │       └── → Monitor + Investigate
│       │
├── Is the activity happening on a production system?
│   │
│   ├── NO — dev/staging/sandbox only
│   │   └── Escalation threshold is lower; investigate but no immediate containment needed
│   │       unless there is evidence of lateral movement toward prod
│   │
│   └── YES — production system
│       └── Any ambiguous activity on production defaults to escalation
│           Do not wait for certainty before involving security team
│
├── Is the activity escalating? (Spreading to more systems)
│   `kubectl get events --all-namespaces | grep -i "failed\|error\|kill" | tail -30`
│   Check for new processes / connections appearing on adjacent hosts
│   │
│   ├── YES — activity spreading, multiple systems affected
│   │   └── ✅ ACTION: Declare Incident — Follow IR Playbook
│   │       Containment takes priority over investigation
│   │
│   └── NO — isolated to one system
│       └── Continue to exfiltration check
│
└── Is there a data exfiltration signature?
    (Large outbound transfers, DNS tunneling, unusual destinations, bulk data queries)
    │
    ├── Check outbound transfer volume
    │   `iftop -i eth0 -t -s 10` or VPC Flow Logs filtered by source pod IP
    │   Check application audit logs for bulk data queries or exports
    │   │
    ├── Check for DNS tunneling
    │   `tcpdump -i eth0 -n port 53 | awk '{print $NF}' | sort | uniq -c | sort -rn | head -20`
    │   Unusually long DNS queries or high query volume to one domain = tunneling
    │
    ├── YES — exfiltration signature present
    │   └── ✅ ACTION: Contain Immediately + Declare Incident
    │       Block outbound at network level NOW; data loss is imminent or ongoing
    │
    └── NO — no exfiltration evidence
        │
        ├── Activity is clearly malicious (confirmed IOC, unknown process, account compromise)
        │   └── ✅ ACTION: Contain Immediately
        │
        ├── Activity is ambiguous (unusual but no confirmed malicious indicator)
        │   └── ⚠️ ESCALATION: Security Team — ambiguous production activity
        │
        └── Activity has a plausible benign explanation (automation, deploy, known tool)
            └── ✅ ACTION: Tune Alert + Document

Node Details¶

Check 1: Is this a known false positive?¶

Command/method:

# WAF rule lookup — check recent false positive registry
grep -i "<rule-id>\|<alert-name>" /var/log/waf/known-fp.log

# Check if alert correlates with a scheduled job
grep "<alert-time>" /var/log/cron
kubectl get cronjobs --all-namespaces
# List recent deployments at the same time
kubectl rollout history deploy --all-namespaces 2>/dev/null | head -20

What you're looking for: A documented, recurring pattern that has been previously validated as safe. Security scanners, vulnerability scanners, CDN health checks, and synthetic monitoring all routinely trigger security alerts. Common pitfall: The "known false positive" exemption is frequently abused to avoid investigating inconvenient alerts. Require written documentation of why it is a false positive, not just institutional memory.

Check 2: Attribute to known automation or deploy¶

Command/method:

# Check CI/CD pipeline timing
gh run list --limit 10 --json createdAt,headBranch,status

# Check scheduled jobs
kubectl get cronjobs --all-namespaces -o wide
crontab -l && cat /etc/cron.d/*

# Check Kubernetes events around the time of the alert
kubectl get events --all-namespaces --sort-by='.metadata.creationTimestamp' | tail -30

What you're looking for: A deploy, smoke test, scanner, or scheduled job that ran at the exact time of the alert and explains the observed behavior. Common pitfall: Timing correlation is not the same as causation. A deploy at 14:02 does not explain a suspicious process that started at 14:47. Check timestamps carefully.

Check 3: Source IP — internal vs external¶

Command/method:

# Check WAF/nginx log for source IP
grep "<suspicious-request-path>" /var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -rn

# Check IP reputation
curl -s "https://ipinfo.io/<ip>/json"
# Or local: check if IP is in your RFC1918 space
python3 -c "import ipaddress; print(ipaddress.ip_address('<ip>').is_private)"

# Check active connections on the host
ss -tnp | grep ESTABLISHED | awk '{print $5}' | cut -d: -f1 | sort | uniq

What you're looking for: Whether the source is internal (could be a misconfigured service or compromised internal host) or external (direct attack or C2 communication). Internal sources are not automatically safe — lateral movement from a compromised internal host is common. Common pitfall: VPN and NAT. An "internal" IP may be a VPN endpoint proxying external traffic. Check both the source IP and whether it correlates with a specific VPN user or session.

Check 4: Is the activity escalating?¶

Command/method:

# Check for activity on adjacent hosts (same node, same namespace)
kubectl get events --all-namespaces | grep -i "$(date -d '30 minutes ago' +%H:%M)" | head -50

# Check for new processes spawned across the cluster
falco --list 2>/dev/null || journalctl -u falco --since "30 minutes ago" | tail -50

# Network scan detection — are other hosts logging similar activity?
grep "<source-ip>" /var/log/*/access.log 2>/dev/null | wc -l

What you're looking for: The same pattern appearing on multiple systems, or new activity appearing on systems adjacent to the initial alert. Ransomware and worms move fast — escalation is a strong signal of an active attack. Common pitfall: Alert deduplication hides escalation. Your SIEM may suppress repeat alerts for the same rule, making it look like a single event when it is actually spreading. Check raw logs, not just alert counts.

Check 5: Data exfiltration signature¶

Command/method:

# VPC/cloud flow logs — check for large outbound
# AWS: filter by source pod IP in VPC Flow Logs (last 30 min)
aws logs filter-log-events \
  --log-group-name /aws/vpc/flowlogs \
  --filter-pattern "[version, account, eni, srcaddr=<pod-ip>, ...]" \
  --start-time $(date -d '30 minutes ago' +%s)000

# Local: check top bandwidth consumers
iftop -i eth0 -t -s 30 -n 2>/dev/null

# DNS tunneling check — look for long subdomain queries
tcpdump -i eth0 -n port 53 -l | awk '/\. A\?/{print length($NF), $NF}' | sort -rn | head -20

# Check application for bulk data exports or unusual query patterns
kubectl logs deploy/<app> --since=30m | grep -iE "SELECT \*|LIMIT 100000|export|dump|download"

What you're looking for: Sustained outbound transfer (not a one-time request), large DNS query labels (> 50 chars = tunneling candidate), bulk application queries without a business event to explain them. Common pitfall: Confusing backup processes with exfiltration. Scheduled backups to S3/GCS look exactly like exfiltration in network traffic. Know your backup schedule and destination addresses.

Terminal Actions¶

✅ Action: False Positive — Tune Alert + Document¶

Do: 1. Add a comment to the alert rule with the false positive explanation 2. If the rule fires too frequently: adjust threshold, add exception for known-good source, or suppress specific pattern 3. Document in your false positive registry: echo "<date> <rule-id> <reason>" >> /var/log/waf/known-fp.log 4. Review whether the tuning change could hide a real attack — consult with security team if unsure Verify: Alert does not fire on the next occurrence of the known-benign behavior. Alert still fires on a simulated malicious version of the pattern.

✅ Action: Monitor + Investigate¶

Do: 1. Capture a snapshot of current state: running processes, network connections, open files

ps auxf > /tmp/proc-snapshot-$(date +%Y%m%d-%H%M).txt
ss -tnp >> /tmp/proc-snapshot-$(date +%Y%m%d-%H%M).txt
lsof -n >> /tmp/proc-snapshot-$(date +%Y%m%d-%H%M).txt

2. Set a 30-minute watch: watch -n 60 "ss -tnp | grep ESTABLISHED; ps auxf | grep -v '\[kworker'" 3. Increase logging verbosity temporarily on the affected service 4. Correlate with SIEM/Datadog for historical context on this IP/host/account 5. Set alert for recurrence or escalation Verify: Activity does not recur or escalate within 1 hour. Document findings (even if conclusion is "unexplained but low risk"). Do not close investigation without a written conclusion.

✅ Action: Contain Immediately¶

Do: 1. Isolate the affected system — block network access at security group / NetworkPolicy level:

# Kubernetes: apply a deny-all NetworkPolicy to the affected pod label
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: emergency-isolate
  namespace: <namespace>
spec:
  podSelector:
    matchLabels:
      app: <affected-app>
  policyTypes: [Ingress, Egress]
EOF

2. Preserve state before terminating — capture process list, network connections, active files 3. Terminate the suspicious process or pod: kubectl delete pod <pod> --grace-period=0 4. Cordon the node if host-level compromise is suspected: kubectl cordon <node> 5. Escalate to security team immediately with collected evidence Verify: Affected system no longer has network connectivity. Evidence snapshots are saved. Security team is paged and has the evidence. Incident ticket is open.

⚠️ Escalation: Security Team — Ambiguous Production Activity¶

When: Activity on production that cannot be explained by automation or known patterns, but no confirmed IOC yet. Who: Security team on-call; if unavailable, CISO. Include in page: Alert description, system/service affected, timestamp and duration, source IP and destination, what makes it ambiguous (what you've ruled out and what you haven't), whether it is ongoing.

✅ Action: Declare Incident — Follow IR Playbook¶

Do: 1. Declare incident in PagerDuty/incident.io — severity based on: confirmed breach = P1, active spreading = P1, suspected exfiltration = P1 2. Assign incident commander — this person drives all decisions, communications, and escalations 3. Open incident Slack channel: #inc-<date>-<short-description> 4. Begin timeline log — document every action with timestamp from this point forward 5. Do NOT reboot or terminate systems without IC approval — you may destroy forensic evidence 6. Execute IR playbook: training/library/runbooks/incident_response.md Verify: Incident is declared, IC is assigned, channel is open, timeline is started. Security team and management are notified.

Edge Cases¶

The alert is from a third-party security vendor scanner (e.g., Qualys, Tenable) doing an authorized scan: Their scanner traffic is indistinguishable from an attacker. Maintain a scheduled scanner IP allowlist and check scan schedules before investigating.
Activity looks like it is from your own monitoring (Prometheus, Datadog agent): Monitoring agents do read system files, make network connections, and run processes. Know your monitoring agent's behavior and confirm via process tree (pstree -p <pid>).
Suspicious activity is in a CI/CD pipeline runner: Pipeline runners execute arbitrary code — a supply chain compromise can inject malicious steps. Check if the suspicious activity correlates with a specific pipeline run and review that run's definition.
You are not sure if you have the right to contain (service is owned by another team): Contact the service owner first but do not wait more than 5 minutes if the activity is escalating. Containment first, apology later.
The suspicious IP is your own load balancer or proxy: Health checks, proxied connections, and NAT gateways cause this constantly. Know your infrastructure's IP ranges before investigating an internal IP as external.

Cross-References¶

Topic Packs: security, incident-response, observability-deep-dive, networking
Related trees: found-vulnerability.md, secret-exposed.md
Runbooks: incident_response.md

Decision Tree: Suspicious Activity Detected¶

The Tree¶

Node Details¶

Check 1: Is this a known false positive?¶

Check 2: Attribute to known automation or deploy¶

Check 3: Source IP — internal vs external¶

Check 4: Is the activity escalating?¶

Check 5: Data exfiltration signature¶

Terminal Actions¶

✅ Action: False Positive — Tune Alert + Document¶

✅ Action: Monitor + Investigate¶

✅ Action: Contain Immediately¶

⚠️ Escalation: Security Team — Ambiguous Production Activity¶

✅ Action: Declare Incident — Follow IR Playbook¶

Edge Cases¶

Cross-References¶

Pages that link here¶