Skip to content

Decision Tree: Suspicious Activity Detected

Category: Security Response Starting Question: "Something looks like a security incident — is it?" Estimated traversal: 2-5 minutes Domains: security, incident-response, observability, linux-performance, networking


The Tree

Something looks like a security incident  is it?
├── What triggered the alert or observation?
      ├── WAF rule fired / IDS/IPS alert
            ├── Is this a known false positive pattern for this rule?
         Check: `grep "<rule-id>" /var/log/waf/false-positive-registry.txt`
         Or check your WAF tuning notes / runbook for this rule ID
                  ├── YES  documented false positive
            └──  ACTION: Tune Alert + Document
                Log this occurrence; update tuning if frequency is high
                  └── NO  not a known false positive
                          ├── Does the triggering request match a deploy/automation pattern?
                `kubectl get events --field-selector=reason=Scheduled -n <ns>`
                Check CI/CD pipeline for recent deploys at same time
                                ├── YES  deploy/automation explains it
                   └──  ACTION: Tune Alert + Document automation exception
                                └── NO  unexplained WAF trigger
                    └── Continue to source IP check below
         ├── Anomalous login / auth event
      (New location, impossible travel, off-hours access, many failed attempts)
            ├── Is the account a service account or human account?
                  ├── Service account  check if automation changed recently
            `git log --since="48 hours ago" -- devops/ .github/`
                        ├── Automation change explains it   ACTION: Document + Monitor
                        └── No change  unusual service account login = high suspicion
                └──  Escalate to security team immediately
                  └── Human account  contact the account owner directly
             "Did you log in from <location> at <time>?"
                          ├── YES  owner confirms  document + review if MFA is enforced
                          └── NO  owner denies  assume account compromise
                 └──  ACTION: Contain  disable account, rotate credentials
                      Escalate to security team
         ├── Unusual process or binary running on a host/container
      `ps aux | grep -vE "expected-process-list"`
      `ls -la /proc/<pid>/exe`  (check binary path)
      `cat /proc/<pid>/cmdline | tr '\0' ' '`
            ├── Is the process name a known system process?
         (But verify the binary path  attackers masquerade as `sshd`, `kworker`, etc.)
         `readlink /proc/<pid>/exe`
                  ├── Path is wrong (e.g., `/tmp/kworker` instead of `/usr/sbin/kworker`)
            └──  ACTION: Contain Immediately  this is an IOC
                  └── Path matches known good binary
             └── Verify file hash: `sha256sum /proc/<pid>/exe`
                 Compare against known-good hash
            └── Unknown process with no explanation
          └──  Treat as malicious; escalate immediately
      ├── Network traffic spike / unusual connection
      `ss -tnp | grep ESTABLISHED`
      `netstat -an | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn | head -20`
      `tcpdump -i eth0 -n -c 200 'not port 22 and not port 443'`
            ├── Is the destination IP/domain known and expected?
         `whois <ip>` / `dig -x <ip>` / check against allowlist
                  ├── YES  expected CDN, vendor, cloud provider
            └──  ACTION: Document + Monitor  likely false positive
                  └── NO  unknown destination, especially on unusual port
             └── Check volume: large outbound transfer?
                 `iftop -i eth0 -n` or check NetFlow/VPC flow logs
                                  ├── Large outbound (> baseline by 10x or to new destination)
                    └──  ACTION: Contain Immediately  exfiltration signature
                                  └── Small/normal volume  unknown but not alarming
                     └──  Monitor + Investigate
         └── File modification alert (FIM, auditd, Falco)
       `ausearch -k <watch-key> -ts recent`
       `falco --list | grep <rule-name>`
              ├── Was there a recent deployment or package update at that time?
          `rpm -qa --last | head -20` / `dpkg --get-selections | xargs dpkg -l`
          `kubectl rollout history deploy/<name>`
                    ├── YES  deploy explains the modification   ACTION: Document + Monitor
                    └── NO  unexplained file modification
                            ├── Is the modified file a binary, cron job, or startup script?
                                  ├── YES   ACTION: Contain Immediately  persistence mechanism
                                  └── NO  data file or log
                     └──  Monitor + Investigate
       ├── Is the activity happening on a production system?
      ├── NO  dev/staging/sandbox only
      └── Escalation threshold is lower; investigate but no immediate containment needed
          unless there is evidence of lateral movement toward prod
      └── YES  production system
       └── Any ambiguous activity on production defaults to escalation
           Do not wait for certainty before involving security team
├── Is the activity escalating? (Spreading to more systems)
   `kubectl get events --all-namespaces | grep -i "failed\|error\|kill" | tail -30`
   Check for new processes / connections appearing on adjacent hosts
      ├── YES  activity spreading, multiple systems affected
      └──  ACTION: Declare Incident  Follow IR Playbook
          Containment takes priority over investigation
      └── NO  isolated to one system
       └── Continue to exfiltration check
└── Is there a data exfiltration signature?
    (Large outbound transfers, DNS tunneling, unusual destinations, bulk data queries)
        ├── Check outbound transfer volume
       `iftop -i eth0 -t -s 10` or VPC Flow Logs filtered by source pod IP
       Check application audit logs for bulk data queries or exports
           ├── Check for DNS tunneling
       `tcpdump -i eth0 -n port 53 | awk '{print $NF}' | sort | uniq -c | sort -rn | head -20`
       Unusually long DNS queries or high query volume to one domain = tunneling
        ├── YES  exfiltration signature present
       └──  ACTION: Contain Immediately + Declare Incident
           Block outbound at network level NOW; data loss is imminent or ongoing
        └── NO  no exfiltration evidence
                ├── Activity is clearly malicious (confirmed IOC, unknown process, account compromise)
           └──  ACTION: Contain Immediately
                ├── Activity is ambiguous (unusual but no confirmed malicious indicator)
           └── ⚠️ ESCALATION: Security Team  ambiguous production activity
                └── Activity has a plausible benign explanation (automation, deploy, known tool)
            └──  ACTION: Tune Alert + Document

Node Details

Check 1: Is this a known false positive?

Command/method:

# WAF rule lookup — check recent false positive registry
grep -i "<rule-id>\|<alert-name>" /var/log/waf/known-fp.log

# Check if alert correlates with a scheduled job
grep "<alert-time>" /var/log/cron
kubectl get cronjobs --all-namespaces
# List recent deployments at the same time
kubectl rollout history deploy --all-namespaces 2>/dev/null | head -20
What you're looking for: A documented, recurring pattern that has been previously validated as safe. Security scanners, vulnerability scanners, CDN health checks, and synthetic monitoring all routinely trigger security alerts. Common pitfall: The "known false positive" exemption is frequently abused to avoid investigating inconvenient alerts. Require written documentation of why it is a false positive, not just institutional memory.

Check 2: Attribute to known automation or deploy

Command/method:

# Check CI/CD pipeline timing
gh run list --limit 10 --json createdAt,headBranch,status

# Check scheduled jobs
kubectl get cronjobs --all-namespaces -o wide
crontab -l && cat /etc/cron.d/*

# Check Kubernetes events around the time of the alert
kubectl get events --all-namespaces --sort-by='.metadata.creationTimestamp' | tail -30
What you're looking for: A deploy, smoke test, scanner, or scheduled job that ran at the exact time of the alert and explains the observed behavior. Common pitfall: Timing correlation is not the same as causation. A deploy at 14:02 does not explain a suspicious process that started at 14:47. Check timestamps carefully.

Check 3: Source IP — internal vs external

Command/method:

# Check WAF/nginx log for source IP
grep "<suspicious-request-path>" /var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -rn

# Check IP reputation
curl -s "https://ipinfo.io/<ip>/json"
# Or local: check if IP is in your RFC1918 space
python3 -c "import ipaddress; print(ipaddress.ip_address('<ip>').is_private)"

# Check active connections on the host
ss -tnp | grep ESTABLISHED | awk '{print $5}' | cut -d: -f1 | sort | uniq
What you're looking for: Whether the source is internal (could be a misconfigured service or compromised internal host) or external (direct attack or C2 communication). Internal sources are not automatically safe — lateral movement from a compromised internal host is common. Common pitfall: VPN and NAT. An "internal" IP may be a VPN endpoint proxying external traffic. Check both the source IP and whether it correlates with a specific VPN user or session.

Check 4: Is the activity escalating?

Command/method:

# Check for activity on adjacent hosts (same node, same namespace)
kubectl get events --all-namespaces | grep -i "$(date -d '30 minutes ago' +%H:%M)" | head -50

# Check for new processes spawned across the cluster
falco --list 2>/dev/null || journalctl -u falco --since "30 minutes ago" | tail -50

# Network scan detection — are other hosts logging similar activity?
grep "<source-ip>" /var/log/*/access.log 2>/dev/null | wc -l
What you're looking for: The same pattern appearing on multiple systems, or new activity appearing on systems adjacent to the initial alert. Ransomware and worms move fast — escalation is a strong signal of an active attack. Common pitfall: Alert deduplication hides escalation. Your SIEM may suppress repeat alerts for the same rule, making it look like a single event when it is actually spreading. Check raw logs, not just alert counts.

Check 5: Data exfiltration signature

Command/method:

# VPC/cloud flow logs — check for large outbound
# AWS: filter by source pod IP in VPC Flow Logs (last 30 min)
aws logs filter-log-events \
  --log-group-name /aws/vpc/flowlogs \
  --filter-pattern "[version, account, eni, srcaddr=<pod-ip>, ...]" \
  --start-time $(date -d '30 minutes ago' +%s)000

# Local: check top bandwidth consumers
iftop -i eth0 -t -s 30 -n 2>/dev/null

# DNS tunneling check — look for long subdomain queries
tcpdump -i eth0 -n port 53 -l | awk '/\. A\?/{print length($NF), $NF}' | sort -rn | head -20

# Check application for bulk data exports or unusual query patterns
kubectl logs deploy/<app> --since=30m | grep -iE "SELECT \*|LIMIT 100000|export|dump|download"
What you're looking for: Sustained outbound transfer (not a one-time request), large DNS query labels (> 50 chars = tunneling candidate), bulk application queries without a business event to explain them. Common pitfall: Confusing backup processes with exfiltration. Scheduled backups to S3/GCS look exactly like exfiltration in network traffic. Know your backup schedule and destination addresses.


Terminal Actions

✅ Action: False Positive — Tune Alert + Document

Do: 1. Add a comment to the alert rule with the false positive explanation 2. If the rule fires too frequently: adjust threshold, add exception for known-good source, or suppress specific pattern 3. Document in your false positive registry: echo "<date> <rule-id> <reason>" >> /var/log/waf/known-fp.log 4. Review whether the tuning change could hide a real attack — consult with security team if unsure Verify: Alert does not fire on the next occurrence of the known-benign behavior. Alert still fires on a simulated malicious version of the pattern.

✅ Action: Monitor + Investigate

Do: 1. Capture a snapshot of current state: running processes, network connections, open files

ps auxf > /tmp/proc-snapshot-$(date +%Y%m%d-%H%M).txt
ss -tnp >> /tmp/proc-snapshot-$(date +%Y%m%d-%H%M).txt
lsof -n >> /tmp/proc-snapshot-$(date +%Y%m%d-%H%M).txt
2. Set a 30-minute watch: watch -n 60 "ss -tnp | grep ESTABLISHED; ps auxf | grep -v '\[kworker'" 3. Increase logging verbosity temporarily on the affected service 4. Correlate with SIEM/Datadog for historical context on this IP/host/account 5. Set alert for recurrence or escalation Verify: Activity does not recur or escalate within 1 hour. Document findings (even if conclusion is "unexplained but low risk"). Do not close investigation without a written conclusion.

✅ Action: Contain Immediately

Do: 1. Isolate the affected system — block network access at security group / NetworkPolicy level:

# Kubernetes: apply a deny-all NetworkPolicy to the affected pod label
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: emergency-isolate
  namespace: <namespace>
spec:
  podSelector:
    matchLabels:
      app: <affected-app>
  policyTypes: [Ingress, Egress]
EOF
2. Preserve state before terminating — capture process list, network connections, active files 3. Terminate the suspicious process or pod: kubectl delete pod <pod> --grace-period=0 4. Cordon the node if host-level compromise is suspected: kubectl cordon <node> 5. Escalate to security team immediately with collected evidence Verify: Affected system no longer has network connectivity. Evidence snapshots are saved. Security team is paged and has the evidence. Incident ticket is open.

⚠️ Escalation: Security Team — Ambiguous Production Activity

When: Activity on production that cannot be explained by automation or known patterns, but no confirmed IOC yet. Who: Security team on-call; if unavailable, CISO. Include in page: Alert description, system/service affected, timestamp and duration, source IP and destination, what makes it ambiguous (what you've ruled out and what you haven't), whether it is ongoing.

✅ Action: Declare Incident — Follow IR Playbook

Do: 1. Declare incident in PagerDuty/incident.io — severity based on: confirmed breach = P1, active spreading = P1, suspected exfiltration = P1 2. Assign incident commander — this person drives all decisions, communications, and escalations 3. Open incident Slack channel: #inc-<date>-<short-description> 4. Begin timeline log — document every action with timestamp from this point forward 5. Do NOT reboot or terminate systems without IC approval — you may destroy forensic evidence 6. Execute IR playbook: training/library/runbooks/incident_response.md Verify: Incident is declared, IC is assigned, channel is open, timeline is started. Security team and management are notified.


Edge Cases

  • The alert is from a third-party security vendor scanner (e.g., Qualys, Tenable) doing an authorized scan: Their scanner traffic is indistinguishable from an attacker. Maintain a scheduled scanner IP allowlist and check scan schedules before investigating.
  • Activity looks like it is from your own monitoring (Prometheus, Datadog agent): Monitoring agents do read system files, make network connections, and run processes. Know your monitoring agent's behavior and confirm via process tree (pstree -p <pid>).
  • Suspicious activity is in a CI/CD pipeline runner: Pipeline runners execute arbitrary code — a supply chain compromise can inject malicious steps. Check if the suspicious activity correlates with a specific pipeline run and review that run's definition.
  • You are not sure if you have the right to contain (service is owned by another team): Contact the service owner first but do not wait more than 5 minutes if the activity is escalating. Containment first, apology later.
  • The suspicious IP is your own load balancer or proxy: Health checks, proxied connections, and NAT gateways cause this constantly. Know your infrastructure's IP ranges before investigating an internal IP as external.

Cross-References