Skip to content

Runbook: Unauthorized Access Investigation

Field Value
Domain Security
Alert Anomalous API calls, unexpected IAM role assumption, security tool alert, or user report
Severity P1
Est. Resolution Time 1-2 hours initial investigation
Escalation Timeout Immediate — page security on-call the moment you confirm unauthorized access
Last Tested 2026-03-19
Prerequisites CloudTrail/audit log access, kubectl audit log access, IAM console, SIEM if available

Quick Assessment (30 seconds)

# Run this first — it tells you the scope of the problem
aws cloudtrail lookup-events \
  --start-time $(date -d '1 hour ago' --iso-8601=seconds) \
  --output json 2>/dev/null | jq '.Events | length'
If output shows: a large number of events (>100) from a single unusual IP or username → High likelihood of active attack; jump to Step 5 (Contain) immediately after Step 1, then come back to scope If output shows: 0 or normal volume → The alert may be a false positive or the activity is older; proceed through steps in order

Step 1: Scope the Incident — What, Where, When, Who

Why: You cannot contain what you haven't scoped. Understanding the attack surface before acting prevents you from missing affected systems and prevents you from over-containing (taking down unaffected services).

# Identify the trigger: what alert fired, what was the suspected actor?
# Common triggers:
# - GuardDuty alert: check the finding details in AWS console → GuardDuty → Findings
# - User report: get specifics — which system, what did they see, when?
# - Kubernetes audit: which user/serviceaccount made the unusual call?

# Get a summary of recent CloudTrail events:
aws cloudtrail lookup-events \
  --start-time $(date -d '4 hours ago' --iso-8601=seconds) \
  --output json | \
  jq '.Events[] | {Username: .Username, EventName: .EventName, EventTime: .EventTime, SourceIPAddress: (.CloudTrailEvent | fromjson | .sourceIPAddress)}' \
  | head -100

# Look for unusual patterns:
# - Activity from unexpected geographies (check SourceIPAddress)
# - API calls at unusual hours (e.g., 3 AM in the company's timezone)
# - Enumeration calls: ListBuckets, DescribeInstances, ListUsers, GetSecretValue
# - Privilege escalation: AttachUserPolicy, CreateAccessKey, AssumeRole
echo "Identify: which actor (user/role/IP), which systems, what time window"
Expected output:
A timeline showing:
  - When the first suspicious event occurred
  - Which IAM entity (username, role ARN, or access key) was used
  - Source IP address(es) — run these through a geo/reputation lookup
  - Which API calls were made (enumeration, data access, privilege escalation?)
If this fails: If CloudTrail is not available (not enabled or log access denied), escalate immediately — an environment without audit logging cannot be investigated and the attacker cannot be tracked.

Step 2: Check Kubernetes Audit Logs

Why: Cloud-level access (IAM) and cluster-level access (Kubernetes RBAC) are separate attack surfaces. Attackers who compromise a cloud credential often pivot into the Kubernetes cluster.

# Check recent Kubernetes events for unusual activity:
kubectl get events --sort-by='.metadata.creationTimestamp' -A | tail -50

# Check what permissions a suspected user or service account has:
kubectl auth can-i --list --as=<SUSPECT_USER_OR_SERVICEACCOUNT> -n <NAMESPACE>

# Check cluster role bindings for the suspected identity:
kubectl get clusterrolebinding -o yaml | grep -B5 -A10 "<SUSPECT_USER_OR_SA>"

# Check for unexpected service accounts or RBAC changes:
kubectl get serviceaccounts -A | grep -v default
kubectl get clusterrolebinding | grep -v "system:\|kubeadm\|cluster-admin-binding"

# If you have access to the Kubernetes API server audit log (self-managed clusters):
grep -i "<SUSPECT_USER_OR_SA>" /var/log/kubernetes/audit.log | tail -50
# EKS: CloudTrail captures Kubernetes API calls — filter on "EKS" service
# GKE: Cloud Audit Logs → Data Access → Kubernetes Engine API
Expected output:
kubectl get events: look for Forbidden events, unusual creates/deletes, or pod exec events.
kubectl auth can-i: if the suspect has broader permissions than expected, that is a finding.
Audit log: look for verbs like "create", "delete", "exec", "get" on sensitive resources (secrets, pods, nodes).
If this fails: If you cannot access cluster audit logs, note this gap in your incident report — it means you cannot rule out cluster compromise.

Step 3: Check Cloud Audit Logs for the Full Blast Radius

Why: The initial alert may show only one API call. The full audit log shows everything the attacker did — which data they accessed, what resources they created, whether they created persistence mechanisms.

# AWS CloudTrail — detailed query for a specific actor:
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=Username,AttributeValue=<SUSPECT_USERNAME> \
  --start-time $(date -d '48 hours ago' --iso-8601=seconds) \
  --output json | \
  jq '.Events[] | {Username: .Username, EventName: .EventName, EventTime: .EventTime, SourceIP: (.CloudTrailEvent | fromjson | .sourceIPAddress)}'

# Look for these high-risk API calls specifically:
# - CreateAccessKey, CreateLoginProfile → new persistence
# - PutUserPolicy, AttachUserPolicy → privilege escalation
# - GetSecretValue → secret exfiltration
# - GetObject, PutObject → S3 data access
# - RunInstances → new compute (cryptomining, lateral movement)
# - ModifyInstanceAttribute, AuthorizeSecurityGroupIngress → network changes

# GCP audit logs for a specific service account:
gcloud logging read \
  'protoPayload.authenticationInfo.principalEmail="<SERVICE_ACCOUNT_EMAIL>" AND severity>=WARNING' \
  --limit=100 \
  --format="json" | \
  jq '.[] | {time: .timestamp, method: .protoPayload.methodName, resource: .resource.labels}'
Expected output:
A list of API calls made by the suspect identity.
Red flags:
  - CreateAccessKey on a different user (attacker created new credentials for persistence)
  - GetSecretValue on production secrets (data exfiltration)
  - RunInstances (crypto-mining or infrastructure for further attacks)
  - Calls to services the suspect user/role should never touch
If this fails: If logs are incomplete or missing, the attacker may have deleted them (a serious escalation — this indicates sophisticated, destructive attack). Escalate to security team immediately.

Step 4: Identify the Attack Vector

Why: Containment is more effective when you know how the attacker got in — it tells you what to close, not just what to quarantine.

# Common vectors to investigate:

# 1. Stolen credential (most common):
# - Check git history for the compromised key: git log -p --all | grep -i "<KEY_FRAGMENT>"
# - Check CI/CD logs for the credential being echoed or printed

# 2. Exposed service (misconfigured access):
# - Check security groups / firewall rules for unexpected 0.0.0.0/0 rules:
aws ec2 describe-security-groups --output json | \
  jq '.SecurityGroups[] | select(.IpPermissions[].IpRanges[].CidrIp == "0.0.0.0/0") | {GroupId, GroupName}'

# 3. Compromised Kubernetes RBAC (serviceaccount misuse):
kubectl get serviceaccounts -A -o yaml | grep "automountServiceAccountToken"
kubectl get pods -A -o yaml | grep "serviceAccountName" | sort -u

# 4. Compromised build system (supply chain):
# - Check recent CI/CD pipeline logs for unexpected outbound connections or commands
# - Check if a recent dependency update pulled in a malicious package

echo "Document the attack vector — this determines what you need to fix, not just quarantine"
Expected output:
A hypothesis about the entry point:
  "IAM access key <KEY_ID> was found in a public GitHub repo committed on <DATE>"
  "Security group <SG_ID> had port 22 open to 0.0.0.0/0 — SSH brute force possible"
  "Service account <SA_NAME> has cluster-admin binding — over-privileged, possibly exploited"
If this fails: If the attack vector is unknown, document that and continue to Step 5 — do not delay containment while investigating the vector.

Step 5: Contain — Stop the Bleeding

Why: Containment limits the damage. Do this after scoping (not before) so you do not accidentally destroy evidence or miss affected systems.

# Revoke the compromised credential IMMEDIATELY:
# AWS access key:
aws iam update-access-key --access-key-id <COMPROMISED_KEY_ID> --status Inactive

# If a Kubernetes service account token was compromised, delete and recreate the SA:
kubectl delete serviceaccount <SA_NAME> -n <NAMESPACE>
kubectl create serviceaccount <SA_NAME> -n <NAMESPACE>

# Restrict the compromised IAM role temporarily:
aws iam put-role-policy --role-name <ROLE_NAME> \
  --policy-name EmergencyDeny \
  --policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Deny","Action":"*","Resource":"*"}]}'

# Isolate affected Kubernetes pods if they may be running attacker code:
# (Adds a network policy to block all ingress/egress — WARNING: this kills traffic to those pods)
cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: isolate-compromised-pod
  namespace: <NAMESPACE>
spec:
  podSelector:
    matchLabels:
      app: <APP_LABEL>
  policyTypes:
  - Ingress
  - Egress
EOF
Expected output:
AWS update-access-key: no output on success.
kubectl delete serviceaccount: "serviceaccount "<SA_NAME>" deleted"
IAM policy attachment: no output on success.
NetworkPolicy: "networkpolicy.networking.k8s.io/isolate-compromised-pod created"
If this fails: If you cannot contain due to permissions, escalate to security on-call and cloud admin immediately — containment cannot wait.

Step 6: Preserve Evidence — Do Not Modify Affected Systems

Why: Forensic evidence (logs, memory, disk state) is time-sensitive and can be destroyed by routine operations. Preserve before remediating.

# Export CloudTrail events for the incident window to a file:
aws cloudtrail lookup-events \
  --start-time <INCIDENT_START_TIME> \
  --end-time <INCIDENT_END_TIME> \
  --output json > cloudtrail-incident-$(date +%Y%m%d-%H%M%S).json

# Export Kubernetes events:
kubectl get events -A --sort-by='.metadata.creationTimestamp' -o json > \
  k8s-events-$(date +%Y%m%d-%H%M%S).json

# If a pod is suspected to be compromised, capture its logs before deleting:
kubectl logs <POD_NAME> -n <NAMESPACE> --all-containers > \
  pod-logs-<POD_NAME>-$(date +%Y%m%d-%H%M%S).txt

# Do NOT: restart affected instances, delete compromised pods, or modify any resources
# until the security team has confirmed forensic evidence is preserved.
echo "Preserve logs BEFORE remediating — upload to a secure evidence storage location"
Expected output:
JSON files created with timestamped names.
These files should be uploaded to your security team's evidence bucket/folder immediately.
If this fails: If you cannot export logs locally, use your cloud console to set log retention to "Do not delete" for the incident time period before it ages out.

Verification

# Confirm containment is in place
aws iam get-access-key-last-used --access-key-id <COMPROMISED_KEY_ID>
# Should show no "LastUsedDate" after the time you disabled the key

kubectl get networkpolicy -n <NAMESPACE>
# Should show the isolation policy you applied
Success looks like: Compromised credentials revoked and showing no new activity. Affected pods isolated. Security on-call engaged. Evidence preserved. Escalation path notified. If still broken: Escalate — see below.

Escalation

Condition Who to Page What to Say
Confirmed unauthorized access Security on-call (IMMEDIATE) "Security incident: confirmed unauthorized access to — attacker used , active since
Evidence of data exfiltration Security on-call + Legal/Compliance "Security incident: possible data exfiltration from accessed by unauthorized actor"
Attacker still active Security on-call (IMMEDIATE) "Active attacker: still seeing API calls from as of
Scope expanding to multiple systems Security on-call "Lateral movement detected: attack has spread from to , expanding scope"

Post-Incident

  • Update monitoring if alert was noisy or missing
  • File postmortem if P1/P2
  • Update this runbook if steps were wrong or incomplete
  • Conduct a full forensic review with the security team
  • Remediate the attack vector (rotate credentials, fix misconfiguration, patch vulnerability)
  • Review all IAM policies and Kubernetes RBAC for over-provisioning (principle of least privilege)
  • Enable or improve audit logging if gaps were found during the investigation
  • Notify affected users/customers if personal data was accessed (may be legally required)
  • Submit a report to management and (if required) regulatory bodies

Common Mistakes

  1. Containing before scoping: Revoking credentials or deleting pods before you understand the full scope can destroy forensic evidence and leave you blind to lateral movement. Scope first (even 5 minutes) then contain.
  2. Not involving the security team immediately for confirmed breach: On-call engineers are not expected to run a breach investigation alone. Call the security team the moment unauthorized access is confirmed.
  3. Not checking lateral movement: Attackers who compromise one system always try to move to others. Check related systems, shared credentials, and adjacent services — the initial alert is rarely the full story.
  4. Deleting compromised resources before evidence is preserved: Deleting a pod, instance, or log group destroys forensic evidence. Always export logs first.
  5. Treating all anomalous events as equally urgent: A suspicious CloudTrail event may be a misconfigured monitoring tool or a developer mistake. Triage severity before triggering a full incident response — but when in doubt, escalate.

Cross-References

  • Topic Pack: training/library/topic-packs/security-fundamentals/ (deep background on incident response and forensics)
  • Related Runbook: credential-rotation.md — to rotate the compromised credential after containment
  • Related Runbook: cve-response.md — if the attack exploited a known CVE
  • Related Runbook: ../kubernetes/rbac_forbidden.md — for RBAC investigation and hardening

Wiki Navigation