- security
- l2
- runbook
- incident-triage
- audit-logging --- Portal | Level: L2: Operations | Topics: Incident Triage, Audit Logging | Domain: Security
Runbook: Unauthorized Access Investigation¶
| Field | Value |
|---|---|
| Domain | Security |
| Alert | Anomalous API calls, unexpected IAM role assumption, security tool alert, or user report |
| Severity | P1 |
| Est. Resolution Time | 1-2 hours initial investigation |
| Escalation Timeout | Immediate — page security on-call the moment you confirm unauthorized access |
| Last Tested | 2026-03-19 |
| Prerequisites | CloudTrail/audit log access, kubectl audit log access, IAM console, SIEM if available |
Quick Assessment (30 seconds)¶
# Run this first — it tells you the scope of the problem
aws cloudtrail lookup-events \
--start-time $(date -d '1 hour ago' --iso-8601=seconds) \
--output json 2>/dev/null | jq '.Events | length'
Step 1: Scope the Incident — What, Where, When, Who¶
Why: You cannot contain what you haven't scoped. Understanding the attack surface before acting prevents you from missing affected systems and prevents you from over-containing (taking down unaffected services).
# Identify the trigger: what alert fired, what was the suspected actor?
# Common triggers:
# - GuardDuty alert: check the finding details in AWS console → GuardDuty → Findings
# - User report: get specifics — which system, what did they see, when?
# - Kubernetes audit: which user/serviceaccount made the unusual call?
# Get a summary of recent CloudTrail events:
aws cloudtrail lookup-events \
--start-time $(date -d '4 hours ago' --iso-8601=seconds) \
--output json | \
jq '.Events[] | {Username: .Username, EventName: .EventName, EventTime: .EventTime, SourceIPAddress: (.CloudTrailEvent | fromjson | .sourceIPAddress)}' \
| head -100
# Look for unusual patterns:
# - Activity from unexpected geographies (check SourceIPAddress)
# - API calls at unusual hours (e.g., 3 AM in the company's timezone)
# - Enumeration calls: ListBuckets, DescribeInstances, ListUsers, GetSecretValue
# - Privilege escalation: AttachUserPolicy, CreateAccessKey, AssumeRole
echo "Identify: which actor (user/role/IP), which systems, what time window"
A timeline showing:
- When the first suspicious event occurred
- Which IAM entity (username, role ARN, or access key) was used
- Source IP address(es) — run these through a geo/reputation lookup
- Which API calls were made (enumeration, data access, privilege escalation?)
Step 2: Check Kubernetes Audit Logs¶
Why: Cloud-level access (IAM) and cluster-level access (Kubernetes RBAC) are separate attack surfaces. Attackers who compromise a cloud credential often pivot into the Kubernetes cluster.
# Check recent Kubernetes events for unusual activity:
kubectl get events --sort-by='.metadata.creationTimestamp' -A | tail -50
# Check what permissions a suspected user or service account has:
kubectl auth can-i --list --as=<SUSPECT_USER_OR_SERVICEACCOUNT> -n <NAMESPACE>
# Check cluster role bindings for the suspected identity:
kubectl get clusterrolebinding -o yaml | grep -B5 -A10 "<SUSPECT_USER_OR_SA>"
# Check for unexpected service accounts or RBAC changes:
kubectl get serviceaccounts -A | grep -v default
kubectl get clusterrolebinding | grep -v "system:\|kubeadm\|cluster-admin-binding"
# If you have access to the Kubernetes API server audit log (self-managed clusters):
grep -i "<SUSPECT_USER_OR_SA>" /var/log/kubernetes/audit.log | tail -50
# EKS: CloudTrail captures Kubernetes API calls — filter on "EKS" service
# GKE: Cloud Audit Logs → Data Access → Kubernetes Engine API
kubectl get events: look for Forbidden events, unusual creates/deletes, or pod exec events.
kubectl auth can-i: if the suspect has broader permissions than expected, that is a finding.
Audit log: look for verbs like "create", "delete", "exec", "get" on sensitive resources (secrets, pods, nodes).
Step 3: Check Cloud Audit Logs for the Full Blast Radius¶
Why: The initial alert may show only one API call. The full audit log shows everything the attacker did — which data they accessed, what resources they created, whether they created persistence mechanisms.
# AWS CloudTrail — detailed query for a specific actor:
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=Username,AttributeValue=<SUSPECT_USERNAME> \
--start-time $(date -d '48 hours ago' --iso-8601=seconds) \
--output json | \
jq '.Events[] | {Username: .Username, EventName: .EventName, EventTime: .EventTime, SourceIP: (.CloudTrailEvent | fromjson | .sourceIPAddress)}'
# Look for these high-risk API calls specifically:
# - CreateAccessKey, CreateLoginProfile → new persistence
# - PutUserPolicy, AttachUserPolicy → privilege escalation
# - GetSecretValue → secret exfiltration
# - GetObject, PutObject → S3 data access
# - RunInstances → new compute (cryptomining, lateral movement)
# - ModifyInstanceAttribute, AuthorizeSecurityGroupIngress → network changes
# GCP audit logs for a specific service account:
gcloud logging read \
'protoPayload.authenticationInfo.principalEmail="<SERVICE_ACCOUNT_EMAIL>" AND severity>=WARNING' \
--limit=100 \
--format="json" | \
jq '.[] | {time: .timestamp, method: .protoPayload.methodName, resource: .resource.labels}'
A list of API calls made by the suspect identity.
Red flags:
- CreateAccessKey on a different user (attacker created new credentials for persistence)
- GetSecretValue on production secrets (data exfiltration)
- RunInstances (crypto-mining or infrastructure for further attacks)
- Calls to services the suspect user/role should never touch
Step 4: Identify the Attack Vector¶
Why: Containment is more effective when you know how the attacker got in — it tells you what to close, not just what to quarantine.
# Common vectors to investigate:
# 1. Stolen credential (most common):
# - Check git history for the compromised key: git log -p --all | grep -i "<KEY_FRAGMENT>"
# - Check CI/CD logs for the credential being echoed or printed
# 2. Exposed service (misconfigured access):
# - Check security groups / firewall rules for unexpected 0.0.0.0/0 rules:
aws ec2 describe-security-groups --output json | \
jq '.SecurityGroups[] | select(.IpPermissions[].IpRanges[].CidrIp == "0.0.0.0/0") | {GroupId, GroupName}'
# 3. Compromised Kubernetes RBAC (serviceaccount misuse):
kubectl get serviceaccounts -A -o yaml | grep "automountServiceAccountToken"
kubectl get pods -A -o yaml | grep "serviceAccountName" | sort -u
# 4. Compromised build system (supply chain):
# - Check recent CI/CD pipeline logs for unexpected outbound connections or commands
# - Check if a recent dependency update pulled in a malicious package
echo "Document the attack vector — this determines what you need to fix, not just quarantine"
A hypothesis about the entry point:
"IAM access key <KEY_ID> was found in a public GitHub repo committed on <DATE>"
"Security group <SG_ID> had port 22 open to 0.0.0.0/0 — SSH brute force possible"
"Service account <SA_NAME> has cluster-admin binding — over-privileged, possibly exploited"
Step 5: Contain — Stop the Bleeding¶
Why: Containment limits the damage. Do this after scoping (not before) so you do not accidentally destroy evidence or miss affected systems.
# Revoke the compromised credential IMMEDIATELY:
# AWS access key:
aws iam update-access-key --access-key-id <COMPROMISED_KEY_ID> --status Inactive
# If a Kubernetes service account token was compromised, delete and recreate the SA:
kubectl delete serviceaccount <SA_NAME> -n <NAMESPACE>
kubectl create serviceaccount <SA_NAME> -n <NAMESPACE>
# Restrict the compromised IAM role temporarily:
aws iam put-role-policy --role-name <ROLE_NAME> \
--policy-name EmergencyDeny \
--policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Deny","Action":"*","Resource":"*"}]}'
# Isolate affected Kubernetes pods if they may be running attacker code:
# (Adds a network policy to block all ingress/egress — WARNING: this kills traffic to those pods)
cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: isolate-compromised-pod
namespace: <NAMESPACE>
spec:
podSelector:
matchLabels:
app: <APP_LABEL>
policyTypes:
- Ingress
- Egress
EOF
AWS update-access-key: no output on success.
kubectl delete serviceaccount: "serviceaccount "<SA_NAME>" deleted"
IAM policy attachment: no output on success.
NetworkPolicy: "networkpolicy.networking.k8s.io/isolate-compromised-pod created"
Step 6: Preserve Evidence — Do Not Modify Affected Systems¶
Why: Forensic evidence (logs, memory, disk state) is time-sensitive and can be destroyed by routine operations. Preserve before remediating.
# Export CloudTrail events for the incident window to a file:
aws cloudtrail lookup-events \
--start-time <INCIDENT_START_TIME> \
--end-time <INCIDENT_END_TIME> \
--output json > cloudtrail-incident-$(date +%Y%m%d-%H%M%S).json
# Export Kubernetes events:
kubectl get events -A --sort-by='.metadata.creationTimestamp' -o json > \
k8s-events-$(date +%Y%m%d-%H%M%S).json
# If a pod is suspected to be compromised, capture its logs before deleting:
kubectl logs <POD_NAME> -n <NAMESPACE> --all-containers > \
pod-logs-<POD_NAME>-$(date +%Y%m%d-%H%M%S).txt
# Do NOT: restart affected instances, delete compromised pods, or modify any resources
# until the security team has confirmed forensic evidence is preserved.
echo "Preserve logs BEFORE remediating — upload to a secure evidence storage location"
JSON files created with timestamped names.
These files should be uploaded to your security team's evidence bucket/folder immediately.
Verification¶
# Confirm containment is in place
aws iam get-access-key-last-used --access-key-id <COMPROMISED_KEY_ID>
# Should show no "LastUsedDate" after the time you disabled the key
kubectl get networkpolicy -n <NAMESPACE>
# Should show the isolation policy you applied
Escalation¶
| Condition | Who to Page | What to Say |
|---|---|---|
| Confirmed unauthorized access | Security on-call (IMMEDIATE) | "Security incident: confirmed unauthorized access to |
| Evidence of data exfiltration | Security on-call + Legal/Compliance | "Security incident: possible data exfiltration from |
| Attacker still active | Security on-call (IMMEDIATE) | "Active attacker: still seeing API calls from |
| Scope expanding to multiple systems | Security on-call | "Lateral movement detected: attack has spread from |
Post-Incident¶
- Update monitoring if alert was noisy or missing
- File postmortem if P1/P2
- Update this runbook if steps were wrong or incomplete
- Conduct a full forensic review with the security team
- Remediate the attack vector (rotate credentials, fix misconfiguration, patch vulnerability)
- Review all IAM policies and Kubernetes RBAC for over-provisioning (principle of least privilege)
- Enable or improve audit logging if gaps were found during the investigation
- Notify affected users/customers if personal data was accessed (may be legally required)
- Submit a report to management and (if required) regulatory bodies
Common Mistakes¶
- Containing before scoping: Revoking credentials or deleting pods before you understand the full scope can destroy forensic evidence and leave you blind to lateral movement. Scope first (even 5 minutes) then contain.
- Not involving the security team immediately for confirmed breach: On-call engineers are not expected to run a breach investigation alone. Call the security team the moment unauthorized access is confirmed.
- Not checking lateral movement: Attackers who compromise one system always try to move to others. Check related systems, shared credentials, and adjacent services — the initial alert is rarely the full story.
- Deleting compromised resources before evidence is preserved: Deleting a pod, instance, or log group destroys forensic evidence. Always export logs first.
- Treating all anomalous events as equally urgent: A suspicious CloudTrail event may be a misconfigured monitoring tool or a developer mistake. Triage severity before triggering a full incident response — but when in doubt, escalate.
Cross-References¶
- Topic Pack:
training/library/topic-packs/security-fundamentals/(deep background on incident response and forensics) - Related Runbook: credential-rotation.md — to rotate the compromised credential after containment
- Related Runbook: cve-response.md — if the attack exploited a known CVE
- Related Runbook:
../kubernetes/rbac_forbidden.md— for RBAC investigation and hardening
Wiki Navigation¶
Related Content¶
- Audit Logging (Topic Pack, L1) — Audit Logging
- Audit Logging Flashcards (CLI) (flashcard_deck, L1) — Audit Logging
- Compliance & Audit Automation (Topic Pack, L2) — Audit Logging
- Incident Triage (Topic Pack, L1) — Incident Triage
- Incident Triage Flashcards (CLI) (flashcard_deck, L1) — Incident Triage
- Infrastructure Forensics (Topic Pack, L2) — Audit Logging
- Runbook: CVE Response (Critical Vulnerability) (Runbook, L2) — Incident Triage
- SELinux & Linux Hardening (Topic Pack, L2) — Audit Logging
Pages that link here¶
- Audit Logging
- Audit Logging Primer
- Compliance & Audit Automation
- Compliance & Audit Automation - Primer
- Incident Triage Primer
- Infrastructure Forensics
- Infrastructure Forensics - Primer
- Operational Runbooks
- Runbook: CVE Response (Critical Vulnerability)
- Runbook: Credential Rotation (Exposed Secret)
- SELinux & Linux Hardening
- SELinux & Linux Hardening - Primer