On-Call Survival: Security¶
Print this. Pin it. Read it at 3 AM.
When in doubt: contain first, investigate second, explain third.
Alert: Compromised Credentials / Leaked Secret¶
Severity: P1
First command:
# Identify scope: which secret, where it was used
git log --all --oneline | head -20 # Was it committed to git?
# Check secret scanning alerts in GitHub Security tab
gh api repos/<org>/<repo>/secret-scanning/alerts --jq '.[].secret_type'
Decision tree:
Was the secret committed to git?
├── Yes → ROTATE IMMEDIATELY (even if repo is private — assume it was indexed).
│ Revoke old credential in the issuing system (AWS console, GitHub settings, etc.)
│ Issue new credential. Update secret in K8s/CI.
│ Purge from git history: escalate to git admin (requires forced history rewrite).
│ Log the incident: what was exposed, when, rotation timestamp.
└── No → Was it exposed in logs / error messages?
├── Yes → Rotate credential. Truncate/delete affected log files.
│ Check who has log access. Escalate to security team.
└── No → Phishing / social engineering?
→ Escalate to security team immediately. Do not investigate alone.
Escalation trigger: Secret grants production DB/cloud access; secret has been active for > 1 hour post-exposure; cannot identify exposure scope; evidence of use by unauthorized party.
Safe actions: Identify scope, check secret scanning alerts — read-only before escalation.
Dangerous actions: Rotating credentials (brief service disruption), purging git history (destructive, requires coordination).
Alert: Unauthorized Access / Suspicious Activity¶
Severity: P1
First command:
# Kubernetes: who has been accessing the API server
kubectl get events -A --sort-by='.lastTimestamp' | tail -30
# Cloud: recent API calls (AWS CloudTrail / GCP Audit Logs)
# Check: logins from unexpected IPs, unusual resource creation/deletion
Decision tree:
Is there an active session / connection still open?
├── Yes → Contain immediately:
│ Kubernetes: kubectl delete rolebinding/clusterrolebinding <suspicious-binding>
│ Cloud: Revoke IAM user access key or assume-role session
│ SSH: pkill -u <user> or block at firewall level
│ THEN: collect evidence before more cleanup (screenshots, logs)
└── No (historical activity, no active session)?
├── Assess blast radius: what did the intruder access/create/delete?
├── Preserve logs: copy audit logs before they rotate
│ kubectl get events -A > /tmp/k8s-events-$(date +%Y%m%d).txt
└── Escalate to security team with: actor, actions, timeline, resources affected.
Escalation trigger: ANY unauthorized access to production systems. Do not try to resolve alone — escalate immediately.
Safe actions: Read audit logs, get events, identify suspicious actors — read-only.
Dangerous actions: Revoke access (may alert intruder to containment), delete evidence, delete resources (preserve for forensics).
Alert: Critical CVE in Running Container/Package¶
Severity: P1 (CVSS ≥ 9, exploitable in your context) / P2 (CVSS 7-9)
First command:
# Check which images are running
kubectl get pods -A -o jsonpath='{range .items[*]}{.spec.containers[*].image}{"\n"}{end}' | sort -u
# Check CVE scanner output (Trivy, Snyk, Grype)
trivy image <image>:<tag> --severity CRITICAL,HIGH
Decision tree:
Is the vulnerability actively exploitable in your deployment?
├── No (not network-reachable, not in code path) → Log it. Schedule patch within SLA.
│ CVSS 9+: patch within 7 days. CVSS 7-9: patch within 30 days.
└── Yes → Is a patched base image available?
├── Yes → Rebuild image with patched base. Test. Deploy.
│ Fast track through CI: treat as hotfix.
└── No → Mitigate while waiting for patch:
- Network policy: restrict access to affected service
- WAF rule: block exploit patterns if known
- Disable the vulnerable feature if possible
Escalate to security: "CVE <id> in <image>, no patch available, mitigation applied: <describe>"
Escalation trigger: CVSS ≥ 9 with active exploit in the wild; vulnerability in auth/crypto code; evidence of exploitation attempt in logs.
Safe actions: Scan images, check CVE details — read-only.
Dangerous actions: Deploying unvetted patches to production, disabling security controls to test exploitability.
Alert: Certificate Issue (Expired / Revoked)¶
Severity: P1 (user-facing HTTPS broken) / P2 (internal service)
First command:
echo | openssl s_client -connect <hostname>:443 2>/dev/null | openssl x509 -noout -dates -subject -issuer
notAfter date, whether it's expired. Also check the issuer — is this a known/trusted CA?
Decision tree:
Is the cert expired?
├── Yes → See Networking guide → "TLS Certificate Error" section.
└── No → Is the issuer unexpected / untrusted?
├── Yes → POSSIBLE MITM OR CERT SUBSTITUTION.
│ Do NOT proceed. Escalate to security team immediately.
│ Preserve the cert: echo | openssl s_client -connect <host>:443 2>/dev/null | openssl x509 > /tmp/suspicious-cert.pem
└── No → Is the cert revoked? (Check with OCSP/CRL)
openssl ocsp -issuer <ca-cert> -cert <cert> -url <ocsp-url> -resp_text
If revoked: rotate immediately. Escalate to security.
If valid but browser warns: intermediate chain missing. Add chain to server config.
Escalation trigger: Unexpected issuer (possible MITM); revoked cert; cert for wrong domain (phishing risk); cannot issue replacement.
Safe actions: Read cert details with openssl — read-only.
Dangerous actions: Accepting an untrusted cert as safe, disabling TLS verification.
Alert: Unauthorized Kubernetes RBAC / Privilege Escalation¶
Severity: P1
First command:
kubectl get clusterrolebindings,rolebindings -A -o yaml | grep -E "subjects|roleRef" | head -40
# Look for: unexpected users/groups bound to cluster-admin or high-privilege roles
cluster-admin or roles with create/delete on sensitive resources.
Decision tree:
Is an unexpected service account or user bound to cluster-admin?
├── Yes → Who added this binding?
│ kubectl get clusterrolebinding <name> -o yaml | grep -E "annotations|labels"
│ Unknown origin? → Delete the binding AND escalate to security.
│ kubectl delete clusterrolebinding <name>
└── No → Is a service account with broad permissions compromised?
├── Yes → Rotate the service account token:
│ kubectl delete secret <sa-token-secret> -n <ns>
│ (New token auto-created)
└── No → New RBAC binding added recently?
kubectl get events --field-selector=reason=ClusterRoleBindingCreated
Unauthorized change? → Delete and escalate.
Escalation trigger: Any suspicious cluster-admin binding of unknown origin; service account token leaked externally; cannot determine if access was used.
Safe actions: Read RBAC bindings — read-only.
Dangerous actions: Delete RBAC bindings (may break services), rotate service account tokens.
Quick Reference¶
Most Useful Commands¶
# Check for exposed secrets in git
gh api repos/<org>/<repo>/secret-scanning/alerts
# Recent K8s API events
kubectl get events -A --sort-by='.lastTimestamp' | tail -30
# Who is bound to cluster-admin
kubectl get clusterrolebindings -o json | jq '.items[] | select(.roleRef.name=="cluster-admin") | {name:.metadata.name, subjects:.subjects}'
# All service accounts with their tokens
kubectl get serviceaccounts -A
# Scan running image for CVEs
trivy image <image>:<tag> --severity CRITICAL,HIGH
# Check certificate validity
echo | openssl s_client -connect <host>:443 2>/dev/null | openssl x509 -noout -dates -subject
# Collect audit events (preserve before rotation)
kubectl get events -A > /tmp/k8s-events-$(date +%Y%m%d-%H%M%S).txt
# Check recent secret changes in K8s
kubectl get events -A | grep -i secret
Escalation Contacts¶
| Situation | Team | Channel |
|---|---|---|
| Any unauthorized access | Security team immediately | #security-incidents |
| Leaked production secret | Security + on-call lead | #security-incidents |
| Critical CVE (CVSS ≥ 9) | Security + app team | #security-incidents |
| Suspicious RBAC change | Security team | #security-incidents |
| Possible MITM / cert fraud | Security team immediately | Direct page |
Safe vs Dangerous Actions¶
| Safe (do without asking) | Dangerous (get approval) |
|---|---|
| Read audit logs | Revoke credentials / access |
| Scan images for CVEs | Delete RBAC bindings |
| Read RBAC bindings | Rotate service account tokens |
| Preserve logs as evidence | Purge git history |
| Check cert details | Network isolation / firewall |