- security
- l2
- runbook
- security-scanning
- incident-triage --- Portal | Level: L2: Operations | Topics: Security Scanning, Incident Triage | Domain: Security
Runbook: CVE Response (Critical Vulnerability)¶
| Field | Value |
|---|---|
| Domain | Security |
| Alert | CVE scanner reports critical/high vulnerability in a production image, or security advisory received |
| Severity | P1 (critical CVE actively exploited), P2 (critical CVE not yet exploited), P3 (high CVE) |
| Est. Resolution Time | 1-4 hours |
| Escalation Timeout | 30 minutes — page if not resolved (for P1) |
| Last Tested | 2026-03-19 |
| Prerequisites | Container image access, Dockerfile knowledge, ability to build/push images, kubectl access |
Quick Assessment (30 seconds)¶
# Run this first — it tells you the scope of the problem
trivy image <IMAGE_NAME>:<TAG> --severity CRITICAL,HIGH --no-progress
Fixed Version: available → A patched version exists; proceed to Step 3 to get the fix
If output shows: a CVE with Fixed Version: none available → No upstream fix yet; proceed to Step 1 to assess exploitability and consider mitigating controls
Step 1: Assess the CVE — Triage Before Panicking¶
Why: Not all critical CVEs are equally dangerous. A critical CVE in a library that your code never calls may have zero actual risk. Triage determines whether you need to act in hours or days.
# Get detailed CVE information:
trivy image <IMAGE_NAME>:<TAG> --severity CRITICAL,HIGH --format json --no-progress | \
jq '.Results[] | .Vulnerabilities[] | {VulnerabilityID, Severity, PkgName, InstalledVersion, FixedVersion, Description}'
# Key questions to answer:
# 1. What package is affected? Is it in our application's code path?
# 2. What is the attack vector? (Network/Adjacent/Local/Physical) — Network = highest risk
# 3. Is there a CVSS score and is it actually above 9.0 (critical)?
# 4. Is this CVE known to be actively exploited? Check: https://www.cisa.gov/known-exploited-vulnerabilities-catalog
# Check the CVE details online:
# NVD: https://nvd.nist.gov/vuln/detail/<CVE_ID>
# Example: https://nvd.nist.gov/vuln/detail/CVE-2021-44228
echo "Review CVE details at https://nvd.nist.gov/vuln/detail/<CVE_ID>"
Trivy JSON output listing the package, version, CVE ID, and whether a fix is available.
Example:
{
"VulnerabilityID": "CVE-2024-XXXXX",
"Severity": "CRITICAL",
"PkgName": "openssl",
"InstalledVersion": "3.0.2",
"FixedVersion": "3.0.9",
"Description": "..."
}
trivy is not installed, use grype <IMAGE_NAME>:<TAG> as an alternative. If neither is available, check the registry UI (Docker Hub, ECR, GHCR) — most provide built-in vulnerability scanning.
Step 2: Identify All Affected Images in Production¶
Why: A vulnerability rarely affects only one image. If you have multiple services built from the same base image, they are all affected and must all be patched.
# List all images currently running in the cluster:
kubectl get pods -A -o jsonpath='{range .items[*]}{.spec.containers[*].image}{"\n"}{end}' | sort -u
# Filter to images using the same base or package:
kubectl get pods -A -o jsonpath='{range .items[*]}{.spec.containers[*].image}{"\n"}{end}' | \
sort -u | grep "<BASE_IMAGE_NAME>"
# If you have an image inventory or SBOM, use it to find all images with the vulnerable package:
# Trivy can scan a whole registry or list of images — example with a list file:
cat images.txt | xargs -I{} trivy image {} --severity CRITICAL --no-progress --quiet
A list of unique image references (registry/image:tag) currently running.
Identify which ones share the vulnerable base image or package.
Example:
myregistry.com/frontend:v1.2.3 ← uses node:18-alpine base (affected)
myregistry.com/api:v2.0.1 ← uses python:3.11-slim base (not affected)
myregistry.com/worker:v1.5.0 ← uses node:18-alpine base (affected)
grep -r "image:" k8s/ | grep -v "#" or grep -r "image:" helm/
Step 3: Check for a Patched Base Image or Package Version¶
Why: Most CVEs in container images are fixed by updating the base image or a system package — you need to know if a fix is available before writing code.
# Pull the latest version of your base image and check if the CVE is fixed:
docker pull <BASE_IMAGE>:<TAG>
trivy image <BASE_IMAGE>:<TAG> --severity CRITICAL --no-progress | grep <CVE_ID>
# If using a pinned version, check the release notes or Docker Hub tags for a patched version:
# Example for Alpine:
docker pull alpine:3.19
trivy image alpine:3.19 --severity CRITICAL --no-progress | grep <CVE_ID>
# If the CVE is in a package (not the OS), check the package manager for a fix:
# Debian/Ubuntu:
docker run --rm <BASE_IMAGE>:<TAG> apt-cache policy <PACKAGE_NAME>
# Alpine:
docker run --rm <BASE_IMAGE>:<TAG> apk info <PACKAGE_NAME>
# Python:
pip index versions <PACKAGE_NAME>
If patched: CVE_ID does NOT appear in trivy output for the new base image/tag.
If not patched: CVE_ID still appears — you need a workaround or must wait for an upstream fix.
Step 4: Update the Dockerfile and Rebuild¶
Why: The fix must be baked into the image — it cannot be applied to running containers. Update the Dockerfile and rebuild.
# Option A — update the base image tag in the Dockerfile:
# Before: FROM node:18.12-alpine
# After: FROM node:18.20-alpine ← patched version
# Edit the Dockerfile:
sed -i 's|FROM <BASE_IMAGE>:<OLD_TAG>|FROM <BASE_IMAGE>:<PATCHED_TAG>|g' Dockerfile
# Option B — add an apt-get/apk upgrade step to patch a specific package:
# Add this line AFTER the FROM line in your Dockerfile:
# RUN apt-get update && apt-get install -y --only-upgrade <PACKAGE_NAME> && rm -rf /var/lib/apt/lists/*
# For Alpine:
# RUN apk update && apk upgrade <PACKAGE_NAME>
# Rebuild the image:
docker build -t <IMAGE_NAME>:<NEW_TAG> .
Step 5: Scan the Rebuilt Image — Confirm the CVE Is Fixed¶
Why: A rebuild does not guarantee the CVE is fixed. You must scan again to confirm the vulnerability is gone before deploying.
# Scan the newly built image:
trivy image <IMAGE_NAME>:<NEW_TAG> --severity CRITICAL,HIGH --no-progress
# Confirm the specific CVE is no longer present:
trivy image <IMAGE_NAME>:<NEW_TAG> --severity CRITICAL,HIGH --no-progress | grep <CVE_ID>
# If this returns nothing, the CVE is fixed.
# Alternative scanner:
grype <IMAGE_NAME>:<NEW_TAG> --only-fixed
trivy scan: no output for the specific CVE_ID — it has been fixed.
Overall: ideally zero CRITICAL findings. If other CVEs remain, triage each one.
RUN apt-get install -y <PACKAGE>=<FIXED_VERSION> (or Alpine equivalent) to force the patched version.
Step 6: Push the Patched Image and Deploy¶
Why: The fix is only in production when the patched image is deployed. Deploy immediately for P1/P2 — do not wait for the next scheduled release.
# Push the patched image to the registry:
docker push <IMAGE_NAME>:<NEW_TAG>
# Update the Kubernetes deployment to use the new image:
kubectl set image deployment/<DEPLOY_NAME> \
<CONTAINER_NAME>=<IMAGE_NAME>:<NEW_TAG> \
-n <NAMESPACE>
# Monitor the rollout:
kubectl rollout status deployment/<DEPLOY_NAME> -n <NAMESPACE>
# Verify pods are running the new image:
kubectl get pods -n <NAMESPACE> -l app=<APP_LABEL> \
-o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}{end}'
"deployment.apps/<DEPLOY_NAME> image updated"
"deployment "<DEPLOY_NAME>" successfully rolled out"
Pod image column shows the new tag (not the old one).
kubectl rollout undo deployment/<DEPLOY_NAME> -n <NAMESPACE> and investigate the crash logs.
Verification¶
# Confirm the issue is resolved
kubectl get pods -n <NAMESPACE> -l app=<APP_LABEL>
trivy image <IMAGE_NAME>:<NEW_TAG> --severity CRITICAL --no-progress | grep <CVE_ID>
Escalation¶
| Condition | Who to Page | What to Say |
|---|---|---|
| Not resolved in 30 min (P1) | Security on-call + Platform on-call | "P1 CVE: |
| CVE is being actively exploited against us | Security on-call | "Security incident: active exploitation of |
| No fix available from upstream | Security on-call | "CVE |
| Scope expanding (many services affected) | Security on-call + Platform on-call | "CVE
|
Post-Incident¶
- Update monitoring if alert was noisy or missing
- File postmortem if P1/P2
- Update this runbook if steps were wrong or incomplete
- Pin base image versions (do not use
:latest) so rebuilds are reproducible - Add CVE scanning to the CI pipeline so vulnerabilities are caught before they reach production
- Schedule a follow-up scan in 30 days to check for newly-discovered CVEs in the same image
- Update the team's SLA document for CVE response times (P1: 24h, P2: 7 days, P3: 30 days)
Common Mistakes¶
- Updating the base image without re-pinning the version: Changing
FROM node:18-alpinetoFROM node:latestfixes the CVE now but introduces drift — the next build may pull a different version. Always pin to an explicit version tag. - Not scanning after the fix: Rebuilding does not guarantee the CVE is fixed. Always run
trivyagain on the new image before deploying. - Deploying without testing: A patched base image can introduce breaking changes (new library versions, different binary locations). Run your test suite against the new image before deploying to production.
- Treating all CVEs as equal severity: A CRITICAL CVE with no network attack vector and no public exploit is very different from a CRITICAL CVE being actively exploited in the wild. Triage before acting.
- Patching only the alerting service: If multiple services share the same base image, they must all be patched. Check all running images, not just the one that triggered the alert.
Cross-References¶
- Topic Pack:
training/library/topic-packs/security-fundamentals/(deep background on vulnerability management) - Related Runbook: credential-rotation.md — if the CVE led to a credential exposure
- Related Runbook: unauthorized-access.md — if the CVE was actively exploited
- Related Runbook: ../cicd/build-failure-triage.md — if the patched image build fails
Wiki Navigation¶
Related Content¶
- Incident Triage (Topic Pack, L1) — Incident Triage
- Incident Triage Flashcards (CLI) (flashcard_deck, L1) — Incident Triage
- Interview: CI Vuln Scan Failed (Scenario, L2) — Security Scanning
- Lab: Trivy Scan Remediation (CLI) (Lab, L1) — Security Scanning
- Runbook: Unauthorized Access Investigation (Runbook, L2) — Incident Triage
- Security Basics (Ops-Focused) (Topic Pack, L1) — Security Scanning
- Security Drills (Drill, L2) — Security Scanning
- Security Flashcards (CLI) (flashcard_deck, L1) — Security Scanning
- Security Scanning (Topic Pack, L1) — Security Scanning
- Skillcheck: Security (Expanded) (Assessment, L2) — Security Scanning