Skip to content

Runbook: CVE Response (Critical Vulnerability)

Field Value
Domain Security
Alert CVE scanner reports critical/high vulnerability in a production image, or security advisory received
Severity P1 (critical CVE actively exploited), P2 (critical CVE not yet exploited), P3 (high CVE)
Est. Resolution Time 1-4 hours
Escalation Timeout 30 minutes — page if not resolved (for P1)
Last Tested 2026-03-19
Prerequisites Container image access, Dockerfile knowledge, ability to build/push images, kubectl access

Quick Assessment (30 seconds)

# Run this first — it tells you the scope of the problem
trivy image <IMAGE_NAME>:<TAG> --severity CRITICAL,HIGH --no-progress
If output shows: a CVE with Fixed Version: available → A patched version exists; proceed to Step 3 to get the fix If output shows: a CVE with Fixed Version: none available → No upstream fix yet; proceed to Step 1 to assess exploitability and consider mitigating controls

Step 1: Assess the CVE — Triage Before Panicking

Why: Not all critical CVEs are equally dangerous. A critical CVE in a library that your code never calls may have zero actual risk. Triage determines whether you need to act in hours or days.

# Get detailed CVE information:
trivy image <IMAGE_NAME>:<TAG> --severity CRITICAL,HIGH --format json --no-progress | \
  jq '.Results[] | .Vulnerabilities[] | {VulnerabilityID, Severity, PkgName, InstalledVersion, FixedVersion, Description}'

# Key questions to answer:
# 1. What package is affected? Is it in our application's code path?
# 2. What is the attack vector? (Network/Adjacent/Local/Physical) — Network = highest risk
# 3. Is there a CVSS score and is it actually above 9.0 (critical)?
# 4. Is this CVE known to be actively exploited? Check: https://www.cisa.gov/known-exploited-vulnerabilities-catalog

# Check the CVE details online:
# NVD: https://nvd.nist.gov/vuln/detail/<CVE_ID>
# Example: https://nvd.nist.gov/vuln/detail/CVE-2021-44228
echo "Review CVE details at https://nvd.nist.gov/vuln/detail/<CVE_ID>"
Expected output:
Trivy JSON output listing the package, version, CVE ID, and whether a fix is available.
Example:
  {
    "VulnerabilityID": "CVE-2024-XXXXX",
    "Severity": "CRITICAL",
    "PkgName": "openssl",
    "InstalledVersion": "3.0.2",
    "FixedVersion": "3.0.9",
    "Description": "..."
  }
If this fails: If trivy is not installed, use grype <IMAGE_NAME>:<TAG> as an alternative. If neither is available, check the registry UI (Docker Hub, ECR, GHCR) — most provide built-in vulnerability scanning.

Step 2: Identify All Affected Images in Production

Why: A vulnerability rarely affects only one image. If you have multiple services built from the same base image, they are all affected and must all be patched.

# List all images currently running in the cluster:
kubectl get pods -A -o jsonpath='{range .items[*]}{.spec.containers[*].image}{"\n"}{end}' | sort -u

# Filter to images using the same base or package:
kubectl get pods -A -o jsonpath='{range .items[*]}{.spec.containers[*].image}{"\n"}{end}' | \
  sort -u | grep "<BASE_IMAGE_NAME>"

# If you have an image inventory or SBOM, use it to find all images with the vulnerable package:
# Trivy can scan a whole registry or list of images — example with a list file:
cat images.txt | xargs -I{} trivy image {} --severity CRITICAL --no-progress --quiet
Expected output:
A list of unique image references (registry/image:tag) currently running.
Identify which ones share the vulnerable base image or package.
Example:
  myregistry.com/frontend:v1.2.3   ← uses node:18-alpine base (affected)
  myregistry.com/api:v2.0.1        ← uses python:3.11-slim base (not affected)
  myregistry.com/worker:v1.5.0     ← uses node:18-alpine base (affected)
If this fails: If you cannot list cluster images, check deployment manifests in git: grep -r "image:" k8s/ | grep -v "#" or grep -r "image:" helm/

Step 3: Check for a Patched Base Image or Package Version

Why: Most CVEs in container images are fixed by updating the base image or a system package — you need to know if a fix is available before writing code.

# Pull the latest version of your base image and check if the CVE is fixed:
docker pull <BASE_IMAGE>:<TAG>
trivy image <BASE_IMAGE>:<TAG> --severity CRITICAL --no-progress | grep <CVE_ID>

# If using a pinned version, check the release notes or Docker Hub tags for a patched version:
# Example for Alpine:
docker pull alpine:3.19
trivy image alpine:3.19 --severity CRITICAL --no-progress | grep <CVE_ID>

# If the CVE is in a package (not the OS), check the package manager for a fix:
# Debian/Ubuntu:
docker run --rm <BASE_IMAGE>:<TAG> apt-cache policy <PACKAGE_NAME>
# Alpine:
docker run --rm <BASE_IMAGE>:<TAG> apk info <PACKAGE_NAME>
# Python:
pip index versions <PACKAGE_NAME>
Expected output:
If patched: CVE_ID does NOT appear in trivy output for the new base image/tag.
If not patched: CVE_ID still appears — you need a workaround or must wait for an upstream fix.
If this fails: If the CVE is not fixed in any available version, consider runtime mitigations (network policy to block external access to the vulnerable service, WAF rule if it's a web vulnerability) and document the accepted risk with a target fix date.

Step 4: Update the Dockerfile and Rebuild

Why: The fix must be baked into the image — it cannot be applied to running containers. Update the Dockerfile and rebuild.

# Option A — update the base image tag in the Dockerfile:
# Before: FROM node:18.12-alpine
# After:  FROM node:18.20-alpine  ← patched version
# Edit the Dockerfile:
sed -i 's|FROM <BASE_IMAGE>:<OLD_TAG>|FROM <BASE_IMAGE>:<PATCHED_TAG>|g' Dockerfile

# Option B — add an apt-get/apk upgrade step to patch a specific package:
# Add this line AFTER the FROM line in your Dockerfile:
# RUN apt-get update && apt-get install -y --only-upgrade <PACKAGE_NAME> && rm -rf /var/lib/apt/lists/*
# For Alpine:
# RUN apk update && apk upgrade <PACKAGE_NAME>

# Rebuild the image:
docker build -t <IMAGE_NAME>:<NEW_TAG> .
Expected output:
Docker build succeeds:
  "Successfully built <IMAGE_ID>"
  "Successfully tagged <IMAGE_NAME>:<NEW_TAG>"
If this fails: If the build fails after updating the base image, there may be a compatibility issue between the new base image version and your application dependencies. Check the base image's release notes for breaking changes.

Step 5: Scan the Rebuilt Image — Confirm the CVE Is Fixed

Why: A rebuild does not guarantee the CVE is fixed. You must scan again to confirm the vulnerability is gone before deploying.

# Scan the newly built image:
trivy image <IMAGE_NAME>:<NEW_TAG> --severity CRITICAL,HIGH --no-progress

# Confirm the specific CVE is no longer present:
trivy image <IMAGE_NAME>:<NEW_TAG> --severity CRITICAL,HIGH --no-progress | grep <CVE_ID>
# If this returns nothing, the CVE is fixed.

# Alternative scanner:
grype <IMAGE_NAME>:<NEW_TAG> --only-fixed
Expected output:
trivy scan: no output for the specific CVE_ID — it has been fixed.
Overall: ideally zero CRITICAL findings. If other CVEs remain, triage each one.
If this fails: If the CVE persists after rebuild, the package was not actually updated. Try adding an explicit RUN apt-get install -y <PACKAGE>=<FIXED_VERSION> (or Alpine equivalent) to force the patched version.

Step 6: Push the Patched Image and Deploy

Why: The fix is only in production when the patched image is deployed. Deploy immediately for P1/P2 — do not wait for the next scheduled release.

# Push the patched image to the registry:
docker push <IMAGE_NAME>:<NEW_TAG>

# Update the Kubernetes deployment to use the new image:
kubectl set image deployment/<DEPLOY_NAME> \
  <CONTAINER_NAME>=<IMAGE_NAME>:<NEW_TAG> \
  -n <NAMESPACE>

# Monitor the rollout:
kubectl rollout status deployment/<DEPLOY_NAME> -n <NAMESPACE>

# Verify pods are running the new image:
kubectl get pods -n <NAMESPACE> -l app=<APP_LABEL> \
  -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}{end}'
Expected output:
"deployment.apps/<DEPLOY_NAME> image updated"
"deployment "<DEPLOY_NAME>" successfully rolled out"
Pod image column shows the new tag (not the old one).
If this fails: If the deployment fails (pods crash), the patched base image may have introduced a breaking change. Roll back with kubectl rollout undo deployment/<DEPLOY_NAME> -n <NAMESPACE> and investigate the crash logs.

Verification

# Confirm the issue is resolved
kubectl get pods -n <NAMESPACE> -l app=<APP_LABEL>
trivy image <IMAGE_NAME>:<NEW_TAG> --severity CRITICAL --no-progress | grep <CVE_ID>
Success looks like: All pods running the new image tag. No output from the trivy grep for the CVE ID. No new application errors in logs. If still broken: Escalate — see below.

Escalation

Condition Who to Page What to Say
Not resolved in 30 min (P1) Security on-call + Platform on-call "P1 CVE: in production service , no patch deployed yet, need immediate help"
CVE is being actively exploited against us Security on-call "Security incident: active exploitation of detected in , initiating incident response"
No fix available from upstream Security on-call "CVE has no upstream fix available — need security review for compensating controls"
Scope expanding (many services affected) Security on-call + Platform on-call "CVE affects services — coordinated patching needed across "

Post-Incident

  • Update monitoring if alert was noisy or missing
  • File postmortem if P1/P2
  • Update this runbook if steps were wrong or incomplete
  • Pin base image versions (do not use :latest) so rebuilds are reproducible
  • Add CVE scanning to the CI pipeline so vulnerabilities are caught before they reach production
  • Schedule a follow-up scan in 30 days to check for newly-discovered CVEs in the same image
  • Update the team's SLA document for CVE response times (P1: 24h, P2: 7 days, P3: 30 days)

Common Mistakes

  1. Updating the base image without re-pinning the version: Changing FROM node:18-alpine to FROM node:latest fixes the CVE now but introduces drift — the next build may pull a different version. Always pin to an explicit version tag.
  2. Not scanning after the fix: Rebuilding does not guarantee the CVE is fixed. Always run trivy again on the new image before deploying.
  3. Deploying without testing: A patched base image can introduce breaking changes (new library versions, different binary locations). Run your test suite against the new image before deploying to production.
  4. Treating all CVEs as equal severity: A CRITICAL CVE with no network attack vector and no public exploit is very different from a CRITICAL CVE being actively exploited in the wild. Triage before acting.
  5. Patching only the alerting service: If multiple services share the same base image, they must all be patched. Check all running images, not just the one that triggered the alert.

Cross-References


Wiki Navigation