Skip to content

Vault: Secrets That Expire on Purpose

  • lesson
  • hashicorp-vault
  • secrets-management
  • dynamic-credentials
  • kubernetes
  • pki
  • encryption
  • access-control ---# Vault — Secrets That Expire on Purpose

Topics: HashiCorp Vault, secrets management, dynamic credentials, Kubernetes, PKI, encryption, access control Level: L1–L2 (Foundations → Operations) Time: 60–90 minutes Prerequisites: None (everything is explained from scratch)


The Mission

It's Monday morning. A security scanner just flagged a PostgreSQL password in a public GitHub repository. It's been there for eleven days. The password is prod-db-2024! and it has full read-write access to the production database. Eleven days of exposure. Every automated credential scraper on the internet has it by now.

Your job: rotate that credential right now, then set things up so this can never happen again. Not "probably won't happen" — structurally impossible. By the end of this lesson, your database credentials won't exist long enough to leak. They'll be generated on demand, scoped to one consumer, and dead within an hour.

We're going to build up from the disaster to the fix, touching secrets management, Vault architecture, dynamic credentials, Kubernetes integration, and production operations along the way.


Part 1: Stop the Bleeding

Before we learn anything, we fix the immediate problem. The credential is public.

# Step 1: Connect to PostgreSQL and change the password immediately
psql -h db.example.com -U admin -d myapp -c \
  "ALTER USER app_user PASSWORD 'emergency-rotated-$(date +%s)';"

# Step 2: Restart the application so it picks up the new password
kubectl rollout restart deployment/myapp -n production

# Step 3: Verify the old password no longer works
psql -h db.example.com -U app_user -d myapp -c "SELECT 1;"
# Expected: FATAL: password authentication failed

Good. The fire is out. But you've just replaced one static password with another static password. If someone commits this one to Git in six months, you're back here. The real fix isn't a better password — it's eliminating long-lived passwords entirely.

Mental Model: Static secrets are like house keys cut from metal — they work until someone physically takes them away. Dynamic secrets are like hotel key cards — they stop working at checkout time whether you return them or not. Vault turns your infrastructure into a hotel.


Part 2: What Is Vault and Why Does It Exist

HashiCorp Vault is a secrets management tool that stores, generates, and controls access to secrets. But calling it a "password manager for servers" undersells it. In its most powerful mode, Vault doesn't store secrets at all — it generates them on demand and destroys them automatically.

Trivia: Vault was released in April 2015 by Mitchell Hashimoto and Armon Dadgar. Before Vault, the industry standard for secrets management was — no joke — encrypted Excel spreadsheets, password-protected Word documents, and sticky notes on monitors. Vault introduced dynamic secrets, which was a genuine paradigm shift: credentials that exist only for as long as they're needed, then evaporate.

The four pillars

Concept What it does Example
Secret engine Stores or generates secrets KV (static), database (dynamic), PKI (certificates), transit (encryption)
Auth method Verifies who's asking Kubernetes ServiceAccount, AppRole, OIDC, username/password
Policy Controls what they can access "This token can read secret/data/myapp/* and nothing else"
Lease Controls how long it lasts "These database credentials expire in 1 hour"

These four concepts interact on every single request. A pod authenticates (auth method), receives a token scoped to a policy, uses that token to read from a secret engine, and the result has a lease that determines when it expires. Miss any one of these and you'll hit a wall.


Part 3: The Seal/Unseal Ceremony

Before Vault can serve a single secret, it needs to be unsealed. This is the part that surprises people coming from simpler tools.

Vault encrypts everything it stores. On startup, it has the encrypted data but not the key to decrypt it. This is the sealed state. You need to provide the master key to unlock it.

But here's the twist: the master key doesn't exist as a single thing. It's been split into pieces using Shamir's Secret Sharing.

Under the Hood: Shamir's Secret Sharing was invented in 1979 by Adi Shamir — the same Shamir who is the "S" in RSA encryption. The math is polynomial interpolation: any k points on a polynomial of degree k-1 can reconstruct the polynomial, but k-1 points reveal absolutely nothing about it. Vault's default is 5 shares with a threshold of 3 — meaning any 3 of the 5 key holders can unseal Vault, but 2 key holders together learn nothing about the master key.

Trivia: The unseal ceremony is modeled on nuclear launch key concepts and bank vault procedures — multiple people must act together, preventing any single person from accessing the secrets alone. This is called "split knowledge" in the security world.

# First-time initialization: creates the master key and splits it
vault operator init -key-shares=5 -key-threshold=3
# Output:
# Unseal Key 1: s.Ah4f8Gj2kL...
# Unseal Key 2: s.Bm7n3Pq9xR...
# Unseal Key 3: s.Cx1w5Yz0tU...
# Unseal Key 4: s.Dv6e2Mn8jK...
# Unseal Key 5: s.Ew9i4Qr7sF...
# Initial Root Token: hvs.abc123def456
#
# ⚠ Store each key with a DIFFERENT person. Losing enough keys = losing Vault forever.

# Unsealing: three different people each provide their key
vault operator unseal s.Ah4f8Gj2kL...   # Person 1 — "Unseal Progress 1/3"
vault operator unseal s.Bm7n3Pq9xR...   # Person 2 — "Unseal Progress 2/3"
vault operator unseal s.Cx1w5Yz0tU...   # Person 3 — "Sealed: false" 🎉

# Check status
vault status
# Sealed          false
# HA Enabled      true
# Version         1.15.4

Auto-unseal: because 3am pages are not a ceremony

Manual unsealing is beautifully secure and operationally terrible. If Vault restarts at 3am — after a kernel update, a node eviction, a power blip — it comes up sealed. No secrets are served. Every application that depends on Vault starts failing. Someone has to wake up, coordinate with two other key holders, and unseal.

Auto-unseal delegates the master key to a cloud KMS:

# vault.hcl — auto-unseal with AWS KMS
seal "awskms" {
  region     = "us-east-1"
  kms_key_id = "alias/vault-unseal-key"
}

Now Vault unseals itself on restart using the KMS key. The security model shifts: instead of protecting 5 key shares, you protect one KMS key's IAM permissions.

Gotcha: A team configured auto-unseal with AWS KMS, then rebuilt their Vault cluster in a new AWS account without migrating the KMS key. The old KMS key was deleted. The Vault data was encrypted with a key that no longer existed. All secrets were permanently lost. Always back up the KMS key ARN, ensure cross-account access, and test recovery before you need it.

Remember: Mnemonic for Vault health endpoint status codes: "200 happy, 429 waiting, 501 naked, 503 locked." 200 = active and ready. 429 = standby node in HA (waiting its turn). 501 = not initialized (no keys yet — naked). 503 = sealed (locked up tight).


Flashcard Check #1

Question Answer (cover this column)
What does "sealed" mean in Vault? Vault has encrypted data but cannot decrypt it — the master key is not in memory
How many key shares are needed to unseal with default settings? 3 of 5 (Shamir's Secret Sharing)
What algorithm splits the master key? Shamir's Secret Sharing (1979) — polynomial interpolation
What does auto-unseal replace? Manual unseal ceremony — delegates to a cloud KMS
What happens if Vault restarts without auto-unseal? It starts sealed. No secrets can be read until humans unseal it

Part 4: Secret Engines — The Vaults Inside the Vault

Vault is not one big bucket. It's a collection of secret engines, each mounted at a path, each with different superpowers.

KV v2: the simple one

Key-Value version 2 stores static secrets with versioning. Think of it as an encrypted, access-controlled, audited configuration store.

# Enable KV v2 at the "secret/" path
vault secrets enable -path=secret kv-v2

# Store a secret
vault kv put secret/myapp/database \
  username="dbadmin" \
  password="s3cur3_p@ss" \
  host="db.example.com"

# Read it back
vault kv get secret/myapp/database
# Key        Value
# ---        -----
# host       db.example.com
# password   s3cur3_p@ss
# username   dbadmin

# Read just one field
vault kv get -field=password secret/myapp/database
# s3cur3_p@ss

# Oops, bad password. Update it:
vault kv put secret/myapp/database \
  username="dbadmin" \
  password="n3w_p@ss_2026" \
  host="db.example.com"

# But the old version is still there (v2 keeps history):
vault kv get -version=1 secret/myapp/database
# password = s3cur3_p@ss  ← version 1

# Soft-delete (recoverable):
vault kv delete secret/myapp/database
# Permanent destroy:
vault kv destroy -versions=1 secret/myapp/database

Gotcha: KV v2 adds /data/ and /metadata/ to internal paths. The CLI hides this — vault kv get secret/myapp works fine. But policies must use the internal path: path "secret/data/myapp/*", not path "secret/myapp/*". This mismatch is the #1 cause of "permission denied" errors when people first set up Vault policies. Use vault kv get -output-policy secret/myapp/config to see the exact policy path you need.

Database engine: credentials that self-destruct

This is where Vault gets interesting. Instead of storing a database password, Vault creates a temporary database user on the fly and destroys it when the lease expires.

# Enable the database secrets engine
vault secrets enable database

# Tell Vault how to connect to PostgreSQL
vault write database/config/mydb \
  plugin_name=postgresql-database-plugin \
  connection_url="postgresql://{{username}}:{{password}}@db.example.com:5432/mydb" \
  allowed_roles="readonly,readwrite" \
  username="vault_admin" \
  password="vault_admin_pass"

# Define a role: what kind of user to create
vault write database/roles/readonly \
  db_name=mydb \
  creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' \
    VALID UNTIL '{{expiration}}'; \
    GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
  default_ttl="1h" \
  max_ttl="24h"

Now watch this:

# Generate temporary credentials
vault read database/creds/readonly
# Key                Value
# ---                -----
# lease_id           database/creds/readonly/abc123
# lease_duration     1h
# username           v-token-readonly-8hF3kL
# password           A1b-2Cd-3Ef-4Gh

# That username and password work RIGHT NOW:
psql -h db.example.com -U v-token-readonly-8hF3kL -d mydb
# Connected!

# In 1 hour, Vault automatically runs:
# REVOKE ALL ON ALL TABLES FROM "v-token-readonly-8hF3kL";
# DROP ROLE "v-token-readonly-8hF3kL";

Every request gets a unique username and password. If one leaks, it affects one consumer for at most one hour. Compare that to prod-db-2024! sitting in Git for eleven days.

Mental Model: Static secrets are like giving everyone a copy of the office master key. Dynamic secrets are like a receptionist who creates a badge for each visitor, sets it to expire at 5pm, and shreds it if they don't return it. The receptionist is Vault. The badge is the lease.

Other engines worth knowing

Engine What it does When to use it
PKI Issues TLS certificates from an internal CA Service-to-service mTLS, short-lived certs
AWS Generates temporary IAM credentials Applications that need AWS access
Transit Encrypts/decrypts data without exposing keys Application-level encryption (Vault never stores your data)
SSH Signs SSH certificates or generates OTPs SSH access without distributing private keys
# Transit: encryption without seeing the key
vault secrets enable transit
vault write -f transit/keys/myapp-key

vault write transit/encrypt/myapp-key \
  plaintext=$(echo "credit-card-1234" | base64)
# ciphertext: vault:v1:8SDd3WHDOjf7...

vault write transit/decrypt/myapp-key \
  ciphertext="vault:v1:8SDd3WHDOjf7..."
# plaintext: Y3JlZGl0LWNhcmQtMTIzNA==   (base64 of "credit-card-1234")

Under the Hood: Transit key rotation creates a new key version but keeps old versions for decryption. You can set min_decryption_version to a higher number, which effectively performs crypto-shredding — making old ciphertext permanently unreadable without deleting the ciphertext itself. Useful for GDPR "right to be forgotten" compliance.


Part 5: Auth Methods — Proving You Are Who You Claim

Vault doesn't hand secrets to anonymous callers. Every request requires a token, and tokens come from authentication.

Token auth (the foundation)

Every auth method eventually produces a token. Tokens are the internal currency.

# Create a token with a specific policy and TTL
vault token create -policy=myapp-secrets -ttl=1h
# Key                Value
# ---                -----
# token              hvs.CAESIJ...
# token_policies     [default myapp-secrets]
# token_ttl          1h

# Use it
export VAULT_TOKEN="hvs.CAESIJ..."
vault kv get secret/myapp/database

AppRole (for machines and CI/CD)

AppRole splits authentication into two pieces: a role_id (like a username — long-lived, baked into config) and a secret_id (like a password — short-lived, delivered at runtime).

vault auth enable approle

# Create a role for CI pipelines
vault write auth/approle/role/ci-pipeline \
  token_policies="ci-deploy" \
  token_ttl=1h \
  token_max_ttl=4h \
  secret_id_ttl=10m

# Get the role ID (bake this into the CI image)
vault read auth/approle/role/ci-pipeline/role-id
# role_id: 7a6b8c9d-e0f1-2345-6789-abcdef012345

# Generate a secret ID (inject this at runtime via trusted orchestrator)
vault write -f auth/approle/role/ci-pipeline/secret-id
# secret_id: a1b2c3d4-5678-90ab-cdef-ghijklmnopqr (expires in 10 minutes)

# Login
vault write auth/approle/login \
  role_id="7a6b8c9d-e0f1-2345-6789-abcdef012345" \
  secret_id="a1b2c3d4-5678-90ab-cdef-ghijklmnopqr"
# Returns: a Vault token with the ci-deploy policy

Under the Hood: The split between role_id and secret_id is deliberate. The role ID is safe to bake into a container image — it identifies which role, but can't authenticate alone. The secret ID is the short-lived credential injected at runtime by a trusted orchestrator (Terraform, Kubernetes, your CI platform). Neither piece is useful without the other.

Kubernetes auth (the most common in practice)

A pod authenticates to Vault using its ServiceAccount JWT token. Vault validates the token against the Kubernetes API server.

vault auth enable kubernetes

vault write auth/kubernetes/config \
  kubernetes_host="https://kubernetes.default.svc"

vault write auth/kubernetes/role/myapp \
  bound_service_account_names=myapp-sa \
  bound_service_account_namespaces=production \
  policies=myapp-secrets \
  ttl=1h

Now any pod running as ServiceAccount myapp-sa in the production namespace can authenticate to Vault and get a token with the myapp-secrets policy. No passwords anywhere.

Gotcha: Never use bound_service_account_namespaces=["*"] in production. This lets any namespace with a matching ServiceAccount name authenticate as this role — including test namespaces, CI namespaces, and anything a developer creates. Always specify exact namespaces.

OIDC (for humans)

vault auth enable oidc

vault write auth/oidc/config \
  oidc_discovery_url="https://accounts.google.com" \
  oidc_client_id="vault-app" \
  oidc_client_secret="client-secret-here" \
  default_role="engineer"

# Login opens a browser
vault login -method=oidc role=engineer

Part 6: Policies — The Principle of Least Privilege in HCL

Policies control what a token can do. They're written in HCL, path-based, and default to deny everything.

# myapp-policy.hcl
# Read application secrets
path "secret/data/myapp/*" {
  capabilities = ["read", "list"]
}

# Generate dynamic database credentials
path "database/creds/readonly" {
  capabilities = ["read"]
}

# Allow the token to manage itself
path "auth/token/renew-self" {
  capabilities = ["update"]
}
path "auth/token/lookup-self" {
  capabilities = ["read"]
}

# Explicitly deny access to other apps
path "secret/data/billing/*" {
  capabilities = ["deny"]
}

The capabilities are: create, read, update, delete, list, sudo, deny. Deny always wins.

# Apply the policy
vault policy write myapp-secrets myapp-policy.hcl

# Test what a token can actually do
vault token capabilities hvs.CAESIJ... secret/data/myapp/database
# read

vault token capabilities hvs.CAESIJ... secret/data/billing/stripe-key
# deny

Gotcha: The root policy bypasses everything and never expires. After initial setup, revoke the root token immediately: vault token revoke <root-token>. If you need root access later, regenerate a temporary one with vault operator generate-root using the unseal keys, use it, then revoke it again. Services should never authenticate with root.


Flashcard Check #2

Question Answer (cover this column)
What are the four Vault pillars? Secret engine, auth method, policy, lease
What's the difference between static and dynamic secrets? Static are stored and retrieved; dynamic are generated on demand with a TTL and auto-revoked
Why does KV v2 cause "permission denied" surprises? Policies need secret/data/... path, but CLI uses secret/... — the /data/ is hidden
What does AppRole's secret_id do vs role_id? role_id = identity (long-lived), secret_id = credential (short-lived, runtime-injected)
What's the default policy behavior? Deny everything — you must explicitly grant access
Why is root token dangerous in production? Bypasses all policies, never expires, full access to everything

Part 7: Dynamic Database Secrets — The Full Walkthrough

Back to our mission. We rotated the leaked password manually. Now we're going to make passwords unnecessary. Here's the complete setup, step by step.

Step 1: Vault needs a privileged database account

Vault itself needs credentials that can create and destroy database users. This is the only long-lived credential — and it stays inside Vault, never in Git.

# Create a Vault-managed admin user in PostgreSQL
psql -h db.example.com -U postgres -c \
  "CREATE ROLE vault_admin WITH LOGIN PASSWORD 'vault-managed-2026' CREATEROLE;
   GRANT ALL ON SCHEMA public TO vault_admin;"

Step 2: Configure the database engine

vault secrets enable database

vault write database/config/production-db \
  plugin_name=postgresql-database-plugin \
  connection_url="postgresql://{{username}}:{{password}}@db.example.com:5432/myapp" \
  allowed_roles="app-readonly,app-readwrite" \
  username="vault_admin" \
  password="vault-managed-2026"

# Rotate the root password so even you don't know it anymore
vault write -f database/rotate-root/production-db
# Now Vault has changed vault_admin's password to something only Vault knows

That last command is important. After rotate-root, the password you typed on the command line is no longer valid. Vault generated a new one and stored it internally. No human knows the database admin password.

Step 3: Create roles

# Read-only role: SELECT only, 1-hour TTL
vault write database/roles/app-readonly \
  db_name=production-db \
  creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' \
    VALID UNTIL '{{expiration}}'; \
    GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
  revocation_statements="DROP ROLE IF EXISTS \"{{name}}\";" \
  default_ttl="1h" \
  max_ttl="24h"

# Read-write role: full DML, 30-minute TTL (shorter = less risk)
vault write database/roles/app-readwrite \
  db_name=production-db \
  creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' \
    VALID UNTIL '{{expiration}}'; \
    GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
  revocation_statements="DROP ROLE IF EXISTS \"{{name}}\";" \
  default_ttl="30m" \
  max_ttl="4h"

Step 4: Use it

# Application requests credentials
vault read database/creds/app-readonly
# Key                Value
# ---                -----
# lease_id           database/creds/app-readonly/7yKz...
# lease_duration     1h
# username           v-approle-app-read-Xk9mN2
# password           B4f-7Gh-1Jk-3Lm

# That credential works immediately
psql -h db.example.com -U v-approle-app-read-Xk9mN2 -d myapp -c "SELECT count(*) FROM users;"
#  count
# -------
#  42857

# After 1 hour: Vault drops the role. The credential is dead.

Step 5: Lease management

# Renew a lease before it expires (extend the TTL)
vault lease renew database/creds/app-readonly/7yKz...
# lease_duration: 1h (reset from now)

# Renew with a specific increment
vault lease renew -increment=30m database/creds/app-readonly/7yKz...

# Revoke immediately (emergency rotation)
vault lease revoke database/creds/app-readonly/7yKz...

# Nuclear option: revoke ALL credentials for this role
vault lease revoke -prefix database/creds/app-readonly/

Gotcha: If your application requests credentials at startup and never renews the lease, the credentials die silently after the TTL expires. Long-running batch jobs are especially vulnerable — they start at noon with a 1-hour lease and fail at 1:01pm with a cryptic database connection error. Use a Vault SDK with built-in lease renewal, or implement renewal at 2/3 of the TTL interval.


Part 8: Vault Agent — The Secret Delivery Truck

Your application shouldn't need to know about Vault. It should just read a file. Vault Agent handles authentication, token renewal, and secret rendering as a sidecar process.

# vault-agent.hcl
auto_auth {
  method "kubernetes" {
    mount_path = "auth/kubernetes"
    config = {
      role = "myapp"
    }
  }
  sink "file" {
    config = {
      path = "/tmp/vault-token"
    }
  }
}

template {
  source      = "/etc/vault/templates/db.tpl"
  destination = "/app/config/database.env"
  perms       = 0600
  command     = "pkill -HUP myapp"  # signal the app to reload
}

vault {
  address = "https://vault.example.com:8200"
}

The template file uses Consul Template syntax:

{{ with secret "database/creds/app-readonly" }}
DB_HOST=db.example.com
DB_USER={{ .Data.username }}
DB_PASS={{ .Data.password }}
{{ end }}
# Start the agent
vault agent -config=vault-agent.hcl

# The agent:
# 1. Authenticates to Vault using the pod's ServiceAccount
# 2. Renders the template with live credentials
# 3. Writes /app/config/database.env with mode 0600
# 4. Signals the app to reload
# 5. Renews the lease before it expires
# 6. Re-renders the template with new credentials when the old ones expire

Your application just reads /app/config/database.env. It never touches the Vault API.


Part 9: Vault in Kubernetes — Three Approaches

Option 1: Vault Agent Injector (most common)

Uses a mutating admission webhook to inject a Vault Agent sidecar into pods automatically.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  template:
    metadata:
      annotations:
        vault.hashicorp.com/agent-inject: "true"
        vault.hashicorp.com/role: "myapp"
        vault.hashicorp.com/agent-inject-secret-db-creds: "database/creds/app-readonly"
        vault.hashicorp.com/agent-inject-template-db-creds: |
          {{- with secret "database/creds/app-readonly" -}}
          postgresql://{{ .Data.username }}:{{ .Data.password }}@db.example.com:5432/myapp
          {{- end -}}
    spec:
      serviceAccountName: myapp-sa
      containers:
        - name: app
          image: myapp:latest
          # Secret is available at /vault/secrets/db-creds
          env:
            - name: DATABASE_URL
              value: "file:///vault/secrets/db-creds"

Option 2: Vault CSI Provider

Mounts secrets as volumes via the Secrets Store CSI Driver. No sidecar — secrets appear as files in a mounted volume.

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: vault-db-creds
spec:
  provider: vault
  parameters:
    roleName: "myapp"
    vaultAddress: "https://vault.example.com:8200"
    objects: |
      - objectName: "db-password"
        secretPath: "database/creds/app-readonly"
        secretKey: "password"

Option 3: Vault Secrets Operator (VSO)

The newest option. A Kubernetes-native operator that syncs Vault secrets into Kubernetes Secret objects.

apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultDynamicSecret
metadata:
  name: db-creds
spec:
  mount: database
  path: creds/app-readonly
  destination:
    name: myapp-db-creds
    create: true
  rolloutRestartTargets:
    - kind: Deployment
      name: myapp

VSO automatically triggers a rollout restart when the dynamic credentials rotate — solving the stale-secrets problem that plagues the other approaches.

Approach Sidecar? Auto-rotation? K8s Secret created?
Agent Injector Yes Yes (agent renews) No (writes to file)
CSI Provider No Limited Optional
VSO No Yes (operator handles) Yes

Flashcard Check #3

Question Answer (cover this column)
What does vault write -f database/rotate-root/... do? Changes the DB admin password to a random one only Vault knows
Why set default_ttl="1h" on a database role? Credentials auto-expire after 1 hour, limiting blast radius of a leak
What does Vault Agent do for applications? Handles auth, token renewal, secret rendering, and lease management — app just reads a file
What's the KV v2 /data/ path trap? CLI uses secret/myapp, but policies must use secret/data/myapp
How does Vault Secrets Operator handle credential rotation? Syncs new credentials to a K8s Secret and triggers rollout restart

Part 10: High Availability and Disaster Recovery

Raft consensus (integrated storage)

Vault's recommended storage backend is integrated Raft storage. Three or five nodes form a consensus cluster. One is the active leader; the rest are standby replicas.

# Check cluster membership
vault operator raft list-peers
# Node       Address              State       Voter
# ----       -------              -----       -----
# vault-0    10.0.1.10:8201       leader      true
# vault-1    10.0.1.11:8201       follower    true
# vault-2    10.0.1.12:8201       follower    true

# Check autopilot health
vault operator raft autopilot state
# Healthy: true
# Leader: vault-0

Snapshots

# Take a snapshot (backup)
vault operator raft snapshot save vault-backup-2026-03-23.snap

# Restore from a snapshot
vault operator raft snapshot restore vault-backup-2026-03-23.snap

# Automate daily snapshots
# (cron, systemd timer, or Kubernetes CronJob)
0 2 * * * vault operator raft snapshot save /backups/vault-$(date +\%Y\%m\%d).snap

War Story: A three-node Vault cluster ran on Kubernetes. A network partition isolated the leader from the two followers. The followers elected a new leader — this is Raft working correctly. But the old leader didn't know it was deposed. It continued accepting writes for about 10 seconds (within the lease heartbeat window) before realizing it was partitioned. It then sealed itself — which is the correct safety behavior. But the applications connected to the old leader suddenly lost their Vault connection. The new leader was serving secrets fine, but the apps had cached the old leader's address. The fix was configuring applications to use the Vault service DNS name (which follows the active leader) rather than a specific pod IP. Lesson: Vault's Raft failover works, but your clients need to follow the leader.

Disaster recovery in practice

What failed What you do
Single node down Raft handles it — majority still has quorum
Two of three nodes down Cluster loses quorum. Vault seals. Bring a node back or force-join
All nodes down Restore from snapshot. Re-initialize if snapshots are lost
KMS key deleted (auto-unseal) Data is permanently unrecoverable. This is why you test DR
Unseal keys lost (manual seal) Data is permanently unrecoverable. This is why you use auto-unseal

Part 11: Audit Logging — Trust but Verify

Vault can log every single request and response. In production, this is non-negotiable.

# Enable file-based audit logging
vault audit enable file file_path=/var/log/vault/audit.log

# Enable syslog too (belt and suspenders)
vault audit enable -path=syslog syslog

# Check active audit devices
vault audit list

The audit log is JSON — one line per request/response pair:

# Who's been reading our database credentials?
cat /var/log/vault/audit.log | \
  jq 'select(.request.path == "database/creds/app-readonly") |
      {time: .time, remote: .request.remote_address,
       accessor: .auth.accessor}'

# Count requests per path (find the hot paths)
cat /var/log/vault/audit.log | \
  jq -r '.request.path' | sort | uniq -c | sort -rn | head -10

# Find permission denied events (misconfigured apps or attackers)
cat /var/log/vault/audit.log | \
  jq 'select(.response.data.error != null) |
      {time: .time, path: .request.path, error: .response.data.error}'

Gotcha: Vault will refuse to serve any request if all configured audit devices fail. This is fail-closed by design — better to be unavailable than unaudited. Monitor your audit log disk space. If the audit log partition fills up, Vault stops working entirely. Ship logs to a SIEM and rotate aggressively.

Under the Hood: Vault HMAC's sensitive fields in audit logs by default — the values are hashed, not plaintext. This means you can see that a secret was read but not what the value was. If you need to correlate a leaked value back to an accessor, use vault audit hash to compute the HMAC of the suspect value and search the logs for it.


Part 12: Putting It All Together — From Leak to Lockdown

Let's revisit the mission. We started with a password in Git. Here's the complete before and after:

Before:

Developer → hardcodes DB password → commits to Git → password exposed for 11 days
                           Every clone has the password forever

After:

Pod starts → ServiceAccount authenticates to Vault → Vault creates temp DB user
                                              TTL: 1 hour, auto-revoked
                                              Unique per pod, audited, renewable

No password in Git. No password in environment variables. No password in Kubernetes Secrets. The password exists only in Vault's memory and the database's pg_authid table, for one hour.


Exercises

Exercise 1: Read a Vault secret (2 minutes)

Start a dev Vault server and store a secret:

vault server -dev -dev-root-token-id="dev-token"
export VAULT_ADDR='http://127.0.0.1:8200'
export VAULT_TOKEN='dev-token'

vault kv put secret/exercise/hello message="vault works"

Now retrieve just the message field using vault kv get.

Solution
vault kv get -field=message secret/exercise/hello
# vault works

Exercise 2: Write a least-privilege policy (10 minutes)

Write a policy called web-frontend that: - Can read secrets under secret/data/frontend/* - Can list secrets under secret/metadata/frontend/* - Can generate database credentials from database/creds/frontend-readonly - Cannot access anything else

Solution
# web-frontend.hcl
path "secret/data/frontend/*" {
  capabilities = ["read"]
}
path "secret/metadata/frontend/*" {
  capabilities = ["list"]
}
path "database/creds/frontend-readonly" {
  capabilities = ["read"]
}
vault policy write web-frontend web-frontend.hcl
vault token create -policy=web-frontend -ttl=1h

# Test it
vault token capabilities <new-token> secret/data/frontend/config
# read
vault token capabilities <new-token> secret/data/billing/keys
# deny

Exercise 3: Explain the failure (judgment call)

Your colleague shows you this Vault Agent template that isn't working:

template {
  source      = "/etc/vault/db.tpl"
  destination = "/app/secrets/db.env"
}

The template renders once at startup but never updates when credentials rotate. The app crashes 1 hour after deploy. What's missing and how would you fix it?

Solution The template renders at startup but Vault Agent doesn't re-render when the dynamic secret's lease expires. You need: 1. A `command` to signal the app when the template re-renders 2. Ensure the template references a dynamic secret (so it has a lease to track) 3. The app must be able to reload configuration without a full restart
template {
  source      = "/etc/vault/db.tpl"
  destination = "/app/secrets/db.env"
  perms       = 0600
  command     = "pkill -HUP myapp"  # or: systemctl reload myapp
}
Alternatively, use VSO with `rolloutRestartTargets` to trigger a deployment rollout when credentials change.

Cheat Sheet

Environment

export VAULT_ADDR='https://vault.example.com:8200'
export VAULT_TOKEN='hvs.xxx'           # or use vault login
export VAULT_SKIP_VERIFY=true          # dev only — skip TLS verification

Core operations

Action Command
Check status vault status
Login (userpass) vault login -method=userpass username=alice
Login (AppRole) vault write auth/approle/login role_id=X secret_id=Y
Read KV secret vault kv get secret/path
Read one field vault kv get -field=key secret/path
Write KV secret vault kv put secret/path key=value
Get dynamic DB creds vault read database/creds/role-name
Renew a lease vault lease renew <lease-id>
Revoke a lease vault lease revoke <lease-id>
Revoke all for a role vault lease revoke -prefix database/creds/role/
Write a policy vault policy write name file.hcl
Check capabilities vault token capabilities <token> <path>
Unseal vault operator unseal <key>
Seal (emergency) vault operator seal
Snapshot (backup) vault operator raft snapshot save file.snap

Health endpoint codes

Code Meaning
200 Active, unsealed, initialized
429 Standby node (HA — waiting its turn)
472 Data recovery mode
501 Not initialized
503 Sealed

Policy capabilities

create · read · update · delete · list · sudo · deny

Deny always wins. Default is deny-all.


Takeaways

  • Static secrets are a liability. Every long-lived credential is a breach waiting to happen. Dynamic secrets eliminate the problem by making credentials ephemeral.

  • Vault generates, it doesn't just store. The real power is in secret engines that create credentials on demand — database users, AWS IAM credentials, TLS certificates — all with automatic expiration.

  • The seal/unseal ceremony exists for a reason. Shamir's Secret Sharing ensures no single person can access the vault. Auto-unseal trades human ceremony for KMS dependency — choose based on your operational reality.

  • Policies default to deny. Write least-privilege policies using exact paths. The KV v2 /data/ path prefix will trip you up — use -output-policy to see the real path.

  • Vault Agent makes adoption transparent. Applications read files. The agent handles auth, renewal, and re-rendering. Your app doesn't need a Vault SDK.

  • Audit everything. Vault's fail-closed audit logging means every request is recorded. If audit fails, Vault stops serving — by design.