Vault: Secrets That Expire on Purpose

lesson
hashicorp-vault
secrets-management
dynamic-credentials
kubernetes
pki
encryption
access-control ---# Vault — Secrets That Expire on Purpose

Topics: HashiCorp Vault, secrets management, dynamic credentials, Kubernetes, PKI, encryption, access control Level: L1–L2 (Foundations → Operations) Time: 60–90 minutes Prerequisites: None (everything is explained from scratch)

The Mission¶

It's Monday morning. A security scanner just flagged a PostgreSQL password in a public GitHub repository. It's been there for eleven days. The password is prod-db-2024! and it has full read-write access to the production database. Eleven days of exposure. Every automated credential scraper on the internet has it by now.

Your job: rotate that credential right now, then set things up so this can never happen again. Not "probably won't happen" — structurally impossible. By the end of this lesson, your database credentials won't exist long enough to leak. They'll be generated on demand, scoped to one consumer, and dead within an hour.

We're going to build up from the disaster to the fix, touching secrets management, Vault architecture, dynamic credentials, Kubernetes integration, and production operations along the way.

Part 1: Stop the Bleeding¶

Before we learn anything, we fix the immediate problem. The credential is public.

# Step 1: Connect to PostgreSQL and change the password immediately
psql -h db.example.com -U admin -d myapp -c \
  "ALTER USER app_user PASSWORD 'emergency-rotated-$(date +%s)';"

# Step 2: Restart the application so it picks up the new password
kubectl rollout restart deployment/myapp -n production

# Step 3: Verify the old password no longer works
psql -h db.example.com -U app_user -d myapp -c "SELECT 1;"
# Expected: FATAL: password authentication failed

Good. The fire is out. But you've just replaced one static password with another static password. If someone commits this one to Git in six months, you're back here. The real fix isn't a better password — it's eliminating long-lived passwords entirely.

Mental Model: Static secrets are like house keys cut from metal — they work until someone physically takes them away. Dynamic secrets are like hotel key cards — they stop working at checkout time whether you return them or not. Vault turns your infrastructure into a hotel.

Part 2: What Is Vault and Why Does It Exist¶

HashiCorp Vault is a secrets management tool that stores, generates, and controls access to secrets. But calling it a "password manager for servers" undersells it. In its most powerful mode, Vault doesn't store secrets at all — it generates them on demand and destroys them automatically.

Trivia: Vault was released in April 2015 by Mitchell Hashimoto and Armon Dadgar. Before Vault, the industry standard for secrets management was — no joke — encrypted Excel spreadsheets, password-protected Word documents, and sticky notes on monitors. Vault introduced dynamic secrets, which was a genuine paradigm shift: credentials that exist only for as long as they're needed, then evaporate.

The four pillars¶

Concept	What it does	Example
Secret engine	Stores or generates secrets	KV (static), database (dynamic), PKI (certificates), transit (encryption)
Auth method	Verifies who's asking	Kubernetes ServiceAccount, AppRole, OIDC, username/password
Policy	Controls what they can access	"This token can read `secret/data/myapp/*` and nothing else"
Lease	Controls how long it lasts	"These database credentials expire in 1 hour"

These four concepts interact on every single request. A pod authenticates (auth method), receives a token scoped to a policy, uses that token to read from a secret engine, and the result has a lease that determines when it expires. Miss any one of these and you'll hit a wall.

Part 3: The Seal/Unseal Ceremony¶

Before Vault can serve a single secret, it needs to be unsealed. This is the part that surprises people coming from simpler tools.

Vault encrypts everything it stores. On startup, it has the encrypted data but not the key to decrypt it. This is the sealed state. You need to provide the master key to unlock it.

But here's the twist: the master key doesn't exist as a single thing. It's been split into pieces using Shamir's Secret Sharing.

Under the Hood: Shamir's Secret Sharing was invented in 1979 by Adi Shamir — the same Shamir who is the "S" in RSA encryption. The math is polynomial interpolation: any k points on a polynomial of degree k-1 can reconstruct the polynomial, but k-1 points reveal absolutely nothing about it. Vault's default is 5 shares with a threshold of 3 — meaning any 3 of the 5 key holders can unseal Vault, but 2 key holders together learn nothing about the master key.

Trivia: The unseal ceremony is modeled on nuclear launch key concepts and bank vault procedures — multiple people must act together, preventing any single person from accessing the secrets alone. This is called "split knowledge" in the security world.

# First-time initialization: creates the master key and splits it
vault operator init -key-shares=5 -key-threshold=3
# Output:
# Unseal Key 1: s.Ah4f8Gj2kL...
# Unseal Key 2: s.Bm7n3Pq9xR...
# Unseal Key 3: s.Cx1w5Yz0tU...
# Unseal Key 4: s.Dv6e2Mn8jK...
# Unseal Key 5: s.Ew9i4Qr7sF...
# Initial Root Token: hvs.abc123def456
#
# ⚠ Store each key with a DIFFERENT person. Losing enough keys = losing Vault forever.

# Unsealing: three different people each provide their key
vault operator unseal s.Ah4f8Gj2kL...   # Person 1 — "Unseal Progress 1/3"
vault operator unseal s.Bm7n3Pq9xR...   # Person 2 — "Unseal Progress 2/3"
vault operator unseal s.Cx1w5Yz0tU...   # Person 3 — "Sealed: false" 🎉

# Check status
vault status
# Sealed          false
# HA Enabled      true
# Version         1.15.4

Auto-unseal: because 3am pages are not a ceremony¶

Manual unsealing is beautifully secure and operationally terrible. If Vault restarts at 3am — after a kernel update, a node eviction, a power blip — it comes up sealed. No secrets are served. Every application that depends on Vault starts failing. Someone has to wake up, coordinate with two other key holders, and unseal.

Auto-unseal delegates the master key to a cloud KMS:

# vault.hcl — auto-unseal with AWS KMS
seal "awskms" {
  region     = "us-east-1"
  kms_key_id = "alias/vault-unseal-key"
}

Now Vault unseals itself on restart using the KMS key. The security model shifts: instead of protecting 5 key shares, you protect one KMS key's IAM permissions.

Gotcha: A team configured auto-unseal with AWS KMS, then rebuilt their Vault cluster in a new AWS account without migrating the KMS key. The old KMS key was deleted. The Vault data was encrypted with a key that no longer existed. All secrets were permanently lost. Always back up the KMS key ARN, ensure cross-account access, and test recovery before you need it.

Remember: Mnemonic for Vault health endpoint status codes: "200 happy, 429 waiting, 501 naked, 503 locked." 200 = active and ready. 429 = standby node in HA (waiting its turn). 501 = not initialized (no keys yet — naked). 503 = sealed (locked up tight).

Flashcard Check #1¶

Question	Answer (cover this column)
What does "sealed" mean in Vault?	Vault has encrypted data but cannot decrypt it — the master key is not in memory
How many key shares are needed to unseal with default settings?	3 of 5 (Shamir's Secret Sharing)
What algorithm splits the master key?	Shamir's Secret Sharing (1979) — polynomial interpolation
What does auto-unseal replace?	Manual unseal ceremony — delegates to a cloud KMS
What happens if Vault restarts without auto-unseal?	It starts sealed. No secrets can be read until humans unseal it

Part 4: Secret Engines — The Vaults Inside the Vault¶

Vault is not one big bucket. It's a collection of secret engines, each mounted at a path, each with different superpowers.

KV v2: the simple one¶

Key-Value version 2 stores static secrets with versioning. Think of it as an encrypted, access-controlled, audited configuration store.

# Enable KV v2 at the "secret/" path
vault secrets enable -path=secret kv-v2

# Store a secret
vault kv put secret/myapp/database \
  username="dbadmin" \
  password="s3cur3_p@ss" \
  host="db.example.com"

# Read it back
vault kv get secret/myapp/database
# Key        Value
# ---        -----
# host       db.example.com
# password   s3cur3_p@ss
# username   dbadmin

# Read just one field
vault kv get -field=password secret/myapp/database
# s3cur3_p@ss

# Oops, bad password. Update it:
vault kv put secret/myapp/database \
  username="dbadmin" \
  password="n3w_p@ss_2026" \
  host="db.example.com"

# But the old version is still there (v2 keeps history):
vault kv get -version=1 secret/myapp/database
# password = s3cur3_p@ss  ← version 1

# Soft-delete (recoverable):
vault kv delete secret/myapp/database
# Permanent destroy:
vault kv destroy -versions=1 secret/myapp/database

Gotcha: KV v2 adds /data/ and /metadata/ to internal paths. The CLI hides this — vault kv get secret/myapp works fine. But policies must use the internal path: path "secret/data/myapp/*", not path "secret/myapp/*". This mismatch is the #1 cause of "permission denied" errors when people first set up Vault policies. Use vault kv get -output-policy secret/myapp/config to see the exact policy path you need.

Database engine: credentials that self-destruct¶

This is where Vault gets interesting. Instead of storing a database password, Vault creates a temporary database user on the fly and destroys it when the lease expires.

# Enable the database secrets engine
vault secrets enable database

# Tell Vault how to connect to PostgreSQL
vault write database/config/mydb \
  plugin_name=postgresql-database-plugin \
  connection_url="postgresql://{{username}}:{{password}}@db.example.com:5432/mydb" \
  allowed_roles="readonly,readwrite" \
  username="vault_admin" \
  password="vault_admin_pass"

# Define a role: what kind of user to create
vault write database/roles/readonly \
  db_name=mydb \
  creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' \
    VALID UNTIL '{{expiration}}'; \
    GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
  default_ttl="1h" \
  max_ttl="24h"

Now watch this:

# Generate temporary credentials
vault read database/creds/readonly
# Key                Value
# ---                -----
# lease_id           database/creds/readonly/abc123
# lease_duration     1h
# username           v-token-readonly-8hF3kL
# password           A1b-2Cd-3Ef-4Gh

# That username and password work RIGHT NOW:
psql -h db.example.com -U v-token-readonly-8hF3kL -d mydb
# Connected!

# In 1 hour, Vault automatically runs:
# REVOKE ALL ON ALL TABLES FROM "v-token-readonly-8hF3kL";
# DROP ROLE "v-token-readonly-8hF3kL";

Every request gets a unique username and password. If one leaks, it affects one consumer for at most one hour. Compare that to prod-db-2024! sitting in Git for eleven days.

Mental Model: Static secrets are like giving everyone a copy of the office master key. Dynamic secrets are like a receptionist who creates a badge for each visitor, sets it to expire at 5pm, and shreds it if they don't return it. The receptionist is Vault. The badge is the lease.

Other engines worth knowing¶

Engine	What it does	When to use it
PKI	Issues TLS certificates from an internal CA	Service-to-service mTLS, short-lived certs
AWS	Generates temporary IAM credentials	Applications that need AWS access
Transit	Encrypts/decrypts data without exposing keys	Application-level encryption (Vault never stores your data)
SSH	Signs SSH certificates or generates OTPs	SSH access without distributing private keys

# Transit: encryption without seeing the key
vault secrets enable transit
vault write -f transit/keys/myapp-key

vault write transit/encrypt/myapp-key \
  plaintext=$(echo "credit-card-1234" | base64)
# ciphertext: vault:v1:8SDd3WHDOjf7...

vault write transit/decrypt/myapp-key \
  ciphertext="vault:v1:8SDd3WHDOjf7..."
# plaintext: Y3JlZGl0LWNhcmQtMTIzNA==   (base64 of "credit-card-1234")

Under the Hood: Transit key rotation creates a new key version but keeps old versions for decryption. You can set min_decryption_version to a higher number, which effectively performs crypto-shredding — making old ciphertext permanently unreadable without deleting the ciphertext itself. Useful for GDPR "right to be forgotten" compliance.

Part 5: Auth Methods — Proving You Are Who You Claim¶

Vault doesn't hand secrets to anonymous callers. Every request requires a token, and tokens come from authentication.

Token auth (the foundation)¶

Every auth method eventually produces a token. Tokens are the internal currency.

# Create a token with a specific policy and TTL
vault token create -policy=myapp-secrets -ttl=1h
# Key                Value
# ---                -----
# token              hvs.CAESIJ...
# token_policies     [default myapp-secrets]
# token_ttl          1h

# Use it
export VAULT_TOKEN="hvs.CAESIJ..."
vault kv get secret/myapp/database

AppRole (for machines and CI/CD)¶

AppRole splits authentication into two pieces: a role_id (like a username — long-lived, baked into config) and a secret_id (like a password — short-lived, delivered at runtime).

vault auth enable approle

# Create a role for CI pipelines
vault write auth/approle/role/ci-pipeline \
  token_policies="ci-deploy" \
  token_ttl=1h \
  token_max_ttl=4h \
  secret_id_ttl=10m

# Get the role ID (bake this into the CI image)
vault read auth/approle/role/ci-pipeline/role-id
# role_id: 7a6b8c9d-e0f1-2345-6789-abcdef012345

# Generate a secret ID (inject this at runtime via trusted orchestrator)
vault write -f auth/approle/role/ci-pipeline/secret-id
# secret_id: a1b2c3d4-5678-90ab-cdef-ghijklmnopqr (expires in 10 minutes)

# Login
vault write auth/approle/login \
  role_id="7a6b8c9d-e0f1-2345-6789-abcdef012345" \
  secret_id="a1b2c3d4-5678-90ab-cdef-ghijklmnopqr"
# Returns: a Vault token with the ci-deploy policy

Under the Hood: The split between role_id and secret_id is deliberate. The role ID is safe to bake into a container image — it identifies which role, but can't authenticate alone. The secret ID is the short-lived credential injected at runtime by a trusted orchestrator (Terraform, Kubernetes, your CI platform). Neither piece is useful without the other.

Kubernetes auth (the most common in practice)¶

A pod authenticates to Vault using its ServiceAccount JWT token. Vault validates the token against the Kubernetes API server.

vault auth enable kubernetes

vault write auth/kubernetes/config \
  kubernetes_host="https://kubernetes.default.svc"

vault write auth/kubernetes/role/myapp \
  bound_service_account_names=myapp-sa \
  bound_service_account_namespaces=production \
  policies=myapp-secrets \
  ttl=1h

Now any pod running as ServiceAccount myapp-sa in the production namespace can authenticate to Vault and get a token with the myapp-secrets policy. No passwords anywhere.

Gotcha: Never use bound_service_account_namespaces=["*"] in production. This lets any namespace with a matching ServiceAccount name authenticate as this role — including test namespaces, CI namespaces, and anything a developer creates. Always specify exact namespaces.

OIDC (for humans)¶

vault auth enable oidc

vault write auth/oidc/config \
  oidc_discovery_url="https://accounts.google.com" \
  oidc_client_id="vault-app" \
  oidc_client_secret="client-secret-here" \
  default_role="engineer"

# Login opens a browser
vault login -method=oidc role=engineer

Part 6: Policies — The Principle of Least Privilege in HCL¶

Policies control what a token can do. They're written in HCL, path-based, and default to deny everything.

# myapp-policy.hcl
# Read application secrets
path "secret/data/myapp/*" {
  capabilities = ["read", "list"]
}

# Generate dynamic database credentials
path "database/creds/readonly" {
  capabilities = ["read"]
}

# Allow the token to manage itself
path "auth/token/renew-self" {
  capabilities = ["update"]
}
path "auth/token/lookup-self" {
  capabilities = ["read"]
}

# Explicitly deny access to other apps
path "secret/data/billing/*" {
  capabilities = ["deny"]
}

The capabilities are: create, read, update, delete, list, sudo, deny. Deny always wins.

# Apply the policy
vault policy write myapp-secrets myapp-policy.hcl

# Test what a token can actually do
vault token capabilities hvs.CAESIJ... secret/data/myapp/database
# read

vault token capabilities hvs.CAESIJ... secret/data/billing/stripe-key
# deny

Gotcha: The root policy bypasses everything and never expires. After initial setup, revoke the root token immediately: vault token revoke <root-token>. If you need root access later, regenerate a temporary one with vault operator generate-root using the unseal keys, use it, then revoke it again. Services should never authenticate with root.

Flashcard Check #2¶

Question	Answer (cover this column)
What are the four Vault pillars?	Secret engine, auth method, policy, lease
What's the difference between static and dynamic secrets?	Static are stored and retrieved; dynamic are generated on demand with a TTL and auto-revoked
Why does KV v2 cause "permission denied" surprises?	Policies need `secret/data/...` path, but CLI uses `secret/...` — the `/data/` is hidden
What does AppRole's `secret_id` do vs `role_id`?	`role_id` = identity (long-lived), `secret_id` = credential (short-lived, runtime-injected)
What's the default policy behavior?	Deny everything — you must explicitly grant access
Why is root token dangerous in production?	Bypasses all policies, never expires, full access to everything

Part 7: Dynamic Database Secrets — The Full Walkthrough¶

Back to our mission. We rotated the leaked password manually. Now we're going to make passwords unnecessary. Here's the complete setup, step by step.

Step 1: Vault needs a privileged database account¶

Vault itself needs credentials that can create and destroy database users. This is the only long-lived credential — and it stays inside Vault, never in Git.

# Create a Vault-managed admin user in PostgreSQL
psql -h db.example.com -U postgres -c \
  "CREATE ROLE vault_admin WITH LOGIN PASSWORD 'vault-managed-2026' CREATEROLE;
   GRANT ALL ON SCHEMA public TO vault_admin;"

Step 2: Configure the database engine¶

vault secrets enable database

vault write database/config/production-db \
  plugin_name=postgresql-database-plugin \
  connection_url="postgresql://{{username}}:{{password}}@db.example.com:5432/myapp" \
  allowed_roles="app-readonly,app-readwrite" \
  username="vault_admin" \
  password="vault-managed-2026"

# Rotate the root password so even you don't know it anymore
vault write -f database/rotate-root/production-db
# Now Vault has changed vault_admin's password to something only Vault knows

That last command is important. After rotate-root, the password you typed on the command line is no longer valid. Vault generated a new one and stored it internally. No human knows the database admin password.

Step 3: Create roles¶

# Read-only role: SELECT only, 1-hour TTL
vault write database/roles/app-readonly \
  db_name=production-db \
  creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' \
    VALID UNTIL '{{expiration}}'; \
    GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
  revocation_statements="DROP ROLE IF EXISTS \"{{name}}\";" \
  default_ttl="1h" \
  max_ttl="24h"

# Read-write role: full DML, 30-minute TTL (shorter = less risk)
vault write database/roles/app-readwrite \
  db_name=production-db \
  creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' \
    VALID UNTIL '{{expiration}}'; \
    GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
  revocation_statements="DROP ROLE IF EXISTS \"{{name}}\";" \
  default_ttl="30m" \
  max_ttl="4h"

Step 4: Use it¶

# Application requests credentials
vault read database/creds/app-readonly
# Key                Value
# ---                -----
# lease_id           database/creds/app-readonly/7yKz...
# lease_duration     1h
# username           v-approle-app-read-Xk9mN2
# password           B4f-7Gh-1Jk-3Lm

# That credential works immediately
psql -h db.example.com -U v-approle-app-read-Xk9mN2 -d myapp -c "SELECT count(*) FROM users;"
#  count
# -------
#  42857

# After 1 hour: Vault drops the role. The credential is dead.

Step 5: Lease management¶

# Renew a lease before it expires (extend the TTL)
vault lease renew database/creds/app-readonly/7yKz...
# lease_duration: 1h (reset from now)

# Renew with a specific increment
vault lease renew -increment=30m database/creds/app-readonly/7yKz...

# Revoke immediately (emergency rotation)
vault lease revoke database/creds/app-readonly/7yKz...

# Nuclear option: revoke ALL credentials for this role
vault lease revoke -prefix database/creds/app-readonly/

Gotcha: If your application requests credentials at startup and never renews the lease, the credentials die silently after the TTL expires. Long-running batch jobs are especially vulnerable — they start at noon with a 1-hour lease and fail at 1:01pm with a cryptic database connection error. Use a Vault SDK with built-in lease renewal, or implement renewal at 2/3 of the TTL interval.

Part 8: Vault Agent — The Secret Delivery Truck¶

Your application shouldn't need to know about Vault. It should just read a file. Vault Agent handles authentication, token renewal, and secret rendering as a sidecar process.

# vault-agent.hcl
auto_auth {
  method "kubernetes" {
    mount_path = "auth/kubernetes"
    config = {
      role = "myapp"
    }
  }
  sink "file" {
    config = {
      path = "/tmp/vault-token"
    }
  }
}

template {
  source      = "/etc/vault/templates/db.tpl"
  destination = "/app/config/database.env"
  perms       = 0600
  command     = "pkill -HUP myapp"  # signal the app to reload
}

vault {
  address = "https://vault.example.com:8200"
}

The template file uses Consul Template syntax:

{{ with secret "database/creds/app-readonly" }}
DB_HOST=db.example.com
DB_USER={{ .Data.username }}
DB_PASS={{ .Data.password }}
{{ end }}

# Start the agent
vault agent -config=vault-agent.hcl

# The agent:
# 1. Authenticates to Vault using the pod's ServiceAccount
# 2. Renders the template with live credentials
# 3. Writes /app/config/database.env with mode 0600
# 4. Signals the app to reload
# 5. Renews the lease before it expires
# 6. Re-renders the template with new credentials when the old ones expire

Your application just reads /app/config/database.env. It never touches the Vault API.

Part 9: Vault in Kubernetes — Three Approaches¶

Option 1: Vault Agent Injector (most common)¶

Uses a mutating admission webhook to inject a Vault Agent sidecar into pods automatically.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  template:
    metadata:
      annotations:
        vault.hashicorp.com/agent-inject: "true"
        vault.hashicorp.com/role: "myapp"
        vault.hashicorp.com/agent-inject-secret-db-creds: "database/creds/app-readonly"
        vault.hashicorp.com/agent-inject-template-db-creds: |
          {{- with secret "database/creds/app-readonly" -}}
          postgresql://{{ .Data.username }}:{{ .Data.password }}@db.example.com:5432/myapp
          {{- end -}}
    spec:
      serviceAccountName: myapp-sa
      containers:
        - name: app
          image: myapp:latest
          # Secret is available at /vault/secrets/db-creds
          env:
            - name: DATABASE_URL
              value: "file:///vault/secrets/db-creds"

Option 2: Vault CSI Provider¶

Mounts secrets as volumes via the Secrets Store CSI Driver. No sidecar — secrets appear as files in a mounted volume.

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: vault-db-creds
spec:
  provider: vault
  parameters:
    roleName: "myapp"
    vaultAddress: "https://vault.example.com:8200"
    objects: |
      - objectName: "db-password"
        secretPath: "database/creds/app-readonly"
        secretKey: "password"

Option 3: Vault Secrets Operator (VSO)¶

The newest option. A Kubernetes-native operator that syncs Vault secrets into Kubernetes Secret objects.

apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultDynamicSecret
metadata:
  name: db-creds
spec:
  mount: database
  path: creds/app-readonly
  destination:
    name: myapp-db-creds
    create: true
  rolloutRestartTargets:
    - kind: Deployment
      name: myapp

VSO automatically triggers a rollout restart when the dynamic credentials rotate — solving the stale-secrets problem that plagues the other approaches.

Approach	Sidecar?	Auto-rotation?	K8s Secret created?
Agent Injector	Yes	Yes (agent renews)	No (writes to file)
CSI Provider	No	Limited	Optional
VSO	No	Yes (operator handles)	Yes

Flashcard Check #3¶

Question	Answer (cover this column)
What does `vault write -f database/rotate-root/...` do?	Changes the DB admin password to a random one only Vault knows
Why set `default_ttl="1h"` on a database role?	Credentials auto-expire after 1 hour, limiting blast radius of a leak
What does Vault Agent do for applications?	Handles auth, token renewal, secret rendering, and lease management — app just reads a file
What's the KV v2 `/data/` path trap?	CLI uses `secret/myapp`, but policies must use `secret/data/myapp`
How does Vault Secrets Operator handle credential rotation?	Syncs new credentials to a K8s Secret and triggers rollout restart

Part 10: High Availability and Disaster Recovery¶

Raft consensus (integrated storage)¶

Vault's recommended storage backend is integrated Raft storage. Three or five nodes form a consensus cluster. One is the active leader; the rest are standby replicas.

# Check cluster membership
vault operator raft list-peers
# Node       Address              State       Voter
# ----       -------              -----       -----
# vault-0    10.0.1.10:8201       leader      true
# vault-1    10.0.1.11:8201       follower    true
# vault-2    10.0.1.12:8201       follower    true

# Check autopilot health
vault operator raft autopilot state
# Healthy: true
# Leader: vault-0

Snapshots¶

# Take a snapshot (backup)
vault operator raft snapshot save vault-backup-2026-03-23.snap

# Restore from a snapshot
vault operator raft snapshot restore vault-backup-2026-03-23.snap

# Automate daily snapshots
# (cron, systemd timer, or Kubernetes CronJob)
0 2 * * * vault operator raft snapshot save /backups/vault-$(date +\%Y\%m\%d).snap

War Story: A three-node Vault cluster ran on Kubernetes. A network partition isolated the leader from the two followers. The followers elected a new leader — this is Raft working correctly. But the old leader didn't know it was deposed. It continued accepting writes for about 10 seconds (within the lease heartbeat window) before realizing it was partitioned. It then sealed itself — which is the correct safety behavior. But the applications connected to the old leader suddenly lost their Vault connection. The new leader was serving secrets fine, but the apps had cached the old leader's address. The fix was configuring applications to use the Vault service DNS name (which follows the active leader) rather than a specific pod IP. Lesson: Vault's Raft failover works, but your clients need to follow the leader.

Disaster recovery in practice¶

What failed	What you do
Single node down	Raft handles it — majority still has quorum
Two of three nodes down	Cluster loses quorum. Vault seals. Bring a node back or `force-join`
All nodes down	Restore from snapshot. Re-initialize if snapshots are lost
KMS key deleted (auto-unseal)	Data is permanently unrecoverable. This is why you test DR
Unseal keys lost (manual seal)	Data is permanently unrecoverable. This is why you use auto-unseal

Part 11: Audit Logging — Trust but Verify¶

Vault can log every single request and response. In production, this is non-negotiable.

# Enable file-based audit logging
vault audit enable file file_path=/var/log/vault/audit.log

# Enable syslog too (belt and suspenders)
vault audit enable -path=syslog syslog

# Check active audit devices
vault audit list

The audit log is JSON — one line per request/response pair:

# Who's been reading our database credentials?
cat /var/log/vault/audit.log | \
  jq 'select(.request.path == "database/creds/app-readonly") |
      {time: .time, remote: .request.remote_address,
       accessor: .auth.accessor}'

# Count requests per path (find the hot paths)
cat /var/log/vault/audit.log | \
  jq -r '.request.path' | sort | uniq -c | sort -rn | head -10

# Find permission denied events (misconfigured apps or attackers)
cat /var/log/vault/audit.log | \
  jq 'select(.response.data.error != null) |
      {time: .time, path: .request.path, error: .response.data.error}'

Gotcha: Vault will refuse to serve any request if all configured audit devices fail. This is fail-closed by design — better to be unavailable than unaudited. Monitor your audit log disk space. If the audit log partition fills up, Vault stops working entirely. Ship logs to a SIEM and rotate aggressively.

Under the Hood: Vault HMAC's sensitive fields in audit logs by default — the values are hashed, not plaintext. This means you can see that a secret was read but not what the value was. If you need to correlate a leaked value back to an accessor, use vault audit hash to compute the HMAC of the suspect value and search the logs for it.

Part 12: Putting It All Together — From Leak to Lockdown¶

Let's revisit the mission. We started with a password in Git. Here's the complete before and after:

Before:

Developer → hardcodes DB password → commits to Git → password exposed for 11 days
                                    ↓
                           Every clone has the password forever

After:

Pod starts → ServiceAccount authenticates to Vault → Vault creates temp DB user
                                                       ↓
                                              TTL: 1 hour, auto-revoked
                                              Unique per pod, audited, renewable

No password in Git. No password in environment variables. No password in Kubernetes Secrets. The password exists only in Vault's memory and the database's pg_authid table, for one hour.

Exercises¶

Exercise 1: Read a Vault secret (2 minutes)¶

Start a dev Vault server and store a secret:

vault server -dev -dev-root-token-id="dev-token"
export VAULT_ADDR='http://127.0.0.1:8200'
export VAULT_TOKEN='dev-token'

vault kv put secret/exercise/hello message="vault works"

Now retrieve just the message field using vault kv get.

Solution

vault kv get -field=message secret/exercise/hello
# vault works

Exercise 2: Write a least-privilege policy (10 minutes)¶

Write a policy called web-frontend that: - Can read secrets under secret/data/frontend/* - Can list secrets under secret/metadata/frontend/* - Can generate database credentials from database/creds/frontend-readonly - Cannot access anything else

Solution

# web-frontend.hcl
path "secret/data/frontend/*" {
  capabilities = ["read"]
}
path "secret/metadata/frontend/*" {
  capabilities = ["list"]
}
path "database/creds/frontend-readonly" {
  capabilities = ["read"]
}

vault policy write web-frontend web-frontend.hcl
vault token create -policy=web-frontend -ttl=1h

# Test it
vault token capabilities <new-token> secret/data/frontend/config
# read
vault token capabilities <new-token> secret/data/billing/keys
# deny

Exercise 3: Explain the failure (judgment call)¶

Your colleague shows you this Vault Agent template that isn't working:

template {
  source      = "/etc/vault/db.tpl"
  destination = "/app/secrets/db.env"
}

The template renders once at startup but never updates when credentials rotate. The app crashes 1 hour after deploy. What's missing and how would you fix it?

Solution

The template renders at startup but Vault Agent doesn't re-render when the dynamic secret's lease expires. You need: 1. A `command` to signal the app when the template re-renders 2. Ensure the template references a dynamic secret (so it has a lease to track) 3. The app must be able to reload configuration without a full restart

template {
  source      = "/etc/vault/db.tpl"
  destination = "/app/secrets/db.env"
  perms       = 0600
  command     = "pkill -HUP myapp"  # or: systemctl reload myapp
}

Alternatively, use VSO with `rolloutRestartTargets` to trigger a deployment rollout when credentials change.

Cheat Sheet¶

Environment¶

export VAULT_ADDR='https://vault.example.com:8200'
export VAULT_TOKEN='hvs.xxx'           # or use vault login
export VAULT_SKIP_VERIFY=true          # dev only — skip TLS verification

Core operations¶

Action	Command
Check status	`vault status`
Login (userpass)	`vault login -method=userpass username=alice`
Login (AppRole)	`vault write auth/approle/login role_id=X secret_id=Y`
Read KV secret	`vault kv get secret/path`
Read one field	`vault kv get -field=key secret/path`
Write KV secret	`vault kv put secret/path key=value`
Get dynamic DB creds	`vault read database/creds/role-name`
Renew a lease	`vault lease renew <lease-id>`
Revoke a lease	`vault lease revoke <lease-id>`
Revoke all for a role	`vault lease revoke -prefix database/creds/role/`
Write a policy	`vault policy write name file.hcl`
Check capabilities	`vault token capabilities <token> <path>`
Unseal	`vault operator unseal <key>`
Seal (emergency)	`vault operator seal`
Snapshot (backup)	`vault operator raft snapshot save file.snap`

Health endpoint codes¶

Code	Meaning
200	Active, unsealed, initialized
429	Standby node (HA — waiting its turn)
472	Data recovery mode
501	Not initialized
503	Sealed

Policy capabilities¶

create · read · update · delete · list · sudo · deny

Deny always wins. Default is deny-all.

Takeaways¶

Static secrets are a liability. Every long-lived credential is a breach waiting to happen. Dynamic secrets eliminate the problem by making credentials ephemeral.
Vault generates, it doesn't just store. The real power is in secret engines that create credentials on demand — database users, AWS IAM credentials, TLS certificates — all with automatic expiration.
The seal/unseal ceremony exists for a reason. Shamir's Secret Sharing ensures no single person can access the vault. Auto-unseal trades human ceremony for KMS dependency — choose based on your operational reality.
Policies default to deny. Write least-privilege policies using exact paths. The KV v2 /data/ path prefix will trip you up — use -output-policy to see the real path.
Vault Agent makes adoption transparent. Applications read files. The agent handles auth, renewal, and re-rendering. Your app doesn't need a Vault SDK.
Audit everything. Vault's fail-closed audit logging means every request is recorded. If audit fails, Vault stops serving — by design.

Secrets Management Without Tears — the broader landscape (Sealed Secrets, SOPS, ESO) beyond Vault
Permission Denied — when access control goes wrong across multiple layers, including Vault policies
What Happens When Your Certificate Expires — Vault's PKI engine is one solution to certificate lifecycle
The Container Escape — why secrets in environment variables and mounted files need careful permission management
How Incident Response Actually Works — the broader incident framework when you find a leaked credential

Vault: Secrets That Expire on Purpose

The Mission¶

Part 1: Stop the Bleeding¶

Part 2: What Is Vault and Why Does It Exist¶

The four pillars¶

Part 3: The Seal/Unseal Ceremony¶

Auto-unseal: because 3am pages are not a ceremony¶

Flashcard Check #1¶

Part 4: Secret Engines — The Vaults Inside the Vault¶

KV v2: the simple one¶

Database engine: credentials that self-destruct¶

Other engines worth knowing¶

Part 5: Auth Methods — Proving You Are Who You Claim¶

Token auth (the foundation)¶

AppRole (for machines and CI/CD)¶

Kubernetes auth (the most common in practice)¶

OIDC (for humans)¶

Part 6: Policies — The Principle of Least Privilege in HCL¶

Flashcard Check #2¶

Part 7: Dynamic Database Secrets — The Full Walkthrough¶

Step 1: Vault needs a privileged database account¶

Step 2: Configure the database engine¶

Step 3: Create roles¶

Step 4: Use it¶

Step 5: Lease management¶

Part 8: Vault Agent — The Secret Delivery Truck¶

Part 9: Vault in Kubernetes — Three Approaches¶

Option 1: Vault Agent Injector (most common)¶

Option 2: Vault CSI Provider¶

Option 3: Vault Secrets Operator (VSO)¶

Flashcard Check #3¶

Part 10: High Availability and Disaster Recovery¶

Raft consensus (integrated storage)¶

Snapshots¶

Disaster recovery in practice¶

Part 11: Audit Logging — Trust but Verify¶

Part 12: Putting It All Together — From Leak to Lockdown¶

Exercises¶

Exercise 1: Read a Vault secret (2 minutes)¶

Exercise 2: Write a least-privilege policy (10 minutes)¶

Exercise 3: Explain the failure (judgment call)¶

Cheat Sheet¶

Environment¶

Core operations¶

Health endpoint codes¶

Policy capabilities¶

Takeaways¶

Related Lessons¶

Pages that link here¶