Vault: Secrets That Expire on Purpose
- lesson
- hashicorp-vault
- secrets-management
- dynamic-credentials
- kubernetes
- pki
- encryption
- access-control ---# Vault — Secrets That Expire on Purpose
Topics: HashiCorp Vault, secrets management, dynamic credentials, Kubernetes, PKI, encryption, access control Level: L1–L2 (Foundations → Operations) Time: 60–90 minutes Prerequisites: None (everything is explained from scratch)
The Mission¶
It's Monday morning. A security scanner just flagged a PostgreSQL password in a public GitHub
repository. It's been there for eleven days. The password is prod-db-2024! and it has full
read-write access to the production database. Eleven days of exposure. Every automated
credential scraper on the internet has it by now.
Your job: rotate that credential right now, then set things up so this can never happen again. Not "probably won't happen" — structurally impossible. By the end of this lesson, your database credentials won't exist long enough to leak. They'll be generated on demand, scoped to one consumer, and dead within an hour.
We're going to build up from the disaster to the fix, touching secrets management, Vault architecture, dynamic credentials, Kubernetes integration, and production operations along the way.
Part 1: Stop the Bleeding¶
Before we learn anything, we fix the immediate problem. The credential is public.
# Step 1: Connect to PostgreSQL and change the password immediately
psql -h db.example.com -U admin -d myapp -c \
"ALTER USER app_user PASSWORD 'emergency-rotated-$(date +%s)';"
# Step 2: Restart the application so it picks up the new password
kubectl rollout restart deployment/myapp -n production
# Step 3: Verify the old password no longer works
psql -h db.example.com -U app_user -d myapp -c "SELECT 1;"
# Expected: FATAL: password authentication failed
Good. The fire is out. But you've just replaced one static password with another static password. If someone commits this one to Git in six months, you're back here. The real fix isn't a better password — it's eliminating long-lived passwords entirely.
Mental Model: Static secrets are like house keys cut from metal — they work until someone physically takes them away. Dynamic secrets are like hotel key cards — they stop working at checkout time whether you return them or not. Vault turns your infrastructure into a hotel.
Part 2: What Is Vault and Why Does It Exist¶
HashiCorp Vault is a secrets management tool that stores, generates, and controls access to secrets. But calling it a "password manager for servers" undersells it. In its most powerful mode, Vault doesn't store secrets at all — it generates them on demand and destroys them automatically.
Trivia: Vault was released in April 2015 by Mitchell Hashimoto and Armon Dadgar. Before Vault, the industry standard for secrets management was — no joke — encrypted Excel spreadsheets, password-protected Word documents, and sticky notes on monitors. Vault introduced dynamic secrets, which was a genuine paradigm shift: credentials that exist only for as long as they're needed, then evaporate.
The four pillars¶
| Concept | What it does | Example |
|---|---|---|
| Secret engine | Stores or generates secrets | KV (static), database (dynamic), PKI (certificates), transit (encryption) |
| Auth method | Verifies who's asking | Kubernetes ServiceAccount, AppRole, OIDC, username/password |
| Policy | Controls what they can access | "This token can read secret/data/myapp/* and nothing else" |
| Lease | Controls how long it lasts | "These database credentials expire in 1 hour" |
These four concepts interact on every single request. A pod authenticates (auth method), receives a token scoped to a policy, uses that token to read from a secret engine, and the result has a lease that determines when it expires. Miss any one of these and you'll hit a wall.
Part 3: The Seal/Unseal Ceremony¶
Before Vault can serve a single secret, it needs to be unsealed. This is the part that surprises people coming from simpler tools.
Vault encrypts everything it stores. On startup, it has the encrypted data but not the key to decrypt it. This is the sealed state. You need to provide the master key to unlock it.
But here's the twist: the master key doesn't exist as a single thing. It's been split into pieces using Shamir's Secret Sharing.
Under the Hood: Shamir's Secret Sharing was invented in 1979 by Adi Shamir — the same Shamir who is the "S" in RSA encryption. The math is polynomial interpolation: any
kpoints on a polynomial of degreek-1can reconstruct the polynomial, butk-1points reveal absolutely nothing about it. Vault's default is 5 shares with a threshold of 3 — meaning any 3 of the 5 key holders can unseal Vault, but 2 key holders together learn nothing about the master key.Trivia: The unseal ceremony is modeled on nuclear launch key concepts and bank vault procedures — multiple people must act together, preventing any single person from accessing the secrets alone. This is called "split knowledge" in the security world.
# First-time initialization: creates the master key and splits it
vault operator init -key-shares=5 -key-threshold=3
# Output:
# Unseal Key 1: s.Ah4f8Gj2kL...
# Unseal Key 2: s.Bm7n3Pq9xR...
# Unseal Key 3: s.Cx1w5Yz0tU...
# Unseal Key 4: s.Dv6e2Mn8jK...
# Unseal Key 5: s.Ew9i4Qr7sF...
# Initial Root Token: hvs.abc123def456
#
# ⚠ Store each key with a DIFFERENT person. Losing enough keys = losing Vault forever.
# Unsealing: three different people each provide their key
vault operator unseal s.Ah4f8Gj2kL... # Person 1 — "Unseal Progress 1/3"
vault operator unseal s.Bm7n3Pq9xR... # Person 2 — "Unseal Progress 2/3"
vault operator unseal s.Cx1w5Yz0tU... # Person 3 — "Sealed: false" 🎉
# Check status
vault status
# Sealed false
# HA Enabled true
# Version 1.15.4
Auto-unseal: because 3am pages are not a ceremony¶
Manual unsealing is beautifully secure and operationally terrible. If Vault restarts at 3am — after a kernel update, a node eviction, a power blip — it comes up sealed. No secrets are served. Every application that depends on Vault starts failing. Someone has to wake up, coordinate with two other key holders, and unseal.
Auto-unseal delegates the master key to a cloud KMS:
# vault.hcl — auto-unseal with AWS KMS
seal "awskms" {
region = "us-east-1"
kms_key_id = "alias/vault-unseal-key"
}
Now Vault unseals itself on restart using the KMS key. The security model shifts: instead of protecting 5 key shares, you protect one KMS key's IAM permissions.
Gotcha: A team configured auto-unseal with AWS KMS, then rebuilt their Vault cluster in a new AWS account without migrating the KMS key. The old KMS key was deleted. The Vault data was encrypted with a key that no longer existed. All secrets were permanently lost. Always back up the KMS key ARN, ensure cross-account access, and test recovery before you need it.
Remember: Mnemonic for Vault health endpoint status codes: "200 happy, 429 waiting, 501 naked, 503 locked." 200 = active and ready. 429 = standby node in HA (waiting its turn). 501 = not initialized (no keys yet — naked). 503 = sealed (locked up tight).
Flashcard Check #1¶
| Question | Answer (cover this column) |
|---|---|
| What does "sealed" mean in Vault? | Vault has encrypted data but cannot decrypt it — the master key is not in memory |
| How many key shares are needed to unseal with default settings? | 3 of 5 (Shamir's Secret Sharing) |
| What algorithm splits the master key? | Shamir's Secret Sharing (1979) — polynomial interpolation |
| What does auto-unseal replace? | Manual unseal ceremony — delegates to a cloud KMS |
| What happens if Vault restarts without auto-unseal? | It starts sealed. No secrets can be read until humans unseal it |
Part 4: Secret Engines — The Vaults Inside the Vault¶
Vault is not one big bucket. It's a collection of secret engines, each mounted at a path, each with different superpowers.
KV v2: the simple one¶
Key-Value version 2 stores static secrets with versioning. Think of it as an encrypted, access-controlled, audited configuration store.
# Enable KV v2 at the "secret/" path
vault secrets enable -path=secret kv-v2
# Store a secret
vault kv put secret/myapp/database \
username="dbadmin" \
password="s3cur3_p@ss" \
host="db.example.com"
# Read it back
vault kv get secret/myapp/database
# Key Value
# --- -----
# host db.example.com
# password s3cur3_p@ss
# username dbadmin
# Read just one field
vault kv get -field=password secret/myapp/database
# s3cur3_p@ss
# Oops, bad password. Update it:
vault kv put secret/myapp/database \
username="dbadmin" \
password="n3w_p@ss_2026" \
host="db.example.com"
# But the old version is still there (v2 keeps history):
vault kv get -version=1 secret/myapp/database
# password = s3cur3_p@ss ← version 1
# Soft-delete (recoverable):
vault kv delete secret/myapp/database
# Permanent destroy:
vault kv destroy -versions=1 secret/myapp/database
Gotcha: KV v2 adds
/data/and/metadata/to internal paths. The CLI hides this —vault kv get secret/myappworks fine. But policies must use the internal path:path "secret/data/myapp/*", notpath "secret/myapp/*". This mismatch is the #1 cause of "permission denied" errors when people first set up Vault policies. Usevault kv get -output-policy secret/myapp/configto see the exact policy path you need.
Database engine: credentials that self-destruct¶
This is where Vault gets interesting. Instead of storing a database password, Vault creates a temporary database user on the fly and destroys it when the lease expires.
# Enable the database secrets engine
vault secrets enable database
# Tell Vault how to connect to PostgreSQL
vault write database/config/mydb \
plugin_name=postgresql-database-plugin \
connection_url="postgresql://{{username}}:{{password}}@db.example.com:5432/mydb" \
allowed_roles="readonly,readwrite" \
username="vault_admin" \
password="vault_admin_pass"
# Define a role: what kind of user to create
vault write database/roles/readonly \
db_name=mydb \
creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' \
VALID UNTIL '{{expiration}}'; \
GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
default_ttl="1h" \
max_ttl="24h"
Now watch this:
# Generate temporary credentials
vault read database/creds/readonly
# Key Value
# --- -----
# lease_id database/creds/readonly/abc123
# lease_duration 1h
# username v-token-readonly-8hF3kL
# password A1b-2Cd-3Ef-4Gh
# That username and password work RIGHT NOW:
psql -h db.example.com -U v-token-readonly-8hF3kL -d mydb
# Connected!
# In 1 hour, Vault automatically runs:
# REVOKE ALL ON ALL TABLES FROM "v-token-readonly-8hF3kL";
# DROP ROLE "v-token-readonly-8hF3kL";
Every request gets a unique username and password. If one leaks, it affects one consumer
for at most one hour. Compare that to prod-db-2024! sitting in Git for eleven days.
Mental Model: Static secrets are like giving everyone a copy of the office master key. Dynamic secrets are like a receptionist who creates a badge for each visitor, sets it to expire at 5pm, and shreds it if they don't return it. The receptionist is Vault. The badge is the lease.
Other engines worth knowing¶
| Engine | What it does | When to use it |
|---|---|---|
| PKI | Issues TLS certificates from an internal CA | Service-to-service mTLS, short-lived certs |
| AWS | Generates temporary IAM credentials | Applications that need AWS access |
| Transit | Encrypts/decrypts data without exposing keys | Application-level encryption (Vault never stores your data) |
| SSH | Signs SSH certificates or generates OTPs | SSH access without distributing private keys |
# Transit: encryption without seeing the key
vault secrets enable transit
vault write -f transit/keys/myapp-key
vault write transit/encrypt/myapp-key \
plaintext=$(echo "credit-card-1234" | base64)
# ciphertext: vault:v1:8SDd3WHDOjf7...
vault write transit/decrypt/myapp-key \
ciphertext="vault:v1:8SDd3WHDOjf7..."
# plaintext: Y3JlZGl0LWNhcmQtMTIzNA== (base64 of "credit-card-1234")
Under the Hood: Transit key rotation creates a new key version but keeps old versions for decryption. You can set
min_decryption_versionto a higher number, which effectively performs crypto-shredding — making old ciphertext permanently unreadable without deleting the ciphertext itself. Useful for GDPR "right to be forgotten" compliance.
Part 5: Auth Methods — Proving You Are Who You Claim¶
Vault doesn't hand secrets to anonymous callers. Every request requires a token, and tokens come from authentication.
Token auth (the foundation)¶
Every auth method eventually produces a token. Tokens are the internal currency.
# Create a token with a specific policy and TTL
vault token create -policy=myapp-secrets -ttl=1h
# Key Value
# --- -----
# token hvs.CAESIJ...
# token_policies [default myapp-secrets]
# token_ttl 1h
# Use it
export VAULT_TOKEN="hvs.CAESIJ..."
vault kv get secret/myapp/database
AppRole (for machines and CI/CD)¶
AppRole splits authentication into two pieces: a role_id (like a username — long-lived,
baked into config) and a secret_id (like a password — short-lived, delivered at runtime).
vault auth enable approle
# Create a role for CI pipelines
vault write auth/approle/role/ci-pipeline \
token_policies="ci-deploy" \
token_ttl=1h \
token_max_ttl=4h \
secret_id_ttl=10m
# Get the role ID (bake this into the CI image)
vault read auth/approle/role/ci-pipeline/role-id
# role_id: 7a6b8c9d-e0f1-2345-6789-abcdef012345
# Generate a secret ID (inject this at runtime via trusted orchestrator)
vault write -f auth/approle/role/ci-pipeline/secret-id
# secret_id: a1b2c3d4-5678-90ab-cdef-ghijklmnopqr (expires in 10 minutes)
# Login
vault write auth/approle/login \
role_id="7a6b8c9d-e0f1-2345-6789-abcdef012345" \
secret_id="a1b2c3d4-5678-90ab-cdef-ghijklmnopqr"
# Returns: a Vault token with the ci-deploy policy
Under the Hood: The split between
role_idandsecret_idis deliberate. The role ID is safe to bake into a container image — it identifies which role, but can't authenticate alone. The secret ID is the short-lived credential injected at runtime by a trusted orchestrator (Terraform, Kubernetes, your CI platform). Neither piece is useful without the other.
Kubernetes auth (the most common in practice)¶
A pod authenticates to Vault using its ServiceAccount JWT token. Vault validates the token against the Kubernetes API server.
vault auth enable kubernetes
vault write auth/kubernetes/config \
kubernetes_host="https://kubernetes.default.svc"
vault write auth/kubernetes/role/myapp \
bound_service_account_names=myapp-sa \
bound_service_account_namespaces=production \
policies=myapp-secrets \
ttl=1h
Now any pod running as ServiceAccount myapp-sa in the production namespace can
authenticate to Vault and get a token with the myapp-secrets policy. No passwords
anywhere.
Gotcha: Never use
bound_service_account_namespaces=["*"]in production. This lets any namespace with a matching ServiceAccount name authenticate as this role — including test namespaces, CI namespaces, and anything a developer creates. Always specify exact namespaces.
OIDC (for humans)¶
vault auth enable oidc
vault write auth/oidc/config \
oidc_discovery_url="https://accounts.google.com" \
oidc_client_id="vault-app" \
oidc_client_secret="client-secret-here" \
default_role="engineer"
# Login opens a browser
vault login -method=oidc role=engineer
Part 6: Policies — The Principle of Least Privilege in HCL¶
Policies control what a token can do. They're written in HCL, path-based, and default to deny everything.
# myapp-policy.hcl
# Read application secrets
path "secret/data/myapp/*" {
capabilities = ["read", "list"]
}
# Generate dynamic database credentials
path "database/creds/readonly" {
capabilities = ["read"]
}
# Allow the token to manage itself
path "auth/token/renew-self" {
capabilities = ["update"]
}
path "auth/token/lookup-self" {
capabilities = ["read"]
}
# Explicitly deny access to other apps
path "secret/data/billing/*" {
capabilities = ["deny"]
}
The capabilities are: create, read, update, delete, list, sudo, deny. Deny
always wins.
# Apply the policy
vault policy write myapp-secrets myapp-policy.hcl
# Test what a token can actually do
vault token capabilities hvs.CAESIJ... secret/data/myapp/database
# read
vault token capabilities hvs.CAESIJ... secret/data/billing/stripe-key
# deny
Gotcha: The
rootpolicy bypasses everything and never expires. After initial setup, revoke the root token immediately:vault token revoke <root-token>. If you need root access later, regenerate a temporary one withvault operator generate-rootusing the unseal keys, use it, then revoke it again. Services should never authenticate with root.
Flashcard Check #2¶
| Question | Answer (cover this column) |
|---|---|
| What are the four Vault pillars? | Secret engine, auth method, policy, lease |
| What's the difference between static and dynamic secrets? | Static are stored and retrieved; dynamic are generated on demand with a TTL and auto-revoked |
| Why does KV v2 cause "permission denied" surprises? | Policies need secret/data/... path, but CLI uses secret/... — the /data/ is hidden |
What does AppRole's secret_id do vs role_id? |
role_id = identity (long-lived), secret_id = credential (short-lived, runtime-injected) |
| What's the default policy behavior? | Deny everything — you must explicitly grant access |
| Why is root token dangerous in production? | Bypasses all policies, never expires, full access to everything |
Part 7: Dynamic Database Secrets — The Full Walkthrough¶
Back to our mission. We rotated the leaked password manually. Now we're going to make passwords unnecessary. Here's the complete setup, step by step.
Step 1: Vault needs a privileged database account¶
Vault itself needs credentials that can create and destroy database users. This is the only long-lived credential — and it stays inside Vault, never in Git.
# Create a Vault-managed admin user in PostgreSQL
psql -h db.example.com -U postgres -c \
"CREATE ROLE vault_admin WITH LOGIN PASSWORD 'vault-managed-2026' CREATEROLE;
GRANT ALL ON SCHEMA public TO vault_admin;"
Step 2: Configure the database engine¶
vault secrets enable database
vault write database/config/production-db \
plugin_name=postgresql-database-plugin \
connection_url="postgresql://{{username}}:{{password}}@db.example.com:5432/myapp" \
allowed_roles="app-readonly,app-readwrite" \
username="vault_admin" \
password="vault-managed-2026"
# Rotate the root password so even you don't know it anymore
vault write -f database/rotate-root/production-db
# Now Vault has changed vault_admin's password to something only Vault knows
That last command is important. After rotate-root, the password you typed on the command
line is no longer valid. Vault generated a new one and stored it internally. No human knows
the database admin password.
Step 3: Create roles¶
# Read-only role: SELECT only, 1-hour TTL
vault write database/roles/app-readonly \
db_name=production-db \
creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' \
VALID UNTIL '{{expiration}}'; \
GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
revocation_statements="DROP ROLE IF EXISTS \"{{name}}\";" \
default_ttl="1h" \
max_ttl="24h"
# Read-write role: full DML, 30-minute TTL (shorter = less risk)
vault write database/roles/app-readwrite \
db_name=production-db \
creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' \
VALID UNTIL '{{expiration}}'; \
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
revocation_statements="DROP ROLE IF EXISTS \"{{name}}\";" \
default_ttl="30m" \
max_ttl="4h"
Step 4: Use it¶
# Application requests credentials
vault read database/creds/app-readonly
# Key Value
# --- -----
# lease_id database/creds/app-readonly/7yKz...
# lease_duration 1h
# username v-approle-app-read-Xk9mN2
# password B4f-7Gh-1Jk-3Lm
# That credential works immediately
psql -h db.example.com -U v-approle-app-read-Xk9mN2 -d myapp -c "SELECT count(*) FROM users;"
# count
# -------
# 42857
# After 1 hour: Vault drops the role. The credential is dead.
Step 5: Lease management¶
# Renew a lease before it expires (extend the TTL)
vault lease renew database/creds/app-readonly/7yKz...
# lease_duration: 1h (reset from now)
# Renew with a specific increment
vault lease renew -increment=30m database/creds/app-readonly/7yKz...
# Revoke immediately (emergency rotation)
vault lease revoke database/creds/app-readonly/7yKz...
# Nuclear option: revoke ALL credentials for this role
vault lease revoke -prefix database/creds/app-readonly/
Gotcha: If your application requests credentials at startup and never renews the lease, the credentials die silently after the TTL expires. Long-running batch jobs are especially vulnerable — they start at noon with a 1-hour lease and fail at 1:01pm with a cryptic database connection error. Use a Vault SDK with built-in lease renewal, or implement renewal at 2/3 of the TTL interval.
Part 8: Vault Agent — The Secret Delivery Truck¶
Your application shouldn't need to know about Vault. It should just read a file. Vault Agent handles authentication, token renewal, and secret rendering as a sidecar process.
# vault-agent.hcl
auto_auth {
method "kubernetes" {
mount_path = "auth/kubernetes"
config = {
role = "myapp"
}
}
sink "file" {
config = {
path = "/tmp/vault-token"
}
}
}
template {
source = "/etc/vault/templates/db.tpl"
destination = "/app/config/database.env"
perms = 0600
command = "pkill -HUP myapp" # signal the app to reload
}
vault {
address = "https://vault.example.com:8200"
}
The template file uses Consul Template syntax:
{{ with secret "database/creds/app-readonly" }}
DB_HOST=db.example.com
DB_USER={{ .Data.username }}
DB_PASS={{ .Data.password }}
{{ end }}
# Start the agent
vault agent -config=vault-agent.hcl
# The agent:
# 1. Authenticates to Vault using the pod's ServiceAccount
# 2. Renders the template with live credentials
# 3. Writes /app/config/database.env with mode 0600
# 4. Signals the app to reload
# 5. Renews the lease before it expires
# 6. Re-renders the template with new credentials when the old ones expire
Your application just reads /app/config/database.env. It never touches the Vault API.
Part 9: Vault in Kubernetes — Three Approaches¶
Option 1: Vault Agent Injector (most common)¶
Uses a mutating admission webhook to inject a Vault Agent sidecar into pods automatically.
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
template:
metadata:
annotations:
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/role: "myapp"
vault.hashicorp.com/agent-inject-secret-db-creds: "database/creds/app-readonly"
vault.hashicorp.com/agent-inject-template-db-creds: |
{{- with secret "database/creds/app-readonly" -}}
postgresql://{{ .Data.username }}:{{ .Data.password }}@db.example.com:5432/myapp
{{- end -}}
spec:
serviceAccountName: myapp-sa
containers:
- name: app
image: myapp:latest
# Secret is available at /vault/secrets/db-creds
env:
- name: DATABASE_URL
value: "file:///vault/secrets/db-creds"
Option 2: Vault CSI Provider¶
Mounts secrets as volumes via the Secrets Store CSI Driver. No sidecar — secrets appear as files in a mounted volume.
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: vault-db-creds
spec:
provider: vault
parameters:
roleName: "myapp"
vaultAddress: "https://vault.example.com:8200"
objects: |
- objectName: "db-password"
secretPath: "database/creds/app-readonly"
secretKey: "password"
Option 3: Vault Secrets Operator (VSO)¶
The newest option. A Kubernetes-native operator that syncs Vault secrets into Kubernetes Secret objects.
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultDynamicSecret
metadata:
name: db-creds
spec:
mount: database
path: creds/app-readonly
destination:
name: myapp-db-creds
create: true
rolloutRestartTargets:
- kind: Deployment
name: myapp
VSO automatically triggers a rollout restart when the dynamic credentials rotate — solving the stale-secrets problem that plagues the other approaches.
| Approach | Sidecar? | Auto-rotation? | K8s Secret created? |
|---|---|---|---|
| Agent Injector | Yes | Yes (agent renews) | No (writes to file) |
| CSI Provider | No | Limited | Optional |
| VSO | No | Yes (operator handles) | Yes |
Flashcard Check #3¶
| Question | Answer (cover this column) |
|---|---|
What does vault write -f database/rotate-root/... do? |
Changes the DB admin password to a random one only Vault knows |
Why set default_ttl="1h" on a database role? |
Credentials auto-expire after 1 hour, limiting blast radius of a leak |
| What does Vault Agent do for applications? | Handles auth, token renewal, secret rendering, and lease management — app just reads a file |
What's the KV v2 /data/ path trap? |
CLI uses secret/myapp, but policies must use secret/data/myapp |
| How does Vault Secrets Operator handle credential rotation? | Syncs new credentials to a K8s Secret and triggers rollout restart |
Part 10: High Availability and Disaster Recovery¶
Raft consensus (integrated storage)¶
Vault's recommended storage backend is integrated Raft storage. Three or five nodes form a consensus cluster. One is the active leader; the rest are standby replicas.
# Check cluster membership
vault operator raft list-peers
# Node Address State Voter
# ---- ------- ----- -----
# vault-0 10.0.1.10:8201 leader true
# vault-1 10.0.1.11:8201 follower true
# vault-2 10.0.1.12:8201 follower true
# Check autopilot health
vault operator raft autopilot state
# Healthy: true
# Leader: vault-0
Snapshots¶
# Take a snapshot (backup)
vault operator raft snapshot save vault-backup-2026-03-23.snap
# Restore from a snapshot
vault operator raft snapshot restore vault-backup-2026-03-23.snap
# Automate daily snapshots
# (cron, systemd timer, or Kubernetes CronJob)
0 2 * * * vault operator raft snapshot save /backups/vault-$(date +\%Y\%m\%d).snap
War Story: A three-node Vault cluster ran on Kubernetes. A network partition isolated the leader from the two followers. The followers elected a new leader — this is Raft working correctly. But the old leader didn't know it was deposed. It continued accepting writes for about 10 seconds (within the lease heartbeat window) before realizing it was partitioned. It then sealed itself — which is the correct safety behavior. But the applications connected to the old leader suddenly lost their Vault connection. The new leader was serving secrets fine, but the apps had cached the old leader's address. The fix was configuring applications to use the Vault service DNS name (which follows the active leader) rather than a specific pod IP. Lesson: Vault's Raft failover works, but your clients need to follow the leader.
Disaster recovery in practice¶
| What failed | What you do |
|---|---|
| Single node down | Raft handles it — majority still has quorum |
| Two of three nodes down | Cluster loses quorum. Vault seals. Bring a node back or force-join |
| All nodes down | Restore from snapshot. Re-initialize if snapshots are lost |
| KMS key deleted (auto-unseal) | Data is permanently unrecoverable. This is why you test DR |
| Unseal keys lost (manual seal) | Data is permanently unrecoverable. This is why you use auto-unseal |
Part 11: Audit Logging — Trust but Verify¶
Vault can log every single request and response. In production, this is non-negotiable.
# Enable file-based audit logging
vault audit enable file file_path=/var/log/vault/audit.log
# Enable syslog too (belt and suspenders)
vault audit enable -path=syslog syslog
# Check active audit devices
vault audit list
The audit log is JSON — one line per request/response pair:
# Who's been reading our database credentials?
cat /var/log/vault/audit.log | \
jq 'select(.request.path == "database/creds/app-readonly") |
{time: .time, remote: .request.remote_address,
accessor: .auth.accessor}'
# Count requests per path (find the hot paths)
cat /var/log/vault/audit.log | \
jq -r '.request.path' | sort | uniq -c | sort -rn | head -10
# Find permission denied events (misconfigured apps or attackers)
cat /var/log/vault/audit.log | \
jq 'select(.response.data.error != null) |
{time: .time, path: .request.path, error: .response.data.error}'
Gotcha: Vault will refuse to serve any request if all configured audit devices fail. This is fail-closed by design — better to be unavailable than unaudited. Monitor your audit log disk space. If the audit log partition fills up, Vault stops working entirely. Ship logs to a SIEM and rotate aggressively.
Under the Hood: Vault HMAC's sensitive fields in audit logs by default — the values are hashed, not plaintext. This means you can see that a secret was read but not what the value was. If you need to correlate a leaked value back to an accessor, use
vault audit hashto compute the HMAC of the suspect value and search the logs for it.
Part 12: Putting It All Together — From Leak to Lockdown¶
Let's revisit the mission. We started with a password in Git. Here's the complete before and after:
Before:
Developer → hardcodes DB password → commits to Git → password exposed for 11 days
↓
Every clone has the password forever
After:
Pod starts → ServiceAccount authenticates to Vault → Vault creates temp DB user
↓
TTL: 1 hour, auto-revoked
Unique per pod, audited, renewable
No password in Git. No password in environment variables. No password in Kubernetes Secrets. The password exists only in Vault's memory and the database's pg_authid table, for one hour.
Exercises¶
Exercise 1: Read a Vault secret (2 minutes)¶
Start a dev Vault server and store a secret:
vault server -dev -dev-root-token-id="dev-token"
export VAULT_ADDR='http://127.0.0.1:8200'
export VAULT_TOKEN='dev-token'
vault kv put secret/exercise/hello message="vault works"
Now retrieve just the message field using vault kv get.
Exercise 2: Write a least-privilege policy (10 minutes)¶
Write a policy called web-frontend that:
- Can read secrets under secret/data/frontend/*
- Can list secrets under secret/metadata/frontend/*
- Can generate database credentials from database/creds/frontend-readonly
- Cannot access anything else
Solution
Exercise 3: Explain the failure (judgment call)¶
Your colleague shows you this Vault Agent template that isn't working:
The template renders once at startup but never updates when credentials rotate. The app crashes 1 hour after deploy. What's missing and how would you fix it?
Solution
The template renders at startup but Vault Agent doesn't re-render when the dynamic secret's lease expires. You need: 1. A `command` to signal the app when the template re-renders 2. Ensure the template references a dynamic secret (so it has a lease to track) 3. The app must be able to reload configuration without a full restart Alternatively, use VSO with `rolloutRestartTargets` to trigger a deployment rollout when credentials change.Cheat Sheet¶
Environment¶
export VAULT_ADDR='https://vault.example.com:8200'
export VAULT_TOKEN='hvs.xxx' # or use vault login
export VAULT_SKIP_VERIFY=true # dev only — skip TLS verification
Core operations¶
| Action | Command |
|---|---|
| Check status | vault status |
| Login (userpass) | vault login -method=userpass username=alice |
| Login (AppRole) | vault write auth/approle/login role_id=X secret_id=Y |
| Read KV secret | vault kv get secret/path |
| Read one field | vault kv get -field=key secret/path |
| Write KV secret | vault kv put secret/path key=value |
| Get dynamic DB creds | vault read database/creds/role-name |
| Renew a lease | vault lease renew <lease-id> |
| Revoke a lease | vault lease revoke <lease-id> |
| Revoke all for a role | vault lease revoke -prefix database/creds/role/ |
| Write a policy | vault policy write name file.hcl |
| Check capabilities | vault token capabilities <token> <path> |
| Unseal | vault operator unseal <key> |
| Seal (emergency) | vault operator seal |
| Snapshot (backup) | vault operator raft snapshot save file.snap |
Health endpoint codes¶
| Code | Meaning |
|---|---|
| 200 | Active, unsealed, initialized |
| 429 | Standby node (HA — waiting its turn) |
| 472 | Data recovery mode |
| 501 | Not initialized |
| 503 | Sealed |
Policy capabilities¶
create · read · update · delete · list · sudo · deny
Deny always wins. Default is deny-all.
Takeaways¶
-
Static secrets are a liability. Every long-lived credential is a breach waiting to happen. Dynamic secrets eliminate the problem by making credentials ephemeral.
-
Vault generates, it doesn't just store. The real power is in secret engines that create credentials on demand — database users, AWS IAM credentials, TLS certificates — all with automatic expiration.
-
The seal/unseal ceremony exists for a reason. Shamir's Secret Sharing ensures no single person can access the vault. Auto-unseal trades human ceremony for KMS dependency — choose based on your operational reality.
-
Policies default to deny. Write least-privilege policies using exact paths. The KV v2
/data/path prefix will trip you up — use-output-policyto see the real path. -
Vault Agent makes adoption transparent. Applications read files. The agent handles auth, renewal, and re-rendering. Your app doesn't need a Vault SDK.
-
Audit everything. Vault's fail-closed audit logging means every request is recorded. If audit fails, Vault stops serving — by design.
Related Lessons¶
- Secrets Management Without Tears — the broader landscape (Sealed Secrets, SOPS, ESO) beyond Vault
- Permission Denied — when access control goes wrong across multiple layers, including Vault policies
- What Happens When Your Certificate Expires — Vault's PKI engine is one solution to certificate lifecycle
- The Container Escape — why secrets in environment variables and mounted files need careful permission management
- How Incident Response Actually Works — the broader incident framework when you find a leaked credential