HashiCorp Vault Footguns¶

Mistakes that cause outages or security incidents.

1. Losing All Unseal Keys (Shamir Seal)¶

You deploy Vault with Shamir secret sharing, distribute unseal keys, and fail to store them securely offline. A key holder leaves the company or the keys get lost. The next time Vault restarts — after an OS patch, power failure, or container restart — it starts sealed and cannot be opened. The entire secrets store is permanently inaccessible.

Fix: Store all unseal key shares in separate offline locations (e.g., 5-of-3 scheme: one in each of three physical safes plus two with senior engineers as PGP-encrypted backups). Test unsealing in staging quarterly. For production, use auto-unseal with a cloud KMS — Vault restarts without human intervention, and the KMS key itself is the single secret to protect.

2. KV v2 Policy Path Mismatch¶

You enable the KV v2 secrets engine and write a policy with path "secret/myapp/*". The application gets 403 permission denied even though the path looks right. In KV v2, the actual storage paths include a data/ or metadata/ prefix — secret/data/myapp/* for reads and secret/metadata/myapp/* for listing. The policy must match the internal path, not the CLI shorthand.

Fix: Always write KV v2 policies with the explicit internal paths:

path "secret/data/myapp/*" { capabilities = ["read"] }
path "secret/metadata/myapp/*" { capabilities = ["list"] }

Run vault kv get -output-policy secret/myapp/config to print the exact policy snippet needed for any path.

3. Root Token Left Active in Production¶

During initial setup you authenticate with the root token for convenience. You add it to .env files, CI/CD secrets, or Kubernetes secrets "temporarily." The root token bypasses all policies and has no expiry. It becomes the de facto permanent auth method for all services. When the token leaks — and it will — every secret Vault holds is compromised.

Fix: Revoke the root token immediately after setup using vault token revoke <root-token>. If you need root-level access later, regenerate a temporary root token via vault operator generate-root using the unseal keys, use it, then revoke it again. Services should authenticate via AppRole, Kubernetes auth, or AWS IAM auth — never root token.

4. Not Renewing Leases (Dynamic Secret Expiry)¶

Your application uses Vault's AWS dynamic credentials or database secrets with a short default TTL (1 hour). The application requests credentials at startup and never renews them. After 60 minutes the lease expires, AWS revokes the access key, and the application starts throwing 403 InvalidClientTokenId errors. This typically surfaces during long-running batch jobs or after a quiet overnight period.

Fix: Use a Vault SDK that handles lease renewal automatically (the official Go and Python SDKs do this). Set default_ttl to match your renewal interval and max_ttl to a safe ceiling. If you cannot use the SDK, implement a background goroutine/thread that calls vault lease renew <lease_id> at 2/3 of the TTL interval. Monitor with a Prometheus alert on vault_secret_lease_expiry_seconds.

5. Vault Agent Template Renders Stale Secrets¶

You use Vault Agent with template rendering to write secrets to files. Vault Agent renders the template once at pod startup. The underlying dynamic secret rotates or expires, but the file on disk still contains the old value. The application reads from the file and fails with an auth error minutes later.

Fix: Configure Vault Agent to use exec mode (command runner) or set min_ttl / error_on_missing_key in templates so the agent re-renders and signals the application when secrets change. In Kubernetes, use the vault.hashicorp.com/agent-inject-template-* annotations and ensure the agent is configured with command to restart or signal the app on rotation. Alternatively, use the Vault CSI provider with refreshPeriod.

6. Overly Broad Policies (Path Wildcards)¶

Under time pressure, you write a policy like path "secret/*" { capabilities = ["read", "list"] }. This grants the service account read access to every secret in the KV engine — including other teams' secrets, service passwords, and infrastructure credentials. A single compromised pod or leaked token can read everything.

Fix: Write least-privilege policies scoped to exactly the paths the service needs. Use vault policy fmt to lint policies and vault token capabilities <token> <path> to audit what a given token can actually access. Enforce policy reviews in your GitOps workflow — store policies in git, require PR review, apply via Terraform (vault_policy resource) or Vault's API in CI.

7. Not Enabling Audit Logging¶

Vault is deployed in production with no audit devices configured. When a security incident occurs — leaked credentials, unexplained data access, token abuse — there is no audit trail. You cannot determine which token accessed which secrets, when, or from where.

Fix: Enable at least one audit device before Vault handles production secrets:

vault audit enable file file_path=/var/log/vault/audit.log
vault audit enable syslog

Vault will refuse to respond if all audit devices fail (fail-closed), so monitor the audit log disk space and pipeline. Ship audit logs to a SIEM. Alert on repeated 403 events from the same accessor — it often signals a misconfigured service or an attacker probing paths.

8. Snapshot-Restorable Unseal Keys in Backups¶

Your Vault backup procedure snapshots the full VM image or etcd data. A well-meaning ops engineer stores these snapshots in the same cloud account as the Vault deployment. The snapshots contain the encrypted Vault data and — if the encryption key is also in the same account — provide everything an attacker needs to decrypt the entire secrets store offline.

Fix: Separate backup storage from Vault deployment. For Vault's Raft storage, use vault operator raft snapshot save (not raw disk snapshots) and store snapshots encrypted in a different cloud account or region. Never store the root token or master key adjacent to backups. Treat backups as sensitive as the live Vault data — restrict access, encrypt in transit and at rest.

9. Kubernetes Auth Role Namespace Wildcard in Production¶

You configure the Kubernetes auth role with bound_service_account_namespaces=["*"] to avoid per-namespace Vault role configuration. Any service account named myapp in any namespace can now authenticate as the myapp Vault role and read those secrets — including test namespaces, CI namespaces, and any namespace a developer creates.

Fix: Always specify exact namespaces in Kubernetes auth roles:

vault write auth/kubernetes/role/myapp \
  bound_service_account_names=myapp \
  bound_service_account_namespaces=production \
  policies=myapp-policy

Use separate Vault roles per environment. Restrict namespace creation via Kubernetes RBAC so developers cannot create namespaces that could impersonate production service accounts.

10. Forgetting to Revoke Dynamic Secrets on Service Decommission¶

You decommission an application but forget to revoke its Vault role and outstanding leases. Vault continues issuing credentials on renewed leases, dynamic AWS IAM users or database accounts remain active, and old tokens keep working until they hit max TTL. If the service's Kubernetes namespace still exists and the SA isn't deleted, the role remains exploitable indefinitely.

Fix: Include Vault cleanup in your service decommission runbook: vault lease revoke -prefix <mount>/creds/<role>/, delete the Vault role, and remove the policy. Use Terraform to manage Vault roles — destroying the Terraform resource automatically triggers cleanup. Set short max_ttl (e.g., 24h) on all dynamic credential roles so leaked leases self-destruct quickly.