Skip to content

HashiCorp Vault - Street-Level Ops

Quick Diagnosis Commands

# Vault health and status
vault status
vault operator raft list-peers  # HA cluster peers

# Check auth methods and secret engines
vault auth list
vault secrets list

# Token lookup (your current token)
vault token lookup
vault token lookup <token>

# List tokens with accessor info
vault list auth/token/accessors

# Check a specific accessor
vault token lookup -accessor <accessor>

# Audit logs (if file audit enabled)
tail -f /var/log/vault/audit.log | jq .

# Check seal status and unseal keys
vault status | grep -E "Sealed|Unseal Progress"

# List leases for a secret path
vault list sys/leases/lookup/aws/creds/my-role/

# Revoke a specific lease
vault lease revoke <lease_id>

# Renew a lease
vault lease renew <lease_id>

# Check policy attached to token
vault token lookup | grep policies
vault policy read <policy-name>

# KV secret operations
vault kv get secret/myapp/config
vault kv get -field=password secret/myapp/config
vault kv list secret/myapp/
vault kv put secret/myapp/config key=value
vault kv metadata get secret/myapp/config   # v2 metadata

# AppRole auth
vault read auth/approle/role/myapp/role-id
vault write -f auth/approle/role/myapp/secret-id
> **Under the hood:** AppRole's `secret_id` is the short-lived credential (analogous to a password), while `role_id` is the long-lived identifier (analogous to a username). In CI pipelines, bake `role_id` into the image and inject `secret_id` at runtime via a trusted orchestrator. Never store both in the same place.

vault write auth/approle/login role_id=<id> secret_id=<sid>

# PKI quick check
vault pki health-check pki
vault list pki/certs
vault write pki/issue/my-role common_name="app.example.com" ttl=24h

Common Scenarios

Scenario 1: Vault Sealed After Restart

Vault starts sealed. No secrets can be retrieved.

# Check seal status
vault status
# Sealed: true
# Unseal Progress: 0/3

# If using Shamir unseal (manual)
# Need threshold number of key holders to unseal
vault operator unseal <unseal-key-1>
vault operator unseal <unseal-key-2>
vault operator unseal <unseal-key-3>
# Watch "Unseal Progress" increment to threshold

# If using auto-unseal (AWS KMS, GCP CKMS, Azure Key Vault)
# The problem is likely the cloud KMS — check cloud key accessibility
# AWS KMS: check IAM role on Vault instance has kms:Decrypt permission
aws kms describe-key --key-id <kms-key-id>

# Check Vault logs for auto-unseal errors
journalctl -u vault --since "10 minutes ago" | grep -i "unseal\|kms\|error"

# Vault cluster: unseal each node individually
# (unsealing is per-node, not cluster-wide)

> **Remember:** Vault seals itself on restart. Auto-unseal (via AWS KMS, GCP CKMS, or Azure Key Vault) eliminates the manual unseal ceremony. Without auto-unseal, a Vault pod restart in Kubernetes requires human intervention before any secrets can be served. This is the number one operational surprise for new Vault deployments.
for node in vault-0 vault-1 vault-2; do
  kubectl exec -n vault $node -- vault operator unseal <unseal-key>
done

Scenario 2: Token Expired / Permission Denied

Application can't read secrets — 403 permission denied or token expired.

# Step 1: Check if the token is valid at all
vault token lookup
# If error: token expired or invalid

# Step 2: If using AppRole, regenerate secret-id (short TTL by default)
vault write -f auth/approle/role/myapp/secret-id
# New secret_id in response

# Step 3: Check token's policies
vault token lookup | grep -A5 policies

# Step 4: Check what the policy actually allows
vault policy read myapp-policy

# Step 5: Test path access directly
vault kv get secret/myapp/config
# If 403: policy doesn't allow this path

# Step 6: Write a corrected policy
vault policy write myapp-policy - <<EOF
path "secret/data/myapp/*" {
  capabilities = ["read"]
}
EOF
# Note: KV v2 requires "secret/data/..." path, not "secret/..."

> **Default trap:** KV v2 policy paths must include `/data/` in the path (`secret/data/myapp/*`), but the CLI command uses the short path (`vault kv get secret/myapp/config`). This mismatch is the number one cause of "permission denied" errors when migrating from KV v1 to v2. The CLI transparently adds `/data/`  your policies must match the internal path, not the CLI path.

# Vault Agent: check agent log if using agent injection
kubectl logs <pod> -c vault-agent

Scenario 3: Dynamic AWS Credentials Not Working

Application gets 403 from AWS using Vault-issued credentials.

# Step 1: Manually generate credentials and test
vault read aws/creds/my-role
# access_key, secret_key, lease_id returned

# Step 2: Test the credentials directly
AWS_ACCESS_KEY_ID=<from-vault> AWS_SECRET_ACCESS_KEY=<from-vault> \
  aws sts get-caller-identity

# Step 3: If credential works but expires fast, check TTL
vault read aws/roles/my-role
# Check max_ttl and default_ttl

# Step 4: Check Vault's IAM role can generate credentials
vault read aws/config/root  # check iam user/role Vault uses
# Verify Vault's IAM entity has iam:CreateAccessKey, sts:AssumeRole etc.

# Step 5: Lease is already expired — revoke and re-issue
vault lease revoke -prefix aws/creds/my-role/
vault read aws/creds/my-role  # get fresh creds

# Step 6: If using assumed-role, check the target role's trust policy
# Vault's IAM entity must be in the trust relationship

Scenario 4: Vault Agent Sidecar Not Injecting Secrets

Pod running but secrets not appearing at /vault/secrets/.

# Step 1: Check pod annotations are correct
kubectl get pod <pod> -o yaml | grep -A20 annotations

# Required annotations:
# vault.hashicorp.com/agent-inject: "true"
# vault.hashicorp.com/role: "myapp"
# vault.hashicorp.com/agent-inject-secret-config: "secret/data/myapp/config"

# Step 2: Check if vault-agent container is running
kubectl describe pod <pod> | grep -A5 "vault-agent"
kubectl logs <pod> -c vault-agent-init
kubectl logs <pod> -c vault-agent

# Step 3: Vault Agent auth failure
# Look for "403" or "permission denied" in agent logs
kubectl logs <pod> -c vault-agent-init | tail -50

# Step 4: Check Kubernetes auth is configured in Vault
vault auth list | grep kubernetes
vault read auth/kubernetes/config
vault read auth/kubernetes/role/myapp

# Step 5: Verify service account binding
kubectl get serviceaccount <sa-name> -n <namespace>
# SA name must match what the Vault role expects

# Step 6: Rebuild the Vault Kubernetes role if needed
vault write auth/kubernetes/role/myapp \
  bound_service_account_names=myapp \
  bound_service_account_namespaces=production \
  policies=myapp-policy \
  ttl=1h

Key Patterns

KV v1 vs KV v2 Path Difference

# KV v1: path is "secret/<path>"
vault kv get secret/myapp/config         # works
# Policy: path "secret/myapp/*" { capabilities = ["read"] }

# KV v2: path is "secret/data/<path>" for read, "secret/metadata/<path>" for list
vault kv get secret/myapp/config         # CLI handles this transparently
# But in policies you must use the internal paths:
# path "secret/data/myapp/*" { capabilities = ["read"] }
# path "secret/metadata/myapp/*" { capabilities = ["list"] }

# Check which version is running
vault secrets list -detailed | grep secret

Vault Lease Renewal

# Find all leases expiring soon (requires vault admin)
vault list sys/leases/lookup/

# Renew before expiry (application should do this)
vault lease renew <lease_id>

# Set max TTL for a secret engine
vault secrets tune -max-lease-ttl=720h aws/

# In application: use Vault SDK with lease renewal built in
# Go SDK: client.Auth().Token().RenewSelf()
# Python hvac: client.renew_token()

Transit Encryption (Encryption-as-a-Service)

# Create a transit key
vault write -f transit/keys/myapp-key

# Encrypt data
vault write transit/encrypt/myapp-key \
  plaintext=$(echo "my secret data" | base64)
# Returns: vault:v1:<ciphertext>

# Decrypt data
vault write transit/decrypt/myapp-key \
  ciphertext="vault:v1:<ciphertext>"
# Returns base64 encoded plaintext

# Rotate key (old ciphertext still decryptable)
vault write -f transit/keys/myapp-key/rotate

> **Under the hood:** Transit key rotation creates a new key version but keeps old versions for decryption. Setting `min_decryption_version` to a higher number deletes the ability to decrypt old ciphertext  use this for crypto-shredding (making old data permanently unreadable without deleting the ciphertext itself).
vault write transit/keys/myapp-key/config min_decryption_version=2

Audit Log Review

# Enable file audit logging
vault audit enable file file_path=/var/log/vault/audit.log

# Parse audit log for specific token activity
cat /var/log/vault/audit.log | jq 'select(.auth.accessor == "<accessor>")'

# Find denied requests
cat /var/log/vault/audit.log | jq 'select(.response.auth == null and .type == "response")'

# Count requests by path
cat /var/log/vault/audit.log | jq -r '.request.path' | sort | uniq -c | sort -rn | head -20

> **Gotcha:** Vault audit logs record every request, including the full request body for write operations. This means secrets appear in the audit log (HMAC'd by default). If you disable HMAC (for debugging), plaintext secrets appear in the log file. Never disable HMAC in production, and treat audit logs with the same security as the secrets themselves.

War story: A team configured Vault auto-unseal with AWS KMS but forgot to add the KMS key to their Terraform state. When they rebuilt the Vault cluster in a new account, the old KMS key was gone. The Vault data was encrypted with a key they no longer had access to. All secrets were permanently lost. Always back up the KMS key ARN and ensure cross-account access or key export is configured.

See Also