Skip to content

Portal | Level: L2: Operations | Topics: Kubernetes Core | Domain: Kubernetes

Runbook: Velero Backup & Restore (Application-Level DR)

Symptoms

  • Need to migrate workloads between clusters
  • Accidental namespace deletion
  • Need to restore specific applications, not the entire cluster
  • Disaster recovery for application state (PVCs, configs, secrets)

Fast Triage

# Check Velero status
velero version
velero get backup-locations
velero backup get

# Check latest backup
velero backup describe <latest-backup-name> --details

What Velero Does

Velero backs up Kubernetes resources and persistent volumes:

[Namespaces] + [Deployments] + [Services] + [ConfigMaps] + [Secrets] + [PVCs]
                                    |
                              [Object Storage (S3/GCS)]
                                    +
                              [Volume Snapshots]

Installation

# Install Velero CLI
curl -LO https://github.com/vmware-tanzu/velero/releases/download/v1.13.0/velero-v1.13.0-linux-amd64.tar.gz
tar xvf velero-v1.13.0-linux-amd64.tar.gz
sudo mv velero-v1.13.0-linux-amd64/velero /usr/local/bin/

# Install Velero server (AWS example)
velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.9.0 \
  --bucket velero-backups \
  --backup-location-config region=us-east-1 \
  --snapshot-location-config region=us-east-1 \
  --secret-file ./credentials-velero

# Verify
kubectl get pods -n velero
velero get backup-locations

Backup Procedures

Full Cluster Backup

velero backup create full-backup-$(date +%Y%m%d) \
  --wait

Namespace Backup

velero backup create grokdevops-backup-$(date +%Y%m%d) \
  --include-namespaces grokdevops \
  --wait

Scheduled Backups

# Daily backup of grokdevops namespace, retain 7 days
velero schedule create grokdevops-daily \
  --schedule="0 2 * * *" \
  --include-namespaces grokdevops \
  --ttl 168h

# Weekly full backup, retain 30 days
velero schedule create weekly-full \
  --schedule="0 3 * * 0" \
  --ttl 720h

Backup with Volume Snapshots

velero backup create with-volumes-$(date +%Y%m%d) \
  --include-namespaces grokdevops \
  --snapshot-volumes=true \
  --wait

Restore Procedures

Restore a Namespace

# Restore grokdevops namespace from backup
velero restore create --from-backup grokdevops-backup-20240115 \
  --wait

# Check restore status
velero restore describe <restore-name> --details

Restore to a Different Namespace

velero restore create --from-backup grokdevops-backup-20240115 \
  --namespace-mappings grokdevops:grokdevops-restored \
  --wait

Restore Specific Resources

# Restore only deployments and services
velero restore create --from-backup grokdevops-backup-20240115 \
  --include-resources deployments,services \
  --wait

# Restore only a specific resource
velero restore create --from-backup grokdevops-backup-20240115 \
  --include-resources deployments \
  --selector app=grokdevops \
  --wait

Migrate to Another Cluster

[!WARNING] Cross-cluster restores are destructive. Restoring into a namespace that already has resources will overwrite them (with --existing-resource-policy=update) or fail silently on conflicts. Always restore to a new namespace or empty cluster first, verify the result, then cut over.

# Source cluster: create backup
velero backup create migration-$(date +%Y%m%d) \
  --include-namespaces grokdevops \
  --wait

# Target cluster: install Velero pointing to same bucket
velero install --provider aws --bucket velero-backups ...

# Target cluster: restore
velero restore create --from-backup migration-$(date +%Y%m%d) \
  --wait

Verification

# Check restore status
velero restore describe <restore-name>
# Expected: Phase: Completed

# Verify resources
kubectl get all -n grokdevops
kubectl get pvc -n grokdevops

# Verify application health
kubectl rollout status deployment/grokdevops -n grokdevops
curl -s http://grokdevops:8000/health

Monitoring

# List all backups with status
velero backup get

# Check for failed backups
velero backup get -o json | jq '.items[] | select(.status.phase != "Completed") | .metadata.name'

# Check schedule status
velero schedule get

Common Issues

Issue Fix
Backup stuck in "InProgress" Check Velero pod logs, verify S3 connectivity
Restore fails with "already exists" Use --existing-resource-policy=update
PVC not restored Verify snapshot-location is configured, CSI driver supports snapshots
Partial restore Check velero restore logs <name> for skipped resources
Credentials expired Update the Velero secret with new AWS/GCP credentials

Wiki Navigation