etcd - Street-Level Ops¶
Real-world workflows for operating, backing up, and troubleshooting etcd in Kubernetes clusters.
Health Check¶
# Set env for all commands (put this in your shell profile)
export ETCDCTL_API=3
ETCD_CERTS="--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key"
# Quick cluster health
etcdctl endpoint health --cluster $ETCD_CERTS
# https://10.0.1.10:2379 is healthy: successfully committed proposal: took = 2.34ms
# https://10.0.1.11:2379 is healthy: successfully committed proposal: took = 3.12ms
# https://10.0.1.12:2379 is healthy: successfully committed proposal: took = 2.87ms
# Detailed status per member (shows leader, DB size, raft index)
etcdctl endpoint status --write-out=table --cluster $ETCD_CERTS
# +------------------+--------+---------+---------+-----------+------------+
# | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT INDEX |
# +------------------+--------+---------+---------+-----------+------------+
# Who is the leader?
etcdctl endpoint status --write-out=json --cluster $ETCD_CERTS | jq '.[] | select(.Status.leader == .Status.header.member_id) | .Endpoint'
Debug clue: If
endpoint healthshows hightookvalues (>100ms), etcd is under disk pressure. WAL fsync latency is the #1 performance bottleneck — etcd needs low-latency storage (SSD, not network-attached EBS gp2).Gotcha: Always use
--clusterflag for health checks. Without it, you only check the single endpoint you connected to — the other two members could be down and you would not know.Remember: etcd health check mnemonic: H-S-M — Health, Size, Members. Run
endpoint health(latency),endpoint status(DB size + leader),member list(quorum). If any of these three look wrong, the cluster is degrading.
Backup¶
# Snapshot save (run on a control plane node)
etcdctl snapshot save /backup/etcd-$(date +%Y%m%d-%H%M%S).db \
--endpoints=https://127.0.0.1:2379 $ETCD_CERTS
# Verify the snapshot
etcdctl snapshot status /backup/etcd-20260315-140000.db --write-out=table
# +----------+----------+------------+------------+
# | HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
# +----------+----------+------------+------------+
# | 9f8c2d1a | 482918 | 1247 | 4.2 MB |
# +----------+----------+------------+------------+
> **War story:** A team had etcd backups running hourly but never tested restore. When they needed it, the snapshots were corrupt because the backup script was writing to a full disk and silently producing zero-byte files. Always verify with `etcdctl snapshot status` after each backup.
# Cron job for hourly backups (add to /etc/crontab)
# 0 * * * * /usr/local/bin/etcdctl snapshot save /backup/etcd-$(date +\%Y\%m\%d-\%H\%M).db --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key && find /backup -name "etcd-*.db" -mtime +7 -delete
Restore¶
# Stop kube-apiserver and etcd on ALL control plane nodes first
# On kubeadm clusters, move the static pod manifests:
mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/
mv /etc/kubernetes/manifests/etcd.yaml /tmp/
# Restore on each member (different --name and --initial-advertise-peer-urls per node)
etcdctl snapshot restore /backup/etcd-20260315-140000.db \
--data-dir=/var/lib/etcd-restored \
--name=etcd-0 \
--initial-cluster=etcd-0=https://10.0.1.10:2380,etcd-1=https://10.0.1.11:2380,etcd-2=https://10.0.1.12:2380 \
--initial-advertise-peer-urls=https://10.0.1.10:2380
# Point etcd config to the new data directory
# Edit /etc/kubernetes/manifests/etcd.yaml: change --data-dir to /var/lib/etcd-restored
# Restore static pod manifests
mv /tmp/etcd.yaml /etc/kubernetes/manifests/
mv /tmp/kube-apiserver.yaml /etc/kubernetes/manifests/
# Verify cluster comes back
etcdctl endpoint health --cluster $ETCD_CERTS
Database Size and Compaction¶
# Check current database size
etcdctl endpoint status --write-out=json $ETCD_CERTS | jq '.[].Status.dbSize' | numfmt --to=iec
# Check if alarms are set (quota exceeded?)
etcdctl alarm list $ETCD_CERTS
# Compact old revisions
REV=$(etcdctl endpoint status --write-out=json $ETCD_CERTS | jq '.[0].Status.header.revision')
etcdctl compact $REV $ETCD_CERTS
# Defrag to reclaim disk space (run on ONE member at a time, blocks during execution)
etcdctl defrag --endpoints=https://10.0.1.10:2379 $ETCD_CERTS
# WARNING: defrag blocks ALL reads/writes during execution. Run on non-leader first.
# If quota exceeded: compact, defrag, then disarm
etcdctl alarm disarm $ETCD_CERTS
Default trap: The default etcd database quota is 2GB. A busy cluster with many ConfigMaps, Secrets, and CRDs can hit this. When the quota is exceeded, etcd goes read-only and the entire cluster freezes. Set
--quota-backend-bytes=8589934592(8GB) in production.Gotcha: The
etcdctl defragcommand blocks ALL reads and writes on the target member during execution. In a 3-node cluster, always defrag the non-leader members first, then the leader last. If you defrag the leader, it triggers a leader election mid-defrag, potentially causing API server timeouts.Scale note: etcd's recommended max DB size is 8GB (the
--quota-backend-byteshard cap). Beyond 8GB, etcd warns at startup and performance degrades. If your DB is growing past 4GB, audit withetcdctl get /registry --prefix --keys-only | awk -F/ '{print $3}' | sort | uniq -c | sort -rnto find which resource type is consuming space — often it is Events, Leases, or custom CRDs.
Member Management¶
# List all members
etcdctl member list --write-out=table $ETCD_CERTS
# Remove a failed member
etcdctl member remove <MEMBER_ID> $ETCD_CERTS
# Add a replacement member
etcdctl member add etcd-2 --peer-urls=https://10.0.1.12:2380 $ETCD_CERTS
# Start the new node with --initial-cluster-state=existing
Certificate Check¶
# Check certificate expiry dates
openssl x509 -in /etc/kubernetes/pki/etcd/server.crt -noout -enddate
# notAfter=Mar 15 00:00:00 2027 GMT
# Check all etcd certs at once
for cert in /etc/kubernetes/pki/etcd/*.crt; do
echo "$cert: $(openssl x509 -in $cert -noout -enddate)"
done
# Renew with kubeadm
kubeadm certs renew all
systemctl restart kubelet
Default trap: kubeadm-managed etcd certificates expire after 1 year by default. If you forget to renew, the API server loses contact with etcd and the entire cluster goes read-only. Set a calendar reminder or automate with
kubeadm certs check-expirationin a cron job that alerts 30 days before expiry.
Read Kubernetes Data from etcd¶
# List all keys (careful: can be large)
etcdctl get / --prefix --keys-only $ETCD_CERTS | head -30
# Read a specific pod definition
etcdctl get /registry/pods/default/nginx-7d9fc $ETCD_CERTS
# Count keys by resource type
etcdctl get /registry --prefix --keys-only $ETCD_CERTS | \
awk -F/ '{print $3}' | sort | uniq -c | sort -rn | head -20
Performance Check¶
# Check WAL fsync latency (most common perf bottleneck)
# Look at Prometheus metrics if available:
# histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m]))
# Quick disk latency test on the etcd data directory
dd if=/dev/zero of=/var/lib/etcd/test bs=512 count=1000 oflag=dsync 2>&1 | tail -1
rm /var/lib/etcd/test
# Check for leader changes (instability indicator)
etcdctl endpoint status --write-out=json --cluster $ETCD_CERTS | jq '.[].Status.raftTerm'
Under the hood: A rising
raftTermmeans leader elections are happening. Frequent elections indicate network partitions between control plane nodes or disk too slow for the heartbeat interval (default 100ms). IfraftTermjumps by more than 2-3 in an hour, investigate network and disk latency immediately.
Quick Reference¶
- Cheatsheet: Etcd-Operations