Skip to content

etcd - Street-Level Ops

Real-world workflows for operating, backing up, and troubleshooting etcd in Kubernetes clusters.

Health Check

# Set env for all commands (put this in your shell profile)
export ETCDCTL_API=3
ETCD_CERTS="--cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key"

# Quick cluster health
etcdctl endpoint health --cluster $ETCD_CERTS
# https://10.0.1.10:2379 is healthy: successfully committed proposal: took = 2.34ms
# https://10.0.1.11:2379 is healthy: successfully committed proposal: took = 3.12ms
# https://10.0.1.12:2379 is healthy: successfully committed proposal: took = 2.87ms

# Detailed status per member (shows leader, DB size, raft index)
etcdctl endpoint status --write-out=table --cluster $ETCD_CERTS
# +------------------+--------+---------+---------+-----------+------------+
# |    ENDPOINT      |   ID   | VERSION | DB SIZE | IS LEADER | RAFT INDEX |
# +------------------+--------+---------+---------+-----------+------------+

# Who is the leader?
etcdctl endpoint status --write-out=json --cluster $ETCD_CERTS | jq '.[] | select(.Status.leader == .Status.header.member_id) | .Endpoint'

Debug clue: If endpoint health shows high took values (>100ms), etcd is under disk pressure. WAL fsync latency is the #1 performance bottleneck — etcd needs low-latency storage (SSD, not network-attached EBS gp2).

Gotcha: Always use --cluster flag for health checks. Without it, you only check the single endpoint you connected to — the other two members could be down and you would not know.

Remember: etcd health check mnemonic: H-S-M — Health, Size, Members. Run endpoint health (latency), endpoint status (DB size + leader), member list (quorum). If any of these three look wrong, the cluster is degrading.

Backup

# Snapshot save (run on a control plane node)
etcdctl snapshot save /backup/etcd-$(date +%Y%m%d-%H%M%S).db \
  --endpoints=https://127.0.0.1:2379 $ETCD_CERTS

# Verify the snapshot
etcdctl snapshot status /backup/etcd-20260315-140000.db --write-out=table
# +----------+----------+------------+------------+
# |   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |
# +----------+----------+------------+------------+
# | 9f8c2d1a |   482918 |       1247 |    4.2 MB  |
# +----------+----------+------------+------------+

> **War story:** A team had etcd backups running hourly but never tested restore. When they needed it, the snapshots were corrupt because the backup script was writing to a full disk and silently producing zero-byte files. Always verify with `etcdctl snapshot status` after each backup.

# Cron job for hourly backups (add to /etc/crontab)
# 0 * * * * /usr/local/bin/etcdctl snapshot save /backup/etcd-$(date +\%Y\%m\%d-\%H\%M).db --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key && find /backup -name "etcd-*.db" -mtime +7 -delete

Restore

# Stop kube-apiserver and etcd on ALL control plane nodes first
# On kubeadm clusters, move the static pod manifests:
mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/
mv /etc/kubernetes/manifests/etcd.yaml /tmp/

# Restore on each member (different --name and --initial-advertise-peer-urls per node)
etcdctl snapshot restore /backup/etcd-20260315-140000.db \
  --data-dir=/var/lib/etcd-restored \
  --name=etcd-0 \
  --initial-cluster=etcd-0=https://10.0.1.10:2380,etcd-1=https://10.0.1.11:2380,etcd-2=https://10.0.1.12:2380 \
  --initial-advertise-peer-urls=https://10.0.1.10:2380

# Point etcd config to the new data directory
# Edit /etc/kubernetes/manifests/etcd.yaml: change --data-dir to /var/lib/etcd-restored

# Restore static pod manifests
mv /tmp/etcd.yaml /etc/kubernetes/manifests/
mv /tmp/kube-apiserver.yaml /etc/kubernetes/manifests/

# Verify cluster comes back
etcdctl endpoint health --cluster $ETCD_CERTS

Database Size and Compaction

# Check current database size
etcdctl endpoint status --write-out=json $ETCD_CERTS | jq '.[].Status.dbSize' | numfmt --to=iec

# Check if alarms are set (quota exceeded?)
etcdctl alarm list $ETCD_CERTS

# Compact old revisions
REV=$(etcdctl endpoint status --write-out=json $ETCD_CERTS | jq '.[0].Status.header.revision')
etcdctl compact $REV $ETCD_CERTS

# Defrag to reclaim disk space (run on ONE member at a time, blocks during execution)
etcdctl defrag --endpoints=https://10.0.1.10:2379 $ETCD_CERTS

# WARNING: defrag blocks ALL reads/writes during execution. Run on non-leader first.

# If quota exceeded: compact, defrag, then disarm
etcdctl alarm disarm $ETCD_CERTS

Default trap: The default etcd database quota is 2GB. A busy cluster with many ConfigMaps, Secrets, and CRDs can hit this. When the quota is exceeded, etcd goes read-only and the entire cluster freezes. Set --quota-backend-bytes=8589934592 (8GB) in production.

Gotcha: The etcdctl defrag command blocks ALL reads and writes on the target member during execution. In a 3-node cluster, always defrag the non-leader members first, then the leader last. If you defrag the leader, it triggers a leader election mid-defrag, potentially causing API server timeouts.

Scale note: etcd's recommended max DB size is 8GB (the --quota-backend-bytes hard cap). Beyond 8GB, etcd warns at startup and performance degrades. If your DB is growing past 4GB, audit with etcdctl get /registry --prefix --keys-only | awk -F/ '{print $3}' | sort | uniq -c | sort -rn to find which resource type is consuming space — often it is Events, Leases, or custom CRDs.

Member Management

# List all members
etcdctl member list --write-out=table $ETCD_CERTS

# Remove a failed member
etcdctl member remove <MEMBER_ID> $ETCD_CERTS

# Add a replacement member
etcdctl member add etcd-2 --peer-urls=https://10.0.1.12:2380 $ETCD_CERTS

# Start the new node with --initial-cluster-state=existing

Certificate Check

# Check certificate expiry dates
openssl x509 -in /etc/kubernetes/pki/etcd/server.crt -noout -enddate
# notAfter=Mar 15 00:00:00 2027 GMT

# Check all etcd certs at once
for cert in /etc/kubernetes/pki/etcd/*.crt; do
  echo "$cert: $(openssl x509 -in $cert -noout -enddate)"
done

# Renew with kubeadm
kubeadm certs renew all
systemctl restart kubelet

Default trap: kubeadm-managed etcd certificates expire after 1 year by default. If you forget to renew, the API server loses contact with etcd and the entire cluster goes read-only. Set a calendar reminder or automate with kubeadm certs check-expiration in a cron job that alerts 30 days before expiry.

Read Kubernetes Data from etcd

# List all keys (careful: can be large)
etcdctl get / --prefix --keys-only $ETCD_CERTS | head -30

# Read a specific pod definition
etcdctl get /registry/pods/default/nginx-7d9fc $ETCD_CERTS

# Count keys by resource type
etcdctl get /registry --prefix --keys-only $ETCD_CERTS | \
  awk -F/ '{print $3}' | sort | uniq -c | sort -rn | head -20

Performance Check

# Check WAL fsync latency (most common perf bottleneck)
# Look at Prometheus metrics if available:
# histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m]))

# Quick disk latency test on the etcd data directory
dd if=/dev/zero of=/var/lib/etcd/test bs=512 count=1000 oflag=dsync 2>&1 | tail -1
rm /var/lib/etcd/test

# Check for leader changes (instability indicator)
etcdctl endpoint status --write-out=json --cluster $ETCD_CERTS | jq '.[].Status.raftTerm'

Under the hood: A rising raftTerm means leader elections are happening. Frequent elections indicate network partitions between control plane nodes or disk too slow for the heartbeat interval (default 100ms). If raftTerm jumps by more than 2-3 in an hour, investigate network and disk latency immediately.


Quick Reference