Etcd¶

23 cards — 🟢 3 easy | 🟡 5 medium | 🔴 3 hard

🟢 Easy (3)¶

1. What does etcd store in a Kubernetes cluster?

Show answer

All Kubernetes cluster state: resource definitions (Pods, Deployments, Services), RBAC policies, Secrets, ConfigMaps, leases, and CRD instances. It does NOT store container images, logs, metrics, or persistent volume data.

Remember: etcd = Kubernetes brain. If it dies without backup, ALL cluster state is lost. Back up daily.

Gotcha: etcd stores API objects only — not images, logs, metrics, or PV data.

2. How many members can fail in a 3-member etcd cluster while maintaining quorum?

Show answer

One member. A 3-member cluster has a quorum of 2, so it tolerates exactly 1 failure. This is why 3 is the minimum recommended production size.

Remember: quorum = (N/2)+1. 3 members: quorum=2, tolerates 1 failure. 5 members: quorum=3, tolerates 2.

3. What command creates an etcd snapshot backup?

Show answer

etcdctl snapshot save /path/to/snapshot.db (with appropriate --endpoints, --cacert, --cert, and --key flags for TLS-secured clusters). Verify with etcdctl snapshot status.

Gotcha: verify with etcdctl snapshot status. Corrupt snapshots give false confidence — worse than no backup.

🟡 Medium (5)¶

1. Why should etcd clusters always have an odd number of members?

Show answer

Even-numbered clusters require the same quorum as the next odd number but tolerate fewer failures. A 4-member cluster needs quorum of 3 (same as 5 members) but can only tolerate 1 failure (vs 2 for a 5-member cluster). Even sizes add cost without improving fault tolerance.

2. What is the difference between compaction and defragmentation in etcd?

Show answer

Compaction removes old key revision history, marking the space as free but not reclaiming it on disk. Defragmentation reclaims that freed space, reducing the actual database file size. Both are needed for maintenance: compact first, then defrag.

3. What happens when etcd exceeds its database size quota, and how do you fix it?

Show answer

etcd enters alarm mode and rejects all writes (the cluster becomes read-only). Fix: run etcdctl compact to remove old revisions, etcdctl defrag to reclaim space, then etcdctl alarm disarm to clear the alarm. Optionally increase --quota-backend-bytes.

4. Why is SSD storage mandatory for production etcd, and what metric indicates disk problems?

Show answer

etcd requires fast synchronous writes for its write-ahead log (WAL). Spinning disks cause high fsync latency, triggering frequent leader elections and API server timeouts. Monitor wal_fsync_duration_seconds — if p99 exceeds 10ms, disk is the bottleneck.

5. What is critical to remember when restoring an etcd snapshot to a multi-member cluster?

Show answer

The restore must be performed on every member of the new cluster, each with its own unique --name and --initial-advertise-peer-urls. The API server and etcd must be stopped first. Restore creates a new data directory; you then point etcd config to it and restart.

🔴 Hard (3)¶

1. What are the two recovery options when an etcd cluster loses quorum, and which is preferred?

Show answer

Option 1 (preferred): Restore from a recent snapshot onto new nodes. Option 2 (last resort): Force a surviving member to start a new single-member cluster with --force-new-cluster, then add new members. The force option risks data inconsistency and should only be used when no snapshot is available.

2. How does certificate expiry cause etcd failure, and how do you prevent it?

Show answer

Expired TLS certificates prevent etcd members from communicating with each other and with the API server, causing TLS handshake errors. The cluster effectively goes down. Prevention: monitor expiry dates (openssl x509 -in cert -noout -enddate), automate rotation, and for kubeadm clusters run kubeadm certs renew all before expiry.

3. Can etcd experience a true split-brain during a network partition? Explain.

Show answer

No. Raft consensus prevents true split-brain. During a network partition, only the partition containing the majority (quorum) can accept writes. The minority partition becomes read-only. However, stale reads from the minority partition can confuse monitoring tools. Once the partition resolves, members reconcile automatically via Raft log replication.