Portal | Level: L2: Operations | Topics: etcd | Domain: Kubernetes
etcd Operations Skill Check¶
Rate yourself 0-2 on each item: 0 = never done, 1 = done with help, 2 = confident
Health & Monitoring¶
- Check etcd cluster health with
etcdctl endpoint health - Check member list and leader status
- Monitor etcd metrics (db size, WAL fsync duration, leader changes)
- Set up alerts for etcd latency and disk space
- Understand quorum requirements (N/2+1 for N members)
Backup & Restore¶
- Take a snapshot with
etcdctl snapshot save - Verify a snapshot with
etcdctl snapshot status - Restore a cluster from a snapshot (single node and multi-node)
- Set up automated periodic etcd backups
- Test restore procedure in a non-production environment
Space Management¶
- Check current database size and quota
- Compact etcd revision history
- Defragment etcd to reclaim disk space
- Handle "mvcc: database space exceeded" alarm
- Disarm the NOSPACE alarm after space recovery
Cluster Operations¶
- Add a new member to an etcd cluster
- Remove a failed member from the cluster
- Understand learner (non-voting) members
- Migrate etcd from stacked to external topology
- Handle leader election issues
Troubleshooting¶
- Diagnose slow etcd responses (disk latency, network)
- Investigate "request timed out" errors
- Debug split-brain scenarios
- Check etcd logs for warning patterns
- Use
etcdctl get/etcdctl watchfor debugging
Scoring¶
| Score | Level |
|---|---|
| 0-6 | Beginner — study the etcd primer and practice backups |
| 7-12 | Intermediate — practice restore and space management |
| 13-18 | Advanced — handle cluster member operations |
| 19+ | Expert — lead etcd incident response |
Wiki Navigation¶
Related Content¶
- Interview: etcd Space Exceeded (Scenario, L3) — etcd
- Runbook: etcd Backup & Restore (Runbook, L2) — etcd
- Runbook: etcd High Latency / Slow Operations (Runbook, L3) — etcd
- Scenario: etcd Troubleshooting (Scenario, L3) — etcd
- etcd (Topic Pack, L1) — etcd
- etcd Drills (Drill, L2) — etcd
- etcd Flashcards (CLI) (flashcard_deck, L1) — etcd