Elasticsearch Operations

lesson
elasticsearch-ops
shard-allocation
disk-watermarks
mapping-explosions
cluster-health
l2 ---# Elasticsearch: Operations for People Who Didn't Choose It

Topics: Elasticsearch ops, shard allocation, disk watermarks, mapping explosions, cluster health Level: L2 (Operations) Time: 45–60 minutes Prerequisites: None (you inherited an ES cluster and need to keep it alive)

The Mission¶

You didn't choose Elasticsearch. Someone before you deployed it for logging, or search, or analytics. Now the cluster is yellow, disk is filling, and queries are timing out. The documentation is 3,000 pages. You need the 30 pages that matter.

Cluster Health: Green, Yellow, Red¶

curl -s http://localhost:9200/_cluster/health | jq .
# → {
#     "status": "yellow",        ← NOT green
#     "number_of_nodes": 3,
#     "unassigned_shards": 5     ← shards without a home
#   }

Status	Meaning	Action
Green	All primary and replica shards allocated	None (this is good)
Yellow	All primaries allocated, some replicas missing	Find why replicas aren't allocated
Red	Some primary shards missing — DATA LOSS RISK	Emergency: find and fix missing primaries

Yellow is the most common non-green state. It usually means: you have number_of_replicas: 1 but only 1 node (replicas can't be on the same node as the primary).

# Why are shards unassigned?
curl -s 'http://localhost:9200/_cluster/allocation/explain' | jq .
# → "allocate_explanation": "cannot allocate because all found copies are stale or corrupt"
# → Or: "the node is above the high watermark"

Disk Watermarks: Why Elasticsearch Stops Writing¶

ES has three disk thresholds:

Watermark	Default	What happens
Low	85% disk used	No new shards allocated to this node
High	90% disk used	Shards start relocating AWAY from this node
Flood stage	95% disk used	Indices become READ-ONLY. Writes rejected.

# Check disk usage per node
curl -s 'http://localhost:9200/_cat/allocation?v'
# → node   shards disk.used disk.avail disk.percent
# → node-1    142     85gb      15gb          85     ← at low watermark!
# → node-2    138     90gb      10gb          90     ← at HIGH watermark!

When flood stage triggers, your indices go read-only. Logs stop being indexed. Searches work but nothing new gets written.

# Unlock read-only indices (after freeing disk space!)
curl -X PUT 'http://localhost:9200/_all/_settings' \
  -H 'Content-Type: application/json' \
  -d '{"index.blocks.read_only_allow_delete": null}'

Gotcha: Unlocking indices without freeing disk space just triggers flood stage again within minutes. Free space first (delete old indices, increase disk), then unlock.

Index Lifecycle: Stop Storing Everything Forever¶

# Check index sizes
curl -s 'http://localhost:9200/_cat/indices?v&s=store.size:desc' | head -10
# → index                     status store.size
# → logs-2025-01              open   120gb     ← 14 months old!
# → logs-2025-02              open   115gb
# → logs-2025-03              open   118gb

If you're storing logs for 14 months and only query the last 7 days, you're wasting 90% of your disk.

ILM (Index Lifecycle Management)¶

PUT _ilm/policy/logs-policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "1d"
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "shrink": { "number_of_shards": 1 },
          "forcemerge": { "max_num_segments": 1 }
        }
      },
      "delete": {
        "min_age": "30d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Hot (current, fast SSD) → Warm (7 days, cheaper disk, merged) → Delete (30 days).

The Mapping Explosion¶

# Check field count per index
curl -s 'http://localhost:9200/logs/_mapping' | jq '.. | objects | keys | length' | sort -rn | head -1
# → 5000 ← this index has 5,000 unique fields!

Every unique field name creates a mapping entry. Logging systems that ingest arbitrary JSON (Kubernetes labels, user-defined fields) can create thousands of unique fields. This consumes memory, slows queries, and eventually crashes the node.

Fix: Set a field limit and use strict mappings:

PUT logs/_settings
{
  "index.mapping.total_fields.limit": 1000
}

Gotcha: Setting dynamic: "strict" on mappings rejects documents with unknown fields. This breaks log ingestion if your applications add new fields without updating the mapping. Use dynamic: "false" instead — it accepts the document but doesn't index unknown fields (they're stored but not searchable).

Essential Operations¶

# Cluster health
curl -s localhost:9200/_cluster/health?pretty

# Node stats (CPU, memory, disk per node)
curl -s localhost:9200/_cat/nodes?v&h=name,cpu,heap.percent,disk.used_percent

# Index sizes (sorted by size)
curl -s localhost:9200/_cat/indices?v&s=store.size:desc | head

# Shard allocation
curl -s localhost:9200/_cat/shards?v&s=state | head -20

# Unassigned shard explanation
curl -s localhost:9200/_cluster/allocation/explain?pretty

# Delete old indices
curl -X DELETE localhost:9200/logs-2025-01

# Force merge (reduce segments, free disk)
curl -X POST localhost:9200/logs-2025-06/_forcemerge?max_num_segments=1

Flashcard Check¶

Q1: Cluster status is yellow. What does this mean?

All primary shards are allocated but some replicas are missing. Data is safe but not redundant. Common cause: single-node cluster with replicas configured.

Q2: Disk at 95%. What happens?

Flood stage watermark triggers. All indices become read-only. Writes are rejected. Free disk space, then unlock indices.

Q3: Index has 5,000 unique fields. Why is this a problem?

Each field consumes memory for mapping metadata. Thousands of fields slow queries, waste memory, and can crash nodes. Set total_fields.limit.

Cheat Sheet¶

Task	Command
Cluster health	`curl localhost:9200/_cluster/health?pretty`
Node stats	`curl localhost:9200/_cat/nodes?v`
Index sizes	`curl localhost:9200/_cat/indices?v&s=store.size:desc`
Shard status	`curl localhost:9200/_cat/shards?v`
Unassigned why	`curl localhost:9200/_cluster/allocation/explain?pretty`
Delete index	`curl -X DELETE localhost:9200/INDEX_NAME`
Unlock read-only	`curl -X PUT localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete":null}'`

Disk Watermarks¶

Level	Threshold	Effect
Low	85%	No new shards allocated
High	90%	Shards relocate away
Flood	95%	Indices go READ-ONLY

Takeaways¶

Yellow = replicas missing, not data loss. Green = everything allocated. Red = data at risk. Yellow is usually a single-node configuration issue.
Disk watermarks are non-negotiable. At 95%, writes stop. Monitor disk and set up ILM to delete old indices automatically.
Mapping explosions kill clusters. Set total_fields.limit. Use dynamic: false for log indices to accept but not index unknown fields.
ILM is essential for log clusters. Hot → Warm → Delete. Without it, you store everything forever and the disk fills.

Exercises¶

Run a single-node cluster and check health. Start Elasticsearch in a container: docker run -d --name es -p 9200:9200 -e "discovery.type=single-node" -e "xpack.security.enabled=false" elasticsearch:8.12.0. Run curl -s localhost:9200/_cluster/health | jq . and note the status. It should be green or yellow. If yellow, use curl -s localhost:9200/_cluster/allocation/explain | jq . to find out why replicas are unassigned on a single-node cluster.
Index documents and inspect shard allocation. Create an index with curl -X PUT localhost:9200/test-index. Index a few documents: curl -X POST localhost:9200/test-index/_doc -H 'Content-Type: application/json' -d '{"msg":"hello"}'. Check shard allocation with curl -s localhost:9200/_cat/shards/test-index?v. Note the shard count, state, and which node they live on. Try setting replicas to 0: curl -X PUT localhost:9200/test-index/_settings -H 'Content-Type: application/json' -d '{"number_of_replicas":0}' and check cluster health again.
Simulate the flood-stage watermark. On your test cluster, check disk usage with curl -s localhost:9200/_cat/allocation?v. Lower the watermark thresholds to trigger them on your current disk usage (e.g., curl -X PUT localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d '{"transient":{"cluster.routing.allocation.disk.watermark.flood_stage":"1b"}}'). Try to index a document and observe the read-only block. Unlock with curl -X PUT localhost:9200/test-index/_settings -H 'Content-Type: application/json' -d '{"index.blocks.read_only_allow_delete":null}'. Reset watermarks to defaults afterward.
Find big indices and calculate retention savings. Run curl -s localhost:9200/_cat/indices?v&s=store.size:desc and identify the largest indices. For a log-based index pattern, calculate how much disk you would save by reducing retention from 90 days to 30 days. Write the ILM policy JSON (hot/delete only) that would implement this retention. Clean up your test container with docker rm -f es.

The Disk That Filled Up — when ES disk watermarks cause write failures
When the Queue Backs Up — when log ingestion backs up because ES is slow
The Monitoring That Lied — when ES metrics don't reflect user experience