Redis Operations - Street-Level Ops¶
What experienced Redis operators know that tutorials don't teach.
Quick Diagnosis Commands¶
# Connect to Redis CLI
redis-cli # local, default port 6379
redis-cli -h redis.internal -p 6379
redis-cli -h redis.internal -p 6379 -a <password>
redis-cli -h redis.internal -p 6379 --no-auth-warning -a <password>
# Basic health check
redis-cli ping # → PONG
redis-cli -h redis.internal ping
# Server info (everything)
redis-cli info
redis-cli info server # version, uptime, OS
redis-cli info clients # connected clients, blocked, tracking
redis-cli info memory # memory usage, fragmentation, eviction stats
redis-cli info stats # commands processed, keyspace hits/misses
redis-cli info replication # role, master/replica info, replication lag
redis-cli info keyspace # keys per database, expires
redis-cli info persistence # RDB/AOF status, last save time
# Memory analysis
redis-cli memory doctor # automated memory health check
redis-cli memory usage mykey # size in bytes of a specific key
redis-cli memory usage mykey SAMPLES 0 # exact size (slower)
redis-cli memory stats # detailed memory breakdown
# Slow log — queries that exceeded slowlog-log-slower-than
redis-cli slowlog get 25 # last 25 slow commands
redis-cli slowlog len # total slow log entries
redis-cli slowlog reset
# Monitor — live command stream (WARNING: CPU intensive, dev only)
redis-cli monitor
# Latency diagnosis
redis-cli --latency # measure latency in ms (Ctrl-C to stop)
redis-cli --latency-history # latency samples over time
redis-cli --latency-dist # latency distribution histogram
redis-cli debug sleep 0 # test latency baseline
# Key scanning (NEVER USE KEYS IN PRODUCTION)
redis-cli scan 0 COUNT 100 # returns (cursor, [keys]) — iterate with cursor
redis-cli scan 0 MATCH "session:*" COUNT 100
redis-cli scan 0 TYPE string COUNT 100
# Count keys matching pattern safely
redis-cli --scan --pattern "session:*" | wc -l
# Database info
redis-cli dbsize # total key count (current db)
redis-cli info keyspace # key count per database
# Client list
redis-cli client list # all connected clients
redis-cli client list TYPE normal # filter by type
redis-cli client getname # name of current connection
redis-cli client kill ID <id> # kill specific client
Common Scenarios¶
Scenario 1: Redis Memory Approaching Limit¶
Redis memory usage is at 90% of maxmemory. Application is seeing cache misses or write errors.
Diagnosis:
redis-cli info memory
# Check:
# used_memory_human — actual data
# used_memory_rss_human — OS-level memory (includes fragmentation)
# mem_fragmentation_ratio — > 1.5 means high fragmentation; < 1 means using swap
# maxmemory_policy — how Redis evicts (or errors) when full
# evicted_keys — from info stats; nonzero means eviction is happening
# maxmemory — current limit (0 = no limit)
redis-cli memory doctor
# Reports: fragmentation, big key warnings, AOF buffer, etc.
# Find your biggest keys
redis-cli --bigkeys # scans entire keyspace, finds largest per type
# Outputs: top 5 largest keys per data type, summary stats
# Find keys that should have expired
redis-cli info stats | grep expired_keys # total expired
Fix:
# Check current eviction policy
redis-cli config get maxmemory-policy
# Set eviction policy (choose based on workload)
redis-cli config set maxmemory-policy allkeys-lru # evict LRU regardless of TTL
redis-cli config set maxmemory-policy volatile-lru # evict only keys with TTL
redis-cli config set maxmemory-policy volatile-ttl # evict soonest-to-expire first
redis-cli config set maxmemory-policy noeviction # return errors when full (default for RDB)
# Increase memory limit (if headroom exists)
redis-cli config set maxmemory 4gb
# Defragment memory (Redis 4.0+, use during low traffic)
redis-cli config set activedefrag yes
redis-cli memory purge # force immediate defrag attempt
# Delete large/stale keys
redis-cli del bigkey # immediate delete
redis-cli unlink bigkey # async delete (better for large keys)
# Scan and delete keys by pattern (do NOT use KEYS + DEL in production)
redis-cli --scan --pattern "tmp:*" | xargs redis-cli del
Scenario 2: Replication Lag or Replica Not Syncing¶
A Redis replica is falling behind or shows errors in replication.
Diagnosis:
# On replica
redis-cli info replication
# Check:
# role: slave
# master_host, master_port — correct?
# master_link_status: up/down
# master_last_io_seconds_ago — time since last data from master
# master_sync_in_progress: 1 if full sync in progress
# slave_repl_offset vs master_repl_offset — gap is lag in bytes
# On primary
redis-cli info replication
# connected_slaves — should show replicas
# slave0: ip=...,port=...,state=online,offset=...,lag=0
# Check for replication errors
redis-cli -p 6379 DEBUG JMAP # not for production; check logs instead
tail -f /var/log/redis/redis-server.log
Fix:
# If replica is permanently disconnected, force a full resync
redis-cli replicaof NO ONE # detach from master
redis-cli replicaof <master-ip> 6379 # reattach (triggers PSYNC or full SYNC)
# If partial resync fails (replica offset too far behind):
# Increase repl-backlog-size on primary (allows partial resync window)
redis-cli config set repl-backlog-size 256mb
# On the replica, if you want to restart fresh
redis-cli debug flushall # WARNING: clears all data on replica
# Then reconfigure replicaof
# Check replication buffer
redis-cli info clients | grep slave # output_buffer for slaves
Scenario 3: KEYS Command Blocking Production¶
An engineer ran redis-cli keys "*" or an application called KEYS in production. Redis is single-threaded for command execution. KEYS on a 10M key database takes seconds, blocking all other commands. Applications see timeouts across the board.
Diagnosis:
# Check if KEYS is in the slow log
redis-cli slowlog get 10 | grep -A5 KEYS
# See current blocked clients
redis-cli info clients | grep blocked_clients
# Monitor for KEYS usage
redis-cli monitor 2>/dev/null | grep " KEYS " # dangerous in production — sample briefly
Fix:
# Short-term: if a long KEYS is running, identify and kill the client
redis-cli client list | grep -i "cmd=keys" # find the culprit client
redis-cli client kill ID <id>
# Replace all KEYS usage with SCAN:
# SCAN is O(1) per call, paginated, safe in production
# Application pattern:
cursor=0
while true:
cursor, keys = redis.scan(cursor, match="session:*", count=100)
process(keys)
if cursor == 0: break
# On the CLI:
redis-cli --scan --pattern "session:*" COUNT 100
# Rename or disable dangerous commands via redis.conf
rename-command KEYS "" # disable entirely
rename-command FLUSHDB "FLUSHDB_RENAMED_abc123" # rename to obscure name
Scenario 4: Persistence Failure (RDB Save Failing)¶
Redis logs show "BGSAVE failed: No space left on device" or RDB save keeps failing, leaving persistence disabled.
Diagnosis:
redis-cli info persistence
# Check:
# rdb_bgsave_in_progress: 1 if saving now
# rdb_last_bgsave_status: ok or err
# rdb_last_bgsave_time_sec: how long the last save took
# rdb_last_cow_size: copy-on-write memory used during last save
# aof_enabled: 0 or 1
# aof_last_bgrewrite_status: ok or err
# Check disk space
df -h /var/lib/redis
Fix:
# Check if save is configured
redis-cli config get save # e.g., "3600 1 300 100 60 10000"
# Disable automatic saves if disk is full (emergency)
redis-cli config set save ""
# Trigger a manual save (blocks until complete — careful with large datasets)
redis-cli bgsave # background save (returns immediately)
redis-cli lastsave # timestamp of last successful RDB save
# For AOF issues
redis-cli bgrewriteaof # compact the AOF file
redis-cli config set appendonly no # disable AOF temporarily if rewrite is failing
# If RDB is consistently failing due to low memory (fork OOM):
# Reduce vm.overcommit_memory setting on OS:
echo 1 | sudo tee /proc/sys/vm/overcommit_memory
# Make permanent: echo 'vm.overcommit_memory = 1' >> /etc/sysctl.conf
Key Patterns¶
RDB vs AOF Persistence¶
# RDB (point-in-time snapshots)
# redis.conf settings:
save 3600 1 # save if 1+ keys changed in the last 3600s
save 300 100 # save if 100+ keys changed in the last 300s
save 60 10000 # save if 10000+ keys changed in the last 60s
dbfilename dump.rdb
dir /var/lib/redis
# AOF (append-only log — better durability)
appendonly yes
appendfsync everysec # sync to disk every second (balance of safety/perf)
# appendfsync always # sync every write (safest, slowest)
# appendfsync no # OS decides (fastest, can lose data)
auto-aof-rewrite-percentage 100 # rewrite when AOF doubles in size
auto-aof-rewrite-min-size 64mb
# Both (RDB + AOF) — recommended for production
# RDB for fast restarts, AOF for durability
appendonly yes
save 3600 1
Eviction Policies Reference¶
Remember:
allkeys-lruis the safe default for caches.noevictionis the safe default for data you cannot afford to lose (sessions, queues). If you are unsure, start withallkeys-lru-- the worst case is a cache miss, not an application error.
noeviction — return error when maxmemory reached (use for sessions, not caches)
allkeys-lru — evict LRU key from all keys (recommended for pure caches)
volatile-lru — evict LRU key only from keys with TTL set
allkeys-lfu — evict least frequently used from all keys (Redis 4.0+)
volatile-lfu — evict LFU from keys with TTL
allkeys-random — evict random key from all keys
volatile-random — evict random key from keys with TTL
volatile-ttl — evict the key with nearest expiry time
Cluster Diagnosis¶
# Cluster health
redis-cli cluster info
# cluster_enabled: 1
# cluster_state: ok (or fail)
# cluster_slots_assigned: should be 16384
# cluster_known_nodes: total nodes
# cluster_size: number of master nodes
# Node topology
redis-cli cluster nodes # all nodes, their roles, slots
redis-cli cluster myid # this node's ID
# Check slot assignment
redis-cli cluster info | grep slots
redis-cli cluster check <any-node-host> <port>
# Find which node owns a key
redis-cli cluster keyslot mykey # returns slot number (0-16383)
redis-cli -c -h <any-node> get mykey # -c enables cluster redirect
# Reshard slots (manual rebalancing)
redis-cli --cluster rebalance <any-node>:<port> --cluster-use-empty-masters
# Add a new node
redis-cli --cluster add-node <new-node-ip>:<port> <existing-node-ip>:<port>
# Move a node to replica
redis-cli cluster replicate <master-node-id>
Keyspace Notifications¶
# Enable notifications (redis.conf or runtime)
redis-cli config set notify-keyspace-events "KEA"
# K = keyspace events, E = keyevent events
# A = all events (g$ lz xed)
# x = expired events only: "Kx"
# Subscribe to expiry events (in a separate client)
redis-cli psubscribe "__keyevent@0__:expired"
# Subscribe to all keyspace events on DB 0
redis-cli psubscribe "__keyspace@0__:*"
# Useful combinations:
notify-keyspace-events "Kx" # only expired key events (low overhead)
notify-keyspace-events "Kg" # generic commands (del, expire, rename)
Sentinel (High Availability without Cluster)¶
# Check sentinel status
redis-cli -p 26379 sentinel masters
redis-cli -p 26379 sentinel slaves mymaster
redis-cli -p 26379 sentinel sentinels mymaster
# Failover state
redis-cli -p 26379 sentinel ckquorum mymaster
# → OK 2 usable Sentinels. Quorum and failover authorization can be reached
# Force a manual failover (for maintenance)
redis-cli -p 26379 sentinel failover mymaster
# Sentinel conf
sentinel monitor mymaster 10.0.0.1 6379 2 # 2 = quorum
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
sentinel auth-pass mymaster <password>
Connection and Performance Tuning¶
Default trap:
tcp-backlogdefaults to 511, but the OS enforcesnet.core.somaxconnas the ceiling. Ifsomaxconnis 128 (common default), Redis silently clamps the backlog to 128 and you get connection drops under load. Always tune both:sysctl -w net.core.somaxconn=65535on the host, then settcp-backlog 65535in redis.conf.
# redis.conf tuning for production
tcp-keepalive 300 # keepalive interval in seconds
timeout 0 # don't close idle connections (let app manage)
tcp-backlog 511 # connection backlog (match net.core.somaxconn)
hz 10 # background task frequency (increase for tighter expiry)
# Disable THP (Transparent Huge Pages) — Redis docs recommend this
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/defrag
# Network tweaks for Redis
sudo sysctl -w net.core.somaxconn=65535
sudo sysctl -w net.ipv4.tcp_max_syn_backlog=65535