- devops
- l2
- topic-pack
- redis
- database-ops --- Portal | Level: L2: Operations | Topics: Redis Operations, Database Operations | Domain: DevOps & Tooling
Redis Operations - Primer¶
Why This Matters¶
Redis is the universal caching layer and data structure server. It sits in front of databases, between microservices, and inside session stores across almost every production stack. Redis is single-threaded by design (one CPU core handles all commands sequentially), which makes it fast (no lock contention) but operationally sensitive — one slow command (KEYS *, large SORT) blocks every other client. Understanding persistence modes, eviction policies, replication, and clustering is essential because Redis failures are typically cascading: when the cache layer fails, the database behind it gets hit with the full query load and often falls over too.
Who made it: Redis was created by Salvatore Sanfilippo (known as "antirez") in 2009 in Sicily, Italy. He built it to improve the scalability of his startup LLOOGG, a real-time web analytics tool. The name stands for Remote Dictionary Server. After open-sourcing it on Hacker News, GitHub and Instagram were among the first major adopters. Sanfilippo served as BDFL (Benevolent Dictator for Life) for 11 years before stepping down in 2020, then returned to Redis (the company) in late 2024 as an evangelist.
Name origin: Redis commands follow a naming convention where the first letter often indicates the data structure:
Sfor sets (SADD, SMEMBERS),Zfor sorted sets (ZADD, ZRANGE),Hfor hashes (HSET, HGET),Lfor lists (LPUSH, LRANGE). String commands have no prefix (GET, SET). Once you internalize this pattern, you can guess command names without looking them up.
Core Concepts¶
1. Essential redis-cli Commands¶
# Connect
redis-cli -h localhost -p 6379
redis-cli -h redis.example.com -p 6379 -a 'password' --tls
# Basic operations
SET user:123:name "Alice" EX 3600 # set with 1-hour TTL
GET user:123:name
DEL user:123:name
EXISTS user:123:name # returns 1 or 0
TTL user:123:name # seconds remaining (-1 = no expiry, -2 = missing)
# Key inspection
TYPE mykey # string, list, set, hash, zset, stream
OBJECT ENCODING mykey # internal encoding (ziplist, hashtable, etc.)
OBJECT IDLETIME mykey # seconds since last access
SCAN 0 MATCH user:* COUNT 100 # iterate keys safely (never use KEYS in production)
# Hash operations (structured objects)
HSET user:123 name "Alice" email "alice@example.com" role "admin"
HGET user:123 name
HGETALL user:123
# List operations (queues)
LPUSH queue:tasks "job-42"
RPOP queue:tasks
LLEN queue:tasks
# Sorted sets (leaderboards, rate limiters)
ZADD leaderboard 1500 "player1" 2300 "player2"
ZRANGEBYSCORE leaderboard 1000 2000
ZRANK leaderboard "player1"
# Pub/Sub
SUBSCRIBE events:deploy # blocks, waits for messages
PUBLISH events:deploy "v2.3.1 deployed"
# Server info
INFO server # version, uptime, config
INFO memory # used memory, fragmentation ratio
INFO replication # master/replica status
INFO stats # total commands, keyspace hits/misses
INFO keyspace # per-database key count
2. Persistence Configuration¶
Redis offers two persistence mechanisms. Most production deployments use both.
RDB (point-in-time snapshots):
# redis.conf
save 900 1 # snapshot if >= 1 key changed in 900 seconds
save 300 10 # snapshot if >= 10 keys changed in 300 seconds
save 60 10000 # snapshot if >= 10000 keys changed in 60 seconds
dbfilename dump.rdb
dir /var/lib/redis
AOF (append-only file — write log):
appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec # fsync once per second (best balance of safety and performance)
# appendfsync always # fsync every write (slow but safest)
# appendfsync no # let OS decide (fastest, risk data loss)
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
| Mode | Data loss on crash | Disk I/O | Recovery speed |
|---|---|---|---|
| RDB only | Up to last snapshot interval | Low | Fast |
| AOF everysec | Up to 1 second | Moderate | Slower (replays log) |
| AOF always | None (theoretically) | High | Slowest |
| RDB + AOF | Up to 1 second | Moderate | Uses AOF for recovery |
# Manual snapshot
redis-cli BGSAVE
# Manual AOF rewrite (compacts the log)
redis-cli BGREWRITEAOF
# Check last save time
redis-cli LASTSAVE
3. Memory Management and Eviction¶
# Check memory usage
redis-cli INFO memory
# used_memory_human: 2.5G
# used_memory_rss_human: 3.1G # RSS > used = fragmentation
# mem_fragmentation_ratio: 1.24 # > 1.5 is problematic
# maxmemory_human: 4G
# maxmemory_policy: allkeys-lru
# Memory usage for a specific key
redis-cli MEMORY USAGE mykey
Eviction policies (maxmemory-policy in redis.conf):
| Policy | Behavior |
|---|---|
| noeviction | Return errors on write when full (default) |
| allkeys-lru | Evict least recently used keys |
| allkeys-lfu | Evict least frequently used keys (Redis 4.0+) |
| volatile-lru | Evict LRU keys only among those with TTL |
| volatile-ttl | Evict keys with shortest TTL first |
| allkeys-random | Evict random keys |
Default trap: Redis ships with
maxmemory-policy: noeviction— meaning when Redis fills up, it returns errors on every write instead of evicting old keys. For a cache, this is almost never what you want. Setmaxmemory-policy allkeys-lfu(orallkeys-lru) explicitly. For a session store or primary data store,noevictionis correct because you do not want Redis silently deleting data. Know your use case and set the policy intentionally.Gotcha: The
KEYS *command scans the entire keyspace in a single blocking operation. In production with millions of keys, this can block Redis for seconds — causing timeouts across every client. Always useSCANinstead, which iterates incrementally. Some Redis GUIs and admin tools useKEYSinternally. Rename the command inredis.conf(rename-command KEYS "") to prevent accidental use.
4. Sentinel (High Availability)¶
Sentinel monitors Redis instances, performs automatic failover, and serves as a service discovery endpoint.
# sentinel.conf
sentinel monitor mymaster 10.0.1.10 6379 2 # name, ip, port, quorum
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 1
sentinel auth-pass mymaster yourpassword
# Start sentinel
redis-sentinel /etc/redis/sentinel.conf
# Query sentinel
redis-cli -p 26379 SENTINEL masters
redis-cli -p 26379 SENTINEL get-master-addr-by-name mymaster
redis-cli -p 26379 SENTINEL replicas mymaster
# Manual failover
redis-cli -p 26379 SENTINEL failover mymaster
Minimum sentinel deployment: 3 sentinel nodes (quorum of 2), 1 master, 1+ replicas. Sentinels must be on separate failure domains.
5. Redis Cluster¶
Cluster provides automatic sharding across multiple Redis nodes. Data is split into 16384 hash slots distributed across masters.
# Create a cluster (3 masters, 3 replicas)
redis-cli --cluster create \
10.0.1.1:6379 10.0.1.2:6379 10.0.1.3:6379 \
10.0.1.4:6379 10.0.1.5:6379 10.0.1.6:6379 \
--cluster-replicas 1
# Check cluster health
redis-cli -c -h 10.0.1.1 CLUSTER INFO
redis-cli -c -h 10.0.1.1 CLUSTER NODES
# Reshard slots between nodes
redis-cli --cluster reshard 10.0.1.1:6379
# Add a node to the cluster
redis-cli --cluster add-node 10.0.1.7:6379 10.0.1.1:6379
# Connect in cluster mode (follows MOVED redirects)
redis-cli -c -h 10.0.1.1 -p 6379
Cluster constraints:
- Multi-key operations only work on keys in the same hash slot
- Use hash tags to colocate keys: {user:123}:profile and {user:123}:sessions go to the same slot
- No database selection (only DB 0)
- Lua scripts must use keys in the same slot
6. Monitoring and Diagnostics¶
# Real-time command monitor (use briefly — performance impact)
redis-cli MONITOR
# Slow log (commands exceeding threshold)
redis-cli CONFIG SET slowlog-log-slower-than 10000 # 10ms in microseconds
redis-cli SLOWLOG GET 10
redis-cli SLOWLOG LEN
redis-cli SLOWLOG RESET
# Client list
redis-cli CLIENT LIST
redis-cli CLIENT KILL ID 42
# Latency diagnostics
redis-cli --latency # continuous latency sampling
redis-cli --latency-history # latency over time
redis-cli --bigkeys # scan for large keys (safe in production)
redis-cli --memkeys # scan for memory-heavy keys
# Check keyspace hit rate
redis-cli INFO stats | grep keyspace
# keyspace_hits:12345678
# keyspace_misses:123456
# Hit rate = hits / (hits + misses) — aim for > 95%
7. Operational Patterns¶
Cache-aside pattern: 1. Application checks Redis first 2. On miss, query database, write result to Redis with TTL 3. On write, invalidate the cache key (or update)
Distributed locking (Redlock):
# Acquire lock with expiry
SET lock:order:42 "owner-uuid" NX EX 30
# NX = only if not exists, EX 30 = 30 second expiry
# Release lock (only if you own it — use Lua script)
redis-cli EVAL "if redis.call('get',KEYS[1]) == ARGV[1] then return redis.call('del',KEYS[1]) else return 0 end" 1 lock:order:42 "owner-uuid"
Rate limiting with sorted sets:
# Sliding window rate limiter
ZADD ratelimit:user:123 <timestamp> <request-id>
ZREMRANGEBYSCORE ratelimit:user:123 0 <timestamp - window>
ZCARD ratelimit:user:123
# If count > limit, reject
Quick Reference¶
# Service management
sudo systemctl start/stop/restart redis
# Check connectivity
redis-cli PING # should return PONG
# Flush (dangerous — know what you're doing)
redis-cli FLUSHDB # current database
redis-cli FLUSHALL # all databases
# Config at runtime (no restart)
redis-cli CONFIG SET maxmemory 4gb
redis-cli CONFIG SET maxmemory-policy allkeys-lfu
redis-cli CONFIG REWRITE # persist runtime changes to redis.conf
# Replica status
redis-cli INFO replication
Wiki Navigation¶
Related Content¶
- AWS Database Flashcards (CLI) (flashcard_deck, L1) — Database Operations
- Database Operations Flashcards (CLI) (flashcard_deck, L1) — Database Operations
- Database Operations on Kubernetes (Topic Pack, L2) — Database Operations
- Database Ops Drills (Drill, L2) — Database Operations
- Interview: Database Failover During Deploy (Scenario, L3) — Database Operations
- PostgreSQL Operations (Topic Pack, L2) — Database Operations
- SQL Fundamentals (Topic Pack, L0) — Database Operations
- SQLite Operations & Internals (Topic Pack, L2) — Database Operations
- Skillcheck: Database Ops (Assessment, L2) — Database Operations