Redis in Production
- lesson
- redis-data-structures
- persistence
- pub/sub
- replication
- memory-management
- operations
- l2 ---# Redis in Production: More Than a Cache
Topics: Redis data structures, persistence, pub/sub, replication, memory management, operations Level: L2 (Operations) Time: 45–60 minutes Prerequisites: Basic key-value store concept
The Mission¶
You inherited a Redis instance. It's used as a cache, a session store, a message queue, a rate limiter, and a leaderboard. Different teams added different uses over the years. Nobody documented any of it. Redis is now critical infrastructure that nobody fully understands.
This lesson covers what Redis actually does, how it persists data (or doesn't), and the operational gotchas that bite everyone.
Redis in 60 Seconds¶
Redis is an in-memory data structure server. Not just key-value — it has strings, lists, sets, sorted sets, hashes, streams, and more. All data lives in RAM, which makes it fast (sub-millisecond responses) but means it's limited by available memory.
# Basic operations
redis-cli SET user:1234:name "Alice" # String
redis-cli GET user:1234:name # → "Alice"
redis-cli HSET user:1234 name Alice age 30 # Hash (like a Python dict)
redis-cli HGETALL user:1234 # → name Alice age 30
redis-cli LPUSH queue:orders '{"id": 42}' # List (used as a queue)
redis-cli RPOP queue:orders # → {"id": 42}
redis-cli ZADD leaderboard 9500 alice 8700 bob # Sorted set (score-based)
redis-cli ZREVRANGE leaderboard 0 9 WITHSCORES # Top 10 by score
redis-cli SET session:abc123 '{"user_id": 42}' EX 3600 # Expire in 1 hour
Name Origin: Redis stands for REmote DIctionary Server. Created by Salvatore Sanfilippo ("antirez") in 2009 to solve a web analytics scaling problem. He needed a system faster than MySQL for real-time statistics. He wrote Redis in C, single-threaded, and it became one of the most popular databases in the world.
Trivia: Redis is single-threaded for command processing. One CPU core handles all commands sequentially. This sounds slow, but it means: no locks, no mutexes, no race conditions. A single Redis instance handles 100,000+ operations per second on modest hardware. Multi-threading was added for I/O in Redis 6.0 (2020), but command processing is still single-threaded.
Persistence: Will My Data Survive a Restart?¶
Redis lives in RAM. Without persistence, restarting Redis = all data gone. Two persistence mechanisms:
RDB (Redis Database file) — Snapshots¶
# redis.conf
save 900 1 # Snapshot if 1 key changed in 900 seconds
save 300 10 # Snapshot if 10 keys changed in 300 seconds
save 60 10000 # Snapshot if 10000 keys changed in 60 seconds
Redis forks a child process, the child writes the dataset to dump.rdb. The parent keeps
serving requests. Using copy-on-write, the child gets a consistent snapshot without blocking.
- Pro: Compact file, fast restarts, good for backups
- Con: Data loss between snapshots (up to 15 minutes with default settings)
AOF (Append Only File) — Write log¶
# redis.conf
appendonly yes
appendfsync everysec # fsync every second (good compromise)
# appendfsync always # fsync every write (safest, slowest)
# appendfsync no # let the OS decide (fastest, risky)
Every write command is appended to a log file. On restart, Redis replays the log.
- Pro: At most 1 second of data loss (with
appendfsync everysec) - Con: Larger file than RDB, slower restart (must replay all commands)
The right choice¶
Cache only (can regenerate): No persistence (save "")
Session store (lose = inconvenience): RDB snapshots
Critical data (lose = problem): AOF + RDB (both)
Gotcha: AOF files grow over time. Redis has AOF rewrite (
BGREWRITEAOF) that compacts the log, but if the rewrite is slower than write volume, the file grows unboundedly. Monitor AOF file size. If your disk fills because of AOF, Redis stops accepting writes.
Memory Management: The Eviction Question¶
Redis is bounded by available RAM. When it hits the limit (maxmemory), it must decide
what to do with new writes:
| Policy | Behavior |
|---|---|
noeviction |
Return errors on write (default — dangerous for caches!) |
allkeys-lru |
Evict least recently used key (best for caches) |
volatile-lru |
Evict LRU keys that have an expire set |
allkeys-random |
Evict random keys |
volatile-ttl |
Evict keys with shortest TTL |
Gotcha: The default policy is
noeviction. If you're using Redis as a cache without settingmaxmemory-policy, Redis fills up and starts returning errors instead of evicting old entries. Every cache should useallkeys-lru.
# Check memory usage
redis-cli INFO memory
# → used_memory_human:1.87G
# → maxmemory_human:2.00G
# → maxmemory_policy:allkeys-lru
# → evicted_keys:45231 ← keys evicted to make room
Common Redis Problems¶
Problem 1: Keys without expiry grow forever¶
# Find keys without TTL
redis-cli --scan --pattern '*' | while read key; do
ttl=$(redis-cli TTL "$key")
if [ "$ttl" = "-1" ]; then
echo "No expiry: $key"
fi
done
Keys without TTL persist until manually deleted or evicted (if allkeys-lru). Session
stores that don't set expiry accumulate stale sessions until memory fills.
Problem 2: Big keys¶
One key with a 50MB value blocks Redis while serializing/deserializing (remember: single-threaded). This causes latency spikes for all clients.
# Find big keys
redis-cli --bigkeys
# → [00.00%] Biggest string found so far '"cache:report:2026"' with 52428800 bytes
# ↑ 50MB string!
Fix: Break big values into smaller keys, use Redis Streams for large datasets, or offload to S3/database.
Problem 3: Slow commands¶
Some Redis commands are O(n) and block the single thread:
# DANGEROUS on large datasets:
KEYS * # O(n) — scans all keys. Use SCAN instead.
SMEMBERS bigset # O(n) — returns all members of a set
SORT # O(n+m*log(m)) — sorts and returns
FLUSHALL # Blocks until all keys deleted
# Check slow commands
redis-cli SLOWLOG GET 10
# → 1) (integer) 1234
# 2) (integer) 1711108800
# 3) (integer) 150000 ← 150ms!
# 4) 1) "KEYS"
# 2) "*" ← there's the problem
Remember:
KEYS *is banned in production. UseSCAN(cursor-based, non-blocking):
Flashcard Check¶
Q1: Redis is single-threaded. How does it handle 100K ops/sec?
No locks, no mutexes, no context switches. Commands execute sequentially in memory. Each operation is microseconds. Single-threaded simplicity = extreme speed.
Q2: maxmemory-policy: noeviction — what happens when memory fills?
Redis returns errors on new writes. For caches, this is wrong — use
allkeys-lruto evict least-recently-used keys automatically.
Q3: RDB vs AOF — when to use which?
Cache (can regenerate): RDB or nothing. Sessions (lose = inconvenience): RDB. Critical data (lose = problem): AOF + RDB.
Q4: Why is KEYS * banned in production?
O(n) — scans every key, blocking the single thread. A database with 10 million keys blocks for seconds. Use
SCAN(cursor-based, non-blocking).
Cheat Sheet¶
Essential Redis Commands¶
| Task | Command |
|---|---|
| Memory usage | redis-cli INFO memory |
| Find big keys | redis-cli --bigkeys |
| Slow command log | redis-cli SLOWLOG GET 10 |
| Connected clients | redis-cli INFO clients |
| Persistence status | redis-cli INFO persistence |
| Replication status | redis-cli INFO replication |
| Safe key scan | redis-cli SCAN 0 COUNT 100 |
| Key TTL | redis-cli TTL keyname |
| Set expiry | redis-cli EXPIRE keyname 3600 |
| Monitor commands (debug) | redis-cli MONITOR (careful — high overhead) |
redis.conf Essentials¶
maxmemory 2gb
maxmemory-policy allkeys-lru
appendonly yes
appendfsync everysec
save 900 1
save 300 10
Takeaways¶
-
Redis is not just a cache. Queues, pub/sub, rate limiting, leaderboards, sessions — it's a multi-purpose data structure server. But each use has different persistence needs.
-
Set
maxmemory-policytoallkeys-lrufor caches. The default (noeviction) causes errors when memory fills. Every cache needs eviction. -
KEYS *is production's enemy. UseSCAN. OneKEYS *on 10M keys blocks Redis for seconds — every client waits. -
Single-threaded = no locks, but one slow command blocks everything. Big keys,
KEYS,SORT— all block the event loop. Monitor withSLOWLOG. -
Persistence is not automatic. RDB + AOF for data you care about. Nothing for pure caches. Monitor AOF file size or it fills the disk.
Exercises¶
-
Explore Redis data structures. Start Redis in a container:
docker run -d --name redis-test -p 6379:6379 redis:7. Connect withredis-cli -p 6379. Create one of each: a string (SET), a hash (HSET), a list (LPUSH), and a sorted set (ZADD). UseTYPE keynameto verify each type. UseTTL keynameto confirm none have an expiry set. Add a 60-second expiry to one key withEXPIRE keyname 60and watch it withTTL keynameuntil it disappears. -
Find big keys and slow commands. With your test Redis running, insert a large key:
redis-cli SET bigkey $(python3 -c "print('x' * 1_000_000)"). Runredis-cli --bigkeysand confirm it identifies the large string. Then runredis-cli SLOWLOG GET 10and note any slow operations. Runredis-cli KEYS '*'(safe here because the database is tiny) and then check the slowlog again to see if it was recorded. -
Test eviction policies. Start a Redis container with a 1MB memory limit:
docker run -d --name redis-evict -p 6380:6379 redis:7 redis-server --maxmemory 1mb --maxmemory-policy noeviction. Write keys in a loop until you hit the memory limit and observe the OOM error. Stop the container and start another with--maxmemory-policy allkeys-lru. Fill it again and observe that old keys are evicted instead of returning errors. Clean up withdocker rm -f redis-evict. -
Compare RDB and AOF persistence. Start Redis with AOF enabled:
docker run -d --name redis-persist -p 6381:6379 redis:7 redis-server --appendonly yes. Write a few keys. Rundocker exec redis-persist ls -la /data/to see bothdump.rdband the AOF file. Rundocker exec redis-persist redis-cli BGSAVEto trigger an RDB snapshot. Compare file sizes. Stop and restart the container and confirm data survived. Clean up withdocker rm -f redis-persist. -
Use SCAN instead of KEYS. Connect to your test Redis and create 100 keys:
for i in $(seq 1 100); do redis-cli -p 6379 SET "user:$i" "data$i"; done. Useredis-cli SCAN 0 COUNT 10to iterate through keys in batches. Follow the cursor value returned by each call until the cursor returns to 0. Count the total keys found and confirm it matches 100. This demonstrates the safe, non-blocking alternative toKEYS *.
Related Lessons¶
- The Split-Brain Nightmare — Redis Sentinel split-brain
- Out of Memory — when Redis hits maxmemory
- The Cascading Timeout — Redis as a circuit breaker cache