Redis in Production

lesson
redis-data-structures
persistence
pub/sub
replication
memory-management
operations
l2 ---# Redis in Production: More Than a Cache

Topics: Redis data structures, persistence, pub/sub, replication, memory management, operations Level: L2 (Operations) Time: 45–60 minutes Prerequisites: Basic key-value store concept

The Mission¶

You inherited a Redis instance. It's used as a cache, a session store, a message queue, a rate limiter, and a leaderboard. Different teams added different uses over the years. Nobody documented any of it. Redis is now critical infrastructure that nobody fully understands.

This lesson covers what Redis actually does, how it persists data (or doesn't), and the operational gotchas that bite everyone.

Redis in 60 Seconds¶

Redis is an in-memory data structure server. Not just key-value — it has strings, lists, sets, sorted sets, hashes, streams, and more. All data lives in RAM, which makes it fast (sub-millisecond responses) but means it's limited by available memory.

# Basic operations
redis-cli SET user:1234:name "Alice"            # String
redis-cli GET user:1234:name                    # → "Alice"

redis-cli HSET user:1234 name Alice age 30      # Hash (like a Python dict)
redis-cli HGETALL user:1234                     # → name Alice age 30

redis-cli LPUSH queue:orders '{"id": 42}'       # List (used as a queue)
redis-cli RPOP queue:orders                     # → {"id": 42}

redis-cli ZADD leaderboard 9500 alice 8700 bob  # Sorted set (score-based)
redis-cli ZREVRANGE leaderboard 0 9 WITHSCORES  # Top 10 by score

redis-cli SET session:abc123 '{"user_id": 42}' EX 3600  # Expire in 1 hour

Name Origin: Redis stands for REmote DIctionary Server. Created by Salvatore Sanfilippo ("antirez") in 2009 to solve a web analytics scaling problem. He needed a system faster than MySQL for real-time statistics. He wrote Redis in C, single-threaded, and it became one of the most popular databases in the world.

Trivia: Redis is single-threaded for command processing. One CPU core handles all commands sequentially. This sounds slow, but it means: no locks, no mutexes, no race conditions. A single Redis instance handles 100,000+ operations per second on modest hardware. Multi-threading was added for I/O in Redis 6.0 (2020), but command processing is still single-threaded.

Persistence: Will My Data Survive a Restart?¶

Redis lives in RAM. Without persistence, restarting Redis = all data gone. Two persistence mechanisms:

RDB (Redis Database file) — Snapshots¶

# redis.conf
save 900 1      # Snapshot if 1 key changed in 900 seconds
save 300 10     # Snapshot if 10 keys changed in 300 seconds
save 60 10000   # Snapshot if 10000 keys changed in 60 seconds

Redis forks a child process, the child writes the dataset to dump.rdb. The parent keeps serving requests. Using copy-on-write, the child gets a consistent snapshot without blocking.

Pro: Compact file, fast restarts, good for backups
Con: Data loss between snapshots (up to 15 minutes with default settings)

AOF (Append Only File) — Write log¶

# redis.conf
appendonly yes
appendfsync everysec    # fsync every second (good compromise)
# appendfsync always    # fsync every write (safest, slowest)
# appendfsync no        # let the OS decide (fastest, risky)

Every write command is appended to a log file. On restart, Redis replays the log.

Pro: At most 1 second of data loss (with appendfsync everysec)
Con: Larger file than RDB, slower restart (must replay all commands)

The right choice¶

Cache only (can regenerate):          No persistence (save "")
Session store (lose = inconvenience): RDB snapshots
Critical data (lose = problem):       AOF + RDB (both)

Gotcha: AOF files grow over time. Redis has AOF rewrite (BGREWRITEAOF) that compacts the log, but if the rewrite is slower than write volume, the file grows unboundedly. Monitor AOF file size. If your disk fills because of AOF, Redis stops accepting writes.

Memory Management: The Eviction Question¶

Redis is bounded by available RAM. When it hits the limit (maxmemory), it must decide what to do with new writes:

# redis.conf
maxmemory 2gb
maxmemory-policy allkeys-lru    # Evict least-recently-used keys

Policy	Behavior
`noeviction`	Return errors on write (default — dangerous for caches!)
`allkeys-lru`	Evict least recently used key (best for caches)
`volatile-lru`	Evict LRU keys that have an expire set
`allkeys-random`	Evict random keys
`volatile-ttl`	Evict keys with shortest TTL

Gotcha: The default policy is noeviction. If you're using Redis as a cache without setting maxmemory-policy, Redis fills up and starts returning errors instead of evicting old entries. Every cache should use allkeys-lru.

# Check memory usage
redis-cli INFO memory
# → used_memory_human:1.87G
# → maxmemory_human:2.00G
# → maxmemory_policy:allkeys-lru
# → evicted_keys:45231        ← keys evicted to make room

Common Redis Problems¶

Problem 1: Keys without expiry grow forever¶

# Find keys without TTL
redis-cli --scan --pattern '*' | while read key; do
    ttl=$(redis-cli TTL "$key")
    if [ "$ttl" = "-1" ]; then
        echo "No expiry: $key"
    fi
done

Keys without TTL persist until manually deleted or evicted (if allkeys-lru). Session stores that don't set expiry accumulate stale sessions until memory fills.

Problem 2: Big keys¶

One key with a 50MB value blocks Redis while serializing/deserializing (remember: single-threaded). This causes latency spikes for all clients.

# Find big keys
redis-cli --bigkeys
# → [00.00%] Biggest string found so far '"cache:report:2026"' with 52428800 bytes
#   ↑ 50MB string!

Fix: Break big values into smaller keys, use Redis Streams for large datasets, or offload to S3/database.

Problem 3: Slow commands¶

Some Redis commands are O(n) and block the single thread:

# DANGEROUS on large datasets:
KEYS *           # O(n) — scans all keys. Use SCAN instead.
SMEMBERS bigset  # O(n) — returns all members of a set
SORT             # O(n+m*log(m)) — sorts and returns
FLUSHALL         # Blocks until all keys deleted

# Check slow commands
redis-cli SLOWLOG GET 10
# → 1) (integer) 1234
#    2) (integer) 1711108800
#    3) (integer) 150000    ← 150ms!
#    4) 1) "KEYS"
#       2) "*"              ← there's the problem

Remember: KEYS * is banned in production. Use SCAN (cursor-based, non-blocking):
redis-cli SCAN 0 COUNT 100

Flashcard Check¶

Q1: Redis is single-threaded. How does it handle 100K ops/sec?

No locks, no mutexes, no context switches. Commands execute sequentially in memory. Each operation is microseconds. Single-threaded simplicity = extreme speed.

Q2: maxmemory-policy: noeviction — what happens when memory fills?

Redis returns errors on new writes. For caches, this is wrong — use allkeys-lru to evict least-recently-used keys automatically.

Q3: RDB vs AOF — when to use which?

Cache (can regenerate): RDB or nothing. Sessions (lose = inconvenience): RDB. Critical data (lose = problem): AOF + RDB.

Q4: Why is KEYS * banned in production?

O(n) — scans every key, blocking the single thread. A database with 10 million keys blocks for seconds. Use SCAN (cursor-based, non-blocking).

Cheat Sheet¶

Essential Redis Commands¶

Task	Command
Memory usage	`redis-cli INFO memory`
Find big keys	`redis-cli --bigkeys`
Slow command log	`redis-cli SLOWLOG GET 10`
Connected clients	`redis-cli INFO clients`
Persistence status	`redis-cli INFO persistence`
Replication status	`redis-cli INFO replication`
Safe key scan	`redis-cli SCAN 0 COUNT 100`
Key TTL	`redis-cli TTL keyname`
Set expiry	`redis-cli EXPIRE keyname 3600`
Monitor commands (debug)	`redis-cli MONITOR` (careful — high overhead)

redis.conf Essentials¶

maxmemory 2gb
maxmemory-policy allkeys-lru
appendonly yes
appendfsync everysec
save 900 1
save 300 10

Takeaways¶

Redis is not just a cache. Queues, pub/sub, rate limiting, leaderboards, sessions — it's a multi-purpose data structure server. But each use has different persistence needs.
Set maxmemory-policy to allkeys-lru for caches. The default (noeviction) causes errors when memory fills. Every cache needs eviction.
KEYS * is production's enemy. Use SCAN. One KEYS * on 10M keys blocks Redis for seconds — every client waits.
Single-threaded = no locks, but one slow command blocks everything. Big keys, KEYS, SORT — all block the event loop. Monitor with SLOWLOG.
Persistence is not automatic. RDB + AOF for data you care about. Nothing for pure caches. Monitor AOF file size or it fills the disk.

Exercises¶

Explore Redis data structures. Start Redis in a container: docker run -d --name redis-test -p 6379:6379 redis:7. Connect with redis-cli -p 6379. Create one of each: a string (SET), a hash (HSET), a list (LPUSH), and a sorted set (ZADD). Use TYPE keyname to verify each type. Use TTL keyname to confirm none have an expiry set. Add a 60-second expiry to one key with EXPIRE keyname 60 and watch it with TTL keyname until it disappears.
Find big keys and slow commands. With your test Redis running, insert a large key: redis-cli SET bigkey $(python3 -c "print('x' * 1_000_000)"). Run redis-cli --bigkeys and confirm it identifies the large string. Then run redis-cli SLOWLOG GET 10 and note any slow operations. Run redis-cli KEYS '*' (safe here because the database is tiny) and then check the slowlog again to see if it was recorded.
Test eviction policies. Start a Redis container with a 1MB memory limit: docker run -d --name redis-evict -p 6380:6379 redis:7 redis-server --maxmemory 1mb --maxmemory-policy noeviction. Write keys in a loop until you hit the memory limit and observe the OOM error. Stop the container and start another with --maxmemory-policy allkeys-lru. Fill it again and observe that old keys are evicted instead of returning errors. Clean up with docker rm -f redis-evict.
Compare RDB and AOF persistence. Start Redis with AOF enabled: docker run -d --name redis-persist -p 6381:6379 redis:7 redis-server --appendonly yes. Write a few keys. Run docker exec redis-persist ls -la /data/ to see both dump.rdb and the AOF file. Run docker exec redis-persist redis-cli BGSAVE to trigger an RDB snapshot. Compare file sizes. Stop and restart the container and confirm data survived. Clean up with docker rm -f redis-persist.
Use SCAN instead of KEYS. Connect to your test Redis and create 100 keys: for i in $(seq 1 100); do redis-cli -p 6379 SET "user:$i" "data$i"; done. Use redis-cli SCAN 0 COUNT 10 to iterate through keys in batches. Follow the cursor value returned by each call until the cursor returns to 0. Count the total keys found and confirm it matches 100. This demonstrates the safe, non-blocking alternative to KEYS *.

The Split-Brain Nightmare — Redis Sentinel split-brain
Out of Memory — when Redis hits maxmemory
The Cascading Timeout — Redis as a circuit breaker cache