Skip to content

Message Queues

← Back to all decks

36 cards — 🟢 9 easy | 🟡 15 medium | 🔴 6 hard

🟢 Easy (9)

1. What is the fundamental difference between a traditional message queue (RabbitMQ) and a log-based broker (Kafka)?

Show answer Traditional queues delete messages after consumption — designed for task distribution to a single consumer. Log-based brokers retain messages as an immutable log — designed for event streaming where multiple independent consumer groups replay at their own pace.

Remember: RabbitMQ: AMQP. Exchange→Binding→Queue→Consumer. Ports: 5672, 15672(mgmt).

2. What does at-least-once delivery mean in a message queue context?

Show answer The broker guarantees a message is delivered one or more times. The consumer must explicitly acknowledge success before the broker discards the message. If the consumer crashes before acking, the message is redelivered — so duplicates are possible. Consumers must be idempotent.

Remember: At-most-once(lose), At-least-once(dupe), Exactly-once(hard). "Most=lose, Least=dupe."

3. What are the four RabbitMQ exchange types?

Show answer Direct (route by exact routing key), Fanout (broadcast to all bound queues), Topic (wildcard pattern matching on routing key), Headers (route by message header attributes). Direct and Topic are most commonly used in production.

Remember: RabbitMQ: AMQP. Exchange→Binding→Queue→Consumer. Ports: 5672, 15672(mgmt).

4. What is a Kafka consumer group?

Show answer A set of consumer instances that cooperate to consume a topic. Kafka assigns each partition to exactly one consumer in the group at a time, distributing load. Multiple independent groups can each consume the full topic at their own pace.

Remember: Queues decouple services. Key choices: delivery guarantee, ordering, throughput.

Example: RabbitMQ(flexible), Kafka(high throughput), SQS(managed), Redis Streams(light).

5. What is the hard limit on Kafka consumer group parallelism?

Show answer The partition count of the topic. You cannot have more active consumers in a group than partitions — extra consumers sit idle. To scale beyond current limits you must increase the partition count.

Remember: Queues decouple services. Key choices: delivery guarantee, ordering, throughput.

Example: RabbitMQ(flexible), Kafka(high throughput), SQS(managed), Redis Streams(light).

6. What is a dead-letter queue (DLQ) and why is it essential?

Show answer A DLQ is a holding queue for messages that cannot be processed successfully. Without a DLQ, a poison message either loops forever (blocking the queue) or is silently dropped. A DLQ preserves failed messages for investigation and replay.

Remember: DLQ = where failed messages go. Essential for debugging and retry.

7. What does basic.qos prefetch_count control in RabbitMQ?

Show answer It limits how many unacknowledged messages the broker will push to a consumer at once. Without prefetch, the broker can saturate a single consumer with the entire queue backlog. Prefetch is required for fair work distribution and backpressure.

Remember: RabbitMQ: AMQP. Exchange→Binding→Queue→Consumer. Ports: 5672, 15672(mgmt).

8. What does consumer lag measure in Kafka?

Show answer The difference between the log-end offset (latest message produced) and the current-offset (position the consumer group has committed). Lag = LOG-END-OFFSET minus CURRENT-OFFSET. A growing lag means the consumer is falling behind production rate.

Remember: Queues decouple services. Key choices: delivery guarantee, ordering, throughput.

Example: RabbitMQ(flexible), Kafka(high throughput), SQS(managed), Redis Streams(light).

9. What three core problems do message queues solve?

Show answer Decoupling (producers and consumers are independent, failures do not cascade), async processing (long work is handed off and the caller returns immediately), and load leveling (absorb traffic spikes and drain at a sustainable rate).

Remember: Queues decouple services. Key choices: delivery guarantee, ordering, throughput.

Example: RabbitMQ(flexible), Kafka(high throughput), SQS(managed), Redis Streams(light).

🟡 Medium (15)

1. How is exactly-once delivery achieved in Kafka and what is the cost?

Show answer Idempotent producers (enable.idempotence=true) prevent producer-side duplicates. Transactional APIs (transactional.id) wrap read-process-write in a transaction. Both require matching consumer config (isolation.level=read_committed). Cost: ~20% throughput reduction and careful configuration. Exactly-once is not free — use at-least-once + idempotent consumers when possible.

Remember: At-most-once(lose), At-least-once(dupe), Exactly-once(hard). "Most=lose, Least=dupe."

2. What triggers a Kafka consumer group rebalance and why is it disruptive?

Show answer Rebalance is triggered when a consumer joins, leaves, or is considered dead (exceeds max.poll.interval.ms or session.timeout.ms). Disruptive because all consumers in the group pause consumption while the group coordinator reassigns partitions. This causes lag spikes. Incremental cooperative rebalancing (Kafka 2.4+) mitigates this by only pausing reassigned partitions.

Remember: Queues decouple services. Key choices: delivery guarantee, ordering, throughput.

Example: RabbitMQ(flexible), Kafka(high throughput), SQS(managed), Redis Streams(light).

3. What is the outbox pattern and what problem does it solve?

Show answer It solves the dual-write problem: how to atomically update a database and publish an event. Solution: in the same DB transaction as the business logic, write the event to an outbox table. A separate relay process reads the outbox and publishes to the broker. Uses CDC (Debezium) or polling. Eliminates the race condition between DB commit and broker publish.

Remember: Queues decouple services. Key choices: delivery guarantee, ordering, throughput.

Example: RabbitMQ(flexible), Kafka(high throughput), SQS(managed), Redis Streams(light).

4. What is the difference between classic mirrored queues and quorum queues in RabbitMQ?

Show answer Classic mirrored queues replicate to all nodes synchronously — high write amplification, deprecated in 3.9. Quorum queues use Raft-based replication requiring a majority quorum (2 of 3, 3 of 5 nodes) for writes. Quorum queues have better durability guarantees and perform more predictably under failures. Use quorum queues for all production queues.

Remember: RabbitMQ: AMQP. Exchange→Binding→Queue→Consumer. Ports: 5672, 15672(mgmt).

5. What is Kafka log compaction and when should you use it?

Show answer Log compaction retains only the latest value per key in a partition, discarding older records with the same key. Tombstones (null value) mark deletions. Use it for changelog streams where only current state matters: user profiles, config changes, inventory levels. Consumers can bootstrap state by replaying the compacted log. Set cleanup.policy=compact.

Remember: Queues decouple services. Key choices: delivery guarantee, ordering, throughput.

Example: RabbitMQ(flexible), Kafka(high throughput), SQS(managed), Redis Streams(light).

6. What is partition skew and how do you detect and fix it?

Show answer Partition skew is when some partitions receive far more messages than others due to a low-cardinality or hot partition key. Detection: compare per-partition lag and offsets — one partition with millions of messages, others near zero. Fix: choose a higher-cardinality key, add random suffix to hot keys, or use null key (round-robin) accepting loss of ordering guarantees.

Remember: Queues decouple services. Key choices: delivery guarantee, ordering, throughput.

Example: RabbitMQ(flexible), Kafka(high throughput), SQS(managed), Redis Streams(light).

7. What three conditions cause a message to be sent to a RabbitMQ dead-letter exchange?

Show answer Consumer nacks the message with requeue=false, the message TTL expires, or the queue length limit (x-max-length) is exceeded. The dead-letter exchange routes the message to the DLQ with the original routing key and death reason in the headers.

Remember: RabbitMQ: AMQP. Exchange→Binding→Queue→Consumer. Ports: 5672, 15672(mgmt).

8. What does acks=all mean in Kafka producer configuration and why is it important?

Show answer The producer waits for acknowledgment from all in-sync replicas (ISR) before considering the write successful. This is the maximum durability guarantee — no data loss even if the leader fails immediately after ack. Without acks=all (e.g., acks=1), a leader failure after ack but before replication causes silent data loss. Always use acks=all combined with enable.idempotence=true for important topics.

Remember: Queues decouple services. Key choices: delivery guarantee, ordering, throughput.

Example: RabbitMQ(flexible), Kafka(high throughput), SQS(managed), Redis Streams(light).

9. What are three practical strategies for making a message consumer idempotent?

Show answer 1. Idempotency table: store processed message IDs in the DB in the same transaction as business logic. 2. Conditional update: UPDATE ... WHERE current_state = expected_pre_state — safe to run twice. 3. Redis SET NX EX: fast short-lived deduplication window for non-financial events. Choose based on durability requirements and acceptable dedup window.

Remember: Queues decouple services. Key choices: delivery guarantee, ordering, throughput.

Example: RabbitMQ(flexible), Kafka(high throughput), SQS(managed), Redis Streams(light).

10. How do you prevent consumer group rebalance storms in Kafka?

Show answer Increase max.poll.interval.ms to exceed worst-case processing time. Reduce max.poll.records to stay under the interval. Use CooperativeStickyAssignor for incremental rebalancing. Set group.instance.id (static membership) to avoid rebalance on expected consumer restarts.

Remember: Queues decouple services. Key choices: delivery guarantee, ordering, throughput.

Example: RabbitMQ(flexible), Kafka(high throughput), SQS(managed), Redis Streams(light).

11. What is the difference between messages_ready and messages_unacknowledged in RabbitMQ?

Show answer messages_ready: messages waiting to be delivered to a consumer. messages_unacknowledged: messages delivered to consumers but not yet acked. High unacknowledged with low ready indicates consumers received messages but are slow to process or have crashed without acking — those messages will return to ready state when the channel closes.

Remember: RabbitMQ: AMQP. Exchange→Binding→Queue→Consumer. Ports: 5672, 15672(mgmt).

12. How does Kafka guarantee message ordering and what breaks that guarantee?

Show answer Kafka guarantees ordering within a single partition. All messages with the same partition key land on the same partition, so per-key ordering is preserved. Ordering breaks when: partition count changes (keys re-route to new partitions), null key is used (round-robin, no ordering), or a consumer reads from multiple partitions (cross-partition order not guaranteed).

Remember: Queues decouple services. Key choices: delivery guarantee, ordering, throughput.

Example: RabbitMQ(flexible), Kafka(high throughput), SQS(managed), Redis Streams(light).

13. What is the consumer pause/resume pattern for backpressure in Kafka?

Show answer When a downstream dependency is overloaded, the consumer calls consumer.pause(assignment()) to stop fetching new records from the broker. After the downstream recovers, it calls consumer.resume(assignment()). This prevents the consumer from accumulating a large local buffer while waiting, and avoids triggering max.poll.interval.ms timeouts.

Remember: Backpressure = slow producers when consumers lag. Without it → OOM.

14. What command shows per-partition consumer lag for a Kafka consumer group?

Show answer kafka-consumer-groups.sh --bootstrap-server kafka:9092 --describe --group . Output columns: TOPIC, PARTITION, CURRENT-OFFSET, LOG-END-OFFSET, LAG, CONSUMER-ID. LAG column shows messages produced but not yet consumed per partition.

Remember: Queues decouple services. Key choices: delivery guarantee, ordering, throughput.

Example: RabbitMQ(flexible), Kafka(high throughput), SQS(managed), Redis Streams(light).

15. What is the risk of setting a short message TTL on a queue or topic?

Show answer During a consumer outage, messages expire before the consumer recovers. Important business events (orders, payments, audit records) are permanently lost. TTL should be set based on consumer recovery SLA, not storage budget. Alert when consumer lag time approaches retention time (e.g., lag_time > 70% of retention_ms).

Remember: Queues decouple services. Key choices: delivery guarantee, ordering, throughput.

Example: RabbitMQ(flexible), Kafka(high throughput), SQS(managed), Redis Streams(light).

🔴 Hard (6)

1. Explain the read-process-write transaction pattern in Kafka for exactly-once semantics.

Show answer Producer uses transactional.id and calls beginTransaction(). Consumer reads with isolation.level=read_committed. Within a transaction: read from input topic, process, produce to output topic, commit consumer offset as part of the same transaction via sendOffsetsToTransaction(). Call commitTransaction(). If any step fails, abortTransaction() rolls back. Consumers with read_committed only see committed records. Overhead: ~20% throughput reduction, added latency per transaction.

Remember: Queues decouple services. Key choices: delivery guarantee, ordering, throughput.

Example: RabbitMQ(flexible), Kafka(high throughput), SQS(managed), Redis Streams(light).

2. Why does the outbox pattern require change data capture (CDC) rather than polling for high-throughput systems?

Show answer Polling introduces latency (poll interval), creates thundering-herd load spikes on the outbox table, and requires careful handling of concurrent writers and readers. CDC (e.g., Debezium) tails the database WAL and streams row changes in near real-time with minimal DB overhead. CDC scales to millions of rows/sec, has sub-second latency, and is transactionally consistent with the source. Polling is acceptable for low-throughput systems; CDC is required above hundreds of events/second.

Remember: Queues decouple services. Key choices: delivery guarantee, ordering, throughput.

Example: RabbitMQ(flexible), Kafka(high throughput), SQS(managed), Redis Streams(light).

3. What are the operational consequences of increasing Kafka partition count on a live topic?

Show answer Adding partitions triggers a consumer group rebalance. Messages with existing keys may be re-routed to new partitions (partition = hash(key) mod partition_count changes), breaking per-key ordering for in-flight messages. Existing messages on old partitions are unaffected but new messages for the same key may land on a different partition. This means temporal ordering is broken across the partition count change boundary. Plan partition count before go-live; treat live increases as a major operational event.

Remember: Queues decouple services. Key choices: delivery guarantee, ordering, throughput.

Example: RabbitMQ(flexible), Kafka(high throughput), SQS(managed), Redis Streams(light).

4. What is incremental cooperative rebalancing in Kafka and how does it differ from eager rebalancing?

Show answer Eager (default, pre-2.4): all consumers in the group revoke all partitions, stop consuming, wait for the coordinator to reassign, then resume. Causes a full stop-the-world pause for all consumers regardless of which partitions actually moved. Incremental cooperative (CooperativeStickyAssignor, 2.4+): only the partitions being reassigned are revoked. Consumers keep all other partitions and continue processing. Multiple rounds of incremental assignment. Dramatically reduces pause time for large consumer groups. Requires all consumers in the group to use the cooperative assignor simultaneously.

Remember: Queues decouple services. Key choices: delivery guarantee, ordering, throughput.

Example: RabbitMQ(flexible), Kafka(high throughput), SQS(managed), Redis Streams(light).

5. How does the Outbox pattern differ from using a distributed transaction (two-phase commit) to guarantee DB+broker consistency?

Show answer 2PC requires both the DB and the broker to participate in a distributed transaction with a coordinator. This creates tight coupling, blocking locking, and catastrophic availability impact if the coordinator fails. It is also not supported by most modern cloud-native brokers. The outbox pattern avoids distributed transactions entirely: the event is first written atomically to the local DB (guaranteed by single-node ACID), then delivered to the broker in a separate step. At-least-once delivery from the relay is handled by idempotent consumers. No cross-system coordinator required.

6. What causes blocking consumer thread syndrome in RabbitMQ and how does prefetch interact with it?

Show answer Consumer threads block on synchronous external calls (DB queries, HTTP calls). With prefetch_count=N, the broker pushes N messages to the consumer immediately. If all N consumer threads block, no messages are acked. After heartbeat_timeout, the broker considers the consumer unhealthy and closes the channel. All N unacked messages return to ready state and are redelivered. Fix: set prefetch equal to the number of non-blocking worker threads, add timeouts to all external calls, use circuit breakers, and prefer async I/O. A consumer that blocks silently destroys throughput and causes phantom redeliveries.