Quiz: Kafka¶

15 questions

L1 (8 questions)¶

1. What is a Kafka partition and why does partition count matter?

Show answer

A partition is an ordered, immutable log within a topic. Partitions enable parallelism — each partition can be consumed by one consumer in a group. More partitions = higher throughput but more overhead (file handles, memory, leader elections). You can increase partitions but never decrease them.

2. What is a Kafka consumer group and how does rebalancing work?

Show answer

A consumer group is a set of consumers that divide partitions among themselves. Each partition is assigned to exactly one consumer in the group. Rebalancing occurs when a consumer joins/leaves — partitions are redistributed. During rebalance, consumption pauses briefly, which can cause lag spikes.

3. How does Kafka guarantee message ordering?

Show answer

Ordering is guaranteed within a single partition only. Messages with the same key go to the same partition (via hash). Cross-partition ordering is NOT guaranteed. If you need global ordering, use a single partition (but lose parallelism). Design your key strategy around ordering requirements.

4. What is Kafka retention and how does it differ from compaction?

Show answer

Retention: messages deleted after a time (retention.ms) or size (retention.bytes) threshold — entire log segments are dropped. Compaction: keeps only the latest value per key — useful for changelog/state topics. A topic can use either or both. Compaction never deletes the latest message for any key.

5. What happens when a Kafka broker fails?

Show answer

Partitions led by that broker elect new leaders from the in-sync replica set (ISR). Producers and consumers switch to the new leaders automatically. If the failed broker was the only ISR member, the partition is unavailable until it recovers (or unclean leader election is enabled, risking data loss).

6. What is the role of ZooKeeper in Kafka (and what replaces it)?

Show answer

ZooKeeper: stores cluster metadata, manages broker registration, elects controllers, tracks ISR lists. KRaft (Kafka Raft) replaces ZooKeeper (GA in Kafka 3.3+) — metadata is stored in an internal Kafka topic, simplifying operations. New clusters should use KRaft; existing clusters should plan migration.

7. What is the minimum configuration for a production Kafka cluster?

Show answer

At least 3 brokers (for replication factor 3). replication.factor=3 and min.insync.replicas=2 on critical topics. Dedicated disks (not shared with OS). JVM heap tuned (6-8GB typical). Monitoring: under-replicated partitions, consumer lag, broker disk/CPU. Separate ZooKeeper ensemble (3 or 5 nodes) or use KRaft.

8. What are acks settings in Kafka producers and their tradeoffs?

Show answer

acks=0: fire-and-forget, fastest, may lose messages. acks=1: leader acknowledges, good throughput, may lose if leader fails before replication. acks=all (-1): all ISR replicas acknowledge, safest, slower. For critical data use acks=all with min.insync.replicas=2.

L2 (7 questions)¶

1. Consumer lag is increasing steadily. What do you investigate?

Show answer

1. Consumer throughput vs producer rate.
2. Consumer processing time per message (slow downstream calls?).
3. Consumer group rebalancing too frequently.
4. Not enough consumers (fewer than partitions).
5. GC pauses in the consumer.
6. Deserialization errors causing retries.
7. max.poll.records too high causing session timeouts.

2. What is the difference between at-least-once, at-most-once, and exactly-once in Kafka?

Show answer

At-most-once: commit offset before processing (may lose messages). At-least-once: process then commit (may duplicate — default). Exactly-once: use idempotent producer (enable.idempotence=true) + transactional consumer-producer. Exactly-once has performance overhead. Most systems use at-least-once with idempotent consumers.

3. A Kafka broker is under-replicated. What do you check?

Show answer

1. Broker is up and connected to the cluster.
2. Network between brokers is healthy (latency, packet loss).
3. Broker disk I/O — slow disks delay replication.
4. Broker is overloaded (too many partitions as leader).
5. Check ISR (in-sync replicas) shrinkage in broker logs.
6. replica.lag.time.max.ms may be too tight.

4. How do you safely increase Kafka partitions for a topic?

Show answer

kafka-topics --alter --partitions N. Existing data is NOT redistributed — only new messages go to new partitions. Key-based routing changes (messages with the same key may go to a different partition). Consumer group must rebalance. You cannot decrease partitions. Plan partition count carefully upfront.

5. A Kafka producer reports NotLeaderForPartitionException intermittently. What is happening?

Show answer

The partition leader changed (broker restart, rebalance, or preferred leader election) and the producer's metadata cache is stale. The producer should refresh metadata and retry automatically (retries > 0). If persistent: check broker stability, controller logs, and under-replicated partitions for frequent leader changes.

6. How do you handle poison messages (messages that cause consumer failures)?

Show answer

1. Wrap processing in try-catch and send failures to a dead-letter topic (DLT).
2. Log the problematic message for debugging.
3. Commit the offset to skip past the poison message.
4. Set up alerts on DLT message count.
5. Never let a single bad message block an entire partition indefinitely.

7. How do you monitor a Kafka cluster effectively?

Show answer

Key metrics:
1. Under-replicated partitions (> 0 = problem).
2. Consumer group lag.
3. Request latency (produce, fetch).
4. Broker disk usage and I/O.
5. Active controller count (exactly 1).
6. ISR shrink/expand rate.
7. Network handler idle ratio. Tools: Burrow, Kafka Exporter + Prometheus, Confluent Control Center.