RabbitMQ Footguns¶

1. Prefetch Count of 0 — One Consumer Hogs All Messages¶

You leave basic_qos unset (prefetch_count = 0), which means unlimited prefetch. When multiple consumers are running, the first consumer to connect receives all queued messages and buffers them locally. Other consumers sit idle while the first one is overwhelmed. Fix: Always set channel.basic_qos(prefetch_count=N) where N is a number your consumer can process concurrently (10-50 is a common starting point). This ensures messages are distributed fairly across consumers and re-delivered if a consumer dies.

2. Deleting a Queue While Messages Are In-Flight¶

During maintenance, you delete a queue that still has active consumers and unacknowledged messages. All in-flight messages are dropped immediately — there's no recycle bin. If your consumers don't have idempotent processing, you get data loss. Fix: Drain the queue to zero messages before deleting. Use rabbitmqctl purge_queue only when you intend to discard messages. For production queues, use the management UI to verify messages_ready and messages_unacknowledged are both 0 before deletion.

3. Classic Mirrored Queues Cause Split-Brain Data Loss¶

You use the deprecated ha-all policy (classic mirrored queues) for durability. During a network partition, RabbitMQ promotes mirrors to masters on both sides. After recovery, the cluster merges but one side's messages are lost because both sides accepted new messages as "master." Fix: Migrate to quorum queues (x-queue-type: quorum), which use Raft consensus to prevent split-brain. Quorum queues are the current recommended approach for durable, HA message storage. Set cluster_partition_handling = pause_minority in rabbitmq.conf as a safety net.

War story: In documented testing by Jack Vanlightly, a classic mirrored queue cluster lost ~40% of messages (39,189 out of 100,000) during a network partition. Both sides accepted writes as "master" and one side's messages were silently discarded on recovery. Quorum queues using Raft consensus make this scenario impossible — they refuse writes on the minority side.

4. Unbounded Queue Growth Exhausts Broker Memory¶

You don't set a max-length policy on queues. A consumer outage or slow consumer lets a queue grow to millions of messages, consuming all broker RAM. RabbitMQ triggers a memory alarm and blocks all publishers cluster-wide, cascading the failure. Fix: Set max-length policies on all queues: rabbitmqctl set_policy max-len "^" '{"max-length": 100000, "overflow": "reject-publish"}'. Use reject-publish overflow behavior so publishers get a rejection error instead of silently losing messages. Monitor queue depth and alert before the limit is hit.

Default trap: RabbitMQ's default memory high watermark is 40% of system RAM. When hit, it blocks ALL publishers cluster-wide — not just the offending queue's publishers. One runaway queue can halt message processing for your entire organization. The vm_memory_high_watermark alarm is cluster-scoped, not queue-scoped.

5. Using Transient Queues and Messages for Important Data¶

You declare queues without durable=true and publish messages without delivery_mode=2 (persistent). If RabbitMQ restarts, both the queue definition and all messages are gone. This is the correct behavior for truly ephemeral data, but a footgun when used for business-critical messages. Fix: For any data you can't afford to lose: declare queues with durable=True, bind to durable exchanges, and publish with delivery_mode=2. Accept the performance overhead — persistent messages are written to disk before acknowledgment. Use transient mode intentionally, not by default.

6. Not Configuring a Dead Letter Exchange Means Rejected Messages Vanish¶

A consumer rejects a message with basic_nack(requeue=False) because it can't process it. Without a Dead Letter Exchange (DLX), the message is silently discarded. You have no way to inspect what failed or retry it later. Fix: Always configure a DLX for queues that process important messages. Create a *.dlq queue bound to a dead-letter exchange, and set the policy: rabbitmqctl set_policy dlx "^" '{"dead-letter-exchange": "main.dlx"}'. Monitor DLQ depth as an operational metric — growth indicates systematic processing failures.

7. Publishing to a Non-Existent Exchange Silently Drops Messages¶

You publish a message to an exchange that hasn't been declared yet. In AMQP, publishing to a non-existent exchange with the basic publish method returns no error — the broker silently drops the message. Your application thinks it succeeded. Fix: Enable publisher confirms (channel.confirm_select()) and handle returned/nacked messages. Use mandatory=True to get messages returned if they can't be routed. Declare exchanges and queues at application startup and verify they exist before publishing. Use passive declarations to check without creating.

8. Running RabbitMQ With Default Credentials in Production¶

You deploy RabbitMQ and forget to change the default guest/guest account. The guest user can only connect from localhost by default, so it's not immediately exploitable — but operators often work around this by enabling guest remote access. Now the broker is accessible with known credentials. Fix: Create application-specific users with minimal permissions (set_permissions scoped to specific vhosts and patterns). Delete or disable the guest user. Enable the management plugin with TLS. Rotate credentials regularly and inject them via secrets management, not config files.

Debug clue: The guest user can only connect from localhost by default (loopback_users.guest = true). If you see ACCESS_REFUSED - Login was refused using authentication mechanism PLAIN from a remote host, it is the loopback restriction, not a wrong password. Create a dedicated user instead of enabling remote guest access.

9. Ignoring Disk Free Alarms Until the Broker Stops Accepting Writes¶

RabbitMQ monitors disk space and blocks publishers when free disk drops below a threshold (default: 50MB or 5% of RAM, whichever is larger). If you're not alerting on disk free, you discover this condition when applications stop processing messages. Fix: Alert when disk free falls below 20% on the RabbitMQ data volume (/var/lib/rabbitmq). Monitor the disk_free_alarm metric via the management API. Persistent messages, Mnesia database, and log files all compete for the same disk. Size the volume generously relative to queue depth expectations.