Skip to content

MongoDB Operations Footguns


1. Choosing a monotonically increasing shard key

You shard a collection on { created_at: 1 } or { _id: 1 } (ObjectId). All new inserts go to the shard holding the last chunk. That shard becomes a hotspot — 100% of write load while other shards sit idle. The balancer moves chunks after the fact, but writes still land on one shard.

Fix: Use a hashed shard key for insert-heavy workloads, or a compound key where the high-cardinality, evenly-distributed field comes first. For ObjectId: sh.shardCollection("mydb.events", { _id: "hashed" }).

Gotcha: Once a collection is sharded, you cannot change the shard key (prior to MongoDB 5.0's reshardCollection). On 4.x and earlier, fixing a bad shard key means dumping, dropping, re-sharding, and restoring the entire collection. Choose the shard key carefully — it is the most permanent decision in your schema.


2. Not sizing the oplog for your maintenance windows

Your oplog holds 24 hours of writes. You do a planned MongoDB upgrade that takes 30 hours (rolling restart, slow secondary sync). The secondary that was down for 20 hours tries to reconnect but the oplog has rolled past its last known position. It triggers a full initial sync — which takes another 6 hours.

Fix: Size the oplog based on your worst-case maintenance window. A conservative rule: oplog window = 2× your longest downtime scenario. For active clusters: rs.printReplicationInfo() to check current window, resize with db.adminCommand({ replSetResizeOplog: 1, size: <MB> }).


3. Running mongodump from the primary on a large dataset

Your backup script runs mongodump --host primary-host .... On a 500 GB collection, the dump reads all data through the primary's buffer cache, evicting hot working set pages and causing read latency to spike for all queries. The primary CPU goes to 90% for 4 hours during business hours.

Fix: Always dump from a secondary: add ?readPreference=secondary to the URI, or use --readPreference=secondary. Secondaries serve reads without impacting primary write throughput.


4. Using db.collection.drop() to free space, then being surprised it's still on disk

You drop a large collection expecting disk space to be returned. MongoDB drops the collection but (in WiredTiger) the freed space goes back to WiredTiger's internal cache, not to the OS. The data directory stays the same size.

Fix: To reclaim OS disk space, run db.runCommand({ compact: "collection_name" }) (blocks writes on that collection) or do a mongodump + rebuild + mongorestore. Or, for a sharded cluster, the recommended approach is to move chunks off a shard, drain it, and remove it.


5. Forgetting $set in an update — replacing the whole document

You run db.users.updateOne({ _id: id }, { status: "active" }) instead of db.users.updateOne({ _id: id }, { $set: { status: "active" } }). Without $set, MongoDB replaces the entire document with { status: "active" } — you just deleted every other field (name, email, created_at, everything).

Fix: Always use $set (or $inc, $push, etc.) for partial updates. Only use a replacement-style update when you explicitly want to overwrite the entire document. In pymongo, use update_one with an operator:

collection.update_one({"_id": user_id}, {"$set": {"status": "active"}})

Default trap: The mongosh shell and most drivers do not warn when you omit $set. The operation succeeds silently — it replaces the document, returns modifiedCount: 1, and looks correct until someone queries the other fields. Code review should flag any updateOne/updateMany call where the second argument does not start with a $ operator.


6. Not setting writeConcern: majority for critical writes

Your application uses the default write concern (w: 1 — acknowledged by primary only). The primary accepts the write, confirms it to the application, then crashes before the write replicates to a secondary. A new primary is elected from the secondaries. The write is lost. Your payment record is gone.

Fix: For critical data, use w: "majority" write concern:

db.payments.insertOne(
  { amount: 1000, user: "alice" },
  { writeConcern: { w: "majority", wtimeout: 5000 } }
)
Set it in the connection string or client default for the entire application. Accept the latency tradeoff (~2× latency for replication acknowledgment).


7. Creating indexes in the foreground on a production collection (pre-4.2)

On MongoDB < 4.2, db.collection.createIndex() by default runs as a foreground build. The entire database is locked for reads and writes until the index build completes. On a 100 GB collection this can be hours.

Fix: On MongoDB < 4.2, always pass { background: true }. On MongoDB 4.2+, all index builds are online by default (hybrid build). On MongoDB 4.4+, index builds take a brief exclusive lock only at the start and end, not during.


8. Using $where or JavaScript in queries

You use db.users.find({ $where: "this.age > 30" }). This executes JavaScript for every document in the collection, disables index use entirely, and runs single-threaded. On a 10M document collection it takes minutes.

Fix: Never use $where in production. Use native query operators: db.users.find({ age: { $gt: 30 } }). This uses indexes and is orders of magnitude faster.


9. Storing large files or blobs as document fields

You store user profile photos as base64 strings in a photo field of a user document. The average document grows from 1 KB to 500 KB. Queries that fetch users now transfer 500 KB per user. The working set that fits in WiredTiger cache shrinks from millions of documents to thousands.

Fix: Use GridFS for files > 16 MB (MongoDB's hard limit per document). For smaller files that are frequently accessed, store them in object storage (S3, MinIO) and keep only the URL in MongoDB. Keep documents small — every field is loaded into the buffer cache on any access to that document.


10. Not enabling auth in development, then deploying to production with the same pattern

Your development MongoDB has no authentication. You containerize it and deploy to a staging environment where the port is accessible from other services. Someone discovers this, pivots to your database, and exfiltrates everything.

Fix: Always enable authentication, even in development. In mongod.conf:

security:
  authorization: enabled
For Docker: never expose MongoDB port publicly without authentication. Use connection strings with credentials. Rotate the admin password before production deploy.


11. Sharding with a field that has low cardinality

You shard on { country: 1 } because you query by country. But you only have 30 countries — so Ceph can only create at most 30 chunks. With 10 shards, several shards will share chunks for the same country, and jumbo chunks (too large to split because cardinality is exhausted) appear. The balancer can't rebalance them.

Fix: Shard keys need high cardinality. If you need country-based distribution, use a compound shard key: { country: 1, user_id: 1 } or { country: 1, created_at: 1 }. The second field provides the cardinality needed for chunk splitting.


12. Relying on _id index for all queries — not auditing unused indexes

You created 15 indexes over 2 years for various queries. Queries changed. Some indexes are never used. Each unused index consumes memory (index pages in WiredTiger cache) and slows every write (indexes must be updated on insert/update/delete).

Fix: Audit index usage regularly:

db.orders.aggregate([{ $indexStats: {} }]).toArray()
// Find indexes with accesses.ops == 0 (never used since last restart)
// Drop unused indexes
db.orders.dropIndex("old_index_name")
Do this as part of quarterly maintenance. Note: $indexStats resets on mongod restart — run it after the server has been up long enough for all query patterns to execute.