Portal | Level: L1: Foundations | Topics: MongoDB Operations | Domain: DevOps & Tooling

MongoDB Operations — Primer¶

Why This Matters¶

MongoDB is widely deployed for document-oriented workloads: user profiles, event streams, product catalogs, real-time analytics. As an operator you need to understand replica set health, what happens during elections, how to read explain() output, how sharding affects query routing, and how to back up and restore safely. MongoDB's operational model differs significantly from relational databases — understanding those differences prevents outages.

Core Concepts¶

1. Document Model and BSON¶

Name origin: BSON stands for "Binary JSON." MongoDB Inc. (originally 10gen, founded 2007) derived the database name from "humongous" — reflecting its design goal of handling massive datasets.

MongoDB stores documents (JSON-like, actually BSON) in collections. No fixed schema per document, though your application enforces structure.

// Sample document
{
  "_id": ObjectId("6579a4b8c3e21a0012345678"),  // auto-generated if not provided
  "username": "alice",
  "email": "alice@example.com",
  "created_at": ISODate("2024-01-15T10:00:00Z"),
  "profile": {
    "age": 32,
    "city": "Berlin"
  },
  "tags": ["admin", "beta"],
  "last_login": null
}

BSON types that matter operationally: - ObjectId: 12-byte unique ID (timestamp + machine + pid + counter). Not a UUID. - ISODate: UTC timestamp — always store dates as ISODate, never strings. - NumberLong vs NumberInt: JSON numbers default to Double in some drivers; be explicit. - BinData: for binary (e.g., UUIDs, encrypted values).

2. Replica Sets¶

A replica set is 3+ mongod instances (odd number for majority quorum). One is primary, the rest are secondaries.

Primary ←── replication (oplog) ──► Secondary 1
                                 ──► Secondary 2

Election: When the primary goes down, secondaries detect its absence via heartbeats (every 2 seconds). They elect a new primary via Raft-like voting. Requires a majority of votes (2 of 3 in a 3-node set). Election takes 10-30 seconds typically.

// Check replica set status
rs.status()
// Key fields:
// - set: replica set name
// - members[n].stateStr: PRIMARY / SECONDARY / ARBITER / RECOVERING
// - members[n].health: 1 = healthy, 0 = unreachable
// - members[n].optimeDate: time of last applied oplog entry
// - members[n].lag: seconds behind primary

// Initiate a replica set (first-time setup)
rs.initiate({
  _id: "rs0",
  members: [
    { _id: 0, host: "mongo1:27017" },
    { _id: 1, host: "mongo2:27017" },
    { _id: 2, host: "mongo3:27017" }
  ]
})

// Add a member
rs.add("mongo4:27017")
rs.addArb("mongo-arbiter:27017")  // arbiter: votes but holds no data

// Step down the current primary (triggers election)
rs.stepDown(60)   // 60 seconds before eligible to be primary again

// Check oplog size and usage
use local
db.oplog.rs.stats()
db.oplog.rs.find().sort({$natural:-1}).limit(1)  // latest oplog entry

Oplog: The oplog is a capped collection in the local database that records all writes to the primary. Secondaries tail this collection to replicate. Oplog window = how far back in time a secondary can fall behind and still recover without a full resync. Tune with --oplogSize or replication.oplogSizeMB.

Read Preference: Controls which replica set member handles reads.

// Options:
// primary (default): all reads from primary
// primaryPreferred: reads from primary if available, else secondary
// secondary: always reads from secondaries (may be stale)
// secondaryPreferred: secondaries preferred
// nearest: lowest network latency

// Set in connection string
mongodb://mongo1,mongo2,mongo3/mydb?replicaSet=rs0&readPreference=secondaryPreferred

// Set in pymongo
from pymongo import MongoClient, ReadPreference
client = MongoClient(hosts, readPreference=ReadPreference.SECONDARY_PREFERRED)

3. Sharding¶

Sharding distributes data across multiple replica sets (shards). Required when a single replica set can't hold or serve all data.

┌─────────────────────────────────────────┐
│                mongos (router)           │  ← application connects here
└──────────────────┬──────────────────────┘
                   │ consults config servers
┌──────────────────▼──────────────────────┐
│            Config Servers (replica set)  │  ← stores chunk metadata
└──────┬──────────────────────┬───────────┘
       │                      │
  ┌────▼────┐            ┌────▼────┐
  │ Shard 1 │            │ Shard 2 │   ← each is a replica set
  │ (rs1)   │            │ (rs2)   │
  └─────────┘            └─────────┘

Shard Key: Determines which shard a document goes to. The most important design decision in a sharded cluster.

// Enable sharding on a database
sh.enableSharding("mydb")

// Shard a collection on a field
sh.shardCollection("mydb.orders", { customer_id: 1 })    // ranged
sh.shardCollection("mydb.events", { event_id: "hashed" }) // hashed

// Check shard distribution
sh.status()
db.orders.getShardDistribution()

// Balancer — moves chunks between shards for even distribution
sh.isBalancerRunning()
sh.stopBalancer()    // stop during maintenance
sh.startBalancer()

Ranged vs Hashed sharding: - Ranged: adjacent documents on the same shard — good for range queries, bad for monotonically increasing keys (hotspot on last shard) - Hashed: even distribution — good for insert throughput, bad for range queries (scatter-gather across all shards)

Avoid hotspot keys: _id with ObjectId, timestamps, auto-incrementing IDs all write to a single shard (the "high chunk"). Use compound shard keys or hashed sharding to distribute.

Gotcha: The shard key is immutable after collection creation (MongoDB < 5.0) and very expensive to change (MongoDB 5.0+ supports reshardCollection but it copies the entire dataset). Get the shard key right before sharding — changing it later is a migration, not a config change.

4. WiredTiger Storage Engine¶

WiredTiger is the default storage engine since MongoDB 3.2. It provides: - Document-level concurrency (not collection-level like old MMAPv1) - Compression (snappy by default, zlib/zstd available) - Internal cache: controls memory usage

# mongod.conf
storage:
  dbPath: /var/lib/mongodb
  engine: wiredTiger
  wiredTiger:
    engineConfig:
      cacheSizeGB: 8          # default: 50% of RAM minus 1 GB, min 256 MB
      journalCompressor: snappy
    collectionConfig:
      blockCompressor: snappy  # zlib for better compression ratio
    indexConfig:
      prefixCompression: true

// Check WiredTiger cache usage
db.serverStatus().wiredTiger.cache
// Key metrics:
// "bytes currently in the cache" vs "maximum bytes configured"
// "pages read into cache" and "pages written from cache" — high values indicate pressure
// "tracked dirty bytes in the cache" — high means write pressure

5. explain() and Query Analysis¶

// Basic explain — query plan only
db.orders.explain().find({ customer_id: "abc123", status: "pending" })

// Execution stats — actually runs the query and reports real metrics
db.orders.explain("executionStats").find({ customer_id: "abc123" })

// All plans — shows winning and rejected plans
db.orders.explain("allPlansExecution").find({ customer_id: "abc123" })

Key fields in executionStats:

{
  "executionSuccess": true,
  "nReturned": 5,
  "executionTimeMillis": 12,
  "totalKeysExamined": 5,        // index keys scanned
  "totalDocsExamined": 5,        // documents fetched from collection
  "executionStages": {
    "stage": "FETCH",            // COLLSCAN = full scan (bad), IXSCAN = index scan (good)
    "inputStage": {
      "stage": "IXSCAN",
      "keyPattern": { "customer_id": 1, "status": 1 },
      "indexName": "customer_id_1_status_1"
    }
  }
}

Optimal query: totalKeysExamined == nReturned (no wasted index scans) and stage is IXSCAN not COLLSCAN.

Remember: The mnemonic for a healthy explain plan is "Keys equals Returns, Stage says Scan." If totalKeysExamined is much larger than nReturned, your index is scanning more entries than it needs — consider refining the index or adding a compound index that better matches the query predicate.

6. Index Types¶

// Single field
db.orders.createIndex({ customer_id: 1 })                // ascending
db.orders.createIndex({ created_at: -1 })                // descending (good for sort)

// Compound index — order matters
db.orders.createIndex({ customer_id: 1, status: 1, created_at: -1 })
// Covers queries on: (customer_id), (customer_id, status), (customer_id, status, created_at)
// Does NOT cover: (status), (created_at), (status, created_at)

// Sparse index — only indexes documents where the field exists
db.users.createIndex({ invite_code: 1 }, { sparse: true })

// TTL index — auto-delete documents after expiry
db.sessions.createIndex({ created_at: 1 }, { expireAfterSeconds: 3600 })

// Text index — full-text search
db.articles.createIndex({ title: "text", body: "text" })
db.articles.find({ $text: { $search: "mongodb performance" } })

// 2dsphere — geospatial queries
db.venues.createIndex({ location: "2dsphere" })
db.venues.find({
  location: {
    $near: {
      $geometry: { type: "Point", coordinates: [13.4050, 52.5200] },
      $maxDistance: 1000   // meters
    }
  }
})

// Partial index — only index documents matching a filter
db.orders.createIndex(
  { customer_id: 1, created_at: -1 },
  { partialFilterExpression: { status: "active" } }
)

// List indexes
db.orders.getIndexes()

// Index stats — usage counters
db.orders.aggregate([{ $indexStats: {} }])
// Use this to find unused indexes that waste memory and slow writes

7. Backup and Restore¶

mongodump / mongorestore — logical backup, BSON format.

# Dump entire instance
mongodump --uri="mongodb://user:pass@host:27017" --out=/backup/$(date +%F)

# Dump with oplog (point-in-time snapshot for replica sets)
mongodump --uri="mongodb://host:27017" --oplog --out=/backup/$(date +%F)

# Dump single database
mongodump --uri="mongodb://host:27017/mydb" --out=/backup/mydb_$(date +%F)

# Restore entire instance
mongorestore --uri="mongodb://host:27017" /backup/2024-01-15/

# Restore single collection
mongorestore --uri="mongodb://host:27017" \
  --nsInclude="mydb.orders" /backup/2024-01-15/

# Restore with oplog (replay oplog entries to reach a point in time)
mongorestore --uri="mongodb://host:27017" \
  --oplogReplay /backup/2024-01-15/

# Drop and restore (for full overwrites)
mongorestore --uri="mongodb://host:27017" --drop /backup/2024-01-15/

Atlas backups vs self-hosted: Atlas provides continuous cloud backups with PITR (point-in-time recovery) out of the box. Self-hosted setups use mongodump (slow, logical) or filesystem snapshots (fast, but must be consistent — freeze writes first or snapshot all replica set members at the same oplog position).

8. currentOp and killOp¶

// Show all currently running operations
db.currentOp()

// Filter: long-running ops over 5 seconds
db.currentOp({ "secs_running": { $gt: 5 } })

// Filter: by operation type
db.currentOp({ "op": "query" })   // "query", "insert", "update", "remove", "command"

// Filter: show index builds
db.currentOp({ "msg": { $exists: true } })

// Kill an operation (use opid from currentOp output)
db.killOp(12345)

// Find and kill all slow operations over 60 seconds
var ops = db.currentOp({ secs_running: { $gt: 60 } }).inprog;
ops.forEach(function(op) {
  print("Killing opid: " + op.opid + " running for " + op.secs_running + "s");
  db.killOp(op.opid);
});

9. Connection Pooling¶

MongoDB drivers maintain a connection pool per mongod/mongos. The pool is per host+port+auth combination.

# Python (pymongo) — connection pool tuning
from pymongo import MongoClient

client = MongoClient(
    "mongodb://user:pass@mongo1,mongo2,mongo3/mydb?replicaSet=rs0",
    maxPoolSize=50,         # max connections per mongos/mongod (default 100)
    minPoolSize=5,          # pre-created connections
    maxIdleTimeMS=60000,    # close idle connections after 60s
    waitQueueTimeoutMS=5000 # raise error if no connection available after 5s
)

// Check server connection stats
db.serverStatus().connections
// currentOp: active connections
// available: remaining connections before limit
// totalCreated: lifetime created connections

// Connection limit per mongod (default 1,000,000 — effectively unlimited)
// Real limit comes from OS ulimits
db.adminCommand({ getParameter: 1, maxConns: 1 })

10. Monitoring Commands¶

// Cluster-wide overview
rs.status()           // replica set health
sh.status()           // sharding status (if sharded)
db.serverStatus()     // extensive metrics dump

// Key serverStatus sections:
db.serverStatus().connections      // connection pool
db.serverStatus().opcounters       // insert/query/update/delete/command rates
db.serverStatus().wiredTiger.cache // memory pressure
db.serverStatus().repl             // replication info
db.serverStatus().locks            // lock stats (pre-WiredTiger mostly obsolete)

// Per-collection stats
db.orders.stats()
db.orders.stats({ scale: 1048576 })  // in MB

// Database stats
db.stats()

# mongostat — real-time stats (like vmstat for MongoDB)
mongostat --uri="mongodb://user:pass@host:27017" 1   # every 1 second

# mongotop — per-collection read/write time
mongotop --uri="mongodb://user:pass@host:27017" 5    # every 5 seconds

# Log slow queries (ops > 100ms)
# In mongod.conf:
# operationProfiling:
#   mode: slowOp
#   slowOpThresholdMs: 100
# Or at runtime:
db.setProfilingLevel(1, { slowms: 100 })

# Query the profiler collection
db.system.profile.find({ millis: { $gt: 100 } }).sort({ ts: -1 }).limit(10).pretty()

11. Common Aggregation Pipeline Patterns¶

// Count + group by status
db.orders.aggregate([
  { $match: { created_at: { $gte: ISODate("2024-01-01") } } },
  { $group: { _id: "$status", count: { $sum: 1 }, total: { $sum: "$amount" } } },
  { $sort: { count: -1 } }
])

// Join (lookup) — like a LEFT JOIN
db.orders.aggregate([
  { $lookup: {
    from: "customers",
    localField: "customer_id",
    foreignField: "_id",
    as: "customer"
  }},
  { $unwind: "$customer" },
  { $project: { "customer_id": 1, "amount": 1, "customer.name": 1 } }
])

// Bucket — histogram-like grouping
db.orders.aggregate([
  { $bucket: {
    groupBy: "$amount",
    boundaries: [0, 10, 50, 100, 500, 1000],
    default: "over_1000",
    output: { count: { $sum: 1 }, total: { $sum: "$amount" } }
  }}
])

// Facet — multiple aggregations in one pass
db.products.aggregate([
  { $facet: {
    "by_category": [{ $group: { _id: "$category", count: { $sum: 1 } } }],
    "by_price_range": [{ $bucket: { groupBy: "$price", boundaries: [0,10,50,100], default: "other" } }],
    "total_count": [{ $count: "n" }]
  }}
])

// Time series — group by hour
db.events.aggregate([
  { $match: { ts: { $gte: ISODate("2024-01-15") } } },
  { $group: {
    _id: { $dateToString: { format: "%Y-%m-%dT%H:00:00", date: "$ts" } },
    count: { $sum: 1 }
  }},
  { $sort: { _id: 1 } }
])

Quick Reference¶

# Connect
mongosh "mongodb://user:pass@host:27017/mydb"
mongosh "mongodb://mongo1,mongo2,mongo3/mydb?replicaSet=rs0"

# Replica set health
rs.status()
rs.printReplicationInfo()    # primary oplog window
rs.printSlaveReplicationInfo()  # secondary lag

# Basic CRUD
db.collection.findOne({ field: "value" })
db.collection.find({ status: "active" }).limit(10).pretty()
db.collection.insertOne({ key: "value", created: new Date() })
db.collection.updateOne({ _id: id }, { $set: { status: "done" } })
db.collection.deleteOne({ _id: id })

# Admin
db.adminCommand({ listDatabases: 1 })
db.currentOp({ secs_running: { $gt: 5 } })
db.setProfilingLevel(1, { slowms: 50 })

Prerequisites¶

Database Operations on Kubernetes (Topic Pack, L2)