Dev Encyclopedia
ArticlesTools

Get notified when new content drops

No spam. Just new articles, tools, and updates straight to your inbox.

Dev Encyclopedia

A reference for builders

Content

  • Articles
  • Tools
  • Contact

Connect

  • support@devencyclopedia.com
  • RSS Feed

© 2026 Dev Encyclopedia

Privacy PolicyTermsDisclaimer
  1. Home
  2. /Blog
  3. /42 NoSQL Database Interview Questions and Answers (2026)
databases37 min read

42 NoSQL Database Interview Questions and Answers (2026)

42 NoSQL interview questions covering MongoDB, Redis, and DynamoDB: aggregation pipelines, data structures, GSI vs LSI, and CAP theorem. Updated for 2026.

By Dev EncyclopediaPublished June 10, 2026
On this page

On this page

  • Category 1: Core NoSQL Concepts (Q1-Q6)
  • Q1. What is NoSQL and how does it differ from a relational database?
  • Q2. What are the four main types of NoSQL databases?
  • Q3. What is the CAP theorem and how does it apply to NoSQL databases?
  • Q4. What is BASE and how does it differ from ACID?
  • Q5. When should you choose NoSQL over a relational database?
  • Q6. What is eventual consistency and what does it mean in practice?
  • Category 2: MongoDB (Q7-Q20)
  • Q7. What is MongoDB? Explain its core data model.
  • Q8. What is BSON and why does MongoDB use it instead of plain JSON?
  • Q9. How do you perform CRUD operations in MongoDB?
  • Q10. How does indexing work in MongoDB and what index types exist?
  • Q11. Explain the MongoDB aggregation pipeline.
  • Q12. What is the difference between embedded documents and references in MongoDB schema design?
  • Q13. What is sharding in MongoDB and how does it work?
  • Q14. What is a replica set in MongoDB and what roles do nodes play?
  • Q15. Does MongoDB support ACID transactions? How do they work?
  • Q16. How does the WiredTiger storage engine work?
  • Q17. What is a TTL index and when do you use it?
  • Q18. What is a Change Stream in MongoDB?
  • Q19. How do you diagnose and optimize a slow MongoDB query?
  • Q20. What update operators does MongoDB support?
  • Category 3: Redis (Q21-Q31)
  • Q21. What is Redis and what are its primary use cases?
  • Q22. What are Redis data structures and what is each used for?
  • Q23. What are the Redis persistence options and how do you choose?
  • Q24. What are Redis eviction policies and how do you choose?
  • Q25. What is Redis Pub/Sub and what are its limitations?
  • Q26. What is Redis pipelining and when do you use it?
  • Q27. What is Redis Cluster and how does it differ from Redis Sentinel?
  • Q28. What is a distributed lock in Redis and how do you implement one?
  • Q29. How do you implement rate limiting with Redis?
  • Q30. What is the cache stampede problem and how do you prevent it?
  • Q31. What is the difference between Redis and Memcached?
  • Category 4: Amazon DynamoDB (Q32-Q42)
  • Q32. What is Amazon DynamoDB and what makes it different from other databases?
  • Q33. What is the difference between a partition key and a sort key?
  • Q34. What is the difference between a GSI and an LSI?
  • Q35. What is the difference between Query and Scan in DynamoDB?
  • Q36. How do you calculate RCUs and WCUs?
  • Q37. What are DynamoDB consistency models?
  • Q38. What is the hot partition problem and how do you prevent it?
  • Q39. What are DynamoDB Streams and what are common use cases?
  • Q40. What is DynamoDB Accelerator (DAX)?
  • Q41. What is single-table design in DynamoDB and why is it recommended?
  • Q42. How do DynamoDB transactions work?
  • Quick Reference: All 42 Questions at a Glance
  • Frequently Asked Questions

NoSQL fluency is now a baseline expectation for backend and cloud roles, not a specialty. MongoDB powers document storage for thousands of production apps, Redis sits in front of almost every API as a cache or queue, and DynamoDB is the default data layer for serverless architectures on AWS. Most senior backend interviews expect working knowledge of at least two of these three.

These 42 questions cover what interviewers actually ask in 2026: foundational NoSQL theory, then database-specific internals, query patterns, and operational tradeoffs. If you're working with MongoDB through an ORM layer in NestJS, our NestJS interview questions guide covers the Mongoose side of that stack. And if this is part of a broader interview prep pass, our Node.js interview questions series pairs naturally with this one since most NoSQL drivers run on top of Node.

Coming from a relational background? It helps to see schema design from the other side first. Our guide to Drizzle ORM migrations shows how rigid, versioned schema changes work in SQL, which is the exact tradeoff NoSQL databases are designed to avoid.

💡 How to use this guide

Skim the Quick Reference table near the bottom to see all 42 questions and their core concept at a glance. Then use the table of contents to jump into whichever database (MongoDB, Redis, or DynamoDB) you're weakest on. Each answer includes the kind of detail and tradeoff discussion that signals real hands-on experience, not a memorized definition.

Category 1: Core NoSQL Concepts (Q1-Q6)

These questions establish whether you understand why NoSQL databases exist and what tradeoffs they make versus relational systems. Almost every NoSQL interview opens here, regardless of which specific database the role uses.

Q1. What is NoSQL and how does it differ from a relational database?

NoSQL (Not Only SQL) is a class of database systems that store and retrieve data using models other than the traditional relational table model. The name reflects that SQL is not the primary or only query mechanism, not that SQL is absent entirely.

Key differences from relational databases:

  • Schema: Relational databases enforce a fixed schema. Every row in a table has the same columns and data types. NoSQL databases are typically schema-flexible: documents in the same collection can have different fields.
  • Scaling: Relational databases scale vertically (bigger server) by default. NoSQL databases are designed to scale horizontally (more servers, data distributed across nodes).
  • Data model: Relational data is normalized into tables with foreign keys. NoSQL data is often denormalized: related data is stored together to minimize the number of reads required.
  • Transactions: Relational databases have had full ACID transactions for decades. Most NoSQL systems historically offered weaker consistency guarantees, though MongoDB now supports multi-document ACID transactions and DynamoDB supports transactions across items.

Use-case fit:

  • Relational: financial systems, ERP, reporting, any domain with complex relationships, strict integrity requirements, and predictable query patterns.
  • NoSQL: user profiles, product catalogs, session storage, real-time feeds, IoT data, content management, anything with variable schemas or extreme scale requirements.

Q2. What are the four main types of NoSQL databases?

  • Document stores: store data as JSON-like documents. Each document is self-contained and can have nested arrays and objects, with no fixed schema. Best for content management, user profiles, and product catalogs. Examples: MongoDB, CouchDB, Amazon DocumentDB.
  • Key-value stores: the simplest model. Each value is stored under a unique key, and the database knows nothing about the value's structure. Extremely fast for reads and writes. Best for session storage, caching, and real-time leaderboards. Examples: Redis, Amazon DynamoDB (can function as key-value), Memcached.
  • Column-family stores (wide-column): organize data by columns rather than rows. Rows can have different sets of columns, which makes them efficient for analytical workloads that read specific columns across many rows. Best for time-series data, analytics, IoT telemetry, and write-heavy workloads. Examples: Apache Cassandra, Amazon DynamoDB, HBase.
  • Graph databases: store data as nodes and edges representing entities and relationships, optimized for traversing connected data. Best for social networks, recommendation engines, fraud detection, and knowledge graphs. Examples: Neo4j, Amazon Neptune, ArangoDB.

Q3. What is the CAP theorem and how does it apply to NoSQL databases?

The CAP theorem states that a distributed database can guarantee at most two of the following three properties simultaneously.

  • Consistency (C): every read receives the most recent write or an error. All nodes see the same data at the same time.
  • Availability (A): every request receives a response (not necessarily the most recent data). The system stays operational even if some nodes fail.
  • Partition Tolerance (P): the system continues to operate when network partitions occur (nodes cannot communicate with each other).

In any real distributed system, network partitions are unavoidable, so every system must be partition tolerant. The real choice is between C and A.

System typeBehaviorExamples
CP (Consistency + Partition Tolerance)Returns an error or timeout if data cannot be confirmed as consistent. Good for financial systems.HBase, MongoDB (replica set default mode), Zookeeper
AP (Availability + Partition Tolerance)Always returns a response, possibly stale. Good for systems where availability matters more than perfect freshness.DynamoDB (eventual consistency mode), Cassandra, CouchDB
CA (Consistency + Availability)Only possible without partitions, meaning a single-node system.Traditional RDBMS running on one server

ℹ Interview nuance

Most modern distributed databases are not purely CP or AP. MongoDB and DynamoDB both offer tunable consistency, meaning you choose the level per operation. Naming this nuance unprompted is what separates a memorized answer from a real one.

Q4. What is BASE and how does it differ from ACID?

ACID (Atomicity, Consistency, Isolation, Durability) is the property set for traditional relational database transactions. It prioritizes correctness.

BASE is the competing model used by many NoSQL systems.

  • Basically Available: the system guarantees availability but may return stale or partially consistent data. The system responds even under partial failure.
  • Soft State: the state of the system may change over time even without new input, as nodes catch up to each other through eventual consistency.
  • Eventually Consistent: given enough time and no new updates, all nodes will converge to the same value. Reads after a write may return the old value for a period.

Practical example: if you post a like on a social media platform and a friend refreshes their feed half a second later, they might not see it yet. That is eventual consistency. The system is highly available (it did not fail) but not immediately consistent.

ACID is right for banking, payments, inventory systems, anything where reading stale data causes real harm. BASE is right for social feeds, analytics, recommendation engines, any system where brief staleness is acceptable and scale is paramount.

Q5. When should you choose NoSQL over a relational database?

Choose NoSQL when one or more of these conditions apply.

  • The schema is genuinely variable or evolving rapidly: if different records have different fields and you do not want to run ALTER TABLE migrations for every feature change, a document store fits naturally.
  • You need horizontal scale beyond what a single server can provide: if you expect hundreds of millions of records or millions of writes per second, NoSQL sharding (MongoDB, Cassandra) or managed auto-scaling (DynamoDB) are designed for this. Relational databases can scale, but it's harder and more expensive.
  • Your data is a natural hierarchy: user documents with embedded addresses, preferences, and activity history map cleanly to a document store. Normalizing this into 8 relational tables adds join complexity without benefit.
  • You need sub-millisecond access to frequently read data: use Redis as a cache layer in front of any database.
  • Your access patterns are known and simple: DynamoDB's single-table design is extremely fast when you know your access patterns upfront and design keys accordingly.

⚠ When NOT to choose NoSQL

Avoid NoSQL when you need complex JOINs across many entity types, full ACID transactions across unrelated entities, or when your team is more familiar with SQL and the dataset isn't large enough to justify the operational complexity.

Q6. What is eventual consistency and what does it mean in practice?

Eventual consistency means that if no new updates are made to a piece of data, all replicas of that data will eventually converge to the same value. There is no guarantee on how long convergence takes: it could be milliseconds or seconds.

In practice, for an AP system like DynamoDB in eventual consistency mode:

  1. You write a user's email update to the primary node.
  2. The write is acknowledged to your application immediately.
  3. Replica nodes receive the update asynchronously, typically within milliseconds, but not guaranteed.
  4. A read from a replica in the next few milliseconds may return the old email.
  5. After replication completes, all replicas return the new email.

For most web applications, this is fine. Users do not notice a 100ms delay before their profile update is visible everywhere.

For systems where it matters (reading your own writes, financial balances), use strongly consistent reads. DynamoDB offers this at 2x the read cost. MongoDB's read concern "majority" ensures you only read data confirmed by a majority of replica set members.

Category 2: MongoDB (Q7-Q20)

MongoDB questions test both conceptual understanding (BSON, schema design, sharding) and hands-on fluency with its query language and aggregation pipeline. Expect a mix of whiteboard explanation and live query writing.

Q7. What is MongoDB? Explain its core data model.

MongoDB is an open-source, document-oriented NoSQL database. Data is stored as BSON (Binary JSON) documents in collections. A collection is analogous to a SQL table. A document is analogous to a row, but each document can have a completely different structure from other documents in the same collection.

Core concepts:

  • Database: a container for collections. One MongoDB instance can have many databases.
  • Collection: a group of documents. No fixed schema is enforced unless you use schema validation.
  • Document: a BSON object with field-value pairs. Supports nested documents and arrays. Maximum document size is 16MB.
  • `_id`: every document has a unique _id field. If you do not provide one, MongoDB generates an ObjectId automatically. An ObjectId is a 12-byte value encoding a timestamp, machine identifier, and counter, making it sortable by creation time.
javascript
// Example document in a 'products' collection
{
  _id: ObjectId("507f1f77bcf86cd799439011"),
  name: "Wireless Keyboard",
  brand: "Logitech",
  price: 79.99,
  tags: ["electronics", "peripherals", "wireless"],
  specs: {
    weight: "450g",
    connectivity: "Bluetooth 5.0",
    batteryLife: "24 months"
  },
  variants: [
    { color: "black", sku: "MK-BLK-US" },
    { color: "white", sku: "MK-WHT-US" }
  ],
  inStock: true,
  createdAt: ISODate("2026-03-15T10:30:00Z")
}

Q8. What is BSON and why does MongoDB use it instead of plain JSON?

BSON (Binary JSON) is a binary-encoded serialization of JSON-like documents. MongoDB stores and transmits data as BSON rather than text JSON for several reasons.

  • Speed: BSON is faster to parse than text JSON because field names and lengths are encoded in binary, allowing the parser to jump to specific fields without scanning the entire document.
  • Additional types: plain JSON supports only string, number, boolean, null, array, and object. BSON adds Date (stored as int64 milliseconds), ObjectId, Binary data (for file chunks), Int32, Int64, Decimal128, Regular expression, and Timestamp. These types are critical for database operations.
  • Traversability: BSON documents include length prefixes that allow the database to skip over fields efficiently without deserializing them.
  • Size: BSON can be slightly larger than JSON for small documents but is more efficient for large documents with many fields.

In your application code, you work with JSON or your language's native objects. The MongoDB driver handles the JSON-to-BSON conversion automatically.

Q9. How do you perform CRUD operations in MongoDB?

MongoDB uses a rich query language expressed as JSON-like objects.

javascript
// INSERT
// Insert one document
db.users.insertOne({
  name: "Alice Chen",
  email: "alice@example.com",
  age: 28,
  roles: ["user", "editor"]
});

// Insert many documents
db.users.insertMany([
  { name: "Bob Smith", email: "bob@example.com", age: 34 },
  { name: "Carol Lee",  email: "carol@example.com", age: 25 }
]);

// READ
// Find all documents
db.users.find({});

// Find with filter, projection (include only specific fields)
db.users.find(
  { age: { $gte: 25 } },          // filter: age >= 25
  { name: 1, email: 1, _id: 0 }   // projection: include name, email; exclude _id
);

// Find one document
db.users.findOne({ email: "alice@example.com" });

// UPDATE
// Update one document
db.users.updateOne(
  { email: "alice@example.com" },                       // filter
  { $set: { age: 29 }, $addToSet: { roles: "admin" } }  // update operators
);

// Update many
db.users.updateMany(
  { age: { $lt: 18 } },
  { $set: { status: "minor" } }
);

// Upsert: insert if not found, update if found
db.users.updateOne(
  { email: "new@example.com" },
  { $set: { name: "New User", age: 22 } },
  { upsert: true }
);

// DELETE
db.users.deleteOne({ email: "bob@example.com" });
db.users.deleteMany({ status: "inactive", lastLogin: { $lt: new Date("2024-01-01") } });

Common query operators:

  • $eq, $ne, $gt, $gte, $lt, $lte: comparison
  • $in, $nin: value in / not in array
  • $and, $or, $not, $nor: logical operators
  • $exists: field exists check
  • $regex: pattern matching
  • $elemMatch: match array element conditions

Q10. How does indexing work in MongoDB and what index types exist?

MongoDB supports several index types, all built on B-Tree structures unless noted otherwise.

Single field index, the most common, created on one field:

javascript
db.orders.createIndex({ userId: 1 });   // ascending
db.orders.createIndex({ createdAt: -1 }); // descending (for sort queries)

Compound index, covers multiple fields. Column order matters (left-prefix rule, same as SQL). Most queries filter on the first field.

javascript
db.orders.createIndex({ userId: 1, status: 1, createdAt: -1 });
// Supports queries on: userId | userId+status | userId+status+createdAt

Unique index, enforces uniqueness. The primary key (_id) has a built-in unique index. Create one on email, username, etc.

javascript
db.users.createIndex({ email: 1 }, { unique: true });

Sparse index, only indexes documents that have the field. Saves space for optional fields that many documents omit.

javascript
db.users.createIndex({ phoneNumber: 1 }, { sparse: true });

TTL index (Time-To-Live), automatically removes documents after a specified time. Used for session data, temporary caches, and log retention.

javascript
db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 86400 });
// Documents are deleted 24 hours after their createdAt timestamp

Text index, supports full-text search across string fields.

javascript
db.articles.createIndex({ title: "text", body: "text" });
db.articles.find({ $text: { $search: "mongodb performance" } });

Geospatial index (2dsphere), for geographic queries (near, within polygon).

javascript
db.locations.createIndex({ coordinates: "2dsphere" });
db.locations.find({
  coordinates: {
    $near: { $geometry: { type: "Point", coordinates: [-73.9857, 40.7484] },
             $maxDistance: 1000 }  // within 1km
  }
});

Check index usage and efficiency with explain().

javascript
db.orders.find({ userId: "abc123" }).explain("executionStats");
// Look for: IXSCAN (good) vs COLLSCAN (bad, full collection scan)

Q11. Explain the MongoDB aggregation pipeline.

The aggregation pipeline processes documents through a sequence of stages. Each stage transforms the documents and passes them to the next stage. It is the preferred way to do data transformation, grouping, and analytics in MongoDB, replacing the older Map-Reduce approach.

Core pipeline stages:

javascript
db.orders.aggregate([

  // $match: filter documents (like WHERE in SQL), put early to reduce work
  { $match: {
    status: "completed",
    createdAt: { $gte: ISODate("2026-01-01") }
  }},

  // $lookup: left join with another collection
  { $lookup: {
    from: "users",
    localField: "userId",
    foreignField: "_id",
    as: "user"
  }},

  // $unwind: deconstruct an array field into individual documents
  { $unwind: "$user" },

  // $group: aggregate documents (like GROUP BY in SQL)
  { $group: {
    _id: "$user.country",               // group by country
    totalRevenue: { $sum: "$total" },   // sum order totals
    orderCount:   { $sum: 1 },          // count orders
    avgOrderValue: { $avg: "$total" }   // average order value
  }},

  // $addFields / $project: add or reshape fields
  { $addFields: {
    revenuePerOrder: { $divide: ["$totalRevenue", "$orderCount"] }
  }},

  // $sort: sort results
  { $sort: { totalRevenue: -1 } },

  // $limit: limit output count
  { $limit: 10 },

  // $skip: skip N results (for pagination)
  // { $skip: 20 },

  // $out: write results to a new collection
  // { $out: "country_revenue_report" }
]);

Other useful stages:

  • $count: count documents
  • $facet: run multiple sub-pipelines on the same data simultaneously
  • $bucket / $bucketAuto: group into buckets/ranges (histograms)
  • $replaceRoot: replace the root document with a nested field
  • $merge: write results back to a collection (incremental materialized views)

Q12. What is the difference between embedded documents and references in MongoDB schema design?

This is the most fundamental MongoDB schema design question. There is no single right answer, it depends on your access patterns.

Embedding (denormalization) stores related data inside the parent document.

javascript
// User document with embedded addresses
{
  _id: ObjectId("..."),
  name: "Alice",
  addresses: [
    { type: "home",    city: "New York", zip: "10001" },
    { type: "billing", city: "Boston",   zip: "02134" }
  ]
}

Use embedding when:

  • The embedded data is always read together with the parent (avoids extra queries)
  • The embedded data is not shared between multiple documents
  • The embedded array has a bounded size (not growing unboundedly)
  • You need atomic updates across parent and child in one write

Referencing (normalization) stores a reference (ObjectId) to another collection, then uses $lookup to join.

javascript
// Order references user by ID
{ _id: ObjectId("..."), userId: ObjectId("..."), total: 49.99 }

// Lookup user when needed
db.orders.aggregate([
  { $lookup: { from: "users", localField: "userId", foreignField: "_id", as: "user" } }
]);

Use referencing when:

  • The related data is large and not always needed
  • The related document is shared by many parents (many-to-many)
  • The child data grows unboundedly (comments, events, log entries)
  • You need to update the child data independently and reflect changes everywhere

💡 Rule of thumb

If you always read the data together, embed it. If the data grows unboundedly or is shared widely, reference it.

Q13. What is sharding in MongoDB and how does it work?

Sharding is MongoDB's horizontal scaling mechanism. It distributes data across multiple servers (shards). Each shard holds a subset of the data. Together, all shards contain the full dataset.

Architecture:

  • Shards: each shard is a replica set holding a portion of the data
  • Mongos (query router): receives application queries, determines which shard(s) hold the relevant data, and routes accordingly
  • Config servers: store cluster metadata, including which shard holds which data ranges

Shard key: a field (or compound fields) you choose to partition data by. MongoDB hashes or ranges the shard key values to distribute data.

javascript
// Enable sharding on a database
sh.enableSharding("ecommerce");

// Shard a collection by a hashed key (even distribution)
sh.shardCollection("ecommerce.orders", { userId: "hashed" });

// Shard by range (good for time-series: sequential writes to one shard, but
// allows efficient range queries by date)
sh.shardCollection("ecommerce.events", { timestamp: 1 });

Shard key selection is critical:

  • High cardinality: many distinct values to spread data evenly
  • High write distribution: avoid "hotspot" shards that receive most writes
  • Query alignment: include the shard key in most queries so mongos can route to a single shard instead of broadcasting to all shards

⚠ Scatter-gather queries

A query that does not include the shard key must be sent to ALL shards and results merged by mongos. This is expensive at scale, and avoiding it is the whole point of careful shard key design.

Q14. What is a replica set in MongoDB and what roles do nodes play?

A replica set is a group of MongoDB instances (typically 3 or more) that maintain the same dataset. It provides high availability and data redundancy.

Node roles:

  • Primary: receives all write operations. There is exactly one primary at any time. Replicates writes to secondaries via the oplog (operations log).
  • Secondary: maintains a copy of the primary's data by continuously applying operations from the primary's oplog. Can serve read operations if configured to do so. Participates in elections.
  • Arbiter (optional): holds no data. Participates in elections to break ties. Used to achieve an odd number of voting members without storing an extra copy of data.

Automatic failover: if the primary becomes unavailable, the remaining nodes hold an election. The secondary with the most up-to-date oplog wins and becomes the new primary. This typically completes in under 12 seconds.

javascript
// Connect to a replica set from application
const client = new MongoClient(
  "mongodb://node1:27017,node2:27017,node3:27017/mydb?replicaSet=rs0",
  { readPreference: "secondaryPreferred" } // reads go to secondaries when possible
);

// Read preferences:
// primary (default): always read from primary (strongly consistent)
// primaryPreferred: primary if available, else secondary
// secondary: always secondary (may be stale)
// secondaryPreferred: secondary if available, else primary
// nearest: lowest network latency

Q15. Does MongoDB support ACID transactions? How do they work?

Yes. MongoDB has supported ACID transactions since version 4.0 (for replica sets) and 4.2 (for sharded clusters). Before 4.0, atomicity was guaranteed only within a single document.

javascript
const session = client.startSession();

try {
  session.startTransaction({
    readConcern:  { level: "snapshot" },  // reads see a consistent snapshot
    writeConcern: { w: "majority" }       // write confirmed by majority of nodes
  });

  // Both operations succeed or both are rolled back
  await ordersCollection.insertOne(
    { userId, items, total: 149.99, status: "pending" },
    { session }
  );

  await inventoryCollection.updateMany(
    { _id: { $in: itemIds } },
    { $inc: { stock: -1 } },
    { session }
  );

  await session.commitTransaction();
  console.log("Transaction committed");

} catch (err) {
  await session.abortTransaction();
  console.error("Transaction aborted:", err.message);
} finally {
  await session.endSession();
}

⚠ Transaction limits

Multi-document transactions in MongoDB have a 60-second timeout by default. If a transaction takes too long, it is aborted. For most web API use cases, single-document operations (which are always atomic) are preferred over transactions because they are faster and simpler. Use transactions only when you genuinely need atomicity across multiple documents.

Q16. How does the WiredTiger storage engine work?

WiredTiger has been MongoDB's default storage engine since version 3.2. It replaced MMAPv1 and brought significant improvements.

  • Document-level concurrency: WiredTiger uses optimistic concurrency control at the document level. Multiple writers can modify different documents in the same collection simultaneously without blocking each other. MMAPv1 only had collection-level locking, creating a bottleneck.
  • Compression: WiredTiger compresses data at rest using snappy (default) or zlib. Index data is also compressed, reducing disk I/O and storage costs significantly.
  • Write-ahead log (journal): all writes go to the journal first. If MongoDB crashes, the journal is replayed on startup to recover data written since the last checkpoint. Checkpoints occur every 60 seconds by default.
  • Cache: WiredTiger maintains an in-memory cache (default: 50% of RAM minus 1GB, minimum 256MB). Frequently accessed data stays in cache. Eviction runs in the background to keep cache usage within bounds.
  • MVCC (Multi-Version Concurrency Control): WiredTiger uses MVCC for reads, similar to PostgreSQL. Readers see a consistent snapshot without blocking writers.

Q17. What is a TTL index and when do you use it?

A TTL (Time-To-Live) index tells MongoDB to automatically delete documents after a specified duration. It is a single-field index on a Date field.

javascript
// Automatically delete sessions 7 days after creation
db.sessions.createIndex(
  { createdAt: 1 },
  { expireAfterSeconds: 604800 }  // 7 days in seconds
);

// Session document
db.sessions.insertOne({
  sessionId: "abc123",
  userId: "user456",
  data: { cart: [], preferences: {} },
  createdAt: new Date()  // TTL index field
});

// You can also set a specific expiry time per document
db.notifications.createIndex(
  { expiresAt: 1 },
  { expireAfterSeconds: 0 }  // document expires exactly at the date stored in expiresAt
);

db.notifications.insertOne({
  message: "Your trial expires soon",
  userId: "user789",
  expiresAt: new Date("2026-12-31T23:59:59Z")  // this specific document expires here
});

MongoDB runs a background task every 60 seconds to delete expired documents. Deletion is not instantaneous: documents may persist for up to 60 seconds after expiry. This is acceptable for session cleanup, cache expiration, and temporary data, but not for exact-time deletion requirements.

Q18. What is a Change Stream in MongoDB?

A Change Stream is a real-time stream of change events (insertions, updates, deletions, collection drops) on a collection, database, or entire cluster. Change Streams use MongoDB's replication oplog under the hood and require a replica set or sharded cluster.

javascript
// Watch a collection for all changes
const changeStream = db.collection("orders").watch();

changeStream.on("change", (event) => {
  console.log("Change type:", event.operationType); // insert, update, delete, replace
  console.log("Document key:", event.documentKey._id);
  console.log("Full document:", event.fullDocument);  // only for insert by default
  console.log("Update description:", event.updateDescription); // for updates
});

// Filter to only watch for new completed orders
const pipeline = [
  { $match: {
    operationType: "update",
    "updateDescription.updatedFields.status": "completed"
  }}
];
const filteredStream = db.collection("orders").watch(pipeline, {
  fullDocument: "updateLookup"  // include full document on update events
});

// Resume a stream after reconnection using a resume token
// Every change event includes a _id (resume token)
const resumeToken = lastProcessedEvent._id;
const resumedStream = collection.watch([], { resumeAfter: resumeToken });

Use cases: invalidating application caches when data changes, powering live dashboards, triggering notifications, feeding Kafka or other event streams, and building audit logs.

Q19. How do you diagnose and optimize a slow MongoDB query?

Step 1: use explain() to inspect the query plan.

javascript
// Run explain with executionStats for real numbers
db.orders.find({ userId: "abc123", status: "pending" })
         .sort({ createdAt: -1 })
         .explain("executionStats");

// Key things to look for in the output:
// winningPlan.stage: "COLLSCAN" = bad (no index), "IXSCAN" = good (using index)
// executionStats.totalDocsExamined vs nReturned:
//   If examined is much greater than returned, the index is not selective enough
// executionStats.executionTimeMillis: how long the query took

Step 2: identify the issue and fix it.

  • COLLSCAN on a frequently queried field: add an index (see code below)
  • Too many documents examined: make the index more selective by adding more fields to a compound index
  • Slow sort without index: add the sort field to the index (sort direction must match)
  • Large document projection: use projection to return only needed fields (see code below)
javascript
// Fix a COLLSCAN by adding a compound index
db.orders.createIndex({ userId: 1, status: 1, createdAt: -1 });

// Use projection to return only needed fields
db.orders.find({ userId: "abc123" }, { _id: 1, total: 1, status: 1 });
// Returns only 3 fields instead of the entire document

Step 3: monitor in production with MongoDB Atlas Performance Advisor or the database profiler.

javascript
// Enable profiler to log slow queries (over 100ms)
db.setProfilingLevel(1, { slowms: 100 });

// View profiled queries
db.system.profile.find().sort({ ts: -1 }).limit(10);

Q20. What update operators does MongoDB support?

javascript
// $set: set field values
db.users.updateOne({ _id: id }, { $set: { name: "Updated Name", age: 30 } });

// $unset: remove a field
db.users.updateOne({ _id: id }, { $unset: { temporaryField: "" } });

// $inc: increment a numeric field
db.products.updateOne({ _id: id }, { $inc: { viewCount: 1, stock: -2 } });

// $push: append to an array
db.posts.updateOne({ _id: id }, { $push: { comments: { text: "Great post!", userId: "u1" } } });

// $addToSet: append to array only if value is not already present
db.users.updateOne({ _id: id }, { $addToSet: { tags: "verified" } });

// $pull: remove matching elements from array
db.users.updateOne({ _id: id }, { $pull: { tags: "spam" } });

// $pop: remove first (-1) or last (1) element of array
db.lists.updateOne({ _id: id }, { $pop: { items: 1 } });  // remove last

// $rename: rename a field
db.users.updateMany({}, { $rename: { "fullName": "name" } });

// $currentDate: set field to current date
db.orders.updateOne({ _id: id }, { $currentDate: { updatedAt: true } });

// $min / $max: update only if new value is lower/higher than current
db.scores.updateOne({ userId: id }, { $max: { highScore: 9500 } });
// Only updates if 9500 is greater than the current highScore

Category 3: Redis (Q21-Q31)

Redis questions probe two things: whether you know its data structures well enough to pick the right one for a given problem, and whether you understand its operational tradeoffs around persistence, eviction, and clustering. Expect rapid-fire "what would you use for X" scenarios.

Q21. What is Redis and what are its primary use cases?

Redis (Remote Dictionary Server) is an open-source, in-memory data structure store. All data lives in RAM, which is why reads and writes take microseconds. It supports persistence, replication, clustering, and over a dozen data structure types.

Primary use cases:

  • Caching: store expensive database query results, API responses, or computed values in Redis with a TTL. Subsequent requests hit Redis instead of the database, reducing latency from milliseconds to microseconds.
  • Session storage: store user session data with automatic expiry. Scales horizontally across multiple application servers without sticky sessions.
  • Rate limiting: use Redis atomic increment operations to count requests per IP or user within a time window.
  • Pub/Sub messaging: publish messages to channels so all subscribers receive them in real time. Used for live notifications, chat, and event broadcasting.
  • Leaderboards and counters: Redis Sorted Sets are perfect for real-time rankings with O(log n) insertion and range queries.
  • Job queues: use Redis Lists as FIFO queues. Libraries like BullMQ (Node.js) and Celery (Python) use Redis as their queue backend.
  • Distributed locks: implement mutex locks across multiple application servers using atomic Redis operations (SETNX or the Redlock algorithm).

Q22. What are Redis data structures and what is each used for?

Redis is not just a key-value store. The "value" can be one of many rich data types.

String: the most basic type. Can store text, serialized JSON, integers, or binary data. Integers can be incremented atomically.

text
SET user:1001:name "Alice Chen"
GET user:1001:name
SET page:views 0
INCR page:views           # atomic increment, becomes 1
INCRBY page:views 10      # becomes 11
EXPIRE user:1001:name 3600   # expire in 1 hour

Hash: a map of field-value pairs inside a single key. Best for storing objects (user profiles, product details). More memory-efficient than separate string keys.

text
HSET user:1001 name "Alice" email "alice@example.com" age 28
HGET user:1001 email           # "alice@example.com"
HGETALL user:1001              # all fields and values
HINCRBY user:1001 loginCount 1 # increment a numeric hash field

List: an ordered collection of strings (doubly linked list). Supports push/pop from both ends. Used for queues, timelines, activity feeds.

text
RPUSH jobs:queue "job:1" "job:2" "job:3"  # push to right (tail)
LPOP jobs:queue                            # pop from left (head), FIFO queue
LRANGE timeline:user:1001 0 9             # get first 10 items
LLEN jobs:queue                           # list length

Set: unordered collection of unique strings. Supports set operations (union, intersection, difference). Used for tags, followers, unique visitors.

text
SADD article:1:tags "mongodb" "nosql" "database"
SMEMBERS article:1:tags
SISMEMBER article:1:tags "nosql"  # 1 (exists) or 0 (not exists)
SINTER user:1:follows user:2:follows  # mutual follows
SUNIONSTORE result:tags tag:A tag:B   # store union of two sets

Sorted Set (ZSet): like a set but every member has a score (float). Members are sorted by score, with O(log n) operations. The go-to structure for leaderboards and ranking systems.

text
ZADD leaderboard 9500 "alice" 8750 "bob" 9200 "carol"
ZRANK leaderboard "alice"     # 0 (top rank, sorted ascending)
ZREVRANK leaderboard "alice"  # 2 (sorted descending, rank in top-N terms)
ZRANGE leaderboard 0 2 REV WITHSCORES  # top 3 with scores
ZINCRBY leaderboard 250 "bob"           # add 250 to bob's score

Bitmap: treat a string as a bit array. Extremely memory-efficient for boolean tracking (daily active users, feature flags per user ID).

text
SETBIT users:active:2026-06-08 1001 1  # user 1001 was active today
GETBIT users:active:2026-06-08 1001    # 1
BITCOUNT users:active:2026-06-08       # total active users today

HyperLogLog: a probabilistic data structure for cardinality estimation. Uses about 12KB of memory regardless of the number of unique elements, with roughly a 0.81% error rate.

text
PFADD page:visitors:today user1 user2 user3 user4
PFCOUNT page:visitors:today  # estimated unique visitors

Streams: an append-only log of messages. Persistent, with consumer group support. More powerful than Pub/Sub for reliable message delivery.

text
XADD events:orders * userId 1001 total 49.99 status pending
XREAD COUNT 10 STREAMS events:orders 0

Q23. What are the Redis persistence options and how do you choose?

Redis supports two persistence mechanisms: RDB and AOF. You can use one, both, or neither (pure in-memory, data lost on restart).

RDB (Redis Database Backup / Snapshots):

  • Periodically saves a point-in-time snapshot of the entire dataset to an .rdb file on disk
  • Configured by save rules: save 900 1 saves if at least 1 key changed in 900 seconds
  • A fork process handles the snapshot; the main process continues serving traffic
  • Fast restarts: loading an RDB file is faster than replaying an AOF log
  • Risk: data written since the last snapshot is lost on crash
text
# redis.conf
save 900 1      # save if 1 change in 15 minutes
save 300 10     # save if 10 changes in 5 minutes
save 60 10000   # save if 10000 changes in 1 minute

AOF (Append Only File):

  • Logs every write operation to an .aof file
  • On restart, Redis replays the AOF to reconstruct the dataset
  • Three fsync policies: always fsyncs after every write (safest, slowest at roughly 1000 writes/sec), everysec fsyncs every second (default, at most 1 second of data loss), and no lets the OS decide when to fsync (fastest, most data loss risk)
  • AOF files grow continuously and are compacted via BGREWRITEAOF
text
# redis.conf
appendonly yes
appendfsync everysec    # recommended balance of safety and performance

Both (recommended for production): run with both RDB and AOF enabled. AOF provides the durability guarantee, RDB provides faster restarts. Redis uses the AOF for recovery when both are present.

No persistence: pure cache use case where losing data on restart is acceptable. Highest performance, lowest disk I/O.

💡 How to choose

Can you lose minutes of data? Use RDB only. Can you lose 1 second of data? Use AOF with everysec. Can you lose zero data? Use AOF with always, accepting the significant performance cost.

Q24. What are Redis eviction policies and how do you choose?

When Redis reaches its maxmemory limit, it uses an eviction policy to decide which keys to remove to make room for new data. Available policies as of Redis 7.x:

  • noeviction: returns an error when memory is full. No keys are removed. Use when you cannot afford to lose data. This is the default.
  • allkeys-lru: evicts the least recently used key from all keys. A good general-purpose caching policy when all keys are fair game.
  • volatile-lru: evicts the least recently used key from keys with a TTL set. Keys without a TTL are never evicted. Use when you want to protect permanent data while caching TTL-bound data.
  • allkeys-lfu: evicts the least frequently used key from all keys. Better than LRU for workloads where some data is accessed seasonally or in bursts.
  • volatile-lfu: same as LFU but only on keys with TTL.
  • allkeys-random: evicts a random key from all keys. Rarely the right choice.
  • volatile-random: evicts a random key from keys with TTL.
  • volatile-ttl: evicts the key with the nearest expiry time. Useful when you want to prioritize keeping longer-lived cached items.
text
# redis.conf
maxmemory 4gb
maxmemory-policy allkeys-lru  # recommended for general caching

For a cache where all data has a TTL and you want it to work automatically, use allkeys-lru or allkeys-lfu. For a cache where some keys must survive (session data mixed with volatile cache), use volatile-lru.

Q25. What is Redis Pub/Sub and what are its limitations?

Pub/Sub is a messaging pattern where publishers send messages to channels without knowing who will receive them. Subscribers listen on channels and receive messages published to them.

text
# Publisher (in one Redis client)
PUBLISH notifications:user:1001 '{"type":"order_shipped","orderId":"ORD-789"}'

# Subscriber (in another Redis client)
SUBSCRIBE notifications:user:1001

# Subscribe to a pattern (all notification channels)
PSUBSCRIBE notifications:*

In Node.js:

javascript
const subscriber = redis.duplicate(); // use a dedicated connection
await subscriber.subscribe("notifications:user:1001", (message) => {
  const event = JSON.parse(message);
  sendWebSocketNotification(event);
});

// Publisher (using a separate connection)
await publisher.publish("notifications:user:1001",
  JSON.stringify({ type: "order_shipped", orderId: "ORD-789" }));

Key limitations:

  • No message persistence: if no subscriber is listening when a message is published, the message is lost. There is no queue and no retention.
  • No delivery guarantee: Pub/Sub is fire-and-forget. If a subscriber disconnects and reconnects, it misses all messages published during the gap.
  • No consumer groups: all subscribers to a channel receive every message. You cannot have competing consumers where only one processes each message.

ℹ When to use Pub/Sub vs Streams

Use Pub/Sub for live notifications, real-time dashboards, and WebSocket event broadcasting, scenarios where losing a message is acceptable. For reliable message delivery with persistence and consumer groups, use Redis Streams instead (see Q22).

Q26. What is Redis pipelining and when do you use it?

Pipelining sends multiple commands to Redis in a single network round trip without waiting for each response. Without pipelining, each command incurs a network round trip, typically 0.5 to 2ms on a LAN. With pipelining, N commands use only one round trip.

javascript
// Without pipelining: N round trips (slow for large N)
for (const key of keys) {
  await redis.get(key);
}

// With pipelining: 1 round trip for all commands
const pipeline = redis.pipeline();
for (const key of keys) {
  pipeline.get(key);
}
const results = await pipeline.exec();
// results is an array of [error, value] for each command

⚠ Pipelining is not atomic

Pipelining does not guarantee atomicity. Other client commands can interleave between pipelined commands on the server. For atomic batch operations, use MULTI/EXEC transactions or Lua scripts.

Use pipelining for bulk operations like warming a cache, loading initial data, or processing batches of updates where order within the batch does not matter and you do not need atomicity.

Q27. What is Redis Cluster and how does it differ from Redis Sentinel?

Both provide high availability, but they serve different purposes.

Redis Sentinel manages a master-replica setup. It monitors the master, detects failure, performs automatic failover by promoting a replica to master, and notifies clients of the new master address. It provides high availability but not horizontal scaling: all data still lives on one master, and replicas are read-only copies.

text
# Sentinel configuration
sentinel monitor mymaster 127.0.0.1 6379 2  # monitor master, quorum 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000

Redis Cluster shards data across multiple nodes. Each node holds a subset of the 16,384 hash slots. It provides both horizontal scaling (data spread across nodes) and high availability (each shard has replicas). Clients must be cluster-aware.

bash
# Create a 6-node cluster (3 primary, 3 replica)
redis-cli --cluster create \
  127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 \
  127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 \
  --cluster-replicas 1

# Cluster info
redis-cli -p 7000 cluster info
redis-cli -p 7000 cluster nodes
  • Dataset fits on one server, need HA: use Redis Sentinel
  • Dataset too large for one server, need to scale writes: use Redis Cluster
  • Managed cloud Redis (ElastiCache, Redis Cloud): cluster mode is usually a configuration toggle, handled for you

Q28. What is a distributed lock in Redis and how do you implement one?

A distributed lock allows multiple application instances to coordinate access to a shared resource, ensuring only one instance executes a critical section at a time.

Simple lock using SET NX EX (single Redis node): NX means only set if the key does not exist (atomic), and EX means expire in N seconds, which prevents the lock being held forever if a process crashes.

javascript
const lockKey = "lock:payment:order-789";
const lockValue = crypto.randomUUID(); // unique value per lock holder
const ttl = 30; // seconds

const acquired = await redis.set(lockKey, lockValue, "NX", "EX", ttl);

if (acquired === "OK") {
  try {
    await processPayment(orderId);
  } finally {
    // Release: only delete if we still own it (Lua for atomicity)
    const script = `
      if redis.call("GET", KEYS[1]) == ARGV[1] then
        return redis.call("DEL", KEYS[1])
      else
        return 0
      end
    `;
    await redis.eval(script, 1, lockKey, lockValue);
  }
} else {
  throw new Error("Could not acquire lock: another instance is processing");
}

⚠ Why the Lua script matters

The Lua script for release is critical. Without it, you might delete another process's lock: your lock expired while processing, another process acquired it, and you then delete theirs. For production multi-node Redis, use the Redlock algorithm (implemented in libraries like redlock-node), which acquires locks on a majority of Redis nodes simultaneously to tolerate individual node failures.

Q29. How do you implement rate limiting with Redis?

The sliding window rate limiter using INCR and EXPIRE is the standard pattern.

javascript
async function isRateLimited(userId, limit = 100, windowSeconds = 60) {
  const key = `ratelimit:${userId}:${Math.floor(Date.now() / 1000 / windowSeconds)}`;

  const count = await redis.incr(key);

  if (count === 1) {
    // First request in this window: set expiry
    await redis.expire(key, windowSeconds * 2);
  }

  return count > limit;
}

// Usage in Express middleware
app.use(async (req, res, next) => {
  const userId = req.user?.id || req.ip;
  if (await isRateLimited(userId)) {
    return res.status(429).json({ error: "Too many requests" });
  }
  next();
});

A more precise sliding window uses Sorted Sets.

javascript
async function isRateLimitedPrecise(userId, limit = 100, windowMs = 60000) {
  const now = Date.now();
  const windowStart = now - windowMs;
  const key = `ratelimit:precise:${userId}`;

  const pipeline = redis.pipeline();
  pipeline.zremrangebyscore(key, 0, windowStart);  // remove old entries
  pipeline.zadd(key, now, `${now}-${Math.random()}`); // add current request
  pipeline.zcard(key);                               // count requests in window
  pipeline.expire(key, Math.ceil(windowMs / 1000));  // auto-cleanup

  const results = await pipeline.exec();
  const requestCount = results[2][1];
  return requestCount > limit;
}

Q30. What is the cache stampede problem and how do you prevent it?

A cache stampede (also called thundering herd) occurs when a cached item expires and many requests all simultaneously find the cache miss and attempt to recompute the expensive value at the same time. Every one of those requests hits the database or API, causing a sudden spike in backend load.

Prevention strategies:

  1. 1

    Cache locking (mutex): only one request recomputes, others wait

    javascript
    async function getWithLock(key, computeFn, ttl = 300) {
      const cached = await redis.get(key);
      if (cached) return JSON.parse(cached);
    
      const lockKey = `lock:${key}`;
      const acquired = await redis.set(lockKey, "1", "NX", "EX", 10);
    
      if (acquired) {
        try {
          const value = await computeFn();
          await redis.setex(key, ttl, JSON.stringify(value));
          return value;
        } finally {
          await redis.del(lockKey);
        }
      } else {
        // Another process is computing: wait briefly and retry
        await new Promise(r => setTimeout(r, 200));
        return getWithLock(key, computeFn, ttl);
      }
    }
  2. 2

    Probabilistic early expiration (XFetch)

    Proactively recompute before expiry using a probabilistic algorithm that triggers a re-fetch earlier for expensive-to-compute values.

  3. 3

    Stale-while-revalidate

    Always return cached (even stale) data immediately, then trigger background recomputation asynchronously. This way there's never a cold miss.

  4. 4

    Pre-warming

    Proactively populate the cache before expiry using a background job.

Q31. What is the difference between Redis and Memcached?

Both are in-memory key-value caches, but Redis is significantly more capable.

FeatureRedisMemcached
Data typesString, List, Hash, Set, ZSet, Stream, Bitmap, HLLString only
PersistenceRDB + AOFNone (cache only)
ReplicationYes (master-replica, cluster)Client-side sharding only
Pub/SubYesNo
Atomic operationsYes (INCR, LPUSH, ZADD, etc.)Limited (INCR, DECR)
Lua scriptingYesNo
Multi-threadingSingle-threaded event loop (I/O threads in Redis 6+)Multi-threaded
Max value size512MB1MB
ClusteringRedis Cluster (native)Not native

When would you choose Memcached over Redis? When you need pure caching with simple string values, maximum multi-threading performance, and the simplest possible operations. In practice, Redis has replaced Memcached in the vast majority of new projects because the feature set is so much richer with no meaningful performance tradeoff for most cache workloads.

Category 4: Amazon DynamoDB (Q32-Q42)

DynamoDB questions test a different muscle than MongoDB or Redis. There is no query language to recall and no persistence config to tune. Instead, interviewers want to know whether you can design a key schema and access patterns upfront, since DynamoDB punishes designs that get this wrong far more than a relational database does.

Q32. What is Amazon DynamoDB and what makes it different from other databases?

DynamoDB is AWS's fully managed, serverless NoSQL database. You do not manage servers, patches, backups, or replication: AWS handles everything. DynamoDB scales automatically from zero to millions of requests per second with single-digit millisecond latency at any scale.

Key characteristics:

  • Serverless: no servers to provision or manage. Pay per read/write or use on-demand mode.
  • Multi-region replication: Global Tables replicate data across multiple AWS regions automatically.
  • Flexible data model: stores items (similar to documents) with a required primary key and any additional attributes. No fixed schema beyond the key.
  • Single-table design: DynamoDB encourages storing multiple entity types in one table, using composite keys to separate them. This is different from every other database and requires careful upfront design.
  • HTTP-based API: all operations (GetItem, PutItem, Query, Scan) are HTTPS API calls. No persistent database connection required.

Best for: serverless applications, applications with predictable access patterns, global applications requiring multi-region, IoT data ingestion, user session storage, and gaming leaderboards.

Q33. What is the difference between a partition key and a sort key?

Every DynamoDB table has a primary key, which can be one of two types.

  • Simple primary key (partition key only): a single attribute that uniquely identifies each item. DynamoDB hashes the partition key to determine which internal partition stores the item.
  • Composite primary key (partition key + sort key): two attributes together uniquely identify each item. All items with the same partition key are stored together (in the same partition), sorted by sort key. This enables efficient range queries within a partition.
javascript
// Table: Orders
// Partition key: userId | Sort key: orderId (sorts by time if you use a ULID)

// Items in the same partition (same userId):
{ userId: "user-1001", orderId: "ORD-2026-001", total: 49.99,  status: "shipped" }
{ userId: "user-1001", orderId: "ORD-2026-002", total: 129.00, status: "pending" }
{ userId: "user-1001", orderId: "ORD-2026-003", total: 22.50,  status: "delivered" }

// Query: get all orders for user-1001 sorted by orderId
// KeyConditionExpression: userId = :uid
// This is a single-partition query: extremely fast

Design principle: the partition key determines where data is stored and how well writes are distributed. The sort key determines how data within a partition is ordered and what range queries are possible.

Q34. What is the difference between a GSI and an LSI?

Both are secondary indexes that let you query data using attributes other than the primary key.

PropertyGlobal Secondary Index (GSI)Local Secondary Index (LSI)
Partition keyDifferent from the base tableSame as the base table
ScopeSpans all partitions of the base tableLocal to each partition (co-located with base table data)
CreatedAnytime, after table creation tooOnly at table creation, cannot be added later
ThroughputIts own provisioned throughputShares the base table's throughput
Read consistencyEventually consistent onlyEventually or strongly consistent
Limit per tableUp to 20Up to 5
Size limitNone beyond table limits10GB per partition key value (shared with base table)
javascript
// Base table: UserOrders
// PK: userId | SK: orderId

// LSI: query by status within a user's orders
// Same PK: userId | SK: status
// Lets you query: all pending orders for user-1001

// GSI: query by status across ALL users
// New PK: status | SK: orderId
// Lets you query: all pending orders across the entire system

💡 The rule that decides GSI vs LSI

If you need to filter within a single user's (or entity's) data, reach for an LSI. If you need to query across all users or entities, reach for a GSI. Since LSIs cannot be added after table creation, most teams default to GSIs unless they know the LSI access pattern upfront.

Q35. What is the difference between Query and Scan in DynamoDB?

Query retrieves items using the primary key or a secondary index. It must specify the partition key value, and returns items sorted by sort key within that partition. It is efficient because it only reads the relevant partition.

Scan reads every item in the table (or index) and then optionally filters. It reads all partitions, so it consumes RCUs proportional to the entire table size, not just the items returned.

javascript
// QUERY (efficient): get orders for a specific user
const result = await dynamoDB.query({
  TableName: "Orders",
  KeyConditionExpression: "userId = :uid AND orderId BETWEEN :start AND :end",
  ExpressionAttributeValues: {
    ":uid":   { S: "user-1001" },
    ":start": { S: "ORD-2026-001" },
    ":end":   { S: "ORD-2026-999" },
  },
}).promise();

// SCAN (expensive, reads all items): use only when necessary
const result2 = await dynamoDB.scan({
  TableName: "Orders",
  FilterExpression: "total > :minTotal",
  ExpressionAttributeValues: { ":minTotal": { N: "100" } },
}).promise();
// FilterExpression runs AFTER reading all items, it does not reduce RCU cost

Scan is acceptable when you genuinely need to process every item in the table, for migration jobs, exports, or analysis. In those cases, use a parallel scan to split the work across multiple segments.

javascript
// Parallel scan with 4 segments
for (let segment = 0; segment < 4; segment++) {
  promises.push(dynamoDB.scan({
    TableName: "Orders",
    TotalSegments: 4,
    Segment: segment,
  }).promise());
}

💡 Interview rule

Always prefer Query over Scan. Design your table so every production read is a Query. If you find yourself reaching for Scan in a hot path, that is a sign the table's key design does not match your access patterns.

Q36. How do you calculate RCUs and WCUs?

RCU (Read Capacity Unit):

  • 1 RCU = 1 strongly consistent read of up to 4KB
  • 1 RCU = 2 eventually consistent reads of up to 4KB
  • For items larger than 4KB, round up: a 9KB item costs 3 RCUs strongly consistent (ceil(9/4) = 3)

WCU (Write Capacity Unit):

  • 1 WCU = 1 write of up to 1KB
  • For items larger than 1KB, round up: a 3.5KB item costs 4 WCUs (ceil(3.5) = 4)
Transactional reads and writes (DynamoDB Transactions) always cost 2x the normal RCU/WCU for the same item size.
ScenarioCalculationResult
1,000 reads/sec of 8KB items, strong consistencyceil(8/4) * 1 * 1,0002,000 RCU
500 reads/sec of 8KB items, eventual consistencyceil(8/4) * 0.5 * 500500 RCU
200 writes/sec of 2.5KB itemsceil(2.5) * 200600 WCU
200 transactional writes of 2.5KB items600 * 2 (transactions cost 2x)1,200 WCU

On-demand mode means you pay per actual request with no capacity planning, good for unpredictable or spiky workloads. Provisioned mode means you set RCU and WCU limits and pay for the provisioned capacity whether it is used or not, with optional auto-scaling, good for predictable workloads.

Q37. What are DynamoDB consistency models?

DynamoDB offers two consistency options per read operation.

  • Eventually consistent reads (default): returns data from any of the three AZ replicas, may return stale data if a recent write has not propagated yet, costs 0.5 RCU per 4KB. Best for product catalogs, game state, and most web application reads.
  • Strongly consistent reads: returns data from the primary replica, reflecting all writes completed before the read. Costs 1 RCU per 4KB. Not available on Global Secondary Indexes (GSIs always use eventual consistency). Best for financial balances, inventory counts, and any read-your-own-writes requirement.
javascript
// Eventually consistent (default)
const result = await dynamoDB.getItem({
  TableName: "Orders",
  Key: { userId: { S: "user-1001" }, orderId: { S: "ORD-001" } },
}).promise();

// Strongly consistent
const strongResult = await dynamoDB.getItem({
  TableName: "Orders",
  Key: { userId: { S: "user-1001" }, orderId: { S: "ORD-001" } },
  ConsistentRead: true, // double the RCU cost
}).promise();

Q38. What is the hot partition problem and how do you prevent it?

A hot partition occurs when too many requests target the same partition key value, sending all that traffic to a single physical partition. DynamoDB automatically splits partitions as they grow, but a single key's traffic is always routed to the same partition, so splitting does not help if every request is for one key.

Example: a global leaderboard table with a partition key of "global" means every write and read hits one partition.

javascript
// Instead of: { pk: "LEADERBOARD", score: 9500, userId: "alice" }
// Use:        { pk: "LEADERBOARD#3", score: 9500, userId: "alice" }
// Rotate the suffix 0-9 on writes; query all 10 suffixes and merge on reads
  • Shard the hot key: append a random suffix (0-9) to the partition key and distribute writes across 10 logical partitions. Reads query all 10 shards and aggregate.
  • Design better partition keys: use high-cardinality attributes (userId, orderId, deviceId) rather than low-cardinality ones (country, status, type).
  • Write sharding for counters: instead of incrementing one counter item, write to one of N shard counters at random. Read by summing all shards.
  • DynamoDB adaptive capacity (automatic): DynamoDB automatically redistributes capacity to hot partitions within seconds. This helps with burst traffic, but it is not a substitute for good key design.

Q39. What are DynamoDB Streams and what are common use cases?

DynamoDB Streams capture a time-ordered sequence of item-level changes in a DynamoDB table. Each stream record contains the item's before state, after state, or both, depending on the configured StreamViewType. Records are retained for 24 hours.

  • KEYS_ONLY: only the key attributes of the changed item
  • NEW_IMAGE: the entire item after the change
  • OLD_IMAGE: the entire item before the change
  • NEW_AND_OLD_IMAGES: both before and after
javascript — lambda-handler.js
// Lambda trigger: processes stream records automatically
exports.handler = async (event) => {
  for (const record of event.Records) {
    const { eventName, dynamodb } = record;

    if (eventName === "INSERT") {
      const newItem = AWS.DynamoDB.Converter.unmarshall(dynamodb.NewImage);
      await sendWelcomeEmail(newItem.email);
    }

    if (eventName === "MODIFY") {
      const oldItem = AWS.DynamoDB.Converter.unmarshall(dynamodb.OldImage);
      const newItem = AWS.DynamoDB.Converter.unmarshall(dynamodb.NewImage);

      if (oldItem.status !== newItem.status && newItem.status === "shipped") {
        await sendShippingNotification(newItem);
      }
    }
  }
};

Common use cases: triggering notifications on data change (order status, user signup), replicating data to other AWS services (OpenSearch for full-text search, Redshift for analytics, S3 for archival), invalidating a Redis cache when DynamoDB changes, audit logging, and cross-region replication (the mechanism behind Global Tables).

Q40. What is DynamoDB Accelerator (DAX)?

DAX is a fully managed, in-memory cache built specifically for DynamoDB. It is API-compatible: you swap the DynamoDB client for the DAX client and no other code changes are required.

  • Read latency drops from single-digit milliseconds (DynamoDB) to microseconds (DAX)
  • Write-through cache: writes go to both DAX and DynamoDB atomically
  • Automatic TTL for cached items
  • Multi-AZ deployment for high availability
javascript
// Without DAX: standard DynamoDB client
const ddb = new AWS.DynamoDB.DocumentClient();

// With DAX: swap the client, same API
const dax = new AmazonDaxClient({ endpoints: ["dax-cluster.xxx.dax.amazonaws.com:8111"] });
const daxClient = new AWS.DynamoDB.DocumentClient({ service: dax });

// The rest of your code is identical
const result = await daxClient.get({
  TableName: "Products",
  Key: { productId: "prod-001" },
}).promise();
Use DAX whenAvoid DAX when
Read-heavy workloads with the same items read repeatedlyThe workload is write-heavy (DAX does not help writes much)
You need microsecond read latencyYou need strongly consistent reads (DAX serves eventual consistency only)
Read costs are high and cache hit rate would be highMost reads are unique or infrequent (low cache hit rate)

Q41. What is single-table design in DynamoDB and why is it recommended?

Single-table design means storing multiple entity types (users, orders, products, sessions) in one DynamoDB table using composite keys that encode the entity type and identifier.

The reason: DynamoDB does not support joins. With separate tables per entity type, retrieving a user and their orders requires two separate API calls. In a relational database, one JOIN handles this. In DynamoDB, the answer is to co-locate related data in the same table so you can fetch all of it in one Query.

javascript
// Users
{ PK: "USER#1001",     SK: "METADATA",       name: "Alice", email: "alice@example.com" }

// User's orders
{ PK: "USER#1001",     SK: "ORDER#ORD-001",  total: 49.99,  status: "shipped" }
{ PK: "USER#1001",     SK: "ORDER#ORD-002",  total: 129.00, status: "pending" }

// Order's items
{ PK: "ORDER#ORD-001", SK: "ITEM#prod-1",    qty: 2, price: 24.99 }
{ PK: "ORDER#ORD-001", SK: "ITEM#prod-2",    qty: 1, price: 0 }

// Products
{ PK: "PRODUCT#001",   SK: "METADATA",       name: "Widget", price: 24.99 }

// GSI for querying orders by status across all users:
// GSI PK: status | GSI SK: SK (orderId)
javascript
// One Query fetches a user and all their orders
await dynamoDB.query({
  TableName: "EcommerceTable",
  KeyConditionExpression: "PK = :pk AND begins_with(SK, :sk)",
  ExpressionAttributeValues: {
    ":pk": { S: "USER#1001" },
    ":sk": { S: "ORDER#" },
  },
}).promise();
// Returns: all ORDER# items for USER#1001

⚠ Plan access patterns first

Single-table design requires upfront planning around access patterns. Changing the key design later is painful: it requires migrating all data. List every access pattern your application needs before you design the table, not after.

Q42. How do DynamoDB transactions work?

DynamoDB supports ACID transactions across multiple items within a single account and region, through two operations.

  • TransactGetItems: atomically reads up to 100 items. All reads see a consistent snapshot.
  • TransactWriteItems: atomically writes up to 100 items across multiple tables. Either all succeed or all fail. Costs 2x the normal WCU.
javascript
// Transfer credits between users atomically
await dynamoDB.transactWrite({
  TransactItems: [
    {
      // Condition check: ensure sender has enough credits
      ConditionCheck: {
        TableName: "Wallets",
        Key: { userId: { S: "user-A" } },
        ConditionExpression: "credits >= :amount",
        ExpressionAttributeValues: { ":amount": { N: "100" } },
      },
    },
    {
      // Deduct from sender
      Update: {
        TableName: "Wallets",
        Key: { userId: { S: "user-A" } },
        UpdateExpression: "SET credits = credits - :amount",
        ExpressionAttributeValues: { ":amount": { N: "100" } },
      },
    },
    {
      // Add to receiver
      Update: {
        TableName: "Wallets",
        Key: { userId: { S: "user-B" } },
        UpdateExpression: "SET credits = credits + :amount",
        ExpressionAttributeValues: { ":amount": { N: "100" } },
      },
    },
    {
      // Record the transfer
      Put: {
        TableName: "TransferLog",
        Item: {
          transferId: { S: "TXN-001" },
          from: { S: "user-A" },
          to: { S: "user-B" },
          amount: { N: "100" },
          timestamp: { S: new Date().toISOString() },
        },
      },
    },
  ],
}).promise();
// If ANY condition fails or write fails, nothing is committed
  • Maximum 100 items per transaction
  • 4MB total size limit for all items in the transaction
  • Transactions cannot span AWS regions (Global Tables)
  • 2x WCU cost compared to non-transactional writes
  • Not supported through DAX (DAX serves eventually consistent reads)

Quick Reference: All 42 Questions at a Glance

Use this table to scan every question and its core concept in one pass. It is the fastest way to spot the topics you need to revisit before an interview.

#QuestionCore concept
Q1NoSQL vs relational databaseSchema flexibility, scaling, consistency tradeoffs
Q2Four NoSQL typesDocument, key-value, column-family, graph
Q3CAP theoremCP vs AP, partition tolerance is required
Q4BASE vs ACIDBasically Available, Soft state, Eventually consistent
Q5When to choose NoSQLVariable schema, scale, access pattern clarity
Q6Eventual consistency in practiceReplica lag, tunable per operation
Q7MongoDB data modelDocuments, collections, BSON, _id, ObjectId
Q8BSON vs JSONBinary format, additional types, speed
Q9MongoDB CRUD operationsinsertOne/Many, find, updateOne/Many, deleteOne/Many
Q10MongoDB index typesSingle, compound, unique, sparse, TTL, text, 2dsphere
Q11Aggregation pipeline$match, $lookup, $group, $sort, $project, $unwind
Q12Embed vs reference schema designAccess pattern decides, bounded vs unbounded arrays
Q13ShardingMongos, config servers, shard key design, scatter-gather
Q14Replica setsPrimary, secondary, arbiter, automatic failover
Q15ACID transactions in MongoDBMulti-document (v4.0+), session API, 60s timeout
Q16WiredTiger storage engineDocument-level locking, compression, MVCC, journaling
Q17TTL indexAuto-delete documents, 60s check interval
Q18Change StreamsReal-time change events, resume tokens, Kafka integration
Q19Slow query diagnosisexplain executionStats, COLLSCAN vs IXSCAN, profiler
Q20MongoDB update operators$set, $inc, $push, $addToSet, $pull, $unset, $min/$max
Q21Redis and its use casesCaching, sessions, pub/sub, leaderboards, queues, locks
Q22Redis data structuresString, Hash, List, Set, ZSet, Bitmap, HLL, Stream
Q23RDB vs AOF persistenceSnapshots vs write log, fsync policies, use both
Q24Redis eviction policiesallkeys-lru, volatile-lru, allkeys-lfu, volatile-ttl
Q25Redis Pub/Sub and limitationsFire-and-forget, no persistence, no consumer groups
Q26Redis pipeliningBatch commands, one round trip, not atomic
Q27Redis Cluster vs SentinelSharding plus HA vs HA only, hash slots, failover
Q28Distributed locks in RedisSET NX EX, Lua for atomic release, Redlock
Q29Rate limiting with RedisINCR plus EXPIRE, sliding window with ZSet
Q30Cache stampede preventionMutex lock, XFetch, stale-while-revalidate
Q31Redis vs MemcachedData types, persistence, pub/sub, cluster
Q32What is DynamoDBServerless, managed, auto-scaling, HTTP API
Q33Partition key vs sort keySimple PK vs composite PK, range queries in partition
Q34GSI vs LSIDifferent PK vs same PK, creation timing, consistency
Q35Query vs ScanTargeted (efficient) vs full table (expensive)
Q36RCU and WCU calculation4KB per read, 1KB per write, consistency multiplier
Q37Consistency modelsEventually consistent (0.5 RCU) vs strongly consistent (1 RCU)
Q38Hot partition problemKey sharding, suffix randomization, adaptive capacity
Q39DynamoDB StreamsItem-level changes, Lambda triggers, 24h retention
Q40DynamoDB Accelerator (DAX)Microsecond reads, write-through, API-compatible
Q41Single-table designMultiple entities one table, composite key patterns
Q42DynamoDB transactionsTransactWrite/Get, 100 items max, 2x WCU cost

💡 Five things to memorize before you walk in

The CAP theorem tradeoff and why every distributed system must be partition tolerant (Q3), the embed vs reference decision rule for MongoDB schema design (Q12), the Redis data structure each use case maps to, especially Sorted Sets for leaderboards and Streams for reliable delivery (Q22), the GSI vs LSI rule based on whether you query within or across entities (Q34), and the fact that Query (not Scan) should back every production DynamoDB read (Q35).

Frequently Asked Questions

What level of NoSQL knowledge do these 42 questions target?

This guide spans junior fundamentals through senior architecture. Questions 1 through 6 cover the theory every backend role expects: NoSQL types, CAP theorem, BASE vs ACID, and when NoSQL is the right call at all. Questions 7 through 42 go deep on MongoDB, Redis, and DynamoDB individually, the territory where mid-level and senior candidates are differentiated.

If you are early in your career, focus on Category 1 and the CRUD/data structure basics in Categories 2 and 3 first. If you are interviewing for a senior or staff backend or cloud role, the schema design questions (Q12, Q41), scaling questions (Q13, Q38), and operational questions (Q19, Q24, Q30) are where most of your prep time should go.

How do MongoDB, Redis, and DynamoDB compare, and how do I know which one an interviewer expects me to know?
MongoDBRedisDynamoDB
TypeDocument storeIn-memory key-value / data structure storeManaged key-value / wide-column
Primary strengthFlexible schema, rich queries, aggregationMicrosecond latency, rich data structuresServerless scale, predictable performance
Typical rolePrimary application databaseCache, queue, session store, rate limiterPrimary database for AWS-native and serverless apps
Scaling modelSharding (horizontal)Cluster (hash slots) or Sentinel (HA)Automatic, fully managed

Look at the job description for which AWS services, frameworks, or stack the company uses. A team running on AWS Lambda and API Gateway almost certainly expects DynamoDB fluency. A team with a traditional Node.js or Python backend more likely expects MongoDB as the primary store, often with Redis in front of it as a cache. Most senior backend roles expect working knowledge of at least two of the three, so do not skip a database entirely just because the job title does not mention it.

How do interviewers actually test CAP theorem and eventual consistency knowledge beyond asking for definitions?
  1. Definition check - state what Consistency, Availability, and Partition Tolerance each mean (Q3).
  2. The real tradeoff - explain why partition tolerance is non-negotiable in any distributed system, so the actual choice is between C and A (Q3).
  3. Classify a real database - given MongoDB, DynamoDB, or Cassandra, say whether it leans CP or AP and why (Q3).
  4. Eventual consistency scenario - walk through what a user sees if they read immediately after a write to a replica that has not caught up yet (Q6).
  5. The fix - name the mechanism for reading your own writes: strongly consistent reads in DynamoDB, or readConcern: "majority" in MongoDB (Q6).

ℹ Info

The strongest answers note that most modern databases are tunable rather than purely CP or AP: MongoDB and DynamoDB both let you choose the consistency level per operation. Saying this out loud signals you understand the theory is a spectrum, not a label.

How can I practice these MongoDB, Redis, and DynamoDB commands before an interview?

Run all three locally with Docker so you can type the exact commands from this guide and see real output:

bash
# MongoDB
docker run -d -p 27017:27017 --name mongo mongo:7

# Redis
docker run -d -p 6379:6379 --name redis redis:7

# DynamoDB Local
docker run -d -p 8000:8000 amazon/dynamodb-local

💡 Tip

Connect with mongosh for MongoDB, redis-cli for Redis, and the AWS CLI with --endpoint-url http://localhost:8000 for DynamoDB Local. Reproducing the aggregation pipeline (Q11), the ZSET leaderboard commands (Q22), and the single-table query (Q41) yourself builds the muscle memory that makes interview answers sound rehearsed in a good way.

Do I need to know single-table design for every DynamoDB interview?

Not always, but you should be able to explain it and why it exists (Q41). Some teams use DynamoDB with simple, multi-table designs for low-complexity workloads, and that is a legitimate choice an interviewer may want you to recognize as valid for smaller systems.

  • Know the why - DynamoDB has no joins, so co-locating related items lets one Query replace what would be multiple round trips.
  • Know the cost - single-table design requires defining every access pattern upfront, and changing the key schema later means migrating all data.
  • Know when to skip it - small services with few entity types and low request volume may be simpler and cheaper to reason about with separate tables.
What follow-up questions tend to come after these 42, once I have answered the basics?
  • Migration scenarios - "How would you migrate this collection from embedded documents to references without downtime?" builds on Q12.
  • Capacity planning - "Walk me through provisioning a DynamoDB table for this traffic pattern" builds on Q36 and Q38.
  • Failure scenarios - "What happens to in-flight writes if the MongoDB primary fails mid-transaction?" builds on Q14 and Q15.
  • Cache invalidation - "How do you keep a Redis cache in sync with MongoDB or DynamoDB writes?" builds on Q21, Q30, and Q39 (DynamoDB Streams as the invalidation trigger).
  • System design tie-in - expect these concepts to resurface inside a larger system design question, such as designing a leaderboard, a session store, or a multi-tenant SaaS data layer.

Related Articles

nodejs

30 NestJS Interview Questions and Answers (2026)

30 NestJS interview questions with full answers: modules, DI, guards, pipes, interceptors, JWT auth, microservices, and testing. Updated for 2026.

Jun 8, 2026·24 min read
nodejs

30 Node.js Interview Questions and Answers (2026)

30 Node.js interview questions with full answers: event loop, streams, clustering, worker threads, memory leaks, and security. Updated for 2026.

Jun 8, 2026·26 min read
databases

Drizzle ORM Migrations: A Practical drizzle-kit Guide

Learn the full Drizzle ORM migration workflow: push vs migrate, drizzle-kit setup, Turso/libSQL config, team conflicts, and production best practices.

May 30, 2026·9 min read

On this page

  • Category 1: Core NoSQL Concepts (Q1-Q6)
  • Q1. What is NoSQL and how does it differ from a relational database?
  • Q2. What are the four main types of NoSQL databases?
  • Q3. What is the CAP theorem and how does it apply to NoSQL databases?
  • Q4. What is BASE and how does it differ from ACID?
  • Q5. When should you choose NoSQL over a relational database?
  • Q6. What is eventual consistency and what does it mean in practice?
  • Category 2: MongoDB (Q7-Q20)
  • Q7. What is MongoDB? Explain its core data model.
  • Q8. What is BSON and why does MongoDB use it instead of plain JSON?
  • Q9. How do you perform CRUD operations in MongoDB?
  • Q10. How does indexing work in MongoDB and what index types exist?
  • Q11. Explain the MongoDB aggregation pipeline.
  • Q12. What is the difference between embedded documents and references in MongoDB schema design?
  • Q13. What is sharding in MongoDB and how does it work?
  • Q14. What is a replica set in MongoDB and what roles do nodes play?
  • Q15. Does MongoDB support ACID transactions? How do they work?
  • Q16. How does the WiredTiger storage engine work?
  • Q17. What is a TTL index and when do you use it?
  • Q18. What is a Change Stream in MongoDB?
  • Q19. How do you diagnose and optimize a slow MongoDB query?
  • Q20. What update operators does MongoDB support?
  • Category 3: Redis (Q21-Q31)
  • Q21. What is Redis and what are its primary use cases?
  • Q22. What are Redis data structures and what is each used for?
  • Q23. What are the Redis persistence options and how do you choose?
  • Q24. What are Redis eviction policies and how do you choose?
  • Q25. What is Redis Pub/Sub and what are its limitations?
  • Q26. What is Redis pipelining and when do you use it?
  • Q27. What is Redis Cluster and how does it differ from Redis Sentinel?
  • Q28. What is a distributed lock in Redis and how do you implement one?
  • Q29. How do you implement rate limiting with Redis?
  • Q30. What is the cache stampede problem and how do you prevent it?
  • Q31. What is the difference between Redis and Memcached?
  • Category 4: Amazon DynamoDB (Q32-Q42)
  • Q32. What is Amazon DynamoDB and what makes it different from other databases?
  • Q33. What is the difference between a partition key and a sort key?
  • Q34. What is the difference between a GSI and an LSI?
  • Q35. What is the difference between Query and Scan in DynamoDB?
  • Q36. How do you calculate RCUs and WCUs?
  • Q37. What are DynamoDB consistency models?
  • Q38. What is the hot partition problem and how do you prevent it?
  • Q39. What are DynamoDB Streams and what are common use cases?
  • Q40. What is DynamoDB Accelerator (DAX)?
  • Q41. What is single-table design in DynamoDB and why is it recommended?
  • Q42. How do DynamoDB transactions work?
  • Quick Reference: All 42 Questions at a Glance
  • Frequently Asked Questions