42 NoSQL Database Interview Questions and Answers (2026)
42 NoSQL interview questions covering MongoDB, Redis, and DynamoDB: aggregation pipelines, data structures, GSI vs LSI, and CAP theorem. Updated for 2026.
On this page
NoSQL fluency is now a baseline expectation for backend and cloud roles, not a specialty. MongoDB powers document storage for thousands of production apps, Redis sits in front of almost every API as a cache or queue, and DynamoDB is the default data layer for serverless architectures on AWS. Most senior backend interviews expect working knowledge of at least two of these three.
These 42 questions cover what interviewers actually ask in 2026: foundational NoSQL theory, then database-specific internals, query patterns, and operational tradeoffs. If you're working with MongoDB through an ORM layer in NestJS, our NestJS interview questions guide covers the Mongoose side of that stack. And if this is part of a broader interview prep pass, our Node.js interview questions series pairs naturally with this one since most NoSQL drivers run on top of Node.
Coming from a relational background? It helps to see schema design from the other side first. Our guide to Drizzle ORM migrations shows how rigid, versioned schema changes work in SQL, which is the exact tradeoff NoSQL databases are designed to avoid.
Category 1: Core NoSQL Concepts (Q1-Q6)
These questions establish whether you understand why NoSQL databases exist and what tradeoffs they make versus relational systems. Almost every NoSQL interview opens here, regardless of which specific database the role uses.
Q1. What is NoSQL and how does it differ from a relational database?
NoSQL (Not Only SQL) is a class of database systems that store and retrieve data using models other than the traditional relational table model. The name reflects that SQL is not the primary or only query mechanism, not that SQL is absent entirely.
Key differences from relational databases:
- Schema: Relational databases enforce a fixed schema. Every row in a table has the same columns and data types. NoSQL databases are typically schema-flexible: documents in the same collection can have different fields.
- Scaling: Relational databases scale vertically (bigger server) by default. NoSQL databases are designed to scale horizontally (more servers, data distributed across nodes).
- Data model: Relational data is normalized into tables with foreign keys. NoSQL data is often denormalized: related data is stored together to minimize the number of reads required.
- Transactions: Relational databases have had full ACID transactions for decades. Most NoSQL systems historically offered weaker consistency guarantees, though MongoDB now supports multi-document ACID transactions and DynamoDB supports transactions across items.
Use-case fit:
- Relational: financial systems, ERP, reporting, any domain with complex relationships, strict integrity requirements, and predictable query patterns.
- NoSQL: user profiles, product catalogs, session storage, real-time feeds, IoT data, content management, anything with variable schemas or extreme scale requirements.
Q2. What are the four main types of NoSQL databases?
- Document stores: store data as JSON-like documents. Each document is self-contained and can have nested arrays and objects, with no fixed schema. Best for content management, user profiles, and product catalogs. Examples: MongoDB, CouchDB, Amazon DocumentDB.
- Key-value stores: the simplest model. Each value is stored under a unique key, and the database knows nothing about the value's structure. Extremely fast for reads and writes. Best for session storage, caching, and real-time leaderboards. Examples: Redis, Amazon DynamoDB (can function as key-value), Memcached.
- Column-family stores (wide-column): organize data by columns rather than rows. Rows can have different sets of columns, which makes them efficient for analytical workloads that read specific columns across many rows. Best for time-series data, analytics, IoT telemetry, and write-heavy workloads. Examples: Apache Cassandra, Amazon DynamoDB, HBase.
- Graph databases: store data as nodes and edges representing entities and relationships, optimized for traversing connected data. Best for social networks, recommendation engines, fraud detection, and knowledge graphs. Examples: Neo4j, Amazon Neptune, ArangoDB.
Q3. What is the CAP theorem and how does it apply to NoSQL databases?
The CAP theorem states that a distributed database can guarantee at most two of the following three properties simultaneously.
- Consistency (C): every read receives the most recent write or an error. All nodes see the same data at the same time.
- Availability (A): every request receives a response (not necessarily the most recent data). The system stays operational even if some nodes fail.
- Partition Tolerance (P): the system continues to operate when network partitions occur (nodes cannot communicate with each other).
In any real distributed system, network partitions are unavoidable, so every system must be partition tolerant. The real choice is between C and A.
| System type | Behavior | Examples |
|---|---|---|
| CP (Consistency + Partition Tolerance) | Returns an error or timeout if data cannot be confirmed as consistent. Good for financial systems. | HBase, MongoDB (replica set default mode), Zookeeper |
| AP (Availability + Partition Tolerance) | Always returns a response, possibly stale. Good for systems where availability matters more than perfect freshness. | DynamoDB (eventual consistency mode), Cassandra, CouchDB |
| CA (Consistency + Availability) | Only possible without partitions, meaning a single-node system. | Traditional RDBMS running on one server |
Q4. What is BASE and how does it differ from ACID?
ACID (Atomicity, Consistency, Isolation, Durability) is the property set for traditional relational database transactions. It prioritizes correctness.
BASE is the competing model used by many NoSQL systems.
- Basically Available: the system guarantees availability but may return stale or partially consistent data. The system responds even under partial failure.
- Soft State: the state of the system may change over time even without new input, as nodes catch up to each other through eventual consistency.
- Eventually Consistent: given enough time and no new updates, all nodes will converge to the same value. Reads after a write may return the old value for a period.
Practical example: if you post a like on a social media platform and a friend refreshes their feed half a second later, they might not see it yet. That is eventual consistency. The system is highly available (it did not fail) but not immediately consistent.
ACID is right for banking, payments, inventory systems, anything where reading stale data causes real harm. BASE is right for social feeds, analytics, recommendation engines, any system where brief staleness is acceptable and scale is paramount.
Q5. When should you choose NoSQL over a relational database?
Choose NoSQL when one or more of these conditions apply.
- The schema is genuinely variable or evolving rapidly: if different records have different fields and you do not want to run
ALTER TABLEmigrations for every feature change, a document store fits naturally. - You need horizontal scale beyond what a single server can provide: if you expect hundreds of millions of records or millions of writes per second, NoSQL sharding (MongoDB, Cassandra) or managed auto-scaling (DynamoDB) are designed for this. Relational databases can scale, but it's harder and more expensive.
- Your data is a natural hierarchy: user documents with embedded addresses, preferences, and activity history map cleanly to a document store. Normalizing this into 8 relational tables adds join complexity without benefit.
- You need sub-millisecond access to frequently read data: use Redis as a cache layer in front of any database.
- Your access patterns are known and simple: DynamoDB's single-table design is extremely fast when you know your access patterns upfront and design keys accordingly.
Q6. What is eventual consistency and what does it mean in practice?
Eventual consistency means that if no new updates are made to a piece of data, all replicas of that data will eventually converge to the same value. There is no guarantee on how long convergence takes: it could be milliseconds or seconds.
In practice, for an AP system like DynamoDB in eventual consistency mode:
- You write a user's email update to the primary node.
- The write is acknowledged to your application immediately.
- Replica nodes receive the update asynchronously, typically within milliseconds, but not guaranteed.
- A read from a replica in the next few milliseconds may return the old email.
- After replication completes, all replicas return the new email.
For most web applications, this is fine. Users do not notice a 100ms delay before their profile update is visible everywhere.
For systems where it matters (reading your own writes, financial balances), use strongly consistent reads. DynamoDB offers this at 2x the read cost. MongoDB's read concern "majority" ensures you only read data confirmed by a majority of replica set members.
Category 2: MongoDB (Q7-Q20)
MongoDB questions test both conceptual understanding (BSON, schema design, sharding) and hands-on fluency with its query language and aggregation pipeline. Expect a mix of whiteboard explanation and live query writing.
Q7. What is MongoDB? Explain its core data model.
MongoDB is an open-source, document-oriented NoSQL database. Data is stored as BSON (Binary JSON) documents in collections. A collection is analogous to a SQL table. A document is analogous to a row, but each document can have a completely different structure from other documents in the same collection.
Core concepts:
- Database: a container for collections. One MongoDB instance can have many databases.
- Collection: a group of documents. No fixed schema is enforced unless you use schema validation.
- Document: a BSON object with field-value pairs. Supports nested documents and arrays. Maximum document size is 16MB.
- `_id`: every document has a unique
_idfield. If you do not provide one, MongoDB generates anObjectIdautomatically. AnObjectIdis a 12-byte value encoding a timestamp, machine identifier, and counter, making it sortable by creation time.
// Example document in a 'products' collection
{
_id: ObjectId("507f1f77bcf86cd799439011"),
name: "Wireless Keyboard",
brand: "Logitech",
price: 79.99,
tags: ["electronics", "peripherals", "wireless"],
specs: {
weight: "450g",
connectivity: "Bluetooth 5.0",
batteryLife: "24 months"
},
variants: [
{ color: "black", sku: "MK-BLK-US" },
{ color: "white", sku: "MK-WHT-US" }
],
inStock: true,
createdAt: ISODate("2026-03-15T10:30:00Z")
}Q8. What is BSON and why does MongoDB use it instead of plain JSON?
BSON (Binary JSON) is a binary-encoded serialization of JSON-like documents. MongoDB stores and transmits data as BSON rather than text JSON for several reasons.
- Speed: BSON is faster to parse than text JSON because field names and lengths are encoded in binary, allowing the parser to jump to specific fields without scanning the entire document.
- Additional types: plain JSON supports only string, number, boolean, null, array, and object. BSON adds Date (stored as int64 milliseconds), ObjectId, Binary data (for file chunks), Int32, Int64, Decimal128, Regular expression, and Timestamp. These types are critical for database operations.
- Traversability: BSON documents include length prefixes that allow the database to skip over fields efficiently without deserializing them.
- Size: BSON can be slightly larger than JSON for small documents but is more efficient for large documents with many fields.
In your application code, you work with JSON or your language's native objects. The MongoDB driver handles the JSON-to-BSON conversion automatically.
Q9. How do you perform CRUD operations in MongoDB?
MongoDB uses a rich query language expressed as JSON-like objects.
// INSERT
// Insert one document
db.users.insertOne({
name: "Alice Chen",
email: "alice@example.com",
age: 28,
roles: ["user", "editor"]
});
// Insert many documents
db.users.insertMany([
{ name: "Bob Smith", email: "bob@example.com", age: 34 },
{ name: "Carol Lee", email: "carol@example.com", age: 25 }
]);
// READ
// Find all documents
db.users.find({});
// Find with filter, projection (include only specific fields)
db.users.find(
{ age: { $gte: 25 } }, // filter: age >= 25
{ name: 1, email: 1, _id: 0 } // projection: include name, email; exclude _id
);
// Find one document
db.users.findOne({ email: "alice@example.com" });
// UPDATE
// Update one document
db.users.updateOne(
{ email: "alice@example.com" }, // filter
{ $set: { age: 29 }, $addToSet: { roles: "admin" } } // update operators
);
// Update many
db.users.updateMany(
{ age: { $lt: 18 } },
{ $set: { status: "minor" } }
);
// Upsert: insert if not found, update if found
db.users.updateOne(
{ email: "new@example.com" },
{ $set: { name: "New User", age: 22 } },
{ upsert: true }
);
// DELETE
db.users.deleteOne({ email: "bob@example.com" });
db.users.deleteMany({ status: "inactive", lastLogin: { $lt: new Date("2024-01-01") } });Common query operators:
$eq,$ne,$gt,$gte,$lt,$lte: comparison$in,$nin: value in / not in array$and,$or,$not,$nor: logical operators$exists: field exists check$regex: pattern matching$elemMatch: match array element conditions
Q10. How does indexing work in MongoDB and what index types exist?
MongoDB supports several index types, all built on B-Tree structures unless noted otherwise.
Single field index, the most common, created on one field:
db.orders.createIndex({ userId: 1 }); // ascending
db.orders.createIndex({ createdAt: -1 }); // descending (for sort queries)Compound index, covers multiple fields. Column order matters (left-prefix rule, same as SQL). Most queries filter on the first field.
db.orders.createIndex({ userId: 1, status: 1, createdAt: -1 });
// Supports queries on: userId | userId+status | userId+status+createdAtUnique index, enforces uniqueness. The primary key (_id) has a built-in unique index. Create one on email, username, etc.
db.users.createIndex({ email: 1 }, { unique: true });Sparse index, only indexes documents that have the field. Saves space for optional fields that many documents omit.
db.users.createIndex({ phoneNumber: 1 }, { sparse: true });TTL index (Time-To-Live), automatically removes documents after a specified time. Used for session data, temporary caches, and log retention.
db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 86400 });
// Documents are deleted 24 hours after their createdAt timestampText index, supports full-text search across string fields.
db.articles.createIndex({ title: "text", body: "text" });
db.articles.find({ $text: { $search: "mongodb performance" } });Geospatial index (2dsphere), for geographic queries (near, within polygon).
db.locations.createIndex({ coordinates: "2dsphere" });
db.locations.find({
coordinates: {
$near: { $geometry: { type: "Point", coordinates: [-73.9857, 40.7484] },
$maxDistance: 1000 } // within 1km
}
});Check index usage and efficiency with explain().
db.orders.find({ userId: "abc123" }).explain("executionStats");
// Look for: IXSCAN (good) vs COLLSCAN (bad, full collection scan)Q11. Explain the MongoDB aggregation pipeline.
The aggregation pipeline processes documents through a sequence of stages. Each stage transforms the documents and passes them to the next stage. It is the preferred way to do data transformation, grouping, and analytics in MongoDB, replacing the older Map-Reduce approach.
Core pipeline stages:
db.orders.aggregate([
// $match: filter documents (like WHERE in SQL), put early to reduce work
{ $match: {
status: "completed",
createdAt: { $gte: ISODate("2026-01-01") }
}},
// $lookup: left join with another collection
{ $lookup: {
from: "users",
localField: "userId",
foreignField: "_id",
as: "user"
}},
// $unwind: deconstruct an array field into individual documents
{ $unwind: "$user" },
// $group: aggregate documents (like GROUP BY in SQL)
{ $group: {
_id: "$user.country", // group by country
totalRevenue: { $sum: "$total" }, // sum order totals
orderCount: { $sum: 1 }, // count orders
avgOrderValue: { $avg: "$total" } // average order value
}},
// $addFields / $project: add or reshape fields
{ $addFields: {
revenuePerOrder: { $divide: ["$totalRevenue", "$orderCount"] }
}},
// $sort: sort results
{ $sort: { totalRevenue: -1 } },
// $limit: limit output count
{ $limit: 10 },
// $skip: skip N results (for pagination)
// { $skip: 20 },
// $out: write results to a new collection
// { $out: "country_revenue_report" }
]);Other useful stages:
$count: count documents$facet: run multiple sub-pipelines on the same data simultaneously$bucket/$bucketAuto: group into buckets/ranges (histograms)$replaceRoot: replace the root document with a nested field$merge: write results back to a collection (incremental materialized views)
Q12. What is the difference between embedded documents and references in MongoDB schema design?
This is the most fundamental MongoDB schema design question. There is no single right answer, it depends on your access patterns.
Embedding (denormalization) stores related data inside the parent document.
// User document with embedded addresses
{
_id: ObjectId("..."),
name: "Alice",
addresses: [
{ type: "home", city: "New York", zip: "10001" },
{ type: "billing", city: "Boston", zip: "02134" }
]
}Use embedding when:
- The embedded data is always read together with the parent (avoids extra queries)
- The embedded data is not shared between multiple documents
- The embedded array has a bounded size (not growing unboundedly)
- You need atomic updates across parent and child in one write
Referencing (normalization) stores a reference (ObjectId) to another collection, then uses $lookup to join.
// Order references user by ID
{ _id: ObjectId("..."), userId: ObjectId("..."), total: 49.99 }
// Lookup user when needed
db.orders.aggregate([
{ $lookup: { from: "users", localField: "userId", foreignField: "_id", as: "user" } }
]);Use referencing when:
- The related data is large and not always needed
- The related document is shared by many parents (many-to-many)
- The child data grows unboundedly (comments, events, log entries)
- You need to update the child data independently and reflect changes everywhere
Q13. What is sharding in MongoDB and how does it work?
Sharding is MongoDB's horizontal scaling mechanism. It distributes data across multiple servers (shards). Each shard holds a subset of the data. Together, all shards contain the full dataset.
Architecture:
- Shards: each shard is a replica set holding a portion of the data
- Mongos (query router): receives application queries, determines which shard(s) hold the relevant data, and routes accordingly
- Config servers: store cluster metadata, including which shard holds which data ranges
Shard key: a field (or compound fields) you choose to partition data by. MongoDB hashes or ranges the shard key values to distribute data.
// Enable sharding on a database
sh.enableSharding("ecommerce");
// Shard a collection by a hashed key (even distribution)
sh.shardCollection("ecommerce.orders", { userId: "hashed" });
// Shard by range (good for time-series: sequential writes to one shard, but
// allows efficient range queries by date)
sh.shardCollection("ecommerce.events", { timestamp: 1 });Shard key selection is critical:
- High cardinality: many distinct values to spread data evenly
- High write distribution: avoid "hotspot" shards that receive most writes
- Query alignment: include the shard key in most queries so
mongoscan route to a single shard instead of broadcasting to all shards
Q14. What is a replica set in MongoDB and what roles do nodes play?
A replica set is a group of MongoDB instances (typically 3 or more) that maintain the same dataset. It provides high availability and data redundancy.
Node roles:
- Primary: receives all write operations. There is exactly one primary at any time. Replicates writes to secondaries via the oplog (operations log).
- Secondary: maintains a copy of the primary's data by continuously applying operations from the primary's oplog. Can serve read operations if configured to do so. Participates in elections.
- Arbiter (optional): holds no data. Participates in elections to break ties. Used to achieve an odd number of voting members without storing an extra copy of data.
Automatic failover: if the primary becomes unavailable, the remaining nodes hold an election. The secondary with the most up-to-date oplog wins and becomes the new primary. This typically completes in under 12 seconds.
// Connect to a replica set from application
const client = new MongoClient(
"mongodb://node1:27017,node2:27017,node3:27017/mydb?replicaSet=rs0",
{ readPreference: "secondaryPreferred" } // reads go to secondaries when possible
);
// Read preferences:
// primary (default): always read from primary (strongly consistent)
// primaryPreferred: primary if available, else secondary
// secondary: always secondary (may be stale)
// secondaryPreferred: secondary if available, else primary
// nearest: lowest network latencyQ15. Does MongoDB support ACID transactions? How do they work?
Yes. MongoDB has supported ACID transactions since version 4.0 (for replica sets) and 4.2 (for sharded clusters). Before 4.0, atomicity was guaranteed only within a single document.
const session = client.startSession();
try {
session.startTransaction({
readConcern: { level: "snapshot" }, // reads see a consistent snapshot
writeConcern: { w: "majority" } // write confirmed by majority of nodes
});
// Both operations succeed or both are rolled back
await ordersCollection.insertOne(
{ userId, items, total: 149.99, status: "pending" },
{ session }
);
await inventoryCollection.updateMany(
{ _id: { $in: itemIds } },
{ $inc: { stock: -1 } },
{ session }
);
await session.commitTransaction();
console.log("Transaction committed");
} catch (err) {
await session.abortTransaction();
console.error("Transaction aborted:", err.message);
} finally {
await session.endSession();
}Q16. How does the WiredTiger storage engine work?
WiredTiger has been MongoDB's default storage engine since version 3.2. It replaced MMAPv1 and brought significant improvements.
- Document-level concurrency: WiredTiger uses optimistic concurrency control at the document level. Multiple writers can modify different documents in the same collection simultaneously without blocking each other. MMAPv1 only had collection-level locking, creating a bottleneck.
- Compression: WiredTiger compresses data at rest using snappy (default) or zlib. Index data is also compressed, reducing disk I/O and storage costs significantly.
- Write-ahead log (journal): all writes go to the journal first. If MongoDB crashes, the journal is replayed on startup to recover data written since the last checkpoint. Checkpoints occur every 60 seconds by default.
- Cache: WiredTiger maintains an in-memory cache (default: 50% of RAM minus 1GB, minimum 256MB). Frequently accessed data stays in cache. Eviction runs in the background to keep cache usage within bounds.
- MVCC (Multi-Version Concurrency Control): WiredTiger uses MVCC for reads, similar to PostgreSQL. Readers see a consistent snapshot without blocking writers.
Q17. What is a TTL index and when do you use it?
A TTL (Time-To-Live) index tells MongoDB to automatically delete documents after a specified duration. It is a single-field index on a Date field.
// Automatically delete sessions 7 days after creation
db.sessions.createIndex(
{ createdAt: 1 },
{ expireAfterSeconds: 604800 } // 7 days in seconds
);
// Session document
db.sessions.insertOne({
sessionId: "abc123",
userId: "user456",
data: { cart: [], preferences: {} },
createdAt: new Date() // TTL index field
});
// You can also set a specific expiry time per document
db.notifications.createIndex(
{ expiresAt: 1 },
{ expireAfterSeconds: 0 } // document expires exactly at the date stored in expiresAt
);
db.notifications.insertOne({
message: "Your trial expires soon",
userId: "user789",
expiresAt: new Date("2026-12-31T23:59:59Z") // this specific document expires here
});MongoDB runs a background task every 60 seconds to delete expired documents. Deletion is not instantaneous: documents may persist for up to 60 seconds after expiry. This is acceptable for session cleanup, cache expiration, and temporary data, but not for exact-time deletion requirements.
Q18. What is a Change Stream in MongoDB?
A Change Stream is a real-time stream of change events (insertions, updates, deletions, collection drops) on a collection, database, or entire cluster. Change Streams use MongoDB's replication oplog under the hood and require a replica set or sharded cluster.
// Watch a collection for all changes
const changeStream = db.collection("orders").watch();
changeStream.on("change", (event) => {
console.log("Change type:", event.operationType); // insert, update, delete, replace
console.log("Document key:", event.documentKey._id);
console.log("Full document:", event.fullDocument); // only for insert by default
console.log("Update description:", event.updateDescription); // for updates
});
// Filter to only watch for new completed orders
const pipeline = [
{ $match: {
operationType: "update",
"updateDescription.updatedFields.status": "completed"
}}
];
const filteredStream = db.collection("orders").watch(pipeline, {
fullDocument: "updateLookup" // include full document on update events
});
// Resume a stream after reconnection using a resume token
// Every change event includes a _id (resume token)
const resumeToken = lastProcessedEvent._id;
const resumedStream = collection.watch([], { resumeAfter: resumeToken });Use cases: invalidating application caches when data changes, powering live dashboards, triggering notifications, feeding Kafka or other event streams, and building audit logs.
Q19. How do you diagnose and optimize a slow MongoDB query?
Step 1: use explain() to inspect the query plan.
// Run explain with executionStats for real numbers
db.orders.find({ userId: "abc123", status: "pending" })
.sort({ createdAt: -1 })
.explain("executionStats");
// Key things to look for in the output:
// winningPlan.stage: "COLLSCAN" = bad (no index), "IXSCAN" = good (using index)
// executionStats.totalDocsExamined vs nReturned:
// If examined is much greater than returned, the index is not selective enough
// executionStats.executionTimeMillis: how long the query tookStep 2: identify the issue and fix it.
- COLLSCAN on a frequently queried field: add an index (see code below)
- Too many documents examined: make the index more selective by adding more fields to a compound index
- Slow sort without index: add the sort field to the index (sort direction must match)
- Large document projection: use projection to return only needed fields (see code below)
// Fix a COLLSCAN by adding a compound index
db.orders.createIndex({ userId: 1, status: 1, createdAt: -1 });
// Use projection to return only needed fields
db.orders.find({ userId: "abc123" }, { _id: 1, total: 1, status: 1 });
// Returns only 3 fields instead of the entire documentStep 3: monitor in production with MongoDB Atlas Performance Advisor or the database profiler.
// Enable profiler to log slow queries (over 100ms)
db.setProfilingLevel(1, { slowms: 100 });
// View profiled queries
db.system.profile.find().sort({ ts: -1 }).limit(10);Q20. What update operators does MongoDB support?
// $set: set field values
db.users.updateOne({ _id: id }, { $set: { name: "Updated Name", age: 30 } });
// $unset: remove a field
db.users.updateOne({ _id: id }, { $unset: { temporaryField: "" } });
// $inc: increment a numeric field
db.products.updateOne({ _id: id }, { $inc: { viewCount: 1, stock: -2 } });
// $push: append to an array
db.posts.updateOne({ _id: id }, { $push: { comments: { text: "Great post!", userId: "u1" } } });
// $addToSet: append to array only if value is not already present
db.users.updateOne({ _id: id }, { $addToSet: { tags: "verified" } });
// $pull: remove matching elements from array
db.users.updateOne({ _id: id }, { $pull: { tags: "spam" } });
// $pop: remove first (-1) or last (1) element of array
db.lists.updateOne({ _id: id }, { $pop: { items: 1 } }); // remove last
// $rename: rename a field
db.users.updateMany({}, { $rename: { "fullName": "name" } });
// $currentDate: set field to current date
db.orders.updateOne({ _id: id }, { $currentDate: { updatedAt: true } });
// $min / $max: update only if new value is lower/higher than current
db.scores.updateOne({ userId: id }, { $max: { highScore: 9500 } });
// Only updates if 9500 is greater than the current highScoreCategory 3: Redis (Q21-Q31)
Redis questions probe two things: whether you know its data structures well enough to pick the right one for a given problem, and whether you understand its operational tradeoffs around persistence, eviction, and clustering. Expect rapid-fire "what would you use for X" scenarios.
Q21. What is Redis and what are its primary use cases?
Redis (Remote Dictionary Server) is an open-source, in-memory data structure store. All data lives in RAM, which is why reads and writes take microseconds. It supports persistence, replication, clustering, and over a dozen data structure types.
Primary use cases:
- Caching: store expensive database query results, API responses, or computed values in Redis with a TTL. Subsequent requests hit Redis instead of the database, reducing latency from milliseconds to microseconds.
- Session storage: store user session data with automatic expiry. Scales horizontally across multiple application servers without sticky sessions.
- Rate limiting: use Redis atomic increment operations to count requests per IP or user within a time window.
- Pub/Sub messaging: publish messages to channels so all subscribers receive them in real time. Used for live notifications, chat, and event broadcasting.
- Leaderboards and counters: Redis Sorted Sets are perfect for real-time rankings with O(log n) insertion and range queries.
- Job queues: use Redis Lists as FIFO queues. Libraries like BullMQ (Node.js) and Celery (Python) use Redis as their queue backend.
- Distributed locks: implement mutex locks across multiple application servers using atomic Redis operations (
SETNXor the Redlock algorithm).
Q22. What are Redis data structures and what is each used for?
Redis is not just a key-value store. The "value" can be one of many rich data types.
String: the most basic type. Can store text, serialized JSON, integers, or binary data. Integers can be incremented atomically.
SET user:1001:name "Alice Chen"
GET user:1001:name
SET page:views 0
INCR page:views # atomic increment, becomes 1
INCRBY page:views 10 # becomes 11
EXPIRE user:1001:name 3600 # expire in 1 hourHash: a map of field-value pairs inside a single key. Best for storing objects (user profiles, product details). More memory-efficient than separate string keys.
HSET user:1001 name "Alice" email "alice@example.com" age 28
HGET user:1001 email # "alice@example.com"
HGETALL user:1001 # all fields and values
HINCRBY user:1001 loginCount 1 # increment a numeric hash fieldList: an ordered collection of strings (doubly linked list). Supports push/pop from both ends. Used for queues, timelines, activity feeds.
RPUSH jobs:queue "job:1" "job:2" "job:3" # push to right (tail)
LPOP jobs:queue # pop from left (head), FIFO queue
LRANGE timeline:user:1001 0 9 # get first 10 items
LLEN jobs:queue # list lengthSet: unordered collection of unique strings. Supports set operations (union, intersection, difference). Used for tags, followers, unique visitors.
SADD article:1:tags "mongodb" "nosql" "database"
SMEMBERS article:1:tags
SISMEMBER article:1:tags "nosql" # 1 (exists) or 0 (not exists)
SINTER user:1:follows user:2:follows # mutual follows
SUNIONSTORE result:tags tag:A tag:B # store union of two setsSorted Set (ZSet): like a set but every member has a score (float). Members are sorted by score, with O(log n) operations. The go-to structure for leaderboards and ranking systems.
ZADD leaderboard 9500 "alice" 8750 "bob" 9200 "carol"
ZRANK leaderboard "alice" # 0 (top rank, sorted ascending)
ZREVRANK leaderboard "alice" # 2 (sorted descending, rank in top-N terms)
ZRANGE leaderboard 0 2 REV WITHSCORES # top 3 with scores
ZINCRBY leaderboard 250 "bob" # add 250 to bob's scoreBitmap: treat a string as a bit array. Extremely memory-efficient for boolean tracking (daily active users, feature flags per user ID).
SETBIT users:active:2026-06-08 1001 1 # user 1001 was active today
GETBIT users:active:2026-06-08 1001 # 1
BITCOUNT users:active:2026-06-08 # total active users todayHyperLogLog: a probabilistic data structure for cardinality estimation. Uses about 12KB of memory regardless of the number of unique elements, with roughly a 0.81% error rate.
PFADD page:visitors:today user1 user2 user3 user4
PFCOUNT page:visitors:today # estimated unique visitorsStreams: an append-only log of messages. Persistent, with consumer group support. More powerful than Pub/Sub for reliable message delivery.
XADD events:orders * userId 1001 total 49.99 status pending
XREAD COUNT 10 STREAMS events:orders 0Q23. What are the Redis persistence options and how do you choose?
Redis supports two persistence mechanisms: RDB and AOF. You can use one, both, or neither (pure in-memory, data lost on restart).
RDB (Redis Database Backup / Snapshots):
- Periodically saves a point-in-time snapshot of the entire dataset to an
.rdbfile on disk - Configured by save rules:
save 900 1saves if at least 1 key changed in 900 seconds - A fork process handles the snapshot; the main process continues serving traffic
- Fast restarts: loading an RDB file is faster than replaying an AOF log
- Risk: data written since the last snapshot is lost on crash
# redis.conf
save 900 1 # save if 1 change in 15 minutes
save 300 10 # save if 10 changes in 5 minutes
save 60 10000 # save if 10000 changes in 1 minuteAOF (Append Only File):
- Logs every write operation to an
.aoffile - On restart, Redis replays the AOF to reconstruct the dataset
- Three fsync policies:
alwaysfsyncs after every write (safest, slowest at roughly 1000 writes/sec),everysecfsyncs every second (default, at most 1 second of data loss), andnolets the OS decide when to fsync (fastest, most data loss risk) - AOF files grow continuously and are compacted via
BGREWRITEAOF
# redis.conf
appendonly yes
appendfsync everysec # recommended balance of safety and performanceBoth (recommended for production): run with both RDB and AOF enabled. AOF provides the durability guarantee, RDB provides faster restarts. Redis uses the AOF for recovery when both are present.
No persistence: pure cache use case where losing data on restart is acceptable. Highest performance, lowest disk I/O.
Q24. What are Redis eviction policies and how do you choose?
When Redis reaches its maxmemory limit, it uses an eviction policy to decide which keys to remove to make room for new data. Available policies as of Redis 7.x:
- noeviction: returns an error when memory is full. No keys are removed. Use when you cannot afford to lose data. This is the default.
- allkeys-lru: evicts the least recently used key from all keys. A good general-purpose caching policy when all keys are fair game.
- volatile-lru: evicts the least recently used key from keys with a TTL set. Keys without a TTL are never evicted. Use when you want to protect permanent data while caching TTL-bound data.
- allkeys-lfu: evicts the least frequently used key from all keys. Better than LRU for workloads where some data is accessed seasonally or in bursts.
- volatile-lfu: same as LFU but only on keys with TTL.
- allkeys-random: evicts a random key from all keys. Rarely the right choice.
- volatile-random: evicts a random key from keys with TTL.
- volatile-ttl: evicts the key with the nearest expiry time. Useful when you want to prioritize keeping longer-lived cached items.
# redis.conf
maxmemory 4gb
maxmemory-policy allkeys-lru # recommended for general cachingFor a cache where all data has a TTL and you want it to work automatically, use allkeys-lru or allkeys-lfu. For a cache where some keys must survive (session data mixed with volatile cache), use volatile-lru.
Q25. What is Redis Pub/Sub and what are its limitations?
Pub/Sub is a messaging pattern where publishers send messages to channels without knowing who will receive them. Subscribers listen on channels and receive messages published to them.
# Publisher (in one Redis client)
PUBLISH notifications:user:1001 '{"type":"order_shipped","orderId":"ORD-789"}'
# Subscriber (in another Redis client)
SUBSCRIBE notifications:user:1001
# Subscribe to a pattern (all notification channels)
PSUBSCRIBE notifications:*In Node.js:
const subscriber = redis.duplicate(); // use a dedicated connection
await subscriber.subscribe("notifications:user:1001", (message) => {
const event = JSON.parse(message);
sendWebSocketNotification(event);
});
// Publisher (using a separate connection)
await publisher.publish("notifications:user:1001",
JSON.stringify({ type: "order_shipped", orderId: "ORD-789" }));Key limitations:
- No message persistence: if no subscriber is listening when a message is published, the message is lost. There is no queue and no retention.
- No delivery guarantee: Pub/Sub is fire-and-forget. If a subscriber disconnects and reconnects, it misses all messages published during the gap.
- No consumer groups: all subscribers to a channel receive every message. You cannot have competing consumers where only one processes each message.
Q26. What is Redis pipelining and when do you use it?
Pipelining sends multiple commands to Redis in a single network round trip without waiting for each response. Without pipelining, each command incurs a network round trip, typically 0.5 to 2ms on a LAN. With pipelining, N commands use only one round trip.
// Without pipelining: N round trips (slow for large N)
for (const key of keys) {
await redis.get(key);
}
// With pipelining: 1 round trip for all commands
const pipeline = redis.pipeline();
for (const key of keys) {
pipeline.get(key);
}
const results = await pipeline.exec();
// results is an array of [error, value] for each commandUse pipelining for bulk operations like warming a cache, loading initial data, or processing batches of updates where order within the batch does not matter and you do not need atomicity.
Q27. What is Redis Cluster and how does it differ from Redis Sentinel?
Both provide high availability, but they serve different purposes.
Redis Sentinel manages a master-replica setup. It monitors the master, detects failure, performs automatic failover by promoting a replica to master, and notifies clients of the new master address. It provides high availability but not horizontal scaling: all data still lives on one master, and replicas are read-only copies.
# Sentinel configuration
sentinel monitor mymaster 127.0.0.1 6379 2 # monitor master, quorum 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000Redis Cluster shards data across multiple nodes. Each node holds a subset of the 16,384 hash slots. It provides both horizontal scaling (data spread across nodes) and high availability (each shard has replicas). Clients must be cluster-aware.
# Create a 6-node cluster (3 primary, 3 replica)
redis-cli --cluster create \
127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 \
127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 \
--cluster-replicas 1
# Cluster info
redis-cli -p 7000 cluster info
redis-cli -p 7000 cluster nodes- Dataset fits on one server, need HA: use Redis Sentinel
- Dataset too large for one server, need to scale writes: use Redis Cluster
- Managed cloud Redis (ElastiCache, Redis Cloud): cluster mode is usually a configuration toggle, handled for you
Q28. What is a distributed lock in Redis and how do you implement one?
A distributed lock allows multiple application instances to coordinate access to a shared resource, ensuring only one instance executes a critical section at a time.
Simple lock using SET NX EX (single Redis node): NX means only set if the key does not exist (atomic), and EX means expire in N seconds, which prevents the lock being held forever if a process crashes.
const lockKey = "lock:payment:order-789";
const lockValue = crypto.randomUUID(); // unique value per lock holder
const ttl = 30; // seconds
const acquired = await redis.set(lockKey, lockValue, "NX", "EX", ttl);
if (acquired === "OK") {
try {
await processPayment(orderId);
} finally {
// Release: only delete if we still own it (Lua for atomicity)
const script = `
if redis.call("GET", KEYS[1]) == ARGV[1] then
return redis.call("DEL", KEYS[1])
else
return 0
end
`;
await redis.eval(script, 1, lockKey, lockValue);
}
} else {
throw new Error("Could not acquire lock: another instance is processing");
}Q29. How do you implement rate limiting with Redis?
The sliding window rate limiter using INCR and EXPIRE is the standard pattern.
async function isRateLimited(userId, limit = 100, windowSeconds = 60) {
const key = `ratelimit:${userId}:${Math.floor(Date.now() / 1000 / windowSeconds)}`;
const count = await redis.incr(key);
if (count === 1) {
// First request in this window: set expiry
await redis.expire(key, windowSeconds * 2);
}
return count > limit;
}
// Usage in Express middleware
app.use(async (req, res, next) => {
const userId = req.user?.id || req.ip;
if (await isRateLimited(userId)) {
return res.status(429).json({ error: "Too many requests" });
}
next();
});A more precise sliding window uses Sorted Sets.
async function isRateLimitedPrecise(userId, limit = 100, windowMs = 60000) {
const now = Date.now();
const windowStart = now - windowMs;
const key = `ratelimit:precise:${userId}`;
const pipeline = redis.pipeline();
pipeline.zremrangebyscore(key, 0, windowStart); // remove old entries
pipeline.zadd(key, now, `${now}-${Math.random()}`); // add current request
pipeline.zcard(key); // count requests in window
pipeline.expire(key, Math.ceil(windowMs / 1000)); // auto-cleanup
const results = await pipeline.exec();
const requestCount = results[2][1];
return requestCount > limit;
}Q30. What is the cache stampede problem and how do you prevent it?
A cache stampede (also called thundering herd) occurs when a cached item expires and many requests all simultaneously find the cache miss and attempt to recompute the expensive value at the same time. Every one of those requests hits the database or API, causing a sudden spike in backend load.
Prevention strategies:
- 1
Cache locking (mutex): only one request recomputes, others wait
javascriptasync function getWithLock(key, computeFn, ttl = 300) { const cached = await redis.get(key); if (cached) return JSON.parse(cached); const lockKey = `lock:${key}`; const acquired = await redis.set(lockKey, "1", "NX", "EX", 10); if (acquired) { try { const value = await computeFn(); await redis.setex(key, ttl, JSON.stringify(value)); return value; } finally { await redis.del(lockKey); } } else { // Another process is computing: wait briefly and retry await new Promise(r => setTimeout(r, 200)); return getWithLock(key, computeFn, ttl); } } - 2
Probabilistic early expiration (XFetch)
Proactively recompute before expiry using a probabilistic algorithm that triggers a re-fetch earlier for expensive-to-compute values.
- 3
Stale-while-revalidate
Always return cached (even stale) data immediately, then trigger background recomputation asynchronously. This way there's never a cold miss.
- 4
Pre-warming
Proactively populate the cache before expiry using a background job.
Q31. What is the difference between Redis and Memcached?
Both are in-memory key-value caches, but Redis is significantly more capable.
| Feature | Redis | Memcached |
|---|---|---|
| Data types | String, List, Hash, Set, ZSet, Stream, Bitmap, HLL | String only |
| Persistence | RDB + AOF | None (cache only) |
| Replication | Yes (master-replica, cluster) | Client-side sharding only |
| Pub/Sub | Yes | No |
| Atomic operations | Yes (INCR, LPUSH, ZADD, etc.) | Limited (INCR, DECR) |
| Lua scripting | Yes | No |
| Multi-threading | Single-threaded event loop (I/O threads in Redis 6+) | Multi-threaded |
| Max value size | 512MB | 1MB |
| Clustering | Redis Cluster (native) | Not native |
When would you choose Memcached over Redis? When you need pure caching with simple string values, maximum multi-threading performance, and the simplest possible operations. In practice, Redis has replaced Memcached in the vast majority of new projects because the feature set is so much richer with no meaningful performance tradeoff for most cache workloads.
Category 4: Amazon DynamoDB (Q32-Q42)
DynamoDB questions test a different muscle than MongoDB or Redis. There is no query language to recall and no persistence config to tune. Instead, interviewers want to know whether you can design a key schema and access patterns upfront, since DynamoDB punishes designs that get this wrong far more than a relational database does.
Q32. What is Amazon DynamoDB and what makes it different from other databases?
DynamoDB is AWS's fully managed, serverless NoSQL database. You do not manage servers, patches, backups, or replication: AWS handles everything. DynamoDB scales automatically from zero to millions of requests per second with single-digit millisecond latency at any scale.
Key characteristics:
- Serverless: no servers to provision or manage. Pay per read/write or use on-demand mode.
- Multi-region replication: Global Tables replicate data across multiple AWS regions automatically.
- Flexible data model: stores items (similar to documents) with a required primary key and any additional attributes. No fixed schema beyond the key.
- Single-table design: DynamoDB encourages storing multiple entity types in one table, using composite keys to separate them. This is different from every other database and requires careful upfront design.
- HTTP-based API: all operations (
GetItem,PutItem,Query,Scan) are HTTPS API calls. No persistent database connection required.
Best for: serverless applications, applications with predictable access patterns, global applications requiring multi-region, IoT data ingestion, user session storage, and gaming leaderboards.
Q33. What is the difference between a partition key and a sort key?
Every DynamoDB table has a primary key, which can be one of two types.
- Simple primary key (partition key only): a single attribute that uniquely identifies each item. DynamoDB hashes the partition key to determine which internal partition stores the item.
- Composite primary key (partition key + sort key): two attributes together uniquely identify each item. All items with the same partition key are stored together (in the same partition), sorted by sort key. This enables efficient range queries within a partition.
// Table: Orders
// Partition key: userId | Sort key: orderId (sorts by time if you use a ULID)
// Items in the same partition (same userId):
{ userId: "user-1001", orderId: "ORD-2026-001", total: 49.99, status: "shipped" }
{ userId: "user-1001", orderId: "ORD-2026-002", total: 129.00, status: "pending" }
{ userId: "user-1001", orderId: "ORD-2026-003", total: 22.50, status: "delivered" }
// Query: get all orders for user-1001 sorted by orderId
// KeyConditionExpression: userId = :uid
// This is a single-partition query: extremely fastDesign principle: the partition key determines where data is stored and how well writes are distributed. The sort key determines how data within a partition is ordered and what range queries are possible.
Q34. What is the difference between a GSI and an LSI?
Both are secondary indexes that let you query data using attributes other than the primary key.
| Property | Global Secondary Index (GSI) | Local Secondary Index (LSI) |
|---|---|---|
| Partition key | Different from the base table | Same as the base table |
| Scope | Spans all partitions of the base table | Local to each partition (co-located with base table data) |
| Created | Anytime, after table creation too | Only at table creation, cannot be added later |
| Throughput | Its own provisioned throughput | Shares the base table's throughput |
| Read consistency | Eventually consistent only | Eventually or strongly consistent |
| Limit per table | Up to 20 | Up to 5 |
| Size limit | None beyond table limits | 10GB per partition key value (shared with base table) |
// Base table: UserOrders
// PK: userId | SK: orderId
// LSI: query by status within a user's orders
// Same PK: userId | SK: status
// Lets you query: all pending orders for user-1001
// GSI: query by status across ALL users
// New PK: status | SK: orderId
// Lets you query: all pending orders across the entire systemQ35. What is the difference between Query and Scan in DynamoDB?
Query retrieves items using the primary key or a secondary index. It must specify the partition key value, and returns items sorted by sort key within that partition. It is efficient because it only reads the relevant partition.
Scan reads every item in the table (or index) and then optionally filters. It reads all partitions, so it consumes RCUs proportional to the entire table size, not just the items returned.
// QUERY (efficient): get orders for a specific user
const result = await dynamoDB.query({
TableName: "Orders",
KeyConditionExpression: "userId = :uid AND orderId BETWEEN :start AND :end",
ExpressionAttributeValues: {
":uid": { S: "user-1001" },
":start": { S: "ORD-2026-001" },
":end": { S: "ORD-2026-999" },
},
}).promise();
// SCAN (expensive, reads all items): use only when necessary
const result2 = await dynamoDB.scan({
TableName: "Orders",
FilterExpression: "total > :minTotal",
ExpressionAttributeValues: { ":minTotal": { N: "100" } },
}).promise();
// FilterExpression runs AFTER reading all items, it does not reduce RCU costScan is acceptable when you genuinely need to process every item in the table, for migration jobs, exports, or analysis. In those cases, use a parallel scan to split the work across multiple segments.
// Parallel scan with 4 segments
for (let segment = 0; segment < 4; segment++) {
promises.push(dynamoDB.scan({
TableName: "Orders",
TotalSegments: 4,
Segment: segment,
}).promise());
}Q36. How do you calculate RCUs and WCUs?
RCU (Read Capacity Unit):
- 1 RCU = 1 strongly consistent read of up to 4KB
- 1 RCU = 2 eventually consistent reads of up to 4KB
- For items larger than 4KB, round up: a 9KB item costs 3 RCUs strongly consistent (
ceil(9/4) = 3)
WCU (Write Capacity Unit):
- 1 WCU = 1 write of up to 1KB
- For items larger than 1KB, round up: a 3.5KB item costs 4 WCUs (
ceil(3.5) = 4)
| Scenario | Calculation | Result |
|---|---|---|
| 1,000 reads/sec of 8KB items, strong consistency | ceil(8/4) * 1 * 1,000 | 2,000 RCU |
| 500 reads/sec of 8KB items, eventual consistency | ceil(8/4) * 0.5 * 500 | 500 RCU |
| 200 writes/sec of 2.5KB items | ceil(2.5) * 200 | 600 WCU |
| 200 transactional writes of 2.5KB items | 600 * 2 (transactions cost 2x) | 1,200 WCU |
On-demand mode means you pay per actual request with no capacity planning, good for unpredictable or spiky workloads. Provisioned mode means you set RCU and WCU limits and pay for the provisioned capacity whether it is used or not, with optional auto-scaling, good for predictable workloads.
Q37. What are DynamoDB consistency models?
DynamoDB offers two consistency options per read operation.
- Eventually consistent reads (default): returns data from any of the three AZ replicas, may return stale data if a recent write has not propagated yet, costs 0.5 RCU per 4KB. Best for product catalogs, game state, and most web application reads.
- Strongly consistent reads: returns data from the primary replica, reflecting all writes completed before the read. Costs 1 RCU per 4KB. Not available on Global Secondary Indexes (GSIs always use eventual consistency). Best for financial balances, inventory counts, and any read-your-own-writes requirement.
// Eventually consistent (default)
const result = await dynamoDB.getItem({
TableName: "Orders",
Key: { userId: { S: "user-1001" }, orderId: { S: "ORD-001" } },
}).promise();
// Strongly consistent
const strongResult = await dynamoDB.getItem({
TableName: "Orders",
Key: { userId: { S: "user-1001" }, orderId: { S: "ORD-001" } },
ConsistentRead: true, // double the RCU cost
}).promise();Q38. What is the hot partition problem and how do you prevent it?
A hot partition occurs when too many requests target the same partition key value, sending all that traffic to a single physical partition. DynamoDB automatically splits partitions as they grow, but a single key's traffic is always routed to the same partition, so splitting does not help if every request is for one key.
Example: a global leaderboard table with a partition key of "global" means every write and read hits one partition.
// Instead of: { pk: "LEADERBOARD", score: 9500, userId: "alice" }
// Use: { pk: "LEADERBOARD#3", score: 9500, userId: "alice" }
// Rotate the suffix 0-9 on writes; query all 10 suffixes and merge on reads- Shard the hot key: append a random suffix (0-9) to the partition key and distribute writes across 10 logical partitions. Reads query all 10 shards and aggregate.
- Design better partition keys: use high-cardinality attributes (
userId,orderId,deviceId) rather than low-cardinality ones (country,status,type). - Write sharding for counters: instead of incrementing one counter item, write to one of N shard counters at random. Read by summing all shards.
- DynamoDB adaptive capacity (automatic): DynamoDB automatically redistributes capacity to hot partitions within seconds. This helps with burst traffic, but it is not a substitute for good key design.
Q39. What are DynamoDB Streams and what are common use cases?
DynamoDB Streams capture a time-ordered sequence of item-level changes in a DynamoDB table. Each stream record contains the item's before state, after state, or both, depending on the configured StreamViewType. Records are retained for 24 hours.
- KEYS_ONLY: only the key attributes of the changed item
- NEW_IMAGE: the entire item after the change
- OLD_IMAGE: the entire item before the change
- NEW_AND_OLD_IMAGES: both before and after
// Lambda trigger: processes stream records automatically
exports.handler = async (event) => {
for (const record of event.Records) {
const { eventName, dynamodb } = record;
if (eventName === "INSERT") {
const newItem = AWS.DynamoDB.Converter.unmarshall(dynamodb.NewImage);
await sendWelcomeEmail(newItem.email);
}
if (eventName === "MODIFY") {
const oldItem = AWS.DynamoDB.Converter.unmarshall(dynamodb.OldImage);
const newItem = AWS.DynamoDB.Converter.unmarshall(dynamodb.NewImage);
if (oldItem.status !== newItem.status && newItem.status === "shipped") {
await sendShippingNotification(newItem);
}
}
}
};Common use cases: triggering notifications on data change (order status, user signup), replicating data to other AWS services (OpenSearch for full-text search, Redshift for analytics, S3 for archival), invalidating a Redis cache when DynamoDB changes, audit logging, and cross-region replication (the mechanism behind Global Tables).
Q40. What is DynamoDB Accelerator (DAX)?
DAX is a fully managed, in-memory cache built specifically for DynamoDB. It is API-compatible: you swap the DynamoDB client for the DAX client and no other code changes are required.
- Read latency drops from single-digit milliseconds (DynamoDB) to microseconds (DAX)
- Write-through cache: writes go to both DAX and DynamoDB atomically
- Automatic TTL for cached items
- Multi-AZ deployment for high availability
// Without DAX: standard DynamoDB client
const ddb = new AWS.DynamoDB.DocumentClient();
// With DAX: swap the client, same API
const dax = new AmazonDaxClient({ endpoints: ["dax-cluster.xxx.dax.amazonaws.com:8111"] });
const daxClient = new AWS.DynamoDB.DocumentClient({ service: dax });
// The rest of your code is identical
const result = await daxClient.get({
TableName: "Products",
Key: { productId: "prod-001" },
}).promise();| Use DAX when | Avoid DAX when |
|---|---|
| Read-heavy workloads with the same items read repeatedly | The workload is write-heavy (DAX does not help writes much) |
| You need microsecond read latency | You need strongly consistent reads (DAX serves eventual consistency only) |
| Read costs are high and cache hit rate would be high | Most reads are unique or infrequent (low cache hit rate) |
Q41. What is single-table design in DynamoDB and why is it recommended?
Single-table design means storing multiple entity types (users, orders, products, sessions) in one DynamoDB table using composite keys that encode the entity type and identifier.
The reason: DynamoDB does not support joins. With separate tables per entity type, retrieving a user and their orders requires two separate API calls. In a relational database, one JOIN handles this. In DynamoDB, the answer is to co-locate related data in the same table so you can fetch all of it in one Query.
// Users
{ PK: "USER#1001", SK: "METADATA", name: "Alice", email: "alice@example.com" }
// User's orders
{ PK: "USER#1001", SK: "ORDER#ORD-001", total: 49.99, status: "shipped" }
{ PK: "USER#1001", SK: "ORDER#ORD-002", total: 129.00, status: "pending" }
// Order's items
{ PK: "ORDER#ORD-001", SK: "ITEM#prod-1", qty: 2, price: 24.99 }
{ PK: "ORDER#ORD-001", SK: "ITEM#prod-2", qty: 1, price: 0 }
// Products
{ PK: "PRODUCT#001", SK: "METADATA", name: "Widget", price: 24.99 }
// GSI for querying orders by status across all users:
// GSI PK: status | GSI SK: SK (orderId)// One Query fetches a user and all their orders
await dynamoDB.query({
TableName: "EcommerceTable",
KeyConditionExpression: "PK = :pk AND begins_with(SK, :sk)",
ExpressionAttributeValues: {
":pk": { S: "USER#1001" },
":sk": { S: "ORDER#" },
},
}).promise();
// Returns: all ORDER# items for USER#1001Q42. How do DynamoDB transactions work?
DynamoDB supports ACID transactions across multiple items within a single account and region, through two operations.
- TransactGetItems: atomically reads up to 100 items. All reads see a consistent snapshot.
- TransactWriteItems: atomically writes up to 100 items across multiple tables. Either all succeed or all fail. Costs 2x the normal WCU.
// Transfer credits between users atomically
await dynamoDB.transactWrite({
TransactItems: [
{
// Condition check: ensure sender has enough credits
ConditionCheck: {
TableName: "Wallets",
Key: { userId: { S: "user-A" } },
ConditionExpression: "credits >= :amount",
ExpressionAttributeValues: { ":amount": { N: "100" } },
},
},
{
// Deduct from sender
Update: {
TableName: "Wallets",
Key: { userId: { S: "user-A" } },
UpdateExpression: "SET credits = credits - :amount",
ExpressionAttributeValues: { ":amount": { N: "100" } },
},
},
{
// Add to receiver
Update: {
TableName: "Wallets",
Key: { userId: { S: "user-B" } },
UpdateExpression: "SET credits = credits + :amount",
ExpressionAttributeValues: { ":amount": { N: "100" } },
},
},
{
// Record the transfer
Put: {
TableName: "TransferLog",
Item: {
transferId: { S: "TXN-001" },
from: { S: "user-A" },
to: { S: "user-B" },
amount: { N: "100" },
timestamp: { S: new Date().toISOString() },
},
},
},
],
}).promise();
// If ANY condition fails or write fails, nothing is committed- Maximum 100 items per transaction
- 4MB total size limit for all items in the transaction
- Transactions cannot span AWS regions (Global Tables)
- 2x WCU cost compared to non-transactional writes
- Not supported through DAX (DAX serves eventually consistent reads)
Quick Reference: All 42 Questions at a Glance
Use this table to scan every question and its core concept in one pass. It is the fastest way to spot the topics you need to revisit before an interview.
| # | Question | Core concept |
|---|---|---|
| Q1 | NoSQL vs relational database | Schema flexibility, scaling, consistency tradeoffs |
| Q2 | Four NoSQL types | Document, key-value, column-family, graph |
| Q3 | CAP theorem | CP vs AP, partition tolerance is required |
| Q4 | BASE vs ACID | Basically Available, Soft state, Eventually consistent |
| Q5 | When to choose NoSQL | Variable schema, scale, access pattern clarity |
| Q6 | Eventual consistency in practice | Replica lag, tunable per operation |
| Q7 | MongoDB data model | Documents, collections, BSON, _id, ObjectId |
| Q8 | BSON vs JSON | Binary format, additional types, speed |
| Q9 | MongoDB CRUD operations | insertOne/Many, find, updateOne/Many, deleteOne/Many |
| Q10 | MongoDB index types | Single, compound, unique, sparse, TTL, text, 2dsphere |
| Q11 | Aggregation pipeline | $match, $lookup, $group, $sort, $project, $unwind |
| Q12 | Embed vs reference schema design | Access pattern decides, bounded vs unbounded arrays |
| Q13 | Sharding | Mongos, config servers, shard key design, scatter-gather |
| Q14 | Replica sets | Primary, secondary, arbiter, automatic failover |
| Q15 | ACID transactions in MongoDB | Multi-document (v4.0+), session API, 60s timeout |
| Q16 | WiredTiger storage engine | Document-level locking, compression, MVCC, journaling |
| Q17 | TTL index | Auto-delete documents, 60s check interval |
| Q18 | Change Streams | Real-time change events, resume tokens, Kafka integration |
| Q19 | Slow query diagnosis | explain executionStats, COLLSCAN vs IXSCAN, profiler |
| Q20 | MongoDB update operators | $set, $inc, $push, $addToSet, $pull, $unset, $min/$max |
| Q21 | Redis and its use cases | Caching, sessions, pub/sub, leaderboards, queues, locks |
| Q22 | Redis data structures | String, Hash, List, Set, ZSet, Bitmap, HLL, Stream |
| Q23 | RDB vs AOF persistence | Snapshots vs write log, fsync policies, use both |
| Q24 | Redis eviction policies | allkeys-lru, volatile-lru, allkeys-lfu, volatile-ttl |
| Q25 | Redis Pub/Sub and limitations | Fire-and-forget, no persistence, no consumer groups |
| Q26 | Redis pipelining | Batch commands, one round trip, not atomic |
| Q27 | Redis Cluster vs Sentinel | Sharding plus HA vs HA only, hash slots, failover |
| Q28 | Distributed locks in Redis | SET NX EX, Lua for atomic release, Redlock |
| Q29 | Rate limiting with Redis | INCR plus EXPIRE, sliding window with ZSet |
| Q30 | Cache stampede prevention | Mutex lock, XFetch, stale-while-revalidate |
| Q31 | Redis vs Memcached | Data types, persistence, pub/sub, cluster |
| Q32 | What is DynamoDB | Serverless, managed, auto-scaling, HTTP API |
| Q33 | Partition key vs sort key | Simple PK vs composite PK, range queries in partition |
| Q34 | GSI vs LSI | Different PK vs same PK, creation timing, consistency |
| Q35 | Query vs Scan | Targeted (efficient) vs full table (expensive) |
| Q36 | RCU and WCU calculation | 4KB per read, 1KB per write, consistency multiplier |
| Q37 | Consistency models | Eventually consistent (0.5 RCU) vs strongly consistent (1 RCU) |
| Q38 | Hot partition problem | Key sharding, suffix randomization, adaptive capacity |
| Q39 | DynamoDB Streams | Item-level changes, Lambda triggers, 24h retention |
| Q40 | DynamoDB Accelerator (DAX) | Microsecond reads, write-through, API-compatible |
| Q41 | Single-table design | Multiple entities one table, composite key patterns |
| Q42 | DynamoDB transactions | TransactWrite/Get, 100 items max, 2x WCU cost |
Frequently Asked Questions
What level of NoSQL knowledge do these 42 questions target?
This guide spans junior fundamentals through senior architecture. Questions 1 through 6 cover the theory every backend role expects: NoSQL types, CAP theorem, BASE vs ACID, and when NoSQL is the right call at all. Questions 7 through 42 go deep on MongoDB, Redis, and DynamoDB individually, the territory where mid-level and senior candidates are differentiated.
If you are early in your career, focus on Category 1 and the CRUD/data structure basics in Categories 2 and 3 first. If you are interviewing for a senior or staff backend or cloud role, the schema design questions (Q12, Q41), scaling questions (Q13, Q38), and operational questions (Q19, Q24, Q30) are where most of your prep time should go.
How do MongoDB, Redis, and DynamoDB compare, and how do I know which one an interviewer expects me to know?
| MongoDB | Redis | DynamoDB | |
|---|---|---|---|
| Type | Document store | In-memory key-value / data structure store | Managed key-value / wide-column |
| Primary strength | Flexible schema, rich queries, aggregation | Microsecond latency, rich data structures | Serverless scale, predictable performance |
| Typical role | Primary application database | Cache, queue, session store, rate limiter | Primary database for AWS-native and serverless apps |
| Scaling model | Sharding (horizontal) | Cluster (hash slots) or Sentinel (HA) | Automatic, fully managed |
Look at the job description for which AWS services, frameworks, or stack the company uses. A team running on AWS Lambda and API Gateway almost certainly expects DynamoDB fluency. A team with a traditional Node.js or Python backend more likely expects MongoDB as the primary store, often with Redis in front of it as a cache. Most senior backend roles expect working knowledge of at least two of the three, so do not skip a database entirely just because the job title does not mention it.
How do interviewers actually test CAP theorem and eventual consistency knowledge beyond asking for definitions?
- Definition check - state what Consistency, Availability, and Partition Tolerance each mean (Q3).
- The real tradeoff - explain why partition tolerance is non-negotiable in any distributed system, so the actual choice is between C and A (Q3).
- Classify a real database - given MongoDB, DynamoDB, or Cassandra, say whether it leans CP or AP and why (Q3).
- Eventual consistency scenario - walk through what a user sees if they read immediately after a write to a replica that has not caught up yet (Q6).
- The fix - name the mechanism for reading your own writes: strongly consistent reads in DynamoDB, or
readConcern: "majority"in MongoDB (Q6).
How can I practice these MongoDB, Redis, and DynamoDB commands before an interview?
Run all three locally with Docker so you can type the exact commands from this guide and see real output:
# MongoDB
docker run -d -p 27017:27017 --name mongo mongo:7
# Redis
docker run -d -p 6379:6379 --name redis redis:7
# DynamoDB Local
docker run -d -p 8000:8000 amazon/dynamodb-localDo I need to know single-table design for every DynamoDB interview?
Not always, but you should be able to explain it and why it exists (Q41). Some teams use DynamoDB with simple, multi-table designs for low-complexity workloads, and that is a legitimate choice an interviewer may want you to recognize as valid for smaller systems.
- Know the why - DynamoDB has no joins, so co-locating related items lets one Query replace what would be multiple round trips.
- Know the cost - single-table design requires defining every access pattern upfront, and changing the key schema later means migrating all data.
- Know when to skip it - small services with few entity types and low request volume may be simpler and cheaper to reason about with separate tables.
What follow-up questions tend to come after these 42, once I have answered the basics?
- Migration scenarios - "How would you migrate this collection from embedded documents to references without downtime?" builds on Q12.
- Capacity planning - "Walk me through provisioning a DynamoDB table for this traffic pattern" builds on Q36 and Q38.
- Failure scenarios - "What happens to in-flight writes if the MongoDB primary fails mid-transaction?" builds on Q14 and Q15.
- Cache invalidation - "How do you keep a Redis cache in sync with MongoDB or DynamoDB writes?" builds on Q21, Q30, and Q39 (DynamoDB Streams as the invalidation trigger).
- System design tie-in - expect these concepts to resurface inside a larger system design question, such as designing a leaderboard, a session store, or a multi-tenant SaaS data layer.
Related Articles
30 NestJS Interview Questions and Answers (2026)
30 NestJS interview questions with full answers: modules, DI, guards, pipes, interceptors, JWT auth, microservices, and testing. Updated for 2026.
30 Node.js Interview Questions and Answers (2026)
30 Node.js interview questions with full answers: event loop, streams, clustering, worker threads, memory leaks, and security. Updated for 2026.
Drizzle ORM Migrations: A Practical drizzle-kit Guide
Learn the full Drizzle ORM migration workflow: push vs migrate, drizzle-kit setup, Turso/libSQL config, team conflicts, and production best practices.