40 SQL Interview Questions and Answers (2026)
40 SQL and relational database interview questions covering joins, indexes, ACID, window functions, CTEs, MySQL vs PostgreSQL, and query challenges.
On this page
SQL is the most universally tested technical skill in developer interviews. It shows up in backend roles, full-stack positions, data engineering, and database administration. Interviewers use it to probe how you think about data, not just whether you can write a SELECT statement.
These 40 questions cover the full range: theory questions on ACID and normalization, query-writing challenges on joins and window functions, performance questions on indexes and EXPLAIN, and practical MySQL vs PostgreSQL differences. Questions are ordered from foundational to advanced within each category. If you're prepping for a broader interview, our Node.js interview questions guide pairs well with this one since most backend roles test both together.
Once you're comfortable writing raw SQL, the next step is understanding how that translates to schema changes in an ORM. Our guide to schema migrations in practice with Drizzle ORM shows how the constraints and indexes covered here map onto a TypeScript schema definition.
Category 1: Fundamentals and Core Concepts (Q1-Q8)
These questions establish whether you understand SQL's building blocks: how it's organized, how constraints work, and how the database actually processes a query. Almost every SQL interview opens here, regardless of seniority.
Q1. What is SQL and what are the four main sublanguages?
SQL (Structured Query Language) is the standard language for managing and querying relational databases. It is used by MySQL, PostgreSQL, SQL Server, Oracle, SQLite, and most other relational database systems.
SQL is divided into four sublanguages based on what operation you are performing.
- DDL (Data Definition Language): defines and modifies the structure of the database (CREATE, ALTER, DROP, TRUNCATE, RENAME). These change schema, not data. Most DDL statements auto-commit in MySQL. In PostgreSQL, DDL is transactional and can be rolled back.
- DML (Data Manipulation Language): works with data inside tables (SELECT, INSERT, UPDATE, DELETE). These are the statements you write most often.
- DCL (Data Control Language): manages access permissions (GRANT, REVOKE). Controls who can do what on which objects.
- TCL (Transaction Control Language): manages transactions (BEGIN, COMMIT, ROLLBACK, SAVEPOINT). Controls when changes become permanent.
-- DDL
CREATE TABLE users (id SERIAL PRIMARY KEY, email VARCHAR(255) UNIQUE NOT NULL);
ALTER TABLE users ADD COLUMN created_at TIMESTAMPTZ DEFAULT NOW();
-- DML
INSERT INTO users (email) VALUES ('alice@example.com');
SELECT * FROM users WHERE email LIKE '%@example.com';
UPDATE users SET email = 'alice@new.com' WHERE id = 1;
DELETE FROM users WHERE id = 1;
-- DCL
GRANT SELECT ON users TO readonly_role;
REVOKE INSERT ON users FROM readonly_role;
-- TCL
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT; -- or ROLLBACK if something went wrongQ2. What is the difference between PRIMARY KEY and UNIQUE constraint?
Both PRIMARY KEY and UNIQUE enforce uniqueness on a column or combination of columns, but they have important differences.
- PRIMARY KEY guarantees uniqueness and NOT NULL (cannot be empty), and each table can have exactly one primary key.
- PRIMARY KEY creates a clustered index in MySQL/InnoDB (rows are physically ordered by PK). In PostgreSQL, it creates a unique B-Tree index and adds a NOT NULL constraint.
- PRIMARY KEY should be a stable, narrow value: integer or UUID, not email or username.
- UNIQUE guarantees uniqueness but allows NULL values, and a table can have multiple UNIQUE constraints.
- UNIQUE creates a non-clustered index, and most databases allow multiple NULL values in a UNIQUE column because NULL is not equal to NULL in SQL's three-valued logic.
CREATE TABLE users (
id SERIAL PRIMARY KEY, -- one PK, not nullable, clustered
email VARCHAR(255) NOT NULL UNIQUE, -- enforces uniqueness, allows index lookup
username VARCHAR(100) UNIQUE, -- allows ONE NULL in most databases
phone VARCHAR(20) UNIQUE -- multiple UNIQUE columns are fine
);Q3. How does NULL work in SQL and what are the common gotchas?
NULL represents the absence of a value: it is not zero, not an empty string, not false. NULL is unknown. This creates a three-valued logic: TRUE, FALSE, and NULL (unknown).
The most common gotcha is that comparisons with NULL always return NULL (unknown), not TRUE or FALSE.
SELECT NULL = NULL; -- NULL (not TRUE)
SELECT NULL != NULL; -- NULL (not TRUE)
SELECT NULL = 0; -- NULL
SELECT NULL IS NULL; -- TRUE (use IS NULL / IS NOT NULL)
SELECT NULL IS NOT NULL;-- FALSE
-- Common trap: WHERE column != 'value' does NOT return rows where column IS NULL
SELECT * FROM users WHERE status != 'inactive';
-- This MISSES rows where status IS NULL
-- Correct version:
SELECT * FROM users WHERE status != 'inactive' OR status IS NULL;Aggregate functions ignore NULLs. COUNT(*) counts all rows, COUNT(column) counts only non-NULL values. SUM, AVG, MAX, MIN all skip NULLs.
SELECT
COUNT(*) AS total_rows, -- includes NULL phone rows
COUNT(phone) AS rows_with_phone, -- excludes NULL phone rows
AVG(salary) AS avg_salary -- averages only non-NULL salaries
FROM employees;COALESCE(a, b, c) returns the first non-NULL value in the list. Use it to provide default values when NULLs are possible.
Q4. What is the difference between DELETE, TRUNCATE, and DROP?
All three remove data but at different scopes and with different behavior.
DELETE removes specific rows based on a WHERE clause. It is DML, can be rolled back inside a transaction, and fires row-level triggers. It is slower on large tables because each deleted row is logged.
DELETE FROM orders WHERE status = 'cancelled'; -- removes matching rows only
DELETE FROM orders; -- removes ALL rows but table structure remainsTRUNCATE removes all rows from a table instantly. It is DDL in most databases, and it cannot filter rows: it is always all-or-nothing. It's much faster than DELETE because it does not log individual rows.
In PostgreSQL, TRUNCATE is transactional and can be rolled back. In MySQL, TRUNCATE cannot be rolled back, and it does not fire row-level delete triggers.
TRUNCATE TABLE orders; -- empties the table, keeps structure and indexes
-- Also resets AUTO_INCREMENT counter in MySQL (DELETE does not)DROP removes the entire table including its structure, indexes, constraints, and all data. It is DDL. It cannot be rolled back in MySQL, but in PostgreSQL, DDL is transactional so it can be rolled back.
DROP TABLE orders; -- removes table entirely
DROP TABLE IF EXISTS orders; -- safer version, no error if table doesn't existQ5. What is the difference between WHERE and HAVING?
WHERE filters rows before aggregation. HAVING filters groups after aggregation. This matters because aggregate functions (COUNT, SUM, AVG, MAX, MIN) cannot be used in a WHERE clause because they have not been calculated yet.
-- WHERE: filters individual rows BEFORE GROUP BY
SELECT department_id, COUNT(*) AS headcount
FROM employees
WHERE salary > 50000 -- filters individual employees first
GROUP BY department_id;
-- HAVING: filters groups AFTER GROUP BY and aggregation
SELECT department_id, COUNT(*) AS headcount
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 10; -- filters departments with more than 10 people
-- Both together: WHERE and HAVING on the same query
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
WHERE hire_date >= '2020-01-01' -- only consider employees hired after 2020
GROUP BY department_id
HAVING AVG(salary) > 70000; -- only return departments with high avg salaryThe order of SQL clause execution is FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY, LIMIT. This is why you cannot reference a SELECT alias in a WHERE clause, but you can in ORDER BY.
Q6. What is the difference between UNION and UNION ALL?
Both combine the results of two SELECT statements vertically (stacking rows). The column count and data types must match.
UNION removes duplicate rows from the combined result. This requires sorting and comparing all rows, which adds overhead. UNION ALL keeps all rows including duplicates, and is faster because no deduplication is performed.
-- UNION: removes duplicates
SELECT email FROM customers
UNION
SELECT email FROM newsletter_subscribers;
-- If alice@example.com is in both tables, she appears once
-- UNION ALL: keeps duplicates, faster
SELECT email FROM customers
UNION ALL
SELECT email FROM newsletter_subscribers;
-- alice@example.com appears twice if she is in both tables
-- Practical example: combine data from multiple tables for a report
SELECT 'customer' AS source, name, created_at FROM customers
UNION ALL
SELECT 'vendor' AS source, name, created_at FROM vendors
ORDER BY created_at DESC;Use UNION only when you need deduplication and can afford the cost. Use UNION ALL when duplicates are acceptable or when the sets are already distinct, since it is always faster.
Q7. What are the main types of constraints in SQL?
Constraints are rules enforced at the database level that maintain data integrity. They apply to columns or tables.
- NOT NULL: the column cannot store NULL. Applied to required fields.
- UNIQUE: all values in the column (or column combination) must be distinct. Allows NULL once, in most databases.
- PRIMARY KEY: combination of NOT NULL and UNIQUE. One per table.
- FOREIGN KEY: enforces referential integrity between tables. The value in the child column must exist in the parent table's referenced column.
- CHECK: enforces a boolean condition on a column's value.
- DEFAULT: provides a value when none is supplied on INSERT.
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
status VARCHAR(20) NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending','processing','shipped','cancelled')),
total NUMERIC(10,2) NOT NULL CHECK (total >= 0),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);ON DELETE CASCADE means: if the parent user is deleted, all their orders are automatically deleted. Other options include ON DELETE SET NULL (set foreign key to NULL) and ON DELETE RESTRICT (block the parent delete if children exist). For a practical look at how these constraints map onto a TypeScript schema, see our guide to schema migrations in practice.
Q8. What is the SQL query execution order?
The order you write a SQL query does not match the order the database executes it. Understanding execution order explains why certain things work or fail.
The logical execution order is:
- FROM / JOIN: identify all data sources and join them.
- WHERE: filter individual rows.
- GROUP BY: group remaining rows.
- HAVING: filter groups.
- SELECT: compute expressions, apply aliases.
- DISTINCT: remove duplicate rows (if specified).
- ORDER BY: sort the result (can reference SELECT aliases here).
- LIMIT/OFFSET: return a subset of rows.
SELECT
department_id,
AVG(salary) AS avg_sal -- step 5: computed here
FROM employees -- step 1: data source
WHERE hire_date > '2020-01-01' -- step 2: filter rows
GROUP BY department_id -- step 3: group
HAVING AVG(salary) > 60000 -- step 4: filter groups
ORDER BY avg_sal DESC -- step 7: can use alias from step 5
LIMIT 5; -- step 8: top 5 resultsCategory 2: Joins and Relationships (Q9-Q15)
JOINs are the heart of relational databases and the most query-heavy part of any SQL interview. Expect to write at least one join, anti-join, or correlated subquery live on a whiteboard or shared editor.
Q9. Explain all SQL JOIN types with examples.
JOINs combine rows from two or more tables based on a related column.
INNER JOIN returns only rows that have a match in both tables.
SELECT e.name, d.name AS dept
FROM employees e
INNER JOIN departments d ON e.department_id = d.id;
-- Employees with no department and departments with no employees are excludedLEFT JOIN (LEFT OUTER JOIN) returns all rows from the left table, plus matching rows from the right. NULL fills missing right-side columns.
SELECT e.name, d.name AS dept
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id;
-- Employees with no department appear with dept = NULLRIGHT JOIN is the opposite of LEFT JOIN: it returns all rows from the right table. It's rarely used since most developers rewrite it as a LEFT JOIN by swapping table order.
FULL OUTER JOIN returns all rows from both tables, with NULL filling missing columns on either side. MySQL does not support FULL OUTER JOIN directly: emulate it with a LEFT JOIN combined with UNION ALL and a RIGHT JOIN filtered for NULLs on the left.
-- PostgreSQL / SQL Server
SELECT e.name, d.name AS dept
FROM employees e
FULL OUTER JOIN departments d ON e.department_id = d.id;CROSS JOIN returns the Cartesian product: every row from the left matched with every row from the right. The result row count equals left rows multiplied by right rows.
SELECT colors.name, sizes.name
FROM colors CROSS JOIN sizes;
-- 3 colors x 4 sizes = 12 combinations| JOIN type | Returns |
|---|---|
| INNER JOIN | Only rows with a match in both tables (intersection) |
| LEFT JOIN | All rows from the left table, matched rows from the right, NULL where no match |
| RIGHT JOIN | All rows from the right table, matched rows from the left, NULL where no match |
| FULL OUTER JOIN | All rows from both tables, NULL on whichever side has no match |
| CROSS JOIN | Every row from the left paired with every row from the right (Cartesian product) |
Q10. What is a self-join and when do you use it?
A self-join joins a table to itself. You need table aliases to distinguish the two instances. Use it when a table has a hierarchical or recursive relationship, most commonly an employees table where each employee has a manager who is also an employee.
-- Find each employee and their manager's name
SELECT
e.name AS employee,
m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id
ORDER BY m.name, e.name;
-- LEFT JOIN ensures employees with no manager (the CEO) still appearOther use cases include finding pairs in the same table (students in the same class), comparing rows within the same table (orders where amount exceeds the previous order), and bill-of-materials problems (parts that contain other parts).
Q11. What is the anti-join pattern and how do you write it?
An anti-join finds rows in one table that have no matching row in another table. There are three common ways to write it.
-- Method 1: LEFT JOIN with IS NULL check (most readable, usually fastest)
SELECT c.id, c.email
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.customer_id IS NULL;
-- Customers who have never placed an order
-- Method 2: NOT EXISTS (clear intent, often same performance as LEFT JOIN)
SELECT c.id, c.email
FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.id
);
-- Method 3: NOT IN (avoid with NULLs -- if subquery returns any NULL, the
-- entire NOT IN returns no rows)
SELECT id, email
FROM customers
WHERE id NOT IN (SELECT customer_id FROM orders WHERE customer_id IS NOT NULL);Q12. What is a correlated subquery?
A correlated subquery references a column from the outer query. It runs once per row of the outer query, which makes it potentially slow on large tables.
-- Find employees who earn more than the average salary in their own department
SELECT e.name, e.salary, e.department_id
FROM employees e
WHERE e.salary > (
SELECT AVG(salary)
FROM employees
WHERE department_id = e.department_id -- references outer query's e
);
-- The inner query runs once for every employee rowCompare that to a non-correlated (simple) subquery, which runs only once.
-- Find employees who earn more than the company-wide average
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- Inner query runs once, result is reusedFor large datasets, rewrite correlated subqueries using a JOIN or window function. Both are typically faster because the engine can optimize them as a single scan.
-- Same result as the correlated subquery above, but written with a window function
SELECT name, salary, department_id
FROM (
SELECT name, salary, department_id,
AVG(salary) OVER (PARTITION BY department_id) AS dept_avg
FROM employees
) sub
WHERE salary > dept_avg;Q13. What is a FOREIGN KEY and what does ON DELETE CASCADE do?
A FOREIGN KEY is a column (or group of columns) in a child table that references the PRIMARY KEY (or UNIQUE column) of a parent table. It enforces referential integrity: you cannot insert a child row that references a non-existent parent, and you cannot delete a parent row that has children unless you tell the database what to do.
ON DELETE behavior has four options.
- RESTRICT / NO ACTION (default in most databases): blocks the DELETE on the parent if any child rows reference it.
- CASCADE: automatically deletes all matching child rows when the parent is deleted.
- SET NULL: sets the foreign key column in child rows to NULL when the parent is deleted.
- SET DEFAULT: sets the foreign key column to its default value.
-- Schema with different ON DELETE behaviors
CREATE TABLE departments (
id INT PRIMARY KEY,
name VARCHAR(100) NOT NULL
);
CREATE TABLE employees (
id INT PRIMARY KEY,
department_id INT REFERENCES departments(id) ON DELETE SET NULL,
-- If department is deleted, employees become unassigned (NULL dept)
manager_id INT REFERENCES employees(id) ON DELETE RESTRICT
-- Cannot delete a manager who still has direct reports
);
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT NOT NULL REFERENCES customers(id) ON DELETE CASCADE
-- If customer is deleted, all their orders are deleted automatically
);Q14. What is the N+1 query problem?
The N+1 problem occurs when your code runs one query to fetch a list of N records, then runs N additional queries to fetch related data for each record. Total: N+1 database round trips instead of one or two.
Example: fetching 100 users and their most recent order.
users = db.query("SELECT * FROM users LIMIT 100") # 1 query
for user in users:
order = db.query("SELECT * FROM orders WHERE user_id = ? ORDER BY created_at DESC LIMIT 1", user.id)
# runs 100 times -- 100 queries
# Total: 101 queriesFix it with a JOIN and window function in a single query.
SELECT DISTINCT ON (u.id)
u.id, u.name, u.email,
o.id AS last_order_id, o.total, o.created_at
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
ORDER BY u.id, o.created_at DESC
LIMIT 100;
-- PostgreSQL DISTINCT ON keeps first row per group after ORDER BYOr in standard SQL using a window function.
SELECT u.id, u.name, u.email, o.id AS last_order_id, o.total
FROM users u
LEFT JOIN (
SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn
FROM orders
) o ON u.id = o.user_id AND o.rn = 1
LIMIT 100;ORMs solve this with eager loading: Sequelize's include, ActiveRecord's includes, or TypeORM's relations with eager: true. If you're working in NestJS, our NestJS interview questions guide covers how TypeORM's relation loading and query builder handle this in practice. In raw SQL, solve it with a JOIN or a single IN query.
Q15. What is the difference between a subquery and a JOIN?
Both retrieve related data from multiple tables, but they work differently.
A subquery is a SELECT statement nested inside another. It can appear in the SELECT, FROM, WHERE, or HAVING clause. It's easy to read and reason about, and the optimizer may or may not convert it to a join internally.
A JOIN combines tables into a single result set by matching rows. It's generally faster for large datasets because the query planner can choose efficient join algorithms (hash join, merge join, nested loop).
-- Subquery version: find customers who placed orders above average value
SELECT name FROM customers
WHERE id IN (
SELECT customer_id FROM orders WHERE total > (SELECT AVG(total) FROM orders)
);
-- JOIN version (often faster, no subquery overhead)
SELECT DISTINCT c.name
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.total > (SELECT AVG(total) FROM orders);Use subqueries when the logic is complex and readability matters. Use JOINs when performance is critical. Use EXPLAIN to confirm which approach the optimizer actually prefers for your specific data distribution.
Category 3: Indexes and Query Performance (Q16-Q21)
Index and performance questions separate candidates who memorized syntax from those who have actually debugged a slow production query. Expect questions that ask you to reason about tradeoffs, not just recite definitions.
Q16. What is a database index and how does a B-Tree index work?
An index is a separate data structure that the database maintains alongside your table to speed up lookups. Without an index, the database performs a full table scan: it reads every row to find matches. With an index on the searched column, it can jump directly to matching rows.
The most common index type is B-Tree (Balanced Tree). A B-Tree stores index keys in sorted order with pointers to the actual table rows. It is balanced: every leaf node is the same depth from the root, keeping lookups at O(log n) regardless of table size.
B-Tree indexes are efficient for exact match lookups (WHERE id = 42), range queries (WHERE created_at BETWEEN x AND y), ORDER BY on indexed columns (avoids a separate sort step), and LIKE 'prefix%' (but not LIKE '%suffix', since that cannot use the left side of the index).
-- Create a B-Tree index (default type in MySQL and PostgreSQL)
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_created ON orders(created_at);
-- Partial index (PostgreSQL) -- index only the rows you actually query
CREATE INDEX idx_orders_pending ON orders(created_at)
WHERE status = 'pending';
-- Much smaller index, only covers pending orders
-- Check which indexes exist on a table (PostgreSQL)
SELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'orders';
-- MySQL
SHOW INDEXES FROM orders;Q17. What is the difference between a clustered and a non-clustered index?
A clustered index means the table data is physically stored in the order of the index key. There can be only one clustered index per table, since you can only sort the physical rows one way. In MySQL/InnoDB, the primary key is always the clustered index.
In PostgreSQL, there is no concept of a clustered index in the same sense: all indexes are non-clustered (heap-based), though you can run CLUSTER to physically reorder the table once.
A non-clustered index is a separate structure from the table. It contains the index key plus a pointer (row ID or primary key value) to the actual row. A table can have many non-clustered indexes. Lookups require two steps: find the key in the index, then follow the pointer to fetch the full row, sometimes called a heap fetch or bookmark lookup.
-- MySQL InnoDB: PK is always clustered
-- Secondary indexes store the PK value as the row pointer
CREATE TABLE products (
id INT AUTO_INCREMENT PRIMARY KEY, -- clustered
sku VARCHAR(50) NOT NULL,
name VARCHAR(200) NOT NULL,
price DECIMAL(10,2),
INDEX idx_sku (sku) -- non-clustered, points to id
);
-- PostgreSQL: CLUSTER command physically reorders table data by index
-- (one-time operation, not maintained automatically)
CLUSTER products USING idx_sku;Q18. What is a composite index and what is column order significance?
A composite (multi-column) index includes multiple columns in a single index structure. The column order in a composite index is critical: the index can only be used when queries filter on a left-prefix of the index columns.
-- Composite index on (department_id, hire_date, salary)
CREATE INDEX idx_emp_dept_hire_salary
ON employees(department_id, hire_date, salary);
-- CAN use the index:
-- WHERE department_id = 5 (leftmost column)
-- WHERE department_id = 5 AND hire_date > '2020-01-01' (leftmost two)
-- WHERE department_id = 5 AND hire_date = '2021-06-01' AND salary > 60000 (all three)
-- CANNOT fully use the index:
-- WHERE hire_date > '2020-01-01' (skips department_id)
-- WHERE salary > 60000 (skips both leading columns)The left-prefix rule means you should design composite indexes to match your most common query patterns. Put equality-filter columns first, then range-filter columns last.
High-cardinality columns (many distinct values) benefit most from indexing. Low-cardinality columns like boolean flags or a status with 3-4 values rarely justify their own index, since the optimizer may prefer a full scan anyway.
Q19. What is a covering index?
A covering index includes all the columns a query needs, so the database can answer the query entirely from the index without reading the actual table rows. This eliminates the heap fetch step and is significantly faster.
-- Query: find email and created_at for all active users
SELECT email, created_at FROM users WHERE status = 'active';
-- Without covering index: index lookup on status, then heap fetch for email + created_at
CREATE INDEX idx_status ON users(status);
-- Covering index: includes all columns needed by the SELECT
CREATE INDEX idx_status_covering ON users(status, email, created_at);
-- Now the query can be answered entirely from the index (no heap fetch)
-- PostgreSQL INCLUDE syntax (cleaner -- included columns are not part of the key)
CREATE INDEX idx_status_include ON users(status) INCLUDE (email, created_at);
-- MySQL does not support INCLUDE -- add columns to the key insteadEXPLAIN output will show 'Index Only Scan' (PostgreSQL) or 'Using index' (MySQL) when a covering index is in use. These are good signs.
Q20. How do you use EXPLAIN / EXPLAIN ANALYZE to diagnose a slow query?
EXPLAIN shows the query plan the optimizer will use. EXPLAIN ANALYZE actually runs the query and shows real execution stats alongside the estimates.
-- PostgreSQL
EXPLAIN SELECT * FROM orders WHERE user_id = 42;
-- Shows: estimated cost, rows, join type, index usage
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total, u.name
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.status = 'shipped'
ORDER BY o.created_at DESC
LIMIT 20;
-- Shows: actual time, actual rows, loops, buffer cache hits/misses
-- MySQL
EXPLAIN SELECT * FROM orders WHERE user_id = 42;
EXPLAIN FORMAT=JSON SELECT * FROM orders WHERE user_id = 42;Key things to look for in the EXPLAIN output:
- Seq Scan / ALL: a full table scan. Usually means a missing index.
- Index Scan: using an index. Good, but includes a heap fetch.
- Index Only Scan: using a covering index. Best case.
- Nested Loop / Hash Join / Merge Join: the join algorithm. Hash join is often best for large unsorted datasets; merge join for pre-sorted or indexed data.
- rows vs actual rows: large discrepancies mean stale statistics. Run ANALYZE (PostgreSQL) or ANALYZE TABLE (MySQL) to update stats.
Q21. When should you NOT add an index?
Indexes speed up reads but slow down writes, since INSERT, UPDATE, and DELETE must maintain every index on the table. Adding indexes without consideration creates maintenance overhead.
Do not add an index when:
- The table is small (under a few thousand rows). The optimizer prefers a full scan since it's cheaper than the index lookup overhead.
- The column has very low cardinality (few distinct values). A status column with values 'active' or 'inactive' is not a good candidate. The optimizer may skip the index and scan anyway because it would visit half the table.
- The table is write-heavy and read-light. A logging table that receives thousands of INSERTs per second but is rarely queried is harmed, not helped, by extra indexes.
- You already have a composite index with the column as the leftmost key. A separate single-column index on that column is redundant.
- The column is frequently updated. Every UPDATE to an indexed column requires updating the index, adding I/O overhead.
Category 4: Transactions, ACID, and Concurrency (Q22-Q27)
This is where interviewers separate candidates who have read about transactions from candidates who have debugged a production deadlock at 2am. Expect at least one or two questions from this category in any backend or database-heavy interview.
Q22. What does ACID stand for? Explain each property.
ACID is the set of properties that guarantee database transactions are processed reliably. It is one of the most common interview theory questions.
- Atomicity: a transaction is all-or-nothing. If any statement in a transaction fails, all preceding statements in that transaction are rolled back. You never get a half-completed transaction.
- Consistency: a transaction brings the database from one valid state to another valid state. All integrity constraints (NOT NULL, CHECK, FOREIGN KEY) must be satisfied after the transaction commits, or it is rolled back.
- Isolation: concurrent transactions do not see each other's intermediate state. The degree of isolation is configurable via isolation levels. Full isolation means transactions behave as if they ran serially.
- Durability: once a transaction commits, it is permanent, even if the server crashes immediately after. This is implemented via write-ahead logging (WAL): changes are written to a log on disk before being applied to the data files, and on restart the database replays the WAL to recover committed changes.
BEGIN;
UPDATE accounts SET balance = balance - 500 WHERE id = 1;
UPDATE accounts SET balance = balance + 500 WHERE id = 2;
-- If the second UPDATE fails, the first is rolled back automatically
COMMIT;Q23. What are the four transaction isolation levels and what problems do they prevent?
SQL defines four isolation levels, ordered from least to most isolated. Higher isolation means fewer anomalies but more locking and less concurrency.
| Isolation level | Behavior | Default in |
|---|---|---|
| READ UNCOMMITTED | A transaction can read uncommitted changes from other transactions (dirty reads). Almost never used in practice. | Rarely the default anywhere |
| READ COMMITTED | A transaction can only read committed data. Prevents dirty reads, but allows non-repeatable reads and phantom reads. | PostgreSQL, Oracle |
| REPEATABLE READ | Repeated reads of the same row within a transaction return the same value (no non-repeatable reads). MySQL's implementation also prevents most phantom reads via gap locking. | MySQL InnoDB |
| SERIALIZABLE | Highest isolation. Transactions execute as if run one at a time, preventing all anomalies, at the cost of the most locking overhead. | Used selectively, rarely the default |
The three anomalies these levels guard against:
- Dirty read: reading uncommitted data from another transaction, data that might still be rolled back.
- Non-repeatable read: reading the same row twice in a transaction and getting different values because another transaction committed a change between the reads.
- Phantom read: running the same range query twice in a transaction and getting different rows because another transaction inserted or deleted rows in that range.
-- PostgreSQL
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
SELECT * FROM accounts WHERE user_id = 42; -- first read
-- another transaction commits a change to this user's account here
SELECT * FROM accounts WHERE user_id = 42; -- same result in REPEATABLE READ
COMMIT;
-- MySQL
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
START TRANSACTION;
-- ...
COMMIT;Q24. What is a deadlock and how do you prevent it?
A deadlock occurs when two or more transactions each hold a lock that the other needs, and are waiting for each other to release it. Neither can proceed.
Transaction A: locks row 1, then tries to lock row 2
Transaction B: locks row 2, then tries to lock row 1
-- Both wait foreverDatabases detect deadlocks automatically and kill one transaction (the victim) with an error, allowing the other to complete. The killed transaction must be retried by the application.
Prevention strategies:
- Acquire locks in a consistent order across all transactions. If every transaction locks
usersbeforeorders, circular waits cannot form. - Keep transactions short and fast. The longer a transaction holds locks, the higher the chance of a deadlock.
- Use `SELECT ... FOR UPDATE` to acquire row locks upfront instead of upgrading from shared locks later, since lock upgrades cause deadlocks.
- Use lower isolation levels where appropriate. READ COMMITTED releases shared locks faster than SERIALIZABLE.
- Retry on deadlock. Deadlocks are expected in concurrent systems, so build retry logic into application code.
-- Consistent lock ordering pattern
-- Transaction A and B both acquire the user lock before the account lock
BEGIN;
SELECT * FROM users WHERE id = 1 FOR UPDATE; -- acquire user lock first
SELECT * FROM accounts WHERE user_id = 1 FOR UPDATE; -- then account lock
-- do work
COMMIT;Q25. What is MVCC (Multi-Version Concurrency Control)?
MVCC is how PostgreSQL (and other databases like Oracle and MySQL InnoDB) implements transaction isolation without relying on read locks. Instead of locking rows so readers block writers and writers block readers, MVCC keeps multiple versions of each row.
When a transaction reads a row, it sees the version of that row as of when the transaction started, regardless of what other transactions do afterward. Writers create new row versions rather than overwriting in place, and readers see old versions. Readers and writers never block each other.
- Read-heavy workloads scale well, since reads are never blocked by writes.
- VACUUM / autovacuum is required in PostgreSQL to remove old row versions (dead tuples) that no longer matter to any transaction.
- Long-running transactions prevent VACUUM from cleaning up old versions, causing table bloat and eventually slowing queries.
Q26. What is the difference between optimistic and pessimistic locking?
Both are strategies for handling concurrent access to the same data, but they differ in when they check for conflicts.
Pessimistic locking assumes conflicts are likely. It locks the row when you first read it, preventing others from modifying it until you commit. It's a good fit for high-contention scenarios with short transactions.
-- PostgreSQL/MySQL: lock the row for update immediately on read
BEGIN;
SELECT * FROM inventory WHERE product_id = 99 FOR UPDATE; -- row is locked
-- check stock, decide to proceed
UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 99;
COMMIT;Optimistic locking assumes conflicts are rare. It does not lock on read. Instead, the row includes a version number or timestamp, and the update checks that the version has not changed. If it has, the update fails and the operation must retry.
-- Read:
SELECT id, quantity, version FROM inventory WHERE product_id = 99;
-- Returns: id=99, quantity=10, version=5
-- Update (check version has not changed):
UPDATE inventory
SET quantity = quantity - 1, version = version + 1
WHERE product_id = 99 AND version = 5; -- fails if another transaction changed it
-- Check affected rows: if 0 rows updated, conflict detected, retryQ27. What is a savepoint and when do you use it?
A savepoint is a named checkpoint within a transaction. If something goes wrong after the savepoint, you can roll back to it without undoing the entire transaction.
BEGIN;
INSERT INTO orders (user_id, total) VALUES (1, 99.00); -- step 1
SAVEPOINT after_order;
INSERT INTO order_items (order_id, product_id, qty) VALUES (1001, 5, 2); -- step 2
-- Something goes wrong with step 2
ROLLBACK TO SAVEPOINT after_order; -- undo step 2, keep step 1
-- Try a different item
INSERT INTO order_items (order_id, product_id, qty) VALUES (1001, 7, 1);
COMMIT; -- commits step 1 and the corrected step 2Use savepoints for complex multi-step transactions where partial retry is more efficient than restarting entirely. They're common in batch processing and migrations where you want to keep successful chunks if one chunk fails.
MySQL supports savepoints in InnoDB tables. PostgreSQL supports them natively.
Category 5: Advanced SQL, Window Functions and CTEs (Q28-Q35)
This category is where senior and staff-level interviews live. Window functions and CTEs show up constantly in analytics-heavy roles, and fluency here signals you can write production-grade reporting queries without reaching for application code.
Q28. What are window functions and how do they differ from GROUP BY?
Window functions perform calculations across a set of rows related to the current row without collapsing those rows into groups. GROUP BY reduces N rows to fewer rows. Window functions return the same number of rows as the input, with an extra computed column.
-- GROUP BY collapses rows: one row per department
SELECT department_id, AVG(salary) AS dept_avg
FROM employees
GROUP BY department_id;
-- Window function: keeps every row, adds the dept average alongside
SELECT
name,
salary,
department_id,
AVG(salary) OVER (PARTITION BY department_id) AS dept_avg,
salary - AVG(salary) OVER (PARTITION BY department_id) AS diff_from_avg
FROM employees;
-- 200 employee rows in, 200 rows out, with per-department averages on every rowWindow functions are invaluable for rankings within groups, running totals, moving averages, finding previous or next row values with LAG and LEAD, and percent-of-total calculations.
Q29. What is the difference between ROW_NUMBER, RANK, and DENSE_RANK?
All three assign numeric rankings to rows within a partition based on an ORDER BY clause. They differ in how they handle ties.
- ROW_NUMBER: assigns a unique sequential number regardless of ties. No row gets the same number.
- RANK: assigns the same rank to tied rows, then skips the next rank(s), creating gaps in the sequence.
- DENSE_RANK: assigns the same rank to tied rows, but does not skip ranks, so there are no gaps.
SELECT
name,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
RANK() OVER (ORDER BY salary DESC) AS rnk,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rnk
FROM employees;| name | salary | row_num | rnk | dense_rnk |
|---|---|---|---|---|
| Alice | 90000 | 1 | 1 | 1 |
| Bob | 85000 | 2 | 2 | 2 |
| Carol | 85000 | 3 | 2 (same rank, no skip) | 2 |
| Dave | 80000 | 4 | 4 (RANK skips 3) | 3 (DENSE_RANK doesn't skip) |
Classic interview query: find the top 3 earners per department.
SELECT name, department_id, salary
FROM (
SELECT
name, department_id, salary,
DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS dr
FROM employees
) ranked
WHERE dr <= 3;Q30. How do running totals and moving averages work with window functions?
Running totals use SUM() OVER with a frame that accumulates from the first row to the current row. The frame clause, ROWS BETWEEN ... AND ..., controls which rows are included in the window.
-- Running total of sales by date
SELECT
sale_date,
daily_total,
SUM(daily_total) OVER (ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS running_total
FROM daily_sales;
-- Moving 7-day average
SELECT
sale_date,
daily_total,
AVG(daily_total) OVER (ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
AS moving_avg_7d
FROM daily_sales;
-- Cumulative percentage of total
SELECT
product_id,
revenue,
SUM(revenue) OVER (ORDER BY revenue DESC) AS cumulative_revenue,
ROUND(
100.0 * SUM(revenue) OVER (ORDER BY revenue DESC)
/ SUM(revenue) OVER (),
2
) AS cumulative_pct
FROM product_revenue
ORDER BY revenue DESC;Q31. What are LAG and LEAD functions and when do you use them?
LAG accesses a value from a previous row in the partition. LEAD accesses a value from a following row. Both are window functions, so no self-join is needed.
LAG(column, n, default) looks n rows back with an optional default if no prior row exists. LEAD(column, n, default) looks n rows forward with the same optional default.
-- Month-over-month revenue change
SELECT
month,
revenue,
LAG(revenue, 1) OVER (ORDER BY month) AS prev_month_revenue,
revenue - LAG(revenue, 1) OVER (ORDER BY month) AS change,
ROUND(
100.0 * (revenue - LAG(revenue, 1) OVER (ORDER BY month))
/ LAG(revenue, 1) OVER (ORDER BY month),
2
) AS pct_change
FROM monthly_revenue;
-- Find orders placed within 24 hours of the previous order per customer
SELECT
customer_id,
order_id,
created_at,
LAG(created_at) OVER (PARTITION BY customer_id ORDER BY created_at) AS prev_order_at,
created_at - LAG(created_at) OVER (PARTITION BY customer_id ORDER BY created_at)
AS time_since_last_order
FROM orders;Q32. What is a CTE (Common Table Expression) and how does it compare to a subquery?
A CTE is a named temporary result set defined with the WITH clause at the start of a query. It can be referenced one or more times within the main query, which makes deeply nested subqueries far more readable.
-- Subquery version (deeply nested, harder to read)
SELECT *
FROM (
SELECT department_id, AVG(salary) AS dept_avg
FROM employees
GROUP BY department_id
) dept_avgs
JOIN departments d ON dept_avgs.department_id = d.id
WHERE dept_avgs.dept_avg > 70000;
-- CTE version (flat, readable, same logic)
WITH dept_averages AS (
SELECT department_id, AVG(salary) AS dept_avg
FROM employees
GROUP BY department_id
)
SELECT d.name, da.dept_avg
FROM dept_averages da
JOIN departments d ON da.department_id = d.id
WHERE da.dept_avg > 70000;
-- Multiple CTEs chained together
WITH
active_users AS (
SELECT id, email FROM users WHERE status = 'active'
),
recent_orders AS (
SELECT user_id, COUNT(*) AS order_count
FROM orders
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY user_id
)
SELECT u.email, o.order_count
FROM active_users u
JOIN recent_orders o ON u.id = o.user_id
ORDER BY o.order_count DESC;- Readability: CTEs let you name complex subexpressions once and reuse them, instead of nesting subqueries multiple levels deep.
- Optimization fences: in PostgreSQL before version 12, CTEs ran exactly once and acted as optimization fences, which could help or hurt performance. From version 12 onward, the optimizer can inline them.
- Inlining: subqueries can sometimes be inlined more aggressively by the optimizer than CTEs.
- Performance: for simple cases, performance between a CTE and an equivalent subquery is usually identical.
Q33. What is a recursive CTE and when do you use it?
A recursive CTE calls itself to traverse hierarchical or graph-structured data. It has two parts: an anchor member (the base case) and a recursive member.
-- Employee hierarchy: find all reports under a given manager
WITH RECURSIVE org_chart AS (
-- Anchor: start with the top-level manager
SELECT id, name, manager_id, 0 AS depth
FROM employees
WHERE id = 1 -- CEO or starting manager
UNION ALL
-- Recursive: join employees who report to the previous level
SELECT e.id, e.name, e.manager_id, oc.depth + 1
FROM employees e
JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT id, name, depth,
REPEAT(' ', depth) || name AS indented_name -- visual hierarchy
FROM org_chart
ORDER BY depth, name;
-- Category tree: find all subcategories
WITH RECURSIVE category_tree AS (
SELECT id, name, parent_id, name AS path
FROM categories
WHERE parent_id IS NULL -- root categories
UNION ALL
SELECT c.id, c.name, c.parent_id,
ct.path || ' > ' || c.name
FROM categories c
JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT id, path FROM category_tree ORDER BY path;Q34. What is the difference between a stored procedure and a function?
Both are named blocks of SQL code stored in the database, but they differ in how they're called and what they can do.
| Stored procedure | Function | |
|---|---|---|
| Called with | CALL or EXEC | Inside a SQL expression, e.g. SELECT myfunc(col) FROM table |
| Return value | Optional, can return zero, one, or many values via OUT params, and multiple result sets | Must return a single scalar value or a table |
| Transaction control | Can COMMIT or ROLLBACK inside it | Cannot perform transaction control in most databases |
| Usable in SELECT/WHERE/JOIN | No | Yes |
-- PostgreSQL function: returns a scalar value
CREATE OR REPLACE FUNCTION get_dept_avg_salary(dept_id INT)
RETURNS NUMERIC AS $$
SELECT AVG(salary)
FROM employees
WHERE department_id = dept_id;
$$ LANGUAGE SQL;
-- Use it in a query like any other expression
SELECT name, salary, get_dept_avg_salary(department_id) AS dept_avg
FROM employees;
-- PostgreSQL stored procedure (Postgres 11+)
CREATE OR REPLACE PROCEDURE archive_old_orders(cutoff_date DATE)
LANGUAGE plpgsql AS $$
BEGIN
INSERT INTO orders_archive SELECT * FROM orders WHERE created_at < cutoff_date;
DELETE FROM orders WHERE created_at < cutoff_date;
COMMIT; -- stored procedures can commit
END;
$$;
CALL archive_old_orders('2024-01-01');If your backend uses an ORM, you'll rarely write these by hand. Our NestJS interview questions guide covers how TypeORM repositories and query builders wrap this kind of logic at the application layer.
Q35. What is a materialized view and when should you use one?
A regular, non-materialized view is a saved query. Every time you query the view, the underlying query runs, so it always returns fresh data but adds latency.
A materialized view stores the query result as a physical table snapshot. Reads from it are instant since no query runs, but the data is stale until you refresh it manually or on a schedule.
-- PostgreSQL: create a materialized view for a heavy analytics query
CREATE MATERIALIZED VIEW monthly_revenue_summary AS
SELECT
DATE_TRUNC('month', o.created_at) AS month,
p.category,
SUM(oi.quantity * oi.unit_price) AS total_revenue,
COUNT(DISTINCT o.id) AS order_count
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
GROUP BY 1, 2;
-- Create an index on the materialized view for fast lookups
CREATE INDEX ON monthly_revenue_summary(month, category);
-- Refresh when data changes (full refresh, blocks reads while running)
REFRESH MATERIALIZED VIEW monthly_revenue_summary;
-- Concurrent refresh: does not block readers (requires a UNIQUE index)
CREATE UNIQUE INDEX ON monthly_revenue_summary(month, category);
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue_summary;- The base query is slow and expensive, involving complex joins or large aggregations.
- The data does not need to be real-time. Hourly or daily refresh is acceptable.
- The view is read frequently, so the cost of maintaining a snapshot is worth it.
Category 6: MySQL vs PostgreSQL and Query Challenges (Q36-Q40)
The final stretch covers the practical differences between the two databases you're most likely to use day to day, plus the classic whiteboard queries almost every SQL interview ends with.
Q36. What are the key differences between MySQL and PostgreSQL?
Both are open-source relational databases, but they make meaningfully different tradeoffs.
| Area | MySQL | PostgreSQL |
|---|---|---|
| DDL transactions | Auto-commits, cannot be rolled back | Transactional, CREATE TABLE can be rolled back inside BEGIN |
| Default isolation level | REPEATABLE READ (InnoDB) | READ COMMITTED |
| JSON support | JSON type since 5.7, less powerful | Rich native JSONB with indexing and operators (@>, ?) |
| Index types | B-Tree, FULLTEXT, spatial | B-Tree, Hash, GIN, GiST, BRIN, partial and expression indexes |
| Full-text search | FULLTEXT index, simpler | tsvector/tsquery with dictionaries, stemming, and ranking |
| Extensions | Limited | PostGIS (geospatial), pgvector (AI embeddings), pg_trgm (fuzzy text), TimescaleDB |
| Upsert syntax | INSERT ... ON DUPLICATE KEY UPDATE | INSERT ... ON CONFLICT DO UPDATE |
MySQL tends to fit web apps with high write throughput and an existing LAMP or LEMP stack where setup simplicity matters. PostgreSQL tends to fit complex queries, analytics, JSON-heavy workloads, geographic data, and AI or ML pipelines built on pgvector.
If your data layer needs to go beyond either of these, our NoSQL interview questions guide covers MongoDB, Redis, and DynamoDB for when schema flexibility or horizontal scale matters more than relational integrity.
Q37. How do you perform an upsert in MySQL and PostgreSQL?
An upsert inserts a row if it does not exist, or updates it if it does. It's an atomic operation, so there's no race condition between the check and the insert.
-- PostgreSQL: ON CONFLICT DO UPDATE
INSERT INTO user_stats (user_id, login_count, last_login)
VALUES (42, 1, NOW())
ON CONFLICT (user_id) DO UPDATE
SET
login_count = user_stats.login_count + 1,
last_login = EXCLUDED.last_login;
-- EXCLUDED refers to the row that failed to insert
-- PostgreSQL: ignore duplicate (do nothing)
INSERT INTO tags (name)
VALUES ('javascript')
ON CONFLICT (name) DO NOTHING;
-- MySQL: ON DUPLICATE KEY UPDATE
INSERT INTO user_stats (user_id, login_count, last_login)
VALUES (42, 1, NOW())
ON DUPLICATE KEY UPDATE
login_count = login_count + 1,
last_login = VALUES(last_login);
-- VALUES(col) refers to the value that was being inserted
-- MySQL: REPLACE INTO (deletes + re-inserts, loses auto-increment continuity)
REPLACE INTO user_stats (user_id, login_count, last_login)
VALUES (42, 1, NOW());
-- Avoid REPLACE INTO: it deletes the row and re-inserts it, losing the original idQ38. How does JSON work in PostgreSQL and when would you use it?
PostgreSQL has two JSON types: json, stored as text and validated, and jsonb, stored as binary, indexed, and faster to query. Use jsonb for almost everything.
-- Store and query JSONB
CREATE TABLE events (
id SERIAL PRIMARY KEY,
event_type VARCHAR(50),
payload JSONB NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
INSERT INTO events (event_type, payload) VALUES
('purchase', '{"user_id": 42, "items": [{"sku": "ABC", "qty": 2}], "total": 49.99}'),
('login', '{"user_id": 42, "ip": "192.168.1.1", "device": "mobile"}');
-- Extract a field
SELECT payload->>'user_id' AS user_id, -- text
payload->'total' AS total -- jsonb
FROM events WHERE event_type = 'purchase';
-- Filter by JSON value
SELECT * FROM events WHERE payload->>'user_id' = '42';
SELECT * FROM events WHERE payload @> '{"event_type": "purchase"}'; -- containment
-- Index JSONB for fast filtering
CREATE INDEX idx_events_payload ON events USING GIN (payload);
-- Supports @>, ?, ?|, ?& operators efficiently
-- Index a specific JSONB path
CREATE INDEX idx_events_user_id ON events((payload->>'user_id'));Use JSONB when the schema is flexible or evolving, when you're storing event or audit logs, when you're receiving third-party API payloads in varying shapes, or when you need a document-like model alongside relational data.
Q39. Classic interview query: find the second-highest salary
This is one of the most commonly asked SQL query challenges, and there are several valid approaches.
-- Method 1: Window function (cleanest, handles ties correctly)
SELECT salary
FROM (
SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS dr
FROM employees
) ranked
WHERE dr = 2
LIMIT 1;
-- Method 2: Subquery with MAX
SELECT MAX(salary) AS second_highest
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
-- Method 3: OFFSET (simple but fragile with ties)
SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;
-- Nth highest salary (generalized with window function)
SELECT salary
FROM (
SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS dr
FROM employees
) ranked
WHERE dr = 5 -- change this to get any Nth highest
LIMIT 1;Q40. Classic interview queries: department with highest salary, duplicate detection, and customers with no orders
Interviewers ask these frequently. Know them cold.
-- 1. Find the department with the highest average salary
SELECT d.name, AVG(e.salary) AS avg_salary
FROM employees e
JOIN departments d ON e.department_id = d.id
GROUP BY d.id, d.name
ORDER BY avg_salary DESC
LIMIT 1;
-- Or using a CTE for clarity:
WITH dept_avgs AS (
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
)
SELECT d.name, da.avg_salary
FROM dept_avgs da
JOIN departments d ON da.department_id = d.id
ORDER BY da.avg_salary DESC
LIMIT 1;
-- 2. Find duplicate emails in a users table
SELECT email, COUNT(*) AS occurrences
FROM users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY occurrences DESC;
-- Delete duplicates, keep the row with the lowest id (PostgreSQL)
DELETE FROM users
WHERE id NOT IN (
SELECT MIN(id) FROM users GROUP BY email
);
-- 3. Customers who have never placed an order
SELECT c.id, c.name, c.email
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.customer_id IS NULL;
-- 4. Find employees earning more than their manager
SELECT e.name AS employee, e.salary,
m.name AS manager, m.salary AS manager_salary
FROM employees e
JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;
-- 5. Running total and percentage of total in one query
SELECT
product_name,
revenue,
SUM(revenue) OVER (ORDER BY revenue DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS cumulative_revenue,
ROUND(100.0 * revenue / SUM(revenue) OVER (), 2) AS pct_of_total
FROM product_revenue
ORDER BY revenue DESC;Quick Reference: All 40 Questions at a Glance
Use this table to scan every question and its core concept in one pass. It's the fastest way to spot the topics you need to revisit before an interview.
| # | Question | Core concept |
|---|---|---|
| Q1 | SQL sublanguages (DDL/DML/DCL/TCL) | Core SQL categories |
| Q2 | PRIMARY KEY vs UNIQUE | Constraints, nullability, count |
| Q3 | NULL behavior and three-valued logic | IS NULL, COALESCE, COUNT gotchas |
| Q4 | DELETE vs TRUNCATE vs DROP | Scope, rollback, triggers |
| Q5 | WHERE vs HAVING | Filter before vs after GROUP BY |
| Q6 | UNION vs UNION ALL | Deduplication, performance |
| Q7 | SQL constraint types | NOT NULL, FK, CHECK, DEFAULT |
| Q8 | SQL query execution order | FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY, LIMIT |
| Q9 | All JOIN types with examples | INNER, LEFT, RIGHT, FULL, CROSS |
| Q10 | Self-join | Same table with aliases, hierarchy |
| Q11 | Anti-join pattern | LEFT JOIN IS NULL vs NOT EXISTS |
| Q12 | Correlated subquery | References outer query, runs per row |
| Q13 | FOREIGN KEY and ON DELETE options | Referential integrity, CASCADE, RESTRICT, SET NULL |
| Q14 | N+1 query problem | 1+N round trips, fix with JOIN or eager load |
| Q15 | Subquery vs JOIN | Performance, readability, optimizer |
| Q16 | B-Tree index mechanics | How indexes speed up reads, O(log n) |
| Q17 | Clustered vs non-clustered index | Physical order vs pointer-based |
| Q18 | Composite index and column order | Left-prefix rule, cardinality |
| Q19 | Covering index | Index-only scan, INCLUDE syntax |
| Q20 | EXPLAIN / EXPLAIN ANALYZE | Query plan, Seq Scan vs Index Scan |
| Q21 | When NOT to index | Low cardinality, write-heavy, small tables |
| Q22 | ACID properties | Atomicity, Consistency, Isolation, Durability |
| Q23 | Transaction isolation levels | 4 levels, 3 anomalies, MySQL vs PostgreSQL defaults |
| Q24 | Deadlocks and prevention | Circular waits, lock ordering, retry |
| Q25 | MVCC | Multi-version concurrency, no read locks |
| Q26 | Optimistic vs pessimistic locking | FOR UPDATE vs version column |
| Q27 | Savepoints | Partial rollback within a transaction |
| Q28 | Window functions vs GROUP BY | Non-aggregating, OVER(), PARTITION BY |
| Q29 | ROW_NUMBER vs RANK vs DENSE_RANK | Tie handling, gaps vs no gaps |
| Q30 | Running totals and moving averages | SUM OVER, ROWS BETWEEN frame |
| Q31 | LAG and LEAD functions | Prior/next row access, MoM changes |
| Q32 | CTEs vs subqueries | WITH clause, readability, optimizer inlining |
| Q33 | Recursive CTE | Hierarchy traversal, UNION ALL, depth |
| Q34 | Stored procedure vs function | Return type, CALL vs SELECT, transaction control |
| Q35 | Materialized view | Cached query result, REFRESH, use cases |
| Q36 | MySQL vs PostgreSQL differences | ACID, isolation, JSON, indexes, upsert |
| Q37 | Upsert patterns | ON CONFLICT vs ON DUPLICATE KEY UPDATE |
| Q38 | JSONB in PostgreSQL | Operators, GIN index, when to use |
| Q39 | Second-highest salary (query challenge) | DENSE_RANK, subquery MAX, OFFSET |
| Q40 | Classic query challenges | Dept max salary, duplicates, no-orders, manager comparison |
Frequently Asked Questions
What level of SQL knowledge do these 40 questions target?
This guide spans junior fundamentals through senior query optimization. Questions 1 through 15 cover the theory and joins that every backend role expects: constraints, NULL handling, and how queries actually execute. Questions 16 through 40 go deep on indexing, transactions, window functions, and the practical MySQL versus PostgreSQL differences that separate mid-level and senior candidates.
If you're early in your career, focus on Categories 1 and 2 first. If you're interviewing for a senior or staff backend, data engineering, or DBA role, the indexing questions (Q16 to Q21), the transaction and concurrency questions (Q22 to Q27), and the window function and CTE questions (Q28 to Q35) are where most of your prep time should go.
Should I prepare with MySQL or PostgreSQL syntax for interviews?
| MySQL | PostgreSQL | |
|---|---|---|
| Where it's used most | WordPress, LAMP/LEMP stacks, many startups | Analytics-heavy and data-engineering roles, Supabase, AI/ML stacks with pgvector |
| Syntax style in this guide | ON DUPLICATE KEY UPDATE, REPEATABLE READ default | ON CONFLICT, RETURNING, JSONB operators, READ COMMITTED default |
| Good default for interviews | If the job description mentions LAMP, WordPress, or a legacy PHP/Node stack | If the job description mentions analytics, data pipelines, JSON APIs, or AI features |
Core SQL concepts (joins, indexes, ACID, window functions) are identical between the two. The syntax differences in Q36 to Q38 are what shift, so check the job posting's stack and skim those three answers for the matching dialect the night before.
How can I practice these SQL queries before an interview?
Run both databases locally with Docker so you can type the exact queries from this guide and see real output and real EXPLAIN plans.
# MySQL
docker run -d -p 3306:3306 -e MYSQL_ROOT_PASSWORD=pass --name mysql mysql:8
# PostgreSQL
docker run -d -p 5432:5432 -e POSTGRES_PASSWORD=pass --name postgres postgres:16What is the single most commonly asked SQL query challenge?
Finding the second-highest (or Nth-highest) salary (Q39) is the single most repeated SQL whiteboard question across companies. Interviewers use it to see whether you reach for DENSE_RANK and window functions or fall back to a fragile LIMIT/OFFSET approach that breaks on ties.
- Right after it: customers with no orders (Q11, Q40), which tests the anti-join pattern.
- Then: duplicate detection and department-level aggregation (Q40), which tests GROUP BY and HAVING together.
- For senior roles: the top-N-per-group pattern with DENSE_RANK and PARTITION BY (Q29) shows up inside larger system-design questions about leaderboards and reporting dashboards.
What follow-up questions tend to come after these 40, once I've answered the basics?
- Schema design scenarios: "design a schema for an e-commerce checkout flow" builds directly on the constraints and foreign key rules from Q7 and Q13.
- Query rewriting: "this query is slow, how would you speed it up?" expects you to reach for EXPLAIN (Q20), indexing strategy (Q16 to Q19), or rewriting a correlated subquery as a window function (Q12).
- Concurrency scenarios: "two requests update the same row at the same time, what happens?" builds on isolation levels (Q23), deadlocks (Q24), and optimistic locking (Q26).
- ORM integration: "how would this query look written with TypeORM or Prisma?" connects this guide to our NestJS interview questions and Drizzle ORM migrations guides.
- System design tie-in: expect these concepts to resurface inside a larger system design question, such as designing a rate limiter, an audit log, or a multi-tenant SaaS data layer.
Related Articles
42 NoSQL Database Interview Questions and Answers (2026)
42 NoSQL interview questions covering MongoDB, Redis, and DynamoDB: aggregation pipelines, data structures, GSI vs LSI, and CAP theorem. Updated for 2026.
30 Node.js Interview Questions and Answers (2026)
30 Node.js interview questions with full answers: event loop, streams, clustering, worker threads, memory leaks, and security. Updated for 2026.
Drizzle ORM Migrations: A Practical drizzle-kit Guide
Learn the full Drizzle ORM migration workflow: push vs migrate, drizzle-kit setup, Turso/libSQL config, team conflicts, and production best practices.