A query that takes 30 seconds becomes 50 milliseconds after adding one index.
No code change. No new hardware. One line of SQL.
Understanding why that works — and when it doesn’t — separates engineers who “add indexes when it’s slow” from engineers who design data models that never get slow.
This is the Mastery Guide to Database Indexing.
Part 1: Foundations (The Mental Model)
Without an Index = Reading Every Page
Imagine a library with 1,000,000 books. You want “The Great Gatsby.”
Without a catalog: You walk through every shelf, reading the spine of every book, until you find it. In the worst case, you check all 1,000,000 books. (Full Table Scan).
With a Card Catalog (Index): You walk to the “F” drawer → find “Fitzgerald” → it says “Shelf 42, Row 7”. You walk directly there. 3 steps instead of 1,000,000.
In a database, the index is the Card Catalog. It stores the column values in a sorted structure (a B-Tree) with pointers to the actual row on disk.
The B-Tree (How Indexes Are Stored Internally)
A B-Tree is a sorted, balanced tree. Think of it like a binary search on steroids.
1
2
3
4
5
6
7
8
9
| Searching for user_id = 7:
[1, 5, 10, 50]
│
▼ (7 is between 5 and 10)
[5, 6, 7, 8]
│
▼ (Found: 7 → points to row at disk block 482)
ROW DATA
|
- Reads: O(log N). Finding one record in 1 billion rows: ~30 comparisons.
- Writes: Slower. Every INSERT/UPDATE must also update the index tree.
Part 2: The Investigation (Index Types)
1. Single Column Index
1
2
3
4
5
6
7
8
| -- Slow: Full Table Scan. 1 million rows checked.
SELECT * FROM orders WHERE customer_id = 42;
-- Add index:
CREATE INDEX idx_orders_customer ON orders (customer_id);
-- Now fast: B-Tree lookup. ~30 comparisons.
SELECT * FROM orders WHERE customer_id = 42;
|
2. Compound Index (Multi-Column): The Order Matters!
A compound index on (a, b, c) is like a phone book sorted by: Last Name → First Name → City.
1
2
3
4
5
6
7
8
| CREATE INDEX idx_orders_status_date ON orders (status, created_at);
-- ✅ Uses the index (leftmost prefix rule)
SELECT * FROM orders WHERE status = 'paid';
SELECT * FROM orders WHERE status = 'paid' AND created_at > '2024-01-01';
-- ❌ Does NOT use the index (skips the leftmost column)
SELECT * FROM orders WHERE created_at > '2024-01-01';
|
The Leftmost Prefix Rule: An index on (a, b, c) can be used for queries on a, a+b, or a+b+c. Never for queries starting with only b or c.
3. Covering Index (The Perfect Index)
A covering index contains ALL columns the query needs. The database never touches the actual table rows.
1
2
3
4
5
6
7
| -- Query needs: status, created_at, total_amount
SELECT status, created_at, total_amount FROM orders WHERE status = 'paid';
-- Covering Index: includes all 3 columns
CREATE INDEX idx_covering ON orders (status, created_at, total_amount);
-- The index itself contains the answer. Zero table row reads!
|
Part 3: The Diagnosis (Queries That Kill)
EXPLAIN ANALYZE — Your X-Ray Vision
1
2
| EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42;
|
What to look for:
Seq Scan → Full Table Scan. Missing index. 🔴Index Scan → Using index to find rows, then fetching them. 🟡Index Only Scan → Using a covering index. Never touches table. 🟢
Query Patterns That Kill Indexes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| -- ❌ Function on left side: Index on `created_at` is IGNORED!
WHERE YEAR(created_at) = 2024
-- ✅ Fix:
WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31'
-- ❌ Leading wildcard: Index IGNORED (can't sort by unknown prefix)
WHERE email LIKE '%@gmail.com'
-- ✅ Fix: Use full-text search, or prefix match:
WHERE email LIKE 'alice%'
-- ❌ OR condition across different columns: Often causes two index scans
WHERE status = 'paid' OR customer_id = 42
-- ✅ Fix: UNION
SELECT * FROM orders WHERE status = 'paid'
UNION
SELECT * FROM orders WHERE customer_id = 42
|
Part 4: The Resolution (Index Design Strategies)
1. The “ESR” Rule for Compound Indexes
Build compound indexes in this order: Equality → Sort → Range.
1
2
3
4
5
6
7
8
9
| -- Query: Find paid orders, sorted by date, in 2024
SELECT * FROM orders
WHERE status = 'paid' -- Equality
ORDER BY created_at DESC -- Sort
AND created_at > '2024-01-01'; -- Range
-- Best Index:
CREATE INDEX idx_esr ON orders (status, created_at);
-- Equality Sort+Range
|
2. The Partial Index
Index only the rows you care about. Smaller, faster.
1
2
3
| -- 99% of orders are 'completed'. You only query 'pending'.
CREATE INDEX idx_pending ON orders (created_at)
WHERE status = 'pending'; -- Only indexes the 1% of rows that matter!
|
3. Find Missing Indexes (PostgreSQL)
1
2
3
4
5
| -- Tables with lots of sequential scans (missing indexes!)
SELECT schemaname, tablename, seq_scan, idx_scan
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan
ORDER BY seq_scan DESC;
|
Final Mental Model
1
2
3
4
5
6
7
8
| Full Table Scan -> Reading every book in the library.
B-Tree Index -> The Card Catalog. Sorted. O(log N).
Compound Index -> Phone book: sorted by LastName → FirstName → City.
Covering Index -> The answer IS in the catalog. Never touch the shelves.
Leftmost Prefix -> Index on (a,b,c) helps queries starting with 'a'. Not 'b' or 'c'.
Seq Scan -> 🔴 Missing index.
Index Only Scan -> 🟢 Perfect covering index.
|
Rules:
- Run
EXPLAIN ANALYZE before and after adding an index. - Never add indexes blindly — each index slows down writes.
- Design for your most frequent queries, not all possible queries.