Featured image of post Database Indexing: The 'Library Card Catalog' Mental Model

Database Indexing: The 'Library Card Catalog' Mental Model

Why does adding an index make a query 100x faster? A mastery guide to B-Trees, compound indexes, covering indexes, and the queries that kill your database.

A query that takes 30 seconds becomes 50 milliseconds after adding one index.

No code change. No new hardware. One line of SQL.

Understanding why that works — and when it doesn’t — separates engineers who “add indexes when it’s slow” from engineers who design data models that never get slow.

This is the Mastery Guide to Database Indexing.


Part 1: Foundations (The Mental Model)

Without an Index = Reading Every Page

Imagine a library with 1,000,000 books. You want “The Great Gatsby.”

Without a catalog: You walk through every shelf, reading the spine of every book, until you find it. In the worst case, you check all 1,000,000 books. (Full Table Scan).

With a Card Catalog (Index): You walk to the “F” drawer → find “Fitzgerald” → it says “Shelf 42, Row 7”. You walk directly there. 3 steps instead of 1,000,000.

In a database, the index is the Card Catalog. It stores the column values in a sorted structure (a B-Tree) with pointers to the actual row on disk.

The B-Tree (How Indexes Are Stored Internally)

A B-Tree is a sorted, balanced tree. Think of it like a binary search on steroids.

1
2
3
4
5
6
7
8
9
Searching for user_id = 7:

[1, 5, 10, 50]
    ▼ (7 is between 5 and 10)
[5, 6, 7, 8]
    ▼ (Found: 7 → points to row at disk block 482)
ROW DATA
  • Reads: O(log N). Finding one record in 1 billion rows: ~30 comparisons.
  • Writes: Slower. Every INSERT/UPDATE must also update the index tree.

Part 2: The Investigation (Index Types)

1. Single Column Index

1
2
3
4
5
6
7
8
-- Slow: Full Table Scan. 1 million rows checked.
SELECT * FROM orders WHERE customer_id = 42;

-- Add index:
CREATE INDEX idx_orders_customer ON orders (customer_id);

-- Now fast: B-Tree lookup. ~30 comparisons.
SELECT * FROM orders WHERE customer_id = 42;

2. Compound Index (Multi-Column): The Order Matters!

A compound index on (a, b, c) is like a phone book sorted by: Last Name → First Name → City.

1
2
3
4
5
6
7
8
CREATE INDEX idx_orders_status_date ON orders (status, created_at);

-- ✅ Uses the index (leftmost prefix rule)
SELECT * FROM orders WHERE status = 'paid';
SELECT * FROM orders WHERE status = 'paid' AND created_at > '2024-01-01';

-- ❌ Does NOT use the index (skips the leftmost column)
SELECT * FROM orders WHERE created_at > '2024-01-01';

The Leftmost Prefix Rule: An index on (a, b, c) can be used for queries on a, a+b, or a+b+c. Never for queries starting with only b or c.

3. Covering Index (The Perfect Index)

A covering index contains ALL columns the query needs. The database never touches the actual table rows.

1
2
3
4
5
6
7
-- Query needs: status, created_at, total_amount
SELECT status, created_at, total_amount FROM orders WHERE status = 'paid';

-- Covering Index: includes all 3 columns
CREATE INDEX idx_covering ON orders (status, created_at, total_amount);

-- The index itself contains the answer. Zero table row reads!

Part 3: The Diagnosis (Queries That Kill)

EXPLAIN ANALYZE — Your X-Ray Vision

1
2
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42;

What to look for:

  • Seq ScanFull Table Scan. Missing index. 🔴
  • Index Scan → Using index to find rows, then fetching them. 🟡
  • Index Only Scan → Using a covering index. Never touches table. 🟢

Query Patterns That Kill Indexes

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
-- ❌ Function on left side: Index on `created_at` is IGNORED!
WHERE YEAR(created_at) = 2024
-- ✅ Fix:
WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31'

-- ❌ Leading wildcard: Index IGNORED (can't sort by unknown prefix)
WHERE email LIKE '%@gmail.com'
-- ✅ Fix: Use full-text search, or prefix match:
WHERE email LIKE 'alice%'

-- ❌ OR condition across different columns: Often causes two index scans
WHERE status = 'paid' OR customer_id = 42
-- ✅ Fix: UNION
SELECT * FROM orders WHERE status = 'paid'
UNION
SELECT * FROM orders WHERE customer_id = 42

Part 4: The Resolution (Index Design Strategies)

1. The “ESR” Rule for Compound Indexes

Build compound indexes in this order: Equality → Sort → Range.

1
2
3
4
5
6
7
8
9
-- Query: Find paid orders, sorted by date, in 2024
SELECT * FROM orders
WHERE status = 'paid'        -- Equality
ORDER BY created_at DESC     -- Sort  
AND created_at > '2024-01-01'; -- Range

-- Best Index:
CREATE INDEX idx_esr ON orders (status, created_at);
--                               Equality  Sort+Range

2. The Partial Index

Index only the rows you care about. Smaller, faster.

1
2
3
-- 99% of orders are 'completed'. You only query 'pending'.
CREATE INDEX idx_pending ON orders (created_at)
WHERE status = 'pending';  -- Only indexes the 1% of rows that matter!

3. Find Missing Indexes (PostgreSQL)

1
2
3
4
5
-- Tables with lots of sequential scans (missing indexes!)
SELECT schemaname, tablename, seq_scan, idx_scan
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan
ORDER BY seq_scan DESC;

Final Mental Model

1
2
3
4
5
6
7
8
Full Table Scan -> Reading every book in the library.
B-Tree Index    -> The Card Catalog. Sorted. O(log N).
Compound Index  -> Phone book: sorted by LastName → FirstName → City.
Covering Index  -> The answer IS in the catalog. Never touch the shelves.

Leftmost Prefix -> Index on (a,b,c) helps queries starting with 'a'. Not 'b' or 'c'.
Seq Scan        -> 🔴 Missing index.
Index Only Scan -> 🟢 Perfect covering index.

Rules:

  • Run EXPLAIN ANALYZE before and after adding an index.
  • Never add indexes blindly — each index slows down writes.
  • Design for your most frequent queries, not all possible queries.
Made with laziness love 🦥

Subscribe to My Newsletter