Spotify recommends a song you’ve never heard and you immediately love it.
Netflix shows you a movie that’s perfect for your mood.
Google finds the exact document you need even though you used different words.
All of these are powered by Embeddings and Vector Databases.
This is the Mastery Guide to the infrastructure powering modern AI — from semantic search to choosing the right vector DB for production.
Part 1: Foundations (The Mental Model)
Traditional Search = Finding an Exact Word
A traditional database search is lexical — it looks for exact character matches.
1
| SELECT * FROM docs WHERE content LIKE '%refund policy%';
|
This finds “refund policy” but misses: “money back guarantee”, “cancellation terms”, or “how to return a product” — all meaning the same thing.
Embeddings = The Semantic Atlas
An Embedding is a mathematical translation of meaning into coordinates in a high-dimensional space (typically 768–3072 dimensions).
Think of it like a map (2D simplification):
1
2
3
4
5
6
7
8
9
| "Quantum Physics"
│
"Machine Learning" ────────┼──── "Neural Networks"
│
"Deep Learning" "Astronomy"
"Dog" ──── "Cat" ──── "Puppy"
"Democracy" ─ "Election" ─ "Politics"
|
Words/sentences that are semantically similar are close together on this map. Your query “money back” lands near “refund policy” — even though the words are different.
Key insight: A Vector DB doesn’t search for words. It searches for nearby coordinates on the semantic map.
Part 2: The Investigation (How Similarity Works)
Cosine Similarity
The most common way to measure “closeness” between two vectors:
similarity = 1.0 → Perfect match.similarity ≈ 0.8 → Very similar (“dog” vs “canine”).similarity ≈ 0.1 → Very different (“dog” vs “database”).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| import numpy as np
from openai import OpenAI
client = OpenAI()
def embed(text: str) -> list[float]:
return client.embeddings.create(
model="text-embedding-3-small", input=text
).data[0].embedding # Returns [0.23, -0.87, ...] (1536 dims)
def cosine_similarity(a, b) -> float:
a, b = np.array(a), np.array(b)
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
print(cosine_similarity(embed("dog"), embed("puppy"))) # ~0.92
print(cosine_similarity(embed("dog"), embed("database"))) # ~0.15
|
ANN Search (Approximate Nearest Neighbor)
Brute-force search through 10 million vectors is too slow. Vector DBs use HNSW (Hierarchical Navigable Small World) to find nearest neighbors in milliseconds — trading a tiny bit of accuracy for massive speed gains.
Part 3: The Diagnosis (Choosing the Right Vector DB)
| Database | Best For | Key Feature |
|---|
| pgvector | Small-medium scale, existing Postgres users | Zero extra infra. SQL + vectors together. |
| Chroma | Local dev, prototyping | Easiest to start. In-memory mode. |
| Weaviate | Hybrid search (keyword + semantic) | Built-in BM25 + vector search. |
| Qdrant | High-performance, self-hosted | Fast, Rust-based, excellent filtering. |
| Pinecone | Managed, serverless, large scale | Zero ops. Expensive at scale. |
| Milvus | Billion-scale, open source | Most scalable open-source option. |
Decision Guide
1
2
3
4
5
6
| Prototyping? → Chroma (local, zero setup)
Already use Postgres? → pgvector (no new infra)
Need hybrid search? → Weaviate
Need fast + self-hosted? → Qdrant
Need managed cloud? → Pinecone
Need billion-scale OSS? → Milvus
|
Part 4: The Resolution (Python Cookbook)
1. pgvector (Simplest Production Setup)
1
2
3
4
5
6
7
8
9
10
| CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536)
);
-- HNSW index for fast ANN search
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
| import psycopg2
conn = psycopg2.connect("postgresql://...")
def index_document(content: str):
embedding = embed(content)
conn.execute(
"INSERT INTO documents (content, embedding) VALUES (%s, %s)",
(content, embedding)
)
def search(query: str, top_k: int = 5) -> list[dict]:
q_vec = embed(query)
cur = conn.cursor()
cur.execute("""
SELECT content, 1 - (embedding <=> %s::vector) AS score
FROM documents
ORDER BY embedding <=> %s::vector -- <=> is cosine distance
LIMIT %s
""", (q_vec, q_vec, top_k))
return [{"content": r[0], "score": r[1]} for r in cur.fetchall()]
results = search("What is the refund policy?")
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
| from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
client = QdrantClient(":memory:") # or url="http://localhost:6333"
client.create_collection(
collection_name="docs",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)
# Index documents
client.upsert(
collection_name="docs",
points=[
PointStruct(id=i, vector=embed(chunk), payload={"content": chunk})
for i, chunk in enumerate(chunks)
]
)
# Search
results = client.search(
collection_name="docs",
query_vector=embed("refund policy"),
limit=5
)
for r in results:
print(f"Score: {r.score:.3f} | {r.payload['content'][:100]}")
|
3. Chroma (Local Prototyping)
1
2
3
4
5
6
7
8
9
10
11
12
| import chromadb
client = chromadb.Client() # In-memory, zero setup
collection = client.create_collection("docs")
collection.add(
documents=chunks,
ids=[f"id_{i}" for i in range(len(chunks))]
# Chroma auto-embeds if you don't provide embeddings
)
results = collection.query(query_texts=["refund policy"], n_results=5)
|
Final Mental Model
1
2
3
4
5
6
7
8
9
10
11
| Traditional DB → Finds the exact word "dog". Misses "canine", "puppy", "hound".
Vector DB → Finds everything near the concept of "dog". Semantics, not syntax.
Embedding → GPS coordinate of a meaning on the semantic map.
Cosine Similarity→ Angle between two coordinate vectors. (0=opposite, 1=identical).
HNSW Index → Fast navigation shortcut through the high-dimensional map.
pgvector → Your Postgres DB grows wings. Start here.
Pinecone → Someone else runs it. You pay more.
Qdrant → The performance king for self-hosted.
Chroma → Your local playground. Zero friction.
|
The AI Stack of 2026:
- Embeddings → Turn your data into semantic coordinates.
- Vector DB → Store and search those coordinates at scale.
- LLM + RAG → Reason over the retrieved semantic results.
This is the complete foundation. Three posts. One unified architecture.