Khoj: The Open-Source AI Second Brain You Can Self-Host

Part 1: Foundations — The Mental Model

You probably have notes scattered across Obsidian, Notion, PDF research papers, and markdown files. You switch between ChatGPT, Claude, and Gemini tabs, pasting context in by hand. You want an AI that already knows everything you know — but all the big players lock your data into their cloud.

That is exactly the gap Khoj is built to fill.

Mental Model: Think of Khoj as a personal AI brain running on Rails — not a chatbot, but an always-on knowledge assistant that has read every document you’ve ever written, can search the internet, can create autonomous agents, and can do all of this either on your own machine or on Khoj’s cloud, at your choice.

Where most AI tools are stateless (each conversation starts empty), Khoj is stateful and knowledge-indexed. It is your AI that remembers.

Part 2: The Investigation — Architecture Deep Dive

The Big Picture

Khoj is a full-stack Python application built on FastAPI at the core. Here is the high-level flow:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
┌──────────────────────────────────────────────────────────────┐
│                      Khoj Clients                            │
│  Web App │ Obsidian Plugin │ Emacs Package │ Phone │ WhatsApp │
└─────────────────────────┬────────────────────────────────────┘
                          │ REST / WebSocket API
┌─────────────────────────▼────────────────────────────────────┐
│                     Khoj Server (FastAPI)                    │
│  ┌─────────────┐  ┌──────────────┐  ┌──────────────────────┐ │
│  │  Indexing   │  │  Conversation│  │     Agent Engine     │ │
│  │  Pipeline   │  │  Router      │  │  (Tool + Planner)    │ │
│  └──────┬──────┘  └──────┬───────┘  └──────────────────────┘ │
│         │                │                                    │
└─────────┼────────────────┼────────────────────────────────────┘
          │                │
┌─────────▼──────┐  ┌──────▼──────────────────────────────────┐
│  Vector Store  │  │            LLM Adapters                  │
│  (embeddings)  │  │  OpenAI │ Anthropic │ Google │ Ollama    │
└────────────────┘  └─────────────────────────────────────────┘

Source Code Structure

Khoj’s codebase under src/khoj/ is cleanly organized by concern:

Directory	Purpose
`routers/`	FastAPI REST & WebSocket endpoints (chat, agents, search, files)
`processor/conversation/`	LLM adapter per provider (OpenAI, Anthropic, Google, Ollama)
`processor/content/`	Document parsers (PDF, Markdown, Notion, Org-mode, Word)
`database/`	Django ORM models — conversations, agents, files, users
`search/`	Semantic search pipeline using sentence-transformers
`routers/api_agents.py`	Full REST API for creating and managing agents

LLM Adapter Pattern

One of Khoj’s most elegant design choices is the LLM adapter pattern. Each provider gets its own module with the same interface:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# src/khoj/processor/conversation/anthropic/anthropic_chat.py
async def converse_anthropic(
    messages: List[ChatMessage],
    model: Optional[str] = "claude-3-7-sonnet-latest",
    api_key: Optional[str] = None,
    deepthought: Optional[bool] = False,
    tracer: dict = {},
) -> AsyncGenerator[ResponseWithThought, None]:
    """Converse with user using Anthropic's Claude"""
    async for chunk in anthropic_chat_completion_with_backoff(
        messages=messages,
        model_name=model,
        temperature=0.2,
        ...
    ):
        yield chunk

The same pattern is replicated for openai_chat.py, google_chat.py, and ollama_chat.py. The router picks the right adapter at runtime based on the user’s configured model — you swap from GPT-4o to Gemini to Llama 3 without changing any application code.

Document Ingestion Pipeline

Khoj reads your knowledge base and indexes it into a vector store for semantic retrieval:

PDF → pypdf parser
Markdown / Org-mode → Plain text extraction
Notion → Official API integration
Word → Office XML parser
Images → Vision LLM description

Everything lands in an embedding vector index (sentence-transformers). When you ask a question, Khoj performs semantic similarity search over your corpus, retrieves the top-k relevant chunks, and passes them as context to the LLM — classic RAG, but deeply integrated.

Part 3: The Diagnosis — What It Does for Developers

Use Case 1: Personal Research Assistant

Load your entire research library — 300 PDFs, 1,000 Markdown notes, every Notion page — and chat with it:

1
2
3
4
# Sync a local folder of docs
khoj --content-file /path/to/research/

# Or via the web UI: Settings → Files → Upload

Ask: “Which of my papers mentions transformer-based architectures for time-series forecasting?” Khoj retrieves the relevant sections, cites them, and synthesizes a coherent answer.

Use Case 2: Custom AI Agents

Khoj’s agent system lets you create specialized AI personas with their own knowledge base, LLM, system prompt, and tools:

1
2
3
4
5
6
Settings → Agents → Create Agent
- Name: "Python Code Reviewer"
- Model: Llama 3.1 70B (local via Ollama)
- Knowledge Base: your company's internal codebase docs
- Tools: Web Search, Code Execution
- Persona: "You are a strict senior engineer. Review code for security and correctness."

Each agent gets its own chat endpoint. You could have a “Research Analyst” agent reading academic PDFs and a “Marketing Copywriter” agent reading brand guidelines — both running on the same Khoj server.

Use Case 3: Autonomous Research (Scheduled Jobs)

Khoj can act as a proactive assistant:

Set up a daily automated research task: “Every morning, search for news about AI safety and send me a summary newsletter”
It browses the web, synthesizes information, and delivers it to your configured channel (email, webhook, etc.)

Use Case 4: Local-First Privacy

For developers who refuse to send data to third-party clouds:

1
2
3
4
5
# Run Llama 3 locally via Ollama
ollama run llama3.1

# Point Khoj to it
# In Khoj UI → Chat Models → Add Model → host: http://localhost:11434

Your documents stay on your disk. Your conversations are processed locally. Zero data leaves your machine.

Supported LLMs at a Glance

Type	Provider	Example Models
Cloud	OpenAI	GPT-4o, o3-mini
Cloud	Anthropic	Claude 3.7 Sonnet
Cloud	Google	Gemini 1.5 Pro, Flash
Cloud	Cohere, Mistral AI	Command R, Mistral Large
Local	Ollama	Llama 3.1, Qwen, Gemma, DeepSeek

Part 4: The Resolution — How to Get Started

Option A: Cloud (Zero Setup)

The fastest path — just go to app.khoj.dev and create a free account. No installation needed.

Option B: Self-Host with Docker (Recommended)

1
2
3
4
5
6
7
mkdir ~/.khoj && cd ~/.khoj

# Download the official compose file
wget https://raw.githubusercontent.com/khoj-ai/khoj/master/docker-compose.yml

# Start everything
docker-compose up -d

Open http://localhost:42110 and you’re in.

Option C: Self-Host with pip (Python Developers)

1
2
3
4
5
# Install with local LLM support (llama-cpp-python)
python -m pip install 'khoj[local]'

# Start the server
khoj

For GPU acceleration:

1
2
3
4
5
# NVIDIA CUDA
CMAKE_ARGS="-DGGML_CUDA=on" FORCE_CMAKE=1 python -m pip install 'khoj[local]'

# Apple M1/M2/M3
CMAKE_ARGS="-DGGML_METAL=on" python -m pip install 'khoj[local]'

Add Your Knowledge Base

After starting Khoj:

Web App: Go to Settings → Files → drag-and-drop your PDFs, Markdown files, or connect Notion
Obsidian Plugin: Install the Khoj plugin → it indexes your vault automatically
CLI sync:

1
khoj --content-file ~/notes/ --content-file ~/research/*.pdf

Connect Your Preferred LLM

In Settings → Chat Models:

Add your OpenAI key for GPT-4o
Add your Anthropic key for Claude
Point to http://localhost:11434 for Ollama local models

Khoj will route all conversations through whichever model you designate as default.

Final Mental Model

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
┌────────────────────────────────────────────────────────────┐
│                          Khoj                              │
│                                                            │
│  "Your open-source AI second brain"                        │
│                                                            │
│  What it IS:                                               │
│  → A self-hostable personal AI app (FastAPI + Python)      │
│  → An LLM-agnostic router (GPT, Claude, Gemini, Ollama)    │
│  → A RAG pipeline over YOUR documents                      │
│  → An agent builder with custom knowledge + tools          │
│                                                            │
│  What it SOLVES:                                           │
│  → Knowledge fragmented across files, apps, and tools      │
│  → Dependency on closed, cloud-only AI services            │
│  → Privacy: your data stays on your machine if you want    │
│                                                            │
│  What it ENABLES:                                          │
│  → Chat with 1,000s of your own documents                  │
│  → Local LLMs (Llama, Qwen, DeepSeek) via Ollama          │
│  → Autonomous agents that research and deliver newsletters  │
│  → Multi-platform: Web, Obsidian, Emacs, Phone, WhatsApp   │
│                                                            │
│  Self-host: pip install khoj | docker-compose up           │
│  Cloud: app.khoj.dev (free tier available)                 │
└────────────────────────────────────────────────────────────┘

GitHub: khoj-ai/khoj
Docs: docs.khoj.dev
Live App: app.khoj.dev