Featured image of post Khoj: The Open-Source AI Second Brain You Can Self-Host

Khoj: The Open-Source AI Second Brain You Can Self-Host

Khoj is an open-source personal AI app that acts as your AI second brain — chat with any LLM, search your documents with semantic AI, build custom agents, and self-host it completely on your own machine.

Part 1: Foundations — The Mental Model

You probably have notes scattered across Obsidian, Notion, PDF research papers, and markdown files. You switch between ChatGPT, Claude, and Gemini tabs, pasting context in by hand. You want an AI that already knows everything you know — but all the big players lock your data into their cloud.

That is exactly the gap Khoj is built to fill.

Mental Model: Think of Khoj as a personal AI brain running on Rails — not a chatbot, but an always-on knowledge assistant that has read every document you’ve ever written, can search the internet, can create autonomous agents, and can do all of this either on your own machine or on Khoj’s cloud, at your choice.

Where most AI tools are stateless (each conversation starts empty), Khoj is stateful and knowledge-indexed. It is your AI that remembers.


Part 2: The Investigation — Architecture Deep Dive

The Big Picture

Khoj is a full-stack Python application built on FastAPI at the core. Here is the high-level flow:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
┌──────────────────────────────────────────────────────────────┐
│                      Khoj Clients                            │
│  Web App │ Obsidian Plugin │ Emacs Package │ Phone │ WhatsApp │
└─────────────────────────┬────────────────────────────────────┘
                          │ REST / WebSocket API
┌─────────────────────────▼────────────────────────────────────┐
│                     Khoj Server (FastAPI)                    │
│  ┌─────────────┐  ┌──────────────┐  ┌──────────────────────┐ │
│  │  Indexing   │  │  Conversation│  │     Agent Engine     │ │
│  │  Pipeline   │  │  Router      │  │  (Tool + Planner)    │ │
│  └──────┬──────┘  └──────┬───────┘  └──────────────────────┘ │
│         │                │                                    │
└─────────┼────────────────┼────────────────────────────────────┘
          │                │
┌─────────▼──────┐  ┌──────▼──────────────────────────────────┐
│  Vector Store  │  │            LLM Adapters                  │
│  (embeddings)  │  │  OpenAI │ Anthropic │ Google │ Ollama    │
└────────────────┘  └─────────────────────────────────────────┘

Source Code Structure

Khoj’s codebase under src/khoj/ is cleanly organized by concern:

DirectoryPurpose
routers/FastAPI REST & WebSocket endpoints (chat, agents, search, files)
processor/conversation/LLM adapter per provider (OpenAI, Anthropic, Google, Ollama)
processor/content/Document parsers (PDF, Markdown, Notion, Org-mode, Word)
database/Django ORM models — conversations, agents, files, users
search/Semantic search pipeline using sentence-transformers
routers/api_agents.pyFull REST API for creating and managing agents

LLM Adapter Pattern

One of Khoj’s most elegant design choices is the LLM adapter pattern. Each provider gets its own module with the same interface:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# src/khoj/processor/conversation/anthropic/anthropic_chat.py
async def converse_anthropic(
    messages: List[ChatMessage],
    model: Optional[str] = "claude-3-7-sonnet-latest",
    api_key: Optional[str] = None,
    deepthought: Optional[bool] = False,
    tracer: dict = {},
) -> AsyncGenerator[ResponseWithThought, None]:
    """Converse with user using Anthropic's Claude"""
    async for chunk in anthropic_chat_completion_with_backoff(
        messages=messages,
        model_name=model,
        temperature=0.2,
        ...
    ):
        yield chunk

The same pattern is replicated for openai_chat.py, google_chat.py, and ollama_chat.py. The router picks the right adapter at runtime based on the user’s configured model — you swap from GPT-4o to Gemini to Llama 3 without changing any application code.

Document Ingestion Pipeline

Khoj reads your knowledge base and indexes it into a vector store for semantic retrieval:

  • PDFpypdf parser
  • Markdown / Org-mode → Plain text extraction
  • Notion → Official API integration
  • Word → Office XML parser
  • Images → Vision LLM description

Everything lands in an embedding vector index (sentence-transformers). When you ask a question, Khoj performs semantic similarity search over your corpus, retrieves the top-k relevant chunks, and passes them as context to the LLM — classic RAG, but deeply integrated.


Part 3: The Diagnosis — What It Does for Developers

Use Case 1: Personal Research Assistant

Load your entire research library — 300 PDFs, 1,000 Markdown notes, every Notion page — and chat with it:

1
2
3
4
# Sync a local folder of docs
khoj --content-file /path/to/research/

# Or via the web UI: Settings → Files → Upload

Ask: “Which of my papers mentions transformer-based architectures for time-series forecasting?” Khoj retrieves the relevant sections, cites them, and synthesizes a coherent answer.

Use Case 2: Custom AI Agents

Khoj’s agent system lets you create specialized AI personas with their own knowledge base, LLM, system prompt, and tools:

1
2
3
4
5
6
Settings → Agents → Create Agent
- Name: "Python Code Reviewer"
- Model: Llama 3.1 70B (local via Ollama)
- Knowledge Base: your company's internal codebase docs
- Tools: Web Search, Code Execution
- Persona: "You are a strict senior engineer. Review code for security and correctness."

Each agent gets its own chat endpoint. You could have a “Research Analyst” agent reading academic PDFs and a “Marketing Copywriter” agent reading brand guidelines — both running on the same Khoj server.

Use Case 3: Autonomous Research (Scheduled Jobs)

Khoj can act as a proactive assistant:

  • Set up a daily automated research task: “Every morning, search for news about AI safety and send me a summary newsletter”
  • It browses the web, synthesizes information, and delivers it to your configured channel (email, webhook, etc.)

Use Case 4: Local-First Privacy

For developers who refuse to send data to third-party clouds:

1
2
3
4
5
# Run Llama 3 locally via Ollama
ollama run llama3.1

# Point Khoj to it
# In Khoj UI → Chat Models → Add Model → host: http://localhost:11434

Your documents stay on your disk. Your conversations are processed locally. Zero data leaves your machine.

Supported LLMs at a Glance

TypeProviderExample Models
CloudOpenAIGPT-4o, o3-mini
CloudAnthropicClaude 3.7 Sonnet
CloudGoogleGemini 1.5 Pro, Flash
CloudCohere, Mistral AICommand R, Mistral Large
LocalOllamaLlama 3.1, Qwen, Gemma, DeepSeek

Part 4: The Resolution — How to Get Started

Option A: Cloud (Zero Setup)

The fastest path — just go to app.khoj.dev and create a free account. No installation needed.

1
2
3
4
5
6
7
mkdir ~/.khoj && cd ~/.khoj

# Download the official compose file
wget https://raw.githubusercontent.com/khoj-ai/khoj/master/docker-compose.yml

# Start everything
docker-compose up -d

Open http://localhost:42110 and you’re in.

Option C: Self-Host with pip (Python Developers)

1
2
3
4
5
# Install with local LLM support (llama-cpp-python)
python -m pip install 'khoj[local]'

# Start the server
khoj

For GPU acceleration:

1
2
3
4
5
# NVIDIA CUDA
CMAKE_ARGS="-DGGML_CUDA=on" FORCE_CMAKE=1 python -m pip install 'khoj[local]'

# Apple M1/M2/M3
CMAKE_ARGS="-DGGML_METAL=on" python -m pip install 'khoj[local]'

Add Your Knowledge Base

After starting Khoj:

  1. Web App: Go to Settings → Files → drag-and-drop your PDFs, Markdown files, or connect Notion
  2. Obsidian Plugin: Install the Khoj plugin → it indexes your vault automatically
  3. CLI sync:
1
khoj --content-file ~/notes/ --content-file ~/research/*.pdf

Connect Your Preferred LLM

In Settings → Chat Models:

  • Add your OpenAI key for GPT-4o
  • Add your Anthropic key for Claude
  • Point to http://localhost:11434 for Ollama local models

Khoj will route all conversations through whichever model you designate as default.


Final Mental Model

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
┌────────────────────────────────────────────────────────────┐
                          Khoj                              
                                                            
  "Your open-source AI second brain"                        
                                                            
  What it IS:                                               
   A self-hostable personal AI app (FastAPI + Python)      
   An LLM-agnostic router (GPT, Claude, Gemini, Ollama)    
   A RAG pipeline over YOUR documents                      
   An agent builder with custom knowledge + tools          
                                                            
  What it SOLVES:                                           
   Knowledge fragmented across files, apps, and tools      
   Dependency on closed, cloud-only AI services            
   Privacy: your data stays on your machine if you want    
                                                            
  What it ENABLES:                                          
   Chat with 1,000s of your own documents                  
   Local LLMs (Llama, Qwen, DeepSeek) via Ollama          
   Autonomous agents that research and deliver newsletters  
   Multi-platform: Web, Obsidian, Emacs, Phone, WhatsApp   
                                                            
  Self-host: pip install khoj | docker-compose up           
  Cloud: app.khoj.dev (free tier available)                 
└────────────────────────────────────────────────────────────┘

GitHub: khoj-ai/khoj
Docs: docs.khoj.dev
Live App: app.khoj.dev

Made with laziness love 🦥

Subscribe to My Newsletter