Part 1: Foundations — The Mental Model
You probably have notes scattered across Obsidian, Notion, PDF research papers, and markdown files. You switch between ChatGPT, Claude, and Gemini tabs, pasting context in by hand. You want an AI that already knows everything you know — but all the big players lock your data into their cloud.
That is exactly the gap Khoj is built to fill.
Mental Model: Think of Khoj as a personal AI brain running on Rails — not a chatbot, but an always-on knowledge assistant that has read every document you’ve ever written, can search the internet, can create autonomous agents, and can do all of this either on your own machine or on Khoj’s cloud, at your choice.
Where most AI tools are stateless (each conversation starts empty), Khoj is stateful and knowledge-indexed. It is your AI that remembers.
Part 2: The Investigation — Architecture Deep Dive
The Big Picture
Khoj is a full-stack Python application built on FastAPI at the core. Here is the high-level flow:
| |
Source Code Structure
Khoj’s codebase under src/khoj/ is cleanly organized by concern:
| Directory | Purpose |
|---|---|
routers/ | FastAPI REST & WebSocket endpoints (chat, agents, search, files) |
processor/conversation/ | LLM adapter per provider (OpenAI, Anthropic, Google, Ollama) |
processor/content/ | Document parsers (PDF, Markdown, Notion, Org-mode, Word) |
database/ | Django ORM models — conversations, agents, files, users |
search/ | Semantic search pipeline using sentence-transformers |
routers/api_agents.py | Full REST API for creating and managing agents |
LLM Adapter Pattern
One of Khoj’s most elegant design choices is the LLM adapter pattern. Each provider gets its own module with the same interface:
| |
The same pattern is replicated for openai_chat.py, google_chat.py, and ollama_chat.py. The router picks the right adapter at runtime based on the user’s configured model — you swap from GPT-4o to Gemini to Llama 3 without changing any application code.
Document Ingestion Pipeline
Khoj reads your knowledge base and indexes it into a vector store for semantic retrieval:
- PDF →
pypdfparser - Markdown / Org-mode → Plain text extraction
- Notion → Official API integration
- Word → Office XML parser
- Images → Vision LLM description
Everything lands in an embedding vector index (sentence-transformers). When you ask a question, Khoj performs semantic similarity search over your corpus, retrieves the top-k relevant chunks, and passes them as context to the LLM — classic RAG, but deeply integrated.
Part 3: The Diagnosis — What It Does for Developers
Use Case 1: Personal Research Assistant
Load your entire research library — 300 PDFs, 1,000 Markdown notes, every Notion page — and chat with it:
| |
Ask: “Which of my papers mentions transformer-based architectures for time-series forecasting?” Khoj retrieves the relevant sections, cites them, and synthesizes a coherent answer.
Use Case 2: Custom AI Agents
Khoj’s agent system lets you create specialized AI personas with their own knowledge base, LLM, system prompt, and tools:
| |
Each agent gets its own chat endpoint. You could have a “Research Analyst” agent reading academic PDFs and a “Marketing Copywriter” agent reading brand guidelines — both running on the same Khoj server.
Use Case 3: Autonomous Research (Scheduled Jobs)
Khoj can act as a proactive assistant:
- Set up a daily automated research task: “Every morning, search for news about AI safety and send me a summary newsletter”
- It browses the web, synthesizes information, and delivers it to your configured channel (email, webhook, etc.)
Use Case 4: Local-First Privacy
For developers who refuse to send data to third-party clouds:
| |
Your documents stay on your disk. Your conversations are processed locally. Zero data leaves your machine.
Supported LLMs at a Glance
| Type | Provider | Example Models |
|---|---|---|
| Cloud | OpenAI | GPT-4o, o3-mini |
| Cloud | Anthropic | Claude 3.7 Sonnet |
| Cloud | Gemini 1.5 Pro, Flash | |
| Cloud | Cohere, Mistral AI | Command R, Mistral Large |
| Local | Ollama | Llama 3.1, Qwen, Gemma, DeepSeek |
Part 4: The Resolution — How to Get Started
Option A: Cloud (Zero Setup)
The fastest path — just go to app.khoj.dev and create a free account. No installation needed.
Option B: Self-Host with Docker (Recommended)
| |
Open http://localhost:42110 and you’re in.
Option C: Self-Host with pip (Python Developers)
| |
For GPU acceleration:
| |
Add Your Knowledge Base
After starting Khoj:
- Web App: Go to Settings → Files → drag-and-drop your PDFs, Markdown files, or connect Notion
- Obsidian Plugin: Install the Khoj plugin → it indexes your vault automatically
- CLI sync:
| |
Connect Your Preferred LLM
In Settings → Chat Models:
- Add your OpenAI key for GPT-4o
- Add your Anthropic key for Claude
- Point to
http://localhost:11434for Ollama local models
Khoj will route all conversations through whichever model you designate as default.
Final Mental Model
| |
GitHub: khoj-ai/khoj
Docs: docs.khoj.dev
Live App: app.khoj.dev
