OpenKB — Open LLM Knowledge Base
Scale to long documents • Reasoning-based retrieval • Native multi-modality • No Vector DB
📑 What is OpenKB
OpenKB (Open Knowledge Base) is an open-source system (in CLI) that compiles raw documents into a structured, interlinked wiki-style knowledge base using LLMs, powered by PageIndex for vectorless long document retrieval.
The idea is based on a concept described by Andrej Karpathy: LLMs generate summaries, concept pages, and cross-references, all maintained automatically. Knowledge compounds over time instead of being re-derived on every query.
Why not traditional RAG?
Traditional RAG rediscovers knowledge from scratch on every query. Nothing accumulates. OpenKB compiles knowledge once into a persistent wiki, then keeps it current. Cross-references already exist. Contradictions are flagged. Synthesis reflects everything consumed.
Features
- Broad format support — PDF, Word, Markdown, PowerPoint, HTML, Excel, text, and more via markitdown
- Scale to long documents — Long and complex documents are handled via PageIndex tree indexing, enabling accurate, vectorless long-context retrieval
- Native multi-modality — Retrieves and understands figures, tables, and images, not just text
- Compiled Wiki — LLM manages and compiles your documents into summaries, concept pages, and cross-links, all kept in sync