PaddlePaddle
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
tesseract-ocr
Tesseract Open Source OCR Engine (main repository)
paperless-ngx
A community-supported supercharged document management system: scan, index and archive all your documents
naptha
Pure Javascript OCR for more than 100 Languages 📖🎉🖥
yusufkaraaslan
Convert documentation websites, GitHub repositories, and PDFs into Claude AI skills with automatic conflict detection
dataelement
BISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and comprehensive features include: GenAI workflow, RAG, Agent, Unified model management, Evaluation, SFT, Dataset Management, Enterprise-level System Management, Observability and more.
CVHub520
Effortless data labeling with AI support from Segment Anything and other awesome models.
ballerine-io
Open-source infrastructure and data orchestration platform for risk decisioning
eclaire-labs
Local-first, open-source AI assistant for your data. Unify tasks, notes, docs, photos, and bookmarks. Private, self-hosted, and extensible via APIs.