PaddlePaddle
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
tesseract-ocr
Tesseract Open Source OCR Engine (main repository)
siyuan-note
A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
paperless-ngx
A community-supported supercharged document management system: scan, index and archive all your documents
yusufkaraaslan
Convert documentation websites, GitHub repositories, and PDFs into Claude AI skills with automatic conflict detection
CVHub520
Effortless data labeling with AI support from Segment Anything and other awesome models.
eclaire-labs
Local-first, open-source AI assistant for your data. Unify tasks, notes, docs, photos, and bookmarks. Private, self-hosted, and extensible via APIs.