firecrawl
🔥 The Web Data API for AI - Power AI agents with clean web data
ScrapeGraphAI
Python scraper based on AI
apify
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
spider-rs
Web crawler and scraper for Rust
microlinkhq
The headless Chrome/Chromium driver on top of Puppeteer. Take screenshots, generate PDFs, extract text and HTML with a production-ready API.
JustinBeckwith
Broken link checker that crawls websites and validates links. Find broken links, dead links, and invalid URLs in websites, documentation, and local files. Perfect for SEO audits and CI/CD.
0xMassi
Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.
VIDA-NYU
ACHE is a web crawler for domain-specific search.
crwlrsoft
Library for Rapid (Web) Crawler and Scraper Development
s0rg
The unix-way web crawler