web-crawler

12 repos

Sort by:Most Stars Trending Newest

firecrawl

🔥 The Web Data API for AI - Power AI agents with clean web data

Featured

aiai-agentsai-crawler

TypeScript142.3K3.2K8.2K3h ago

Scrapegraph-ai

ScrapeGraphAI

Python scraper based on AI

Featured

ai-crawlerai-scrapingai-search

Python27.9K

crawlee

apify

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

Featured

apifyautomationcrawler

spider

spider-rs

Web crawler and scraper for Rust

ai-agentautomationcrawler

Rust2.6K122176d ago

browserless

microlinkhq

The headless Chrome/Chromium driver on top of Puppeteer. Take screenshots, generate PDFs, extract text and HTML with a production-ready API.

automationbrowser-automationchromium

JavaScript1.8K3911h ago

webclaw

0xMassi

Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.

aiai-agentsai-scraping

Rust1.6K781771d ago

linkinator

JustinBeckwith

Broken link checker that crawls websites and validates links. Find broken links, dead links, and invalid URLs in websites, documentation, and local files. Perfect for SEO audits and CI/CD.

404broken-link-checkerbroken-links

TypeScript1.2K10110d ago

LibreCrawl

PhialsBasement

Free desktop SEO crawler - open source alternative to Screaming Frog and similar tools. Crawl websites, analyze links, extract SEO data, and export results without subscription fees. Fully customizable and extensible!

desktop-appflaskfree

Python73061521mo ago