Build software better, together

firecrawl / firecrawl

The API to search, scrape, and interact with the web at scale. 🔥

markdown crawler scraper ai html-to-markdown web-crawler scraping web-scraper web-scraping data-extraction webscraping web-data-extraction ai-agents web-search ai-search web-data llm ai-crawler ai-scraping

Updated Jun 25, 2026
TypeScript

ScrapeGraphAI / Scrapegraph-ai

Sponsor

Star

Python scraper based on AI

Updated Jun 23, 2026
Python

oxylabs / oxylabs-ai-studio-py

Star

Structured data gathering from any website using AI-powered scraper, crawler, and browser automation. Scraping and crawling with natural language prompts. Equip your LLM agents with fresh data. AI Studio python SDK for intelligent web data gathering.

web-scraping ai-search proxy-scraper python-ai ai-tools web-scraping-python ai-crawler web-scraping-api ai-web-scraper ai-scraping ai-scraper web-scraping-ai

Updated Dec 4, 2025
Python

oxylabs / ai-crawler-py

Star

Crawl a website starting from a URL, find relevant pages, and extract data – all guided by your natural language prompt.

ai web-crawler ai-agents ai-crawler ai-studio ai-scraping ai-web-crawler crawl-agent

Updated Apr 2, 2026

vakra-dev / reader

Star

Open source web infrastructure for AI. Scrape, crawl, and automate the web, clean markdown, browser sessions, ready for your agents.

Updated Jun 22, 2026
TypeScript

BrowserCash / teracrawl

Star

High-performance web crawler API optimized for LLMs. Turn any search or website into clean Markdown using remote browsers. Firecrawl alternative

html-to-markdown web-scraper web-scraping data-extraction browser-automation ai-agents web-search google-serp ai-search serpapi browser-agent ai-crawler antibot-bypass ai-scraping crawl4ai firecrawl-mcp firecrawl-api firecrawl-alternative

Updated Dec 3, 2025
TypeScript

oxylabs / oxylabs-ai-studio-js

Star

Structured data gathering from any website using AI-powered scraper, crawler, and browser automation. Scraping and crawling with natural language prompts. Equip your LLM agents with fresh data. AI Studio JS SDK for intelligent web data gathering.

web-scraping ai-search proxy-scraper python-ai aistudio ai-tools web-scraping-python ai-crawler ai-studio ai-web-scraper ai-scraping ai-scraper web-scraping-ai ai-map

Updated Dec 8, 2025
TypeScript

lennyerik / crawl4ai-proxy

Star

A simple proxy server to integrate crawl4ai with OpenWebUI

adapter crawler machine-learning ai proxy web-crawler inference gpt llm ai-crawler llm-ui openwebui llm-crawler crawl4ai

Updated Nov 20, 2025
Go

agentmarkup / agentmarkup

Star

Build-time llms.txt, JSON-LD, markdown mirrors, AI crawler controls, and validation for Vite, Astro, and Next.js websites.

seo schema-org astro robots-txt json-ld structured-data machine-readable vite-plugin ai-crawler llms-txt

Updated Jun 23, 2026
TypeScript

kr3t3n / smolagents-video-script-generator

Star

A sophisticated system that uses multiple AI agents to research, create, and polish video scripts for social media platforms. The system employs specialized agents for research, script writing, polishing, and evaluation to ensure high-quality, engaging content.

ai-agents ai-crawler ai-writer smolagents

Updated Jan 2, 2025
Python

spidra-io / spidra-js

Star

The official Node.js SDK for Spidra.

markdown crawler ai scraping web-scraper web-scraping data-extraction webscraping web-data-extraction scraping-api web-data llm ai-crawler ai-scraping

Updated Apr 19, 2026
TypeScript

maango-io / ai-policy.json

Star

Machine-readable AI permissions for websites. A consolidated spec at /.well-known/ai-policy.json for declaring how AI agents may train on, search, or use your content.

json-schema spec web-standards robots-txt ai-agents well-known ai-policy ai-crawler gptbot content-usage ai-permissions maango

Updated Apr 18, 2026

JerryZhi / AI-Crawler-Detector

Star

Tool for Fast Detection of Website/Server AI Crawler Blocking Policies（Not robots.txt）

seo geo robots ai-crawler ai-seo-tool

Updated Aug 20, 2025
Python

crawlcore / scp-protocol

Star

A collection-based format for serving clean, structured web content to AI training systems and search engines through pre-generated collections.

web-standards specification internet-draft data-format ai-training ai-crawler web-protocol ai-crawling ietf-draft

Updated Jan 4, 2026
Python

motiv8-team / go-ua-parser

Star

High-performance, zero-allocation HTTP User-Agent parser for Go — browser, OS, device, bot & AI crawler detection with Client Hints support

go golang user-agent user-agent-parser prometheus browser-detection http-middleware client-hints bot-detection opentelemetry ai-crawler

Updated May 8, 2026
Go

pixeldownward / ai-social-media-agent

Star

🤖 Generate high-quality social media posts effortlessly with this AI agent that researches, drafts, critiques, and finalizes content for you.

python sales social-media mcp cryptocurrency learn web3 1password multi-agent-systems claude solana supabase ai-crawler ai-writer openai-integration

Updated Jun 25, 2026

kr3t3n / documentation-crawler

Star

A powerful tool that crawls documentation websites and generates a clean, well-formatted markdown document. Built with FastAPI and support for multiple LLM providers (DeepSeek and Groq).

documentation crawler ai crawl crawler-python ai-crawler ai-documentation

Updated Jan 1, 2025
Python

breadrock1 / news-rss

Star

There is simple project to scrape and collect news using rss and llm API based on rust.

rss crawler scraper rss-feed html2text llm ai-crawler

Updated May 15, 2025
Rust

TryGeoSuite / ai-crawler-bots

Star

See which AI crawlers can read your site — GPTBot, ClaudeBot, PerplexityBot & 20 more. Curated, operator-sourced bot list + a zero-dep CLI and GitHub Action to audit robots.txt, test reachability, read access logs, and gate it all in CI.

nodejs cli automation ai user-agent seo web-crawler robots-txt llm ai-crawler gptbot generative-engine-optimization perplexitybot claudebot

Updated Jun 24, 2026
JavaScript

Everlast-Consulting-GmbH / launchgrade-web

Star

Stack-agnostic web template + 2026 standards (BFSG/GDPR/CWV/CSP/AI-crawler) as three Claude Code skills: setup → design → audit.

template csp accessibility seo web-standards wcag gdpr lighthouse core-web-vitals ai-crawler bfsg claude-code

Updated Jun 22, 2026
HTML

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-crawler

Here are 42 public repositories matching this topic...

firecrawl / firecrawl

ScrapeGraphAI / Scrapegraph-ai

oxylabs / oxylabs-ai-studio-py

oxylabs / ai-crawler-py

vakra-dev / reader

BrowserCash / teracrawl

oxylabs / oxylabs-ai-studio-js

lennyerik / crawl4ai-proxy

agentmarkup / agentmarkup

kr3t3n / smolagents-video-script-generator

spidra-io / spidra-js

maango-io / ai-policy.json

JerryZhi / AI-Crawler-Detector

crawlcore / scp-protocol

motiv8-team / go-ua-parser

pixeldownward / ai-social-media-agent

kr3t3n / documentation-crawler

breadrock1 / news-rss

TryGeoSuite / ai-crawler-bots

Everlast-Consulting-GmbH / launchgrade-web

Improve this page

Add this topic to your repo