GitHub - webpeel/webpeel: Web fetcher for AI agents. Smart escalation from HTTP to headless browser. MCP server included.

Web intelligence for AI agents.
Fetch any URL → clean markdown. YouTube transcripts. Reddit threads. Quick answers. No API keys needed.

Website · Docs · Playground · Dashboard · Discussions

WebPeel gives AI agents reliable web access in one call. It handles JavaScript rendering, bot detection, and content extraction automatically — your agent gets clean, structured data. 18 MCP tools, 1,098 tests, 100% open source.

🚀 Quick Start

npx webpeel "https://example.com"

More examples:

# YouTube transcript — no API key!
npx webpeel "https://youtube.com/watch?v=dQw4w9WgXcQ"

# Ask any page a question — no LLM key!
npx webpeel "https://openai.com/pricing" -q "how much does GPT-4 cost?"

# Reddit thread — structured JSON
npx webpeel "https://reddit.com/r/programming/comments/..." --json

# Reader mode — strips all noise
npx webpeel "https://nytimes.com/article" --readable

No install needed. First 25 fetches work without signup. Get 500/week free →

MCP Server (for Claude, Cursor, VS Code, Windsurf)

{
  "mcpServers": {
    "webpeel": {
      "command": "npx",
      "args": ["-y", "webpeel", "mcp"]
    }
  }
}

REST API

curl "https://api.webpeel.dev/v1/fetch?url=https://example.com" \
  -H "Authorization: Bearer wp_YOUR_KEY"

✨ What can it do?

	Feature	What you get
🌐	Fetch	Any URL → clean markdown, text, or JSON. Auto-handles JS rendering, bot detection, CAPTCHAs
🎬	YouTube	Full video transcripts with timestamps. No API key
🐦	Twitter/Reddit/GitHub/HN	Structured data from social platforms via native APIs
❓	Quick Answer	Ask a question about any page. BM25 scoring, no LLM key
📖	Reader Mode	Browser Reader Mode for AI — strips nav, ads, cookies, 25+ noise patterns
🔍	Search	Web search across 27+ sites. Deep research with multi-hop analysis
📊	Extract	Pricing pages, products, contacts → structured JSON. CSS/JSON Schema/LLM extraction
🕵️	Stealth	Bypasses Cloudflare, PerimeterX, DataDome, Akamai. 28 auto-stealth domains
🏨	Hotels	Kayak + Booking + Google Travel + Expedia in parallel
🔄	Monitor	Watch URLs for changes, get webhook notifications
🕷️	Crawl	BFS/DFS site crawling, sitemap discovery, robots.txt compliance
📸	Screenshot	Full-page or viewport screenshots
🐍	Python SDK	`pip install webpeel` — sync + async client

🏆 How does it compare?

Feature	WebPeel	Firecrawl	Crawl4AI	Jina Reader
YouTube transcripts	✅	❌	❌	❌
LLM-free Q&A	✅	❌	❌	❌
Reader mode	✅	❌	❌	❌
Domain extractors (Twitter, Reddit, GH, HN)	✅	❌	❌	❌
Auto-extract (pricing, products)	✅	❌	❌	❌
URL monitoring	✅	❌	❌	❌
Stealth / anti-bot	✅	⚡ Hosted only	✅	❌
MCP server	✅ 18 tools	✅ 4 tools	❌	❌
Deep research	✅	❌	❌	❌
Hotel search	✅	❌	❌	❌
Self-hostable	✅	✅	✅	❌
Free tier	500/week	500 credits	Unlimited	Unlimited
Open source	AGPL-3.0	AGPL-3.0	Apache-2.0	N/A

⚡ Benchmark

Evaluated on 30 real-world URLs across 6 categories (static, dynamic, SPA, protected, documents, international):

	WebPeel	Next best
Success rate	100% (30/30)	93.3%
Content quality	92.3%	83.2%

WebPeel is the only tool that extracted content from all 30 test URLs. Full methodology →

🤖 MCP Integration

WebPeel exposes 18 tools to your AI coding assistant:

Tool	What it does
`webpeel_fetch`	Fetch any URL → markdown. Smart escalation built in. Supports `readable: true` for reader mode
`webpeel_search`	Web search with structured results across 27+ sources
`webpeel_batch`	Fetch multiple URLs concurrently
`webpeel_crawl`	Crawl a site with depth/page limits
`webpeel_map`	Discover all URLs on a domain
`webpeel_extract`	Structured extraction (CSS, JSON Schema, or LLM)
`webpeel_screenshot`	Screenshot any page (full-page or viewport)
`webpeel_research`	Deep multi-hop research on a topic
`webpeel_summarize`	AI summary of any URL
`webpeel_answer`	Ask a question about a URL's content
`webpeel_change_track`	Detect changes between two fetches
`webpeel_brand`	Extract branding assets from a site
`webpeel_deep_fetch`	Search + batch fetch + merge — comprehensive research, no LLM key
`webpeel_youtube`	Extract YouTube video transcripts — all URL formats, no API key
`webpeel_auto_extract`	Heuristic structured data extraction — auto-detects pricing, products, contacts
`webpeel_quick_answer`	BM25-powered Q&A — ask any question about any page, no LLM key
`webpeel_watch`	Persistent URL change monitoring with webhook notifications
`webpeel_hotels`	Hotel search across Kayak, Booking.com, Google Travel, Expedia in parallel

Setup for Claude Desktop, Cursor, VS Code, Windsurf, Docker

Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "webpeel": { "command": "npx", "args": ["-y", "webpeel", "mcp"] }
  }
}

Cursor (Settings → MCP Servers):

{
  "mcpServers": {
    "webpeel": { "command": "npx", "args": ["-y", "webpeel", "mcp"] }
  }
}

VS Code (~/.vscode/mcp.json):

{
  "servers": {
    "webpeel": { "command": "npx", "args": ["-y", "webpeel", "mcp"] }
  }
}

Windsurf (~/.codeium/windsurf/mcp_config.json):

{
  "mcpServers": {
    "webpeel": { "command": "npx", "args": ["-y", "webpeel", "mcp"] }
  }
}

Docker (stdio):

{
  "mcpServers": {
    "webpeel": { "command": "docker", "args": ["run", "-i", "--rm", "webpeel/mcp"] }
  }
}

Hosted endpoint (no local server needed):

{
  "mcpServers": {
    "webpeel": {
      "url": "https://api.webpeel.dev/mcp",
      "headers": { "Authorization": "Bearer YOUR_API_KEY" }
    }
  }
}

🔬 Deep Research

Multi-hop research that thinks like a researcher, not a search engine:

# Sources only — no API key needed
npx webpeel research "best practices for rate limiting APIs" --max-sources 8

# Full synthesis with LLM (BYOK)
npx webpeel research "compare Firecrawl vs Crawl4AI vs WebPeel" --llm-key sk-...

Search → fetch top results → extract key passages (BM25) → follow the most relevant links → synthesize. No circular references, no duplicate content.

📦 Extraction

CSS Schema, JSON Schema, and LLM extraction — click to expand

CSS Schema (zero config, auto-detected)

# Auto-detects Amazon and applies the built-in schema
npx webpeel "https://www.amazon.com/s?k=mechanical+keyboard" --json

# Force a specific schema
npx webpeel "https://www.booking.com/searchresults.html?city=Paris" --schema booking --json

# List all built-in schemas
npx webpeel --list-schemas

Built-in schemas: amazon · booking · ebay · expedia · hackernews · walmart · yelp

JSON Schema (type-safe structured extraction)

npx webpeel "https://example.com/product" \
  --extract-schema '{"type":"object","properties":{"title":{"type":"string"},"price":{"type":"number"}}}' \
  --llm-key sk-...

LLM Extraction (natural language, BYOK)

npx webpeel "https://hn.algolia.com" \
  --llm-extract "top 10 posts with title, score, and comment count" \
  --llm-key $OPENAI_API_KEY \
  --json

Note: WebPeel is an ESM-only package. Use import syntax:
import { peel } from 'webpeel';
CommonJS require() is not supported. If your project uses CommonJS, use dynamic import: const { peel } = await import('webpeel');

import { peel } from 'webpeel';

// CSS selector extraction
const result = await peel('https://news.ycombinator.com', {
  extract: { selectors: { titles: '.titleline > a', scores: '.score' } }
});

// LLM extraction with JSON Schema
const product = await peel('https://example.com/product', {
  llmExtract: 'title, price, rating, availability',
  llmKey: process.env.OPENAI_API_KEY,
});

🛡️ Stealth & Anti-Bot

Supported bot-protection vendors and auto-stealth domains — click to expand

WebPeel detects 7 bot-protection vendors automatically:

Cloudflare (JS challenge, Turnstile, Bot Management)
PerimeterX / HUMAN (behavioral analysis)
DataDome (ML-based bot detection)
Akamai Bot Manager
Distil Networks
reCAPTCHA / hCaptcha
Generic challenge pages

28 high-protection domains (Amazon, LinkedIn, Glassdoor, Zillow, Ticketmaster, and more) automatically route through stealth mode — no flags needed.

# Explicitly enable stealth
npx webpeel "https://glassdoor.com/jobs" --stealth

# Auto-escalation (stealth triggers automatically on challenge detection)
npx webpeel "https://amazon.com/dp/ASIN"

🏨 Hotel Search

Multi-source hotel search — click to expand

Search Kayak, Booking.com, Google Travel, and Expedia in parallel — returns unified results in one call.

npx webpeel hotels "Paris" --check-in 2025-06-01 --check-out 2025-06-07 --guests 2 --json

Available as webpeel_hotels MCP tool and via the REST API.

💳 Pricing

Plan	Price	Weekly Fetches	Burst
Free	$0/mo	500/wk	50/hr
Pro	$9/mo	1,250/wk	100/hr
Max	$29/mo	6,250/wk	500/hr

All features on all plans. Pro/Max add pay-as-you-go extra usage. Quota resets every Monday.

Sign up free → · Compare with Firecrawl →

🐍 Python SDK

Python SDK usage — click to expand

pip install webpeel

from webpeel import WebPeel

client = WebPeel(api_key="wp_...")  # or WEBPEEL_API_KEY env var

result = client.scrape("https://example.com")
print(result.content)    # Clean markdown
print(result.metadata)   # title, description, author, ...

results = client.search("latest AI research papers")
job = client.crawl("https://docs.example.com", limit=100)
result = client.scrape("https://protected-site.com", render=True, stealth=True)

Sync and async clients. Pure Python 3.8+, zero dependencies. Full SDK docs →

🐳 Self-Hosting

git clone https://github.com/webpeel/webpeel.git
cd webpeel && docker compose up

Full REST API at http://localhost:3000. AGPL-3.0 licensed. Self-hosting guide →

docker run -i webpeel/mcp          # MCP server only
docker run -p 3000:3000 webpeel/api  # API server only

🤝 Contributing

git clone https://github.com/webpeel/webpeel.git
cd webpeel
npm install && npm run build
npm test

Bug reports: Open an issue
Feature requests: Start a discussion
Code: See CONTRIBUTING.md for guidelines

The project has 1,098 tests. Please add tests for new features.

Star History

License

AGPL-3.0 — free to use, modify, and distribute. If you run a modified version as a network service, you must release your source under AGPL-3.0.

Need a commercial license? support@webpeel.dev

Versions 0.7.1 and earlier were released under MIT and remain MIT-licensed.

If WebPeel saves you time, ⭐ star the repo — it helps others find it.

Name		Name	Last commit message	Last commit date
Latest commit History 258 Commits
.github		.github
.well-known		.well-known
benchmarks		benchmarks
dashboard		dashboard
docs		docs
integrations		integrations
migrations		migrations
packages/sdk		packages/sdk
python-sdk		python-sdk
scripts		scripts
sdks/python		sdks/python
site		site
skill		skill
skills/webpeel		skills/webpeel
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.mcp.json		.mcp.json
.npmignore		.npmignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile.api		Dockerfile.api
Dockerfile.mcp		Dockerfile.mcp
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SELF_HOST.md		SELF_HOST.md
VERIFY.sh		VERIFY.sh
docker-compose.yaml		docker-compose.yaml
llms.txt		llms.txt
openapi.yaml		openapi.yaml
package-lock.json		package-lock.json
package.json		package.json
render.yaml		render.yaml
server.json		server.json
smithery.yaml		smithery.yaml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Quick Start

MCP Server (for Claude, Cursor, VS Code, Windsurf)

REST API

✨ What can it do?

🏆 How does it compare?

⚡ Benchmark

🤖 MCP Integration

🔬 Deep Research

📦 Extraction

CSS Schema (zero config, auto-detected)

JSON Schema (type-safe structured extraction)

LLM Extraction (natural language, BYOK)

🛡️ Stealth & Anti-Bot

🏨 Hotel Search

💳 Pricing

🐍 Python SDK

🐳 Self-Hosting

🤝 Contributing

Star History

License

About

Uh oh!

Releases 10

Sponsor this project

Uh oh!

Packages

Languages

Uh oh!

License

webpeel/webpeel

Folders and files

Latest commit

History

Repository files navigation

🚀 Quick Start

MCP Server (for Claude, Cursor, VS Code, Windsurf)

REST API

✨ What can it do?

🏆 How does it compare?

⚡ Benchmark

🤖 MCP Integration

🔬 Deep Research

📦 Extraction

CSS Schema (zero config, auto-detected)

JSON Schema (type-safe structured extraction)

LLM Extraction (natural language, BYOK)

🛡️ Stealth & Anti-Bot

🏨 Hotel Search

💳 Pricing

🐍 Python SDK

🐳 Self-Hosting

🤝 Contributing

Star History

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 10

Sponsor this project

Uh oh!

Packages 0

Languages

Packages