Skip to content

Web fetcher for AI agents. Smart escalation from HTTP to headless browser. MCP server included.

License

Notifications You must be signed in to change notification settings

webpeel/webpeel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

258 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

WebPeel — Web fetching for AI agents

npm version PyPI version downloads GitHub stars CI AGPL v3

Web intelligence for AI agents.
Fetch any URL → clean markdown. YouTube transcripts. Reddit threads. Quick answers. No API keys needed.

Website · Docs · Playground · Dashboard · Discussions


WebPeel gives AI agents reliable web access in one call. It handles JavaScript rendering, bot detection, and content extraction automatically — your agent gets clean, structured data. 18 MCP tools, 1,098 tests, 100% open source.


🚀 Quick Start

npx webpeel "https://example.com"

More examples:

# YouTube transcript — no API key!
npx webpeel "https://youtube.com/watch?v=dQw4w9WgXcQ"

# Ask any page a question — no LLM key!
npx webpeel "https://openai.com/pricing" -q "how much does GPT-4 cost?"

# Reddit thread — structured JSON
npx webpeel "https://reddit.com/r/programming/comments/..." --json

# Reader mode — strips all noise
npx webpeel "https://nytimes.com/article" --readable

No install needed. First 25 fetches work without signup. Get 500/week free →

MCP Server (for Claude, Cursor, VS Code, Windsurf)

{
  "mcpServers": {
    "webpeel": {
      "command": "npx",
      "args": ["-y", "webpeel", "mcp"]
    }
  }
}

Install in Claude Desktop Install in VS Code

REST API

curl "https://api.webpeel.dev/v1/fetch?url=https://example.com" \
  -H "Authorization: Bearer wp_YOUR_KEY"

✨ What can it do?

Feature What you get
🌐 Fetch Any URL → clean markdown, text, or JSON. Auto-handles JS rendering, bot detection, CAPTCHAs
🎬 YouTube Full video transcripts with timestamps. No API key
🐦 Twitter/Reddit/GitHub/HN Structured data from social platforms via native APIs
Quick Answer Ask a question about any page. BM25 scoring, no LLM key
📖 Reader Mode Browser Reader Mode for AI — strips nav, ads, cookies, 25+ noise patterns
🔍 Search Web search across 27+ sites. Deep research with multi-hop analysis
📊 Extract Pricing pages, products, contacts → structured JSON. CSS/JSON Schema/LLM extraction
🕵️ Stealth Bypasses Cloudflare, PerimeterX, DataDome, Akamai. 28 auto-stealth domains
🏨 Hotels Kayak + Booking + Google Travel + Expedia in parallel
🔄 Monitor Watch URLs for changes, get webhook notifications
🕷️ Crawl BFS/DFS site crawling, sitemap discovery, robots.txt compliance
📸 Screenshot Full-page or viewport screenshots
🐍 Python SDK pip install webpeel — sync + async client

🏆 How does it compare?

Feature WebPeel Firecrawl Crawl4AI Jina Reader
YouTube transcripts
LLM-free Q&A
Reader mode
Domain extractors (Twitter, Reddit, GH, HN)
Auto-extract (pricing, products)
URL monitoring
Stealth / anti-bot ⚡ Hosted only
MCP server ✅ 18 tools ✅ 4 tools
Deep research
Hotel search
Self-hostable
Free tier 500/week 500 credits Unlimited Unlimited
Open source AGPL-3.0 AGPL-3.0 Apache-2.0 N/A

⚡ Benchmark

Evaluated on 30 real-world URLs across 6 categories (static, dynamic, SPA, protected, documents, international):

WebPeel Next best
Success rate 100% (30/30) 93.3%
Content quality 92.3% 83.2%

WebPeel is the only tool that extracted content from all 30 test URLs. Full methodology →


🤖 MCP Integration

WebPeel exposes 18 tools to your AI coding assistant:

Tool What it does
webpeel_fetch Fetch any URL → markdown. Smart escalation built in. Supports readable: true for reader mode
webpeel_search Web search with structured results across 27+ sources
webpeel_batch Fetch multiple URLs concurrently
webpeel_crawl Crawl a site with depth/page limits
webpeel_map Discover all URLs on a domain
webpeel_extract Structured extraction (CSS, JSON Schema, or LLM)
webpeel_screenshot Screenshot any page (full-page or viewport)
webpeel_research Deep multi-hop research on a topic
webpeel_summarize AI summary of any URL
webpeel_answer Ask a question about a URL's content
webpeel_change_track Detect changes between two fetches
webpeel_brand Extract branding assets from a site
webpeel_deep_fetch Search + batch fetch + merge — comprehensive research, no LLM key
webpeel_youtube Extract YouTube video transcripts — all URL formats, no API key
webpeel_auto_extract Heuristic structured data extraction — auto-detects pricing, products, contacts
webpeel_quick_answer BM25-powered Q&A — ask any question about any page, no LLM key
webpeel_watch Persistent URL change monitoring with webhook notifications
webpeel_hotels Hotel search across Kayak, Booking.com, Google Travel, Expedia in parallel
Setup for Claude Desktop, Cursor, VS Code, Windsurf, Docker

Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "webpeel": { "command": "npx", "args": ["-y", "webpeel", "mcp"] }
  }
}

Cursor (Settings → MCP Servers):

{
  "mcpServers": {
    "webpeel": { "command": "npx", "args": ["-y", "webpeel", "mcp"] }
  }
}

VS Code (~/.vscode/mcp.json):

{
  "servers": {
    "webpeel": { "command": "npx", "args": ["-y", "webpeel", "mcp"] }
  }
}

Windsurf (~/.codeium/windsurf/mcp_config.json):

{
  "mcpServers": {
    "webpeel": { "command": "npx", "args": ["-y", "webpeel", "mcp"] }
  }
}

Docker (stdio):

{
  "mcpServers": {
    "webpeel": { "command": "docker", "args": ["run", "-i", "--rm", "webpeel/mcp"] }
  }
}

Hosted endpoint (no local server needed):

{
  "mcpServers": {
    "webpeel": {
      "url": "https://api.webpeel.dev/mcp",
      "headers": { "Authorization": "Bearer YOUR_API_KEY" }
    }
  }
}

🔬 Deep Research

Multi-hop research that thinks like a researcher, not a search engine:

# Sources only — no API key needed
npx webpeel research "best practices for rate limiting APIs" --max-sources 8

# Full synthesis with LLM (BYOK)
npx webpeel research "compare Firecrawl vs Crawl4AI vs WebPeel" --llm-key sk-...

Search → fetch top results → extract key passages (BM25) → follow the most relevant links → synthesize. No circular references, no duplicate content.


📦 Extraction

CSS Schema, JSON Schema, and LLM extraction — click to expand

CSS Schema (zero config, auto-detected)

# Auto-detects Amazon and applies the built-in schema
npx webpeel "https://www.amazon.com/s?k=mechanical+keyboard" --json

# Force a specific schema
npx webpeel "https://www.booking.com/searchresults.html?city=Paris" --schema booking --json

# List all built-in schemas
npx webpeel --list-schemas

Built-in schemas: amazon · booking · ebay · expedia · hackernews · walmart · yelp

JSON Schema (type-safe structured extraction)

npx webpeel "https://example.com/product" \
  --extract-schema '{"type":"object","properties":{"title":{"type":"string"},"price":{"type":"number"}}}' \
  --llm-key sk-...

LLM Extraction (natural language, BYOK)

npx webpeel "https://hn.algolia.com" \
  --llm-extract "top 10 posts with title, score, and comment count" \
  --llm-key $OPENAI_API_KEY \
  --json

Note: WebPeel is an ESM-only package. Use import syntax:

import { peel } from 'webpeel';

CommonJS require() is not supported. If your project uses CommonJS, use dynamic import: const { peel } = await import('webpeel');

import { peel } from 'webpeel';

// CSS selector extraction
const result = await peel('https://news.ycombinator.com', {
  extract: { selectors: { titles: '.titleline > a', scores: '.score' } }
});

// LLM extraction with JSON Schema
const product = await peel('https://example.com/product', {
  llmExtract: 'title, price, rating, availability',
  llmKey: process.env.OPENAI_API_KEY,
});

🛡️ Stealth & Anti-Bot

Supported bot-protection vendors and auto-stealth domains — click to expand

WebPeel detects 7 bot-protection vendors automatically:

  • Cloudflare (JS challenge, Turnstile, Bot Management)
  • PerimeterX / HUMAN (behavioral analysis)
  • DataDome (ML-based bot detection)
  • Akamai Bot Manager
  • Distil Networks
  • reCAPTCHA / hCaptcha
  • Generic challenge pages

28 high-protection domains (Amazon, LinkedIn, Glassdoor, Zillow, Ticketmaster, and more) automatically route through stealth mode — no flags needed.

# Explicitly enable stealth
npx webpeel "https://glassdoor.com/jobs" --stealth

# Auto-escalation (stealth triggers automatically on challenge detection)
npx webpeel "https://amazon.com/dp/ASIN"

🏨 Hotel Search

Multi-source hotel search — click to expand

Search Kayak, Booking.com, Google Travel, and Expedia in parallel — returns unified results in one call.

npx webpeel hotels "Paris" --check-in 2025-06-01 --check-out 2025-06-07 --guests 2 --json

Available as webpeel_hotels MCP tool and via the REST API.


💳 Pricing

Plan Price Weekly Fetches Burst
Free $0/mo 500/wk 50/hr
Pro $9/mo 1,250/wk 100/hr
Max $29/mo 6,250/wk 500/hr

All features on all plans. Pro/Max add pay-as-you-go extra usage. Quota resets every Monday.

Sign up free → · Compare with Firecrawl →


🐍 Python SDK

Python SDK usage — click to expand
pip install webpeel
from webpeel import WebPeel

client = WebPeel(api_key="wp_...")  # or WEBPEEL_API_KEY env var

result = client.scrape("https://example.com")
print(result.content)    # Clean markdown
print(result.metadata)   # title, description, author, ...

results = client.search("latest AI research papers")
job = client.crawl("https://docs.example.com", limit=100)
result = client.scrape("https://protected-site.com", render=True, stealth=True)

Sync and async clients. Pure Python 3.8+, zero dependencies. Full SDK docs →


🐳 Self-Hosting

git clone https://github.com/webpeel/webpeel.git
cd webpeel && docker compose up

Full REST API at http://localhost:3000. AGPL-3.0 licensed. Self-hosting guide →

docker run -i webpeel/mcp          # MCP server only
docker run -p 3000:3000 webpeel/api  # API server only

🤝 Contributing

git clone https://github.com/webpeel/webpeel.git
cd webpeel
npm install && npm run build
npm test

The project has 1,098 tests. Please add tests for new features.


Star History

Star History Chart

License

AGPL-3.0 — free to use, modify, and distribute. If you run a modified version as a network service, you must release your source under AGPL-3.0.

Need a commercial license? support@webpeel.dev

Versions 0.7.1 and earlier were released under MIT and remain MIT-licensed.


If WebPeel saves you time, ⭐ star the repo — it helps others find it.

© WebPeel