GitHub - crawlcore/qcrawl: qcrawl - fast async web crawling & scraping framework for Python.

qcrawl is a fast async web crawling & scraping framework for Python to extract structured data from web-pages. It is cross-platform and easy to install via pip or conda.

Follow the documentation.

qCrawl features

Async architecture - High-performance concurrent crawling based on asyncio
Performance optimized - Queue backend on Redis with direct delivery, messagepack serialization, connection pooling, DNS caching
Powerful parsing - CSS/XPath selectors with lxml
Middleware system - Customizable request/response processing
Flexible export - Multiple output formats including JSON, CSV, XML
Flexible queue backends - Memory or Redis-based (+disk) schedulers for different scale requirements
Item pipelines - Data transformation, validation, and processing pipeline
Pluggable downloaders - HTTP (aiohttp), Camoufox (stealth browser) for JavaScript rendering and anti-bot evasion

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.github/workflows		.github/workflows
docs		docs
qcrawl		qcrawl
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
tox.toml		tox.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

qCrawl features

About

Uh oh!

Releases 1

Languages

License

crawlcore/qcrawl

Folders and files

Latest commit

History

Repository files navigation

qCrawl features

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Languages