A distributed, recursive web crawler built in Rust. Feed it a URL and it maps all linked websites at a chosen depth, storing the graph in Neo4j and visualizing it in a React frontend.
- Recursive crawling — follow links up to a configurable depth
- Distributed workers — 8 feeder replicas process URLs in parallel
- Graph storage — Neo4j stores URL nodes and link relationships
- Real-time progress — WebSocket updates stream crawl status live
- Interactive graph visualization — force-directed graph with color-coded node status
- Crawl management — create, monitor, cancel, and review crawl statistics
- Kubernetes-native — Helm chart deploys all services with a single command
graph TD
User([User]) -->|Port 30080| Frontend
Frontend["Frontend<br/>(nginx + React SPA)"]
Frontend -->|"/api/*" reverse proxy| Manager
Manager["Manager<br/>(Axum REST API)"]
Manager -->|Reads / Writes| Neo4j
Neo4j[("Neo4j<br/>Graph Database")]
Feeder["Feeder (x8)<br/>(Background Workers)"]
Feeder -->|Claims / Updates| Neo4j
| Service | Tech | Role |
|---|---|---|
| Frontend | nginx + React/Vite/TypeScript | SPA UI, API reverse proxy |
| Manager | Rust + Axum | REST API, crawl initiation, WebSocket |
| Feeder | Rust | Background URL processing workers |
| Neo4j | Neo4j 5.x | Graph database for crawl data |
helm install web-crawler oci://ghcr.io/bluedotiya/web-crawler/charts/web-crawler \
--version 1.0.0 -n web-crawler --create-namespacekubectl rollout status statefulset/crawler-neo4j -n web-crawler
kubectl get pods -n web-crawler# Get your node IP
minikube ip # or use your cluster's external IP
# Open the frontend
open http://<NODE_IP>:30080Use the web UI at /new, or via the API:
curl -X POST http://<NODE_IP>:30080/api/v1/crawls \
-H 'Content-Type: application/json' \
-d '{"url": "https://example.com", "depth": 2}'Dashboard![]() |
Crawl List![]() |
New Crawl![]() |
Crawl Progress![]() |
Graph Visualization![]() |
Statistics![]() |
| Document | Description |
|---|---|
| Architecture | System design, data flow, concurrency model, WebSocket protocol |
| Neo4j Graph Model | Node labels, properties, relationships, indexes, example queries |
| API Reference | REST endpoints, request/response schemas, WebSocket protocol |
| Deployment Guide | Helm install, configuration reference, scaling, monitoring |
| Development Guide | Local setup, testing, CI/CD pipeline, conventional commits |
| Layer | Technology |
|---|---|
| Backend | Rust, Axum, Tokio, neo4rs, hickory-resolver |
| Frontend | React, TypeScript, Vite, Tailwind CSS |
| Database | Neo4j 5.x |
| Infra | Docker, Kubernetes, Helm, GitHub Actions |
| Registry | GitHub Container Registry (GHCR) |
- Fork the repository
- Create a feature branch from
master - Follow conventional commit format for PR titles
- Ensure all checks pass:
cargo test --workspace && cargo clippy --workspace -- -D warnings - For frontend changes:
cd frontend && npm run lint && npm run type-check - Open a pull request — PRs are squash-merged, and the title drives the version bump
See the Development Guide for detailed setup instructions.
- Report security vulnerabilities via GitHub Security Advisories
- All services run as non-root users in containers
- Neo4j credentials are stored in Kubernetes secrets
This project is licensed under the GPL-3.0 License.





