Self-hosted documentation and code indexing with MCP integration.
Give your AI assistant accurate, up-to-date context from your own sources.
ContextMine indexes your documentation and code repositories, making them searchable via the Model Context Protocol (MCP). Connect it to Claude Desktop, Cursor, or any MCP-compatible AI assistant to provide rich context for code understanding, documentation lookup, and codebase exploration.
Key features:
- Hybrid search - Full-text + vector similarity with RRF ranking for accurate retrieval
- Deep research agent - Multi-step AI agent with LSP and Tree-sitter for complex codebase questions
- Code intelligence - Symbol extraction, code outlines, and structural navigation via Tree-sitter
- Architecture Cockpit - Read-only extracted Twin views per collection/scenario (
Overview,Topology,Deep Dive,C4 Diff,Exports) - Strict real metrics - File-level LOC/complexity/coupling/coverage for GitHub sources with explicit availability status
- Web crawling - Index documentation sites automatically
- Git indexing - Index GitHub repositories with incremental updates
- Self-hosted - Your data stays on your infrastructure
The deep research agent goes beyond simple search to answer complex questions about your codebase. It uses an iterative approach with multiple tools:
| Tool | Description |
|---|---|
| Hybrid Search | BM25 + vector similarity search with RRF ranking |
| LSP Go to Definition | Jump to symbol definitions across files |
| LSP Find References | Find all usages of a symbol |
| LSP Hover | Get type information and documentation |
| Tree-sitter Outline | Extract file structure (classes, functions, methods) |
| Tree-sitter Find Symbol | Locate symbols by name pattern |
| Graph Traversal | Navigate call graphs and dependencies |
The agent collects evidence from multiple sources, verifies findings, and synthesizes a comprehensive answer with citations.
Choose your deployment method:
- Docker Compose (recommended for local development)
- Kubernetes (Helm) (recommended for production)
# Clone the repository
git clone https://github.com/mayflower/contextmine.git
cd contextmine
# Copy environment template and configure
cp .env.example .env
# Edit .env with your API keys (see Configuration section)
# Start all services
docker compose up -d
# Run database migrations
docker compose exec api sh -c "cd /app/packages/core && alembic upgrade head"For production deployments, use the Helm chart from GHCR:
# Create a values file with your configuration
cat > my-values.yaml << EOF
api:
image:
repository: ghcr.io/mayflower/contextmine-api
tag: latest
worker:
image:
repository: ghcr.io/mayflower/contextmine-worker
tag: latest
config:
publicBaseUrl: "https://contextmine.example.com"
secrets:
github:
clientId: "your-github-client-id"
clientSecret: "your-github-client-secret"
sessionSecret: "$(python -c 'import secrets; print(secrets.token_urlsafe(32))')"
tokenEncryptionKey: "$(python -c 'import secrets; print(secrets.token_urlsafe(32))')"
openaiApiKey: "sk-..."
EOF
# Install from OCI registry
helm install contextmine oci://ghcr.io/mayflower/contextmine -f my-values.yaml
# Access the application
kubectl port-forward svc/contextmine-api 8000:8000See deploy/helm/contextmine/README.md for full configuration options.
- Open the admin UI at http://localhost:8000
- Log in with GitHub OAuth
- Create a new Collection (e.g., "My Docs")
- Add a Source:
- Web: Enter a documentation URL (e.g.,
https://docs.python.org/3/) - GitHub: Enter
owner/repo(e.g.,fastapi/fastapi)
- Web: Enter a documentation URL (e.g.,
- Click Sync to start indexing
Configure your MCP client to connect to ContextMine. Authentication is handled via GitHub OAuth automatically.
Claude Desktop (~/.config/claude/claude_desktop_config.json on Linux, ~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"contextmine": {
"url": "http://localhost:8000/mcp"
}
}
}When you first connect, your MCP client will redirect to GitHub for authentication.
Cursor: Settings → MCP → Add server with URL http://localhost:8000/mcp
In your AI assistant, you can now:
Search the FastAPI docs for information about dependency injection
What authentication methods does this codebase support?
Show me the outline of src/auth/handlers.py
The web app includes an Architecture Cockpit for project/collection-level Twin inspection in the browser.
Overview- City KPIs, hotspots, andcc.jsonpreview.Topology- Layered architecture graph view.Deep Dive- Large graph slices for dependency/controlflow inspection.C4 Diff- AS-IS / TO-BE Mermaid compare.Exports- Generatecc_json,cx2,jgf,lpg_jsonl,mermaid_c4.
Overview uses GET /api/twin/collections/{collection_id}/views/city and reads:
{
"metrics_status": {
"status": "ready|unavailable",
"reason": "ok|no_real_metrics|awaiting_ci_coverage|coverage_ingest_failed",
"strict_mode": true
}
}Rules:
ready: real metric snapshots are available.unavailable: no valid real metrics for the selected scenario.reason=awaiting_ci_coverage: structural metrics are ready, coverage has not been ingested yet.reason=coverage_ingest_failed: the latest coverage ingest job failed or was rejected.- UI shows
N/Afor unavailable KPI values (not placeholder0.00).
Coverage is no longer discovered from repository files. CI pushes raw coverage reports to ContextMine.
- Identify your GitHub source ID (
/api/collections/{collection_id}/sources). - Rotate the ingest token once as source owner:
POST /api/sources/{source_id}/metrics/coverage-ingest-token/rotate
- Store returned token in GitHub Secrets as
CONTEXTMINE_INGEST_TOKEN. - Store source ID in GitHub Secrets as
CONTEXTMINE_SOURCE_ID.
- name: Push coverage to ContextMine
if: always()
env:
CONTEXTMINE_URL: https://contextmine.example.com
CONTEXTMINE_SOURCE_ID: ${{ secrets.CONTEXTMINE_SOURCE_ID }}
CONTEXTMINE_INGEST_TOKEN: ${{ secrets.CONTEXTMINE_INGEST_TOKEN }}
run: |
curl --fail-with-body \
-X POST "$CONTEXTMINE_URL/api/sources/$CONTEXTMINE_SOURCE_ID/metrics/coverage-ingest" \
-H "X-ContextMine-Ingest-Token: $CONTEXTMINE_INGEST_TOKEN" \
-F "commit_sha=${{ github.sha }}" \
-F "branch=${{ github.ref_name }}" \
-F "workflow_run_id=${{ github.run_id }}" \
-F "provider=github_actions" \
-F "reports=@coverage/lcov.info" \
-F "reports=@coverage/coverage.xml"Notes:
commit_shamust exactly match the current source cursor SHA.- Multiple report files are supported and merged by file-level average.
- Supported protocols (Core 6):
lcov,Cobertura XML,JaCoCo XML,Clover/PHPUnit XML,OpenCover XML,generic-file-coverage-v1JSON. - Check job status via
GET /api/sources/{source_id}/metrics/coverage-ingest/{job_id}.
| Tool | Description |
|---|---|
get_markdown |
Primary search tool. Searches indexed content and returns relevant context as Markdown. Supports filtering by collection. |
list_collections |
List available documentation collections |
list_documents |
Browse documents in a collection |
| Tool | Description |
|---|---|
outline |
List all functions, classes, and methods in a file with line numbers |
find_symbol |
Get the source code of a specific function or class by name |
definition |
Jump to where a symbol is defined (requires LSP) |
references |
Find all usages of a symbol for impact analysis (requires LSP) |
expand |
Explore code relationships - what a function calls, what calls it, imports, etc. |
| Tool | Description |
|---|---|
deep_research |
Multi-step AI agent for complex questions. Autonomously searches, reads code, and builds answers with citations. |
Copy .env.example to .env and configure these variables:
| Variable | Description |
|---|---|
DATABASE_URL |
PostgreSQL connection string (default works with docker compose) |
GITHUB_CLIENT_ID |
GitHub OAuth app client ID |
GITHUB_CLIENT_SECRET |
GitHub OAuth app secret |
SESSION_SECRET |
Secret for session cookies |
TOKEN_ENCRYPTION_KEY |
Key for encrypting stored tokens |
OPENAI_API_KEY |
OpenAI API key for embeddings |
| Variable | Description |
|---|---|
GEMINI_API_KEY |
Alternative to OpenAI for embeddings |
ANTHROPIC_API_KEY |
For deep_research agent (uses Claude) |
MCP_ALLOWED_ORIGINS |
CORS origins for MCP in production |
POSTGRES_PLATFORM |
Docker Compose postgres image platform override (default: linux/amd64) |
METRICS_STRICT_MODE |
Enforce strict real metrics gate for GitHub syncs (default: true) |
METRICS_LANGUAGES |
Metrics language scope (default: python,typescript,javascript,java,php) |
COVERAGE_INGEST_MAX_PAYLOAD_MB |
Max multipart payload size for CI coverage uploads (default: 25) |
COVERAGE_INGEST_PREFECT_FLOW_NAME |
Prefect flow name for async coverage ingest (default: ingest_coverage_metrics) |
- Go to https://github.com/settings/developers
- Click New OAuth App
- Fill in:
- Application name: ContextMine (or your preferred name)
- Homepage URL:
http://localhost:8000 - Authorization callback URL:
http://localhost:8000/api/auth/callback
- Copy the Client ID and Client Secret to your
.env
Note: Both the admin UI and MCP clients use the same callback URL. The server automatically routes OAuth flows to the appropriate handler.
# Generate session secret
python -c "import secrets; print(secrets.token_urlsafe(32))"
# Generate encryption key
python -c "import secrets; print(secrets.token_urlsafe(32))"Best for: API docs, guides, reference documentation
- Create a collection in the admin UI
- Add a source with type Web
- Enter the base URL (e.g.,
https://docs.example.com/) - The crawler follows links within the same domain
Best for: Source code, README files, inline documentation
- Add a source with type GitHub
- Enter the repository as
owner/repo - Optionally specify:
- Branch: defaults to the default branch
- Path filter: limit to specific directories (e.g.,
src/,docs/)
- Code files are parsed for symbols (functions, classes, methods)
Supported languages for symbol extraction: Python, TypeScript, JavaScript, Go, Rust, Java, C, C++, Ruby, PHP
Supported languages for strict real metrics (Twin/City): Python, TypeScript, JavaScript, Java, PHP
Strict metrics gate behavior for GitHub sources:
- Sync computes structural metrics (
loc,complexity,coupling) without blocking on coverage. - Coverage is ingested asynchronously from CI and bound to exact commit SHA.
- Coverage ingest is strict: invalid token/payload/SHA mismatch/path mismatch fails the ingest job.
- City metrics become fully ready only after successful coverage ingest.
┌───────────────────────────────┐ ┌─────────────┐
│ FastAPI + React SPA │────▶│ PostgreSQL │
│ /api/* /mcp/* /* (frontend) │ │ pg4ai │
└───────────────────────────────┘ └─────────────┘
│
┌──────┴──────┐
▼ ▼
┌─────────┐ ┌─────────┐
│ Prefect │ │ spider │
│ Worker │ │ _md │
└─────────┘ └─────────┘
- API (
apps/api): FastAPI serving REST API at/api/*, MCP at/mcp/*, and React frontend at/* - Web (
apps/web): React admin console (built and served by API) - Worker (
apps/worker): Background sync jobs using Prefect - Core (
packages/core): Shared models, database, and utilities
- Python 3.12+
- Node.js 20+
- uv for Python dependency management
- Docker (for pg4ai: PostgreSQL + pgvector + Apache AGE)
# Start database
docker compose up -d postgres
# Optional: verify vector + graph capabilities in postgres
./scripts/docker/smoke-pg4ai.sh
# Install Python dependencies
uv sync --all-packages
# Run migrations
cd packages/core
DATABASE_URL=postgresql+asyncpg://contextmine:contextmine@localhost:5432/contextmine \
uv run alembic upgrade head
cd ../..
# Build frontend (one-time, or after frontend changes)
cd apps/web && npm install && npm run build && cd ../..
# Start API server (serves both API and frontend)
STATIC_DIR=apps/web/dist uv run uvicorn apps.api.app.main:app --reload --port 8000For frontend development with hot reload, run the Vite dev server separately:
# Terminal 1: API server
uv run uvicorn apps.api.app.main:app --reload --port 8000
# Terminal 2: Frontend dev server (proxies API requests to :8000)
cd apps/web && npm run dev# All tests
uv run pytest -v
# Specific test file
uv run pytest packages/core/tests/test_treesitter.py -v
# With coverage
uv run pytest --cov=contextmine_core --cov-report=term-missing# Linting
uv run ruff check .
# Type checking
uvx ty check
# Auto-format
uv run ruff format .
# Pre-commit hooks
uv run pre-commit install
uv run pre-commit run --all-filesPre-built images are available from GitHub Container Registry:
docker pull ghcr.io/mayflower/contextmine-api:latest
docker pull ghcr.io/mayflower/contextmine-worker:latest
docker pull ghcr.io/mayflower/contextmine-web:latest- Ensure you've created at least one collection in the admin UI
- Check that the collection visibility is set to Global (or you're authenticated)
- Verify you've completed the GitHub OAuth flow when prompted by your MCP client
- Check the Prefect UI at http://localhost:4200 for job status
- For GitHub sources, ensure the repository is accessible
- For web sources, verify the URL is reachable and returns HTML
Symbol extraction works for supported languages only. Check that:
- The file has a recognized extension (
.py,.ts,.js,.go, etc.) - The sync has completed (symbols are extracted during sync)
- Inspect
GET /api/twin/collections/{collection_id}/views/city. - Check
metrics_status.reason:awaiting_ci_coverage: CI has not pushed coverage yet.coverage_ingest_failed: review ingest job diagnostics.no_real_metrics: no structural metric snapshots were produced.
- Verify latest ingest job:
GET /api/sources/{source_id}/metrics/coverage-ingest/{job_id}
- Re-run CI upload with matching
commit_sha=${{ github.sha }}and valid reports.
MIT License - see LICENSE for details.
