ContextMine

Self-hosted documentation and code indexing with MCP integration.
Give your AI assistant accurate, up-to-date context from your own sources.

What is ContextMine?

ContextMine indexes your documentation and code repositories, making them searchable via the Model Context Protocol (MCP). Connect it to Claude Desktop, Cursor, or any MCP-compatible AI assistant to provide rich context for code understanding, documentation lookup, and codebase exploration.

Key features:

Hybrid search - Full-text + vector similarity with RRF ranking for accurate retrieval
Deep research agent - Multi-step AI agent with LSP and Tree-sitter for complex codebase questions
Code intelligence - Symbol extraction, code outlines, and structural navigation via Tree-sitter
Architecture Cockpit - Read-only extracted Twin views per collection/scenario (Overview, Topology, Deep Dive, C4 Diff, Exports)
Strict real metrics - File-level LOC/complexity/coupling/coverage for GitHub sources with explicit availability status
Web crawling - Index documentation sites automatically
Git indexing - Index GitHub repositories with incremental updates
Self-hosted - Your data stays on your infrastructure

Deep Research Agent

The deep research agent goes beyond simple search to answer complex questions about your codebase. It uses an iterative approach with multiple tools:

Tool	Description
Hybrid Search	BM25 + vector similarity search with RRF ranking
LSP Go to Definition	Jump to symbol definitions across files
LSP Find References	Find all usages of a symbol
LSP Hover	Get type information and documentation
Tree-sitter Outline	Extract file structure (classes, functions, methods)
Tree-sitter Find Symbol	Locate symbols by name pattern
Graph Traversal	Navigate call graphs and dependencies

The agent collects evidence from multiple sources, verifies findings, and synthesizes a comprehensive answer with citations.

Quick Start

Choose your deployment method:

Docker Compose (recommended for local development)
Kubernetes (Helm) (recommended for production)

Docker Compose

# Clone the repository
git clone https://github.com/mayflower/contextmine.git
cd contextmine

# Copy environment template and configure
cp .env.example .env
# Edit .env with your API keys (see Configuration section)

# Start all services
docker compose up -d

# Run database migrations
docker compose exec api sh -c "cd /app/packages/core && alembic upgrade head"

Kubernetes (Helm)

For production deployments, use the Helm chart from GHCR:

# Create a values file with your configuration
cat > my-values.yaml << EOF
api:
  image:
    repository: ghcr.io/mayflower/contextmine-api
    tag: latest
worker:
  image:
    repository: ghcr.io/mayflower/contextmine-worker
    tag: latest
config:
  publicBaseUrl: "https://contextmine.example.com"
secrets:
  github:
    clientId: "your-github-client-id"
    clientSecret: "your-github-client-secret"
  sessionSecret: "$(python -c 'import secrets; print(secrets.token_urlsafe(32))')"
  tokenEncryptionKey: "$(python -c 'import secrets; print(secrets.token_urlsafe(32))')"
  openaiApiKey: "sk-..."
EOF

# Install from OCI registry
helm install contextmine oci://ghcr.io/mayflower/contextmine -f my-values.yaml

# Access the application
kubectl port-forward svc/contextmine-api 8000:8000

See deploy/helm/contextmine/README.md for full configuration options.

2. Create Your First Collection

Open the admin UI at http://localhost:8000
Log in with GitHub OAuth
Create a new Collection (e.g., "My Docs")
Add a Source:
- Web: Enter a documentation URL (e.g., https://docs.python.org/3/)
- GitHub: Enter owner/repo (e.g., fastapi/fastapi)
Click Sync to start indexing

3. Connect Your AI Assistant

Configure your MCP client to connect to ContextMine. Authentication is handled via GitHub OAuth automatically.

Claude Desktop (~/.config/claude/claude_desktop_config.json on Linux, ~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "contextmine": {
      "url": "http://localhost:8000/mcp"
    }
  }
}

When you first connect, your MCP client will redirect to GitHub for authentication.

Cursor: Settings → MCP → Add server with URL http://localhost:8000/mcp

4. Start Using It

In your AI assistant, you can now:

Search the FastAPI docs for information about dependency injection

What authentication methods does this codebase support?

Show me the outline of src/auth/handlers.py

Architecture Cockpit (Extracted Views)

The web app includes an Architecture Cockpit for project/collection-level Twin inspection in the browser.

Views

Overview - City KPIs, hotspots, and cc.json preview.
Topology - Layered architecture graph view.
Deep Dive - Large graph slices for dependency/controlflow inspection.
C4 Diff - AS-IS / TO-BE Mermaid compare.
Exports - Generate cc_json, cx2, jgf, lpg_jsonl, mermaid_c4.

Real Metrics Semantics

Overview uses GET /api/twin/collections/{collection_id}/views/city and reads:

{
  "metrics_status": {
    "status": "ready|unavailable",
    "reason": "ok|no_real_metrics|awaiting_ci_coverage|coverage_ingest_failed",
    "strict_mode": true
  }
}

Rules:

ready: real metric snapshots are available.
unavailable: no valid real metrics for the selected scenario.
reason=awaiting_ci_coverage: structural metrics are ready, coverage has not been ingested yet.
reason=coverage_ingest_failed: the latest coverage ingest job failed or was rejected.
UI shows N/A for unavailable KPI values (not placeholder 0.00).

GitHub Actions Coverage Ingest (CI Push)

Coverage is no longer discovered from repository files. CI pushes raw coverage reports to ContextMine.

One-time setup

Identify your GitHub source ID (/api/collections/{collection_id}/sources).
Rotate the ingest token once as source owner:
- POST /api/sources/{source_id}/metrics/coverage-ingest-token/rotate
Store returned token in GitHub Secrets as CONTEXTMINE_INGEST_TOKEN.
Store source ID in GitHub Secrets as CONTEXTMINE_SOURCE_ID.

GitHub Actions example

- name: Push coverage to ContextMine
  if: always()
  env:
    CONTEXTMINE_URL: https://contextmine.example.com
    CONTEXTMINE_SOURCE_ID: ${{ secrets.CONTEXTMINE_SOURCE_ID }}
    CONTEXTMINE_INGEST_TOKEN: ${{ secrets.CONTEXTMINE_INGEST_TOKEN }}
  run: |
    curl --fail-with-body \
      -X POST "$CONTEXTMINE_URL/api/sources/$CONTEXTMINE_SOURCE_ID/metrics/coverage-ingest" \
      -H "X-ContextMine-Ingest-Token: $CONTEXTMINE_INGEST_TOKEN" \
      -F "commit_sha=${{ github.sha }}" \
      -F "branch=${{ github.ref_name }}" \
      -F "workflow_run_id=${{ github.run_id }}" \
      -F "provider=github_actions" \
      -F "reports=@coverage/lcov.info" \
      -F "reports=@coverage/coverage.xml"

Notes:

commit_sha must exactly match the current source cursor SHA.
Multiple report files are supported and merged by file-level average.
Supported protocols (Core 6): lcov, Cobertura XML, JaCoCo XML, Clover/PHPUnit XML, OpenCover XML, generic-file-coverage-v1 JSON.
Check job status via GET /api/sources/{source_id}/metrics/coverage-ingest/{job_id}.

Available MCP Tools

Context Retrieval

Tool	Description
`get_markdown`	Primary search tool. Searches indexed content and returns relevant context as Markdown. Supports filtering by collection.
`list_collections`	List available documentation collections
`list_documents`	Browse documents in a collection

Code Intelligence

Tool	Description
`outline`	List all functions, classes, and methods in a file with line numbers
`find_symbol`	Get the source code of a specific function or class by name
`definition`	Jump to where a symbol is defined (requires LSP)
`references`	Find all usages of a symbol for impact analysis (requires LSP)
`expand`	Explore code relationships - what a function calls, what calls it, imports, etc.

Advanced Research

Tool	Description
`deep_research`	Multi-step AI agent for complex questions. Autonomously searches, reads code, and builds answers with citations.

Configuration

Copy .env.example to .env and configure these variables:

Required

Variable	Description
`DATABASE_URL`	PostgreSQL connection string (default works with docker compose)
`GITHUB_CLIENT_ID`	GitHub OAuth app client ID
`GITHUB_CLIENT_SECRET`	GitHub OAuth app secret
`SESSION_SECRET`	Secret for session cookies
`TOKEN_ENCRYPTION_KEY`	Key for encrypting stored tokens
`OPENAI_API_KEY`	OpenAI API key for embeddings

Optional

Variable	Description
`GEMINI_API_KEY`	Alternative to OpenAI for embeddings
`ANTHROPIC_API_KEY`	For deep_research agent (uses Claude)
`MCP_ALLOWED_ORIGINS`	CORS origins for MCP in production
`POSTGRES_PLATFORM`	Docker Compose postgres image platform override (default: `linux/amd64`)
`METRICS_STRICT_MODE`	Enforce strict real metrics gate for GitHub syncs (default: `true`)
`METRICS_LANGUAGES`	Metrics language scope (default: `python,typescript,javascript,java,php`)
`COVERAGE_INGEST_MAX_PAYLOAD_MB`	Max multipart payload size for CI coverage uploads (default: `25`)
`COVERAGE_INGEST_PREFECT_FLOW_NAME`	Prefect flow name for async coverage ingest (default: `ingest_coverage_metrics`)

Setting Up GitHub OAuth

Go to https://github.com/settings/developers
Click New OAuth App
Fill in:
- Application name: ContextMine (or your preferred name)
- Homepage URL: http://localhost:8000
- Authorization callback URL: http://localhost:8000/api/auth/callback
Copy the Client ID and Client Secret to your .env

Note: Both the admin UI and MCP clients use the same callback URL. The server automatically routes OAuth flows to the appropriate handler.

Generating Secure Keys

# Generate session secret
python -c "import secrets; print(secrets.token_urlsafe(32))"

# Generate encryption key
python -c "import secrets; print(secrets.token_urlsafe(32))"

Adding Sources

Web Documentation

Best for: API docs, guides, reference documentation

Create a collection in the admin UI
Add a source with type Web
Enter the base URL (e.g., https://docs.example.com/)
The crawler follows links within the same domain

GitHub Repositories

Best for: Source code, README files, inline documentation

Add a source with type GitHub
Enter the repository as owner/repo
Optionally specify:
- Branch: defaults to the default branch
- Path filter: limit to specific directories (e.g., src/, docs/)
Code files are parsed for symbols (functions, classes, methods)

Supported languages for symbol extraction: Python, TypeScript, JavaScript, Go, Rust, Java, C, C++, Ruby, PHP

Supported languages for strict real metrics (Twin/City): Python, TypeScript, JavaScript, Java, PHP

Strict metrics gate behavior for GitHub sources:

Sync computes structural metrics (loc, complexity, coupling) without blocking on coverage.
Coverage is ingested asynchronously from CI and bound to exact commit SHA.
Coverage ingest is strict: invalid token/payload/SHA mismatch/path mismatch fails the ingest job.
City metrics become fully ready only after successful coverage ingest.

Architecture

┌───────────────────────────────┐     ┌─────────────┐
│     FastAPI + React SPA       │────▶│  PostgreSQL │
│  /api/* /mcp/* /* (frontend)  │     │    pg4ai    │
└───────────────────────────────┘     └─────────────┘
               │
        ┌──────┴──────┐
        ▼             ▼
  ┌─────────┐   ┌─────────┐
  │ Prefect │   │ spider  │
  │ Worker  │   │   _md   │
  └─────────┘   └─────────┘

API (apps/api): FastAPI serving REST API at /api/*, MCP at /mcp/*, and React frontend at /*
Web (apps/web): React admin console (built and served by API)
Worker (apps/worker): Background sync jobs using Prefect
Core (packages/core): Shared models, database, and utilities

Development

Prerequisites

Python 3.12+
Node.js 20+
uv for Python dependency management
Docker (for pg4ai: PostgreSQL + pgvector + Apache AGE)

Local Development Setup

# Start database
docker compose up -d postgres

# Optional: verify vector + graph capabilities in postgres
./scripts/docker/smoke-pg4ai.sh

# Install Python dependencies
uv sync --all-packages

# Run migrations
cd packages/core
DATABASE_URL=postgresql+asyncpg://contextmine:contextmine@localhost:5432/contextmine \
  uv run alembic upgrade head
cd ../..

# Build frontend (one-time, or after frontend changes)
cd apps/web && npm install && npm run build && cd ../..

# Start API server (serves both API and frontend)
STATIC_DIR=apps/web/dist uv run uvicorn apps.api.app.main:app --reload --port 8000

For frontend development with hot reload, run the Vite dev server separately:

# Terminal 1: API server
uv run uvicorn apps.api.app.main:app --reload --port 8000

# Terminal 2: Frontend dev server (proxies API requests to :8000)
cd apps/web && npm run dev

Running Tests

# All tests
uv run pytest -v

# Specific test file
uv run pytest packages/core/tests/test_treesitter.py -v

# With coverage
uv run pytest --cov=contextmine_core --cov-report=term-missing

Code Quality

# Linting
uv run ruff check .

# Type checking
uvx ty check

# Auto-format
uv run ruff format .

# Pre-commit hooks
uv run pre-commit install
uv run pre-commit run --all-files

Container Images

Pre-built images are available from GitHub Container Registry:

docker pull ghcr.io/mayflower/contextmine-api:latest
docker pull ghcr.io/mayflower/contextmine-worker:latest
docker pull ghcr.io/mayflower/contextmine-web:latest

Troubleshooting

"No collections found" in MCP client

Ensure you've created at least one collection in the admin UI
Check that the collection visibility is set to Global (or you're authenticated)
Verify you've completed the GitHub OAuth flow when prompted by your MCP client

Sync not finding documents

Check the Prefect UI at http://localhost:4200 for job status
For GitHub sources, ensure the repository is accessible
For web sources, verify the URL is reachable and returns HTML

Symbols not being extracted

Symbol extraction works for supported languages only. Check that:

The file has a recognized extension (.py, .ts, .js, .go, etc.)
The sync has completed (symbols are extracted during sync)

Cockpit Overview shows `N/A` metrics

Inspect GET /api/twin/collections/{collection_id}/views/city.
Check metrics_status.reason:
- awaiting_ci_coverage: CI has not pushed coverage yet.
- coverage_ingest_failed: review ingest job diagnostics.
- no_real_metrics: no structural metric snapshots were produced.
Verify latest ingest job:
- GET /api/sources/{source_id}/metrics/coverage-ingest/{job_id}
Re-run CI upload with matching commit_sha=${{ github.sha }} and valid reports.

License

MIT License - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
.claude/agents		.claude/agents
.github/workflows		.github/workflows
apps		apps
deploy/helm/contextmine		deploy/helm/contextmine
docs		docs
packages/core		packages/core
rust/spider_md		rust/spider_md
scripts		scripts
.agents.yaml		.agents.yaml
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.semgrepignore		.semgrepignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
catalog-info.yaml		catalog-info.yaml
docker-compose.yml		docker-compose.yml
logo.png		logo.png
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
sonar-project.properties		sonar-project.properties
uv.lock		uv.lock
vulture_whitelist.py		vulture_whitelist.py

License

mayflower/contextmine

Folders and files

Latest commit

History

Repository files navigation

ContextMine

What is ContextMine?

Deep Research Agent

Quick Start

Docker Compose

Kubernetes (Helm)

2. Create Your First Collection

3. Connect Your AI Assistant

4. Start Using It

Architecture Cockpit (Extracted Views)

Views

Real Metrics Semantics

GitHub Actions Coverage Ingest (CI Push)

One-time setup

GitHub Actions example

Available MCP Tools

Context Retrieval

Code Intelligence

Advanced Research

Configuration

Required

Optional

Setting Up GitHub OAuth

Generating Secure Keys

Adding Sources

Web Documentation

GitHub Repositories

Architecture

Development

Prerequisites

Local Development Setup

Running Tests

Code Quality

Container Images

Troubleshooting

"No collections found" in MCP client

Sync not finding documents

Symbols not being extracted

Cockpit Overview shows N/A metrics

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Cockpit Overview shows `N/A` metrics

Packages