English | δΈζ
Your AI forgets everything the moment a conversation ends. Ask it tomorrow what you told it today β blank stare.
Cortex fixes this. It's a memory service that runs alongside any AI agent, silently learning who you are, what you care about, and how you work. It remembers your name, your preferences, your decisions, your projects β and recalls exactly the right context when you need it.
Think of it as upgrading your AI from a goldfish to a real assistant.
"My name is Alex, I'm a backend dev, I prefer Rust over Go."
β Cortex extracts & stores
[identity] Alex, backend developer
[preference] Prefers Rust over Go
... 3 weeks later, new conversation ...
"What language should I use for this new service?"
β Cortex recalls
"You've mentioned preferring Rust over Go for backend work."
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β WRITE PATH (every turn) β
β β
β Conversation βββ Fast Channel (regex, 0ms) β
β + Deep Channel (LLM, 2-5s) β
β β β
β Extracted memories β
β β β
β ββ 4-tier dedup βββββββββββββββ β
β β exact dup β skip β β
β β near-exact β auto-replace β β
β β semantic overlap β LLM judgeβ β
β β new info β insert β β
β ββββββββββββββββββββββββββββββ β
β β β
β Working (48h) or Core (permanent) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β READ PATH (every turn) β
β β
β User message βββ Query Expansion (optional) β
β β β
β BM25 + Vector β RRF Fusion β
β β β
β LLM Reranker (optional) β
β β β
β Priority inject β AI context β
β (constraints & persona first) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β LIFECYCLE (daily) β
β β
β Working β promote β Core β decay β Archive β compress β
β β β
β back to Core (nothing lost) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Three-layer memory β Working (48h) β Core (permanent) β Archive (compressed back to Core)
- Dual-channel extraction β Fast regex + deep LLM, with batch smart dedup
- 20 memory categories β Identity, preferences, constraints, agent persona, and more
- Hybrid search β BM25 + vector with Reciprocal Rank Fusion
- Query expansion β LLM-generated search variants for better recall
- LLM reranker β Re-score results for improved relevance
- Entity relations β Auto-extracted knowledge graph
- Extraction feedback β Rate memories good/bad/corrected, track quality
- Multi-provider β OpenAI, Anthropic, Google Gemini, DeepSeek, OpenRouter, Ollama
- Multi-agent β Per-agent config, isolated memory namespaces
- Dashboard β Full management UI with search debug, lifecycle preview, extraction logs
- ~$0.55/month β With gpt-4o-mini + text-embedding-3-small at 50 conversations/day
# Clone and start (Docker)
git clone https://github.com/rikouu/cortex.git
cd cortex
docker compose up -dOpen http://localhost:21100 β Dashboard β Settings β choose your LLM/Embedding provider and enter your API key.
That's it. No .env files, no environment variables.
Or run from source (without Docker)
git clone https://github.com/rikouu/cortex.git
cd cortex && pnpm install
pnpm dev # http://localhost:21100OpenClaw is an open-source AI agent framework with built-in tool use, memory, and multi-channel support. Cortex has a dedicated bridge plugin for seamless integration.
# 1. Install the bridge plugin
openclaw plugins install @cortexmem/cortex-bridge
# 2. Set Cortex URL (pick one)
echo 'CORTEX_URL=http://localhost:21100' >> .env
# or: openclaw env set CORTEX_URL http://localhost:21100Done. Your agent now automatically recalls memories before every response and saves important facts after each conversation turn.
The bridge hooks into OpenClaw's lifecycle:
| Hook | When | What |
|---|---|---|
onBeforeResponse |
Before AI responds | Recalls & injects relevant memories |
onAfterResponse |
After AI responds | Extracts & saves key info |
onBeforeCompaction |
Before context compression | Emergency saves before info is lost |
Plus cortex_recall and cortex_remember tools for on-demand use.
See the full guide: OpenClaw Quick Start.
Open Settings β Developer β Edit Config, paste and restart:
{
"mcpServers": {
"cortex": {
"command": "npx",
"args": ["cortex-mcp", "--server-url", "http://localhost:21100"]
}
}
}Cursor
Settings β MCP β + Add new global MCP server:
{
"mcpServers": {
"cortex": {
"command": "npx",
"args": ["cortex-mcp"],
"env": { "CORTEX_URL": "http://localhost:21100" }
}
}
}Claude Code
claude mcp add cortex -- npx cortex-mcp --server-url http://localhost:21100Windsurf / Cline / Other
Add to your client's MCP config:
{
"mcpServers": {
"cortex": {
"command": "npx",
"args": ["cortex-mcp", "--server-url", "http://localhost:21100"],
"env": { "CORTEX_AGENT_ID": "default" }
}
}
}# Store a memory
curl -X POST http://localhost:21100/api/v1/ingest \
-H "Content-Type: application/json" \
-d '{"user_message":"I love sushi","assistant_message":"Got it!","agent_id":"default"}'
# Recall memories
curl -X POST http://localhost:21100/api/v1/recall \
-H "Content-Type: application/json" \
-d '{"query":"What food do I like?","agent_id":"default"}'Tell your AI something memorable (e.g., "My favorite color is blue"). Start a new conversation and ask "What's my favorite color?". If it answers correctly, Cortex is working.
A complete beginner-friendly guide for adding persistent memory to your OpenClaw agent.
After following these steps, your OpenClaw agent will:
- Automatically recall relevant memories before every response
- Automatically save important facts from conversations
- Emergency save key info before context compression
- Have
cortex_recallandcortex_remembertools available for on-demand use
If you haven't already, get Cortex running first:
# Option A: From source
git clone https://github.com/rikouu/cortex.git
cd cortex && pnpm install
cp .env.example .env # add your OPENAI_API_KEY
pnpm dev
# Option B: Docker (one line)
OPENAI_API_KEY=sk-xxx docker compose up -dVerify it's running:
curl http://localhost:21100/api/v1/health
# Should return: {"status":"ok", ...}openclaw plugins install @cortexmem/cortex-bridgeThat's it β no config files, no manual setup.
Pick one of the two methods:
Method A β .env file (recommended)
Add this line to your project's .env file:
CORTEX_URL=http://localhost:21100
Method B β Shell profile
echo 'export CORTEX_URL=http://localhost:21100' >> ~/.zshrc
source ~/.zshrc-
Start a conversation with your agent and say something memorable:
"My favorite programming language is Rust and I work at Acme Corp."
-
Start a new conversation and ask:
"What do you know about me?"
-
If the agent mentions Rust and Acme Corp, everything is working!
You can also type /cortex-status in OpenClaw to check the connection.
The plugin uses OpenClaw's register(api) interface to automatically set up:
| Hook | When | What it does |
|---|---|---|
onBeforeResponse |
Before AI responds | Recalls relevant memories and injects them as context |
onAfterResponse |
After AI responds | Extracts and saves important information (fire-and-forget) |
onBeforeCompaction |
Before context compression | Emergency saves key info before it's lost |
Two tools are also registered:
| Tool | What it does |
|---|---|
cortex_recall |
Agent can search memories on demand |
cortex_remember |
Agent can store important facts explicitly |
For a persistent setup (server + OpenClaw agent always running):
# 1. Run Cortex with Docker (auto-restarts, data persisted)
OPENAI_API_KEY=sk-xxx docker compose up -d
# 2. Optional: set auth token for security
echo 'CORTEX_AUTH_TOKEN=your-secret-token' >> .env
docker compose up -d # restart to apply
# 3. In your OpenClaw project, set the URL
echo 'CORTEX_URL=http://your-server-ip:21100' >> .envTip: If running Cortex and OpenClaw on the same machine, use
http://localhost:21100. If on different machines, replace with your server's IP or domain.
| Problem | Solution |
|---|---|
| Agent doesn't recall memories | Check curl http://localhost:21100/api/v1/health returns OK |
| Plugin not loading | Run openclaw plugins list to verify @cortexmem/cortex-bridge is installed |
| Memories not saving after responses | Known upstream issue in streaming mode β see Known Issues |
| Connection refused | Make sure CORTEX_URL is set and Cortex is running |
ββ Client Layer ββββββββββββββββββββββββββββββββββββββββββ
β OpenClaw (Bridge) β Claude Desktop (MCP) β Any (REST) β
ββββββββββββββββββββββΌβββββββββββββββββββββββΌββββββββββββββ
βΌ βΌ
ββ Cortex Server (:21100) ββββββββββββββββββββββββββββββββ
β REST API β MCP Server β Dashboard β
β Memory Gate (recall) β Memory Sieve (ingest) β
β Memory Flush+ β Lifecycle Engine β
β SQLite + FTS5 β Vector Backend β Markdown Exporter β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The complete recall flow when your AI receives a message:
User message
β
βΌ
Clean query (strip system tags, metadata)
β
βΌ
Small-talk detection ββyesβββ Skip (no search)
βno
βΌ
ββ Query Expansion (1 LLM call) βββββββββββββββ
β "how was server deployed" β
β β variant 1: "server deployment steps" β
β β variant 2: "backend setup and config" β
ββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ Each variant searched independently (no LLM)
ββββββββββββ ββββββββββββββββ
β BM25 FTS β β Vector embed β
β keywords β β semantics β
ββββββ¬ββββββ ββββββββ¬ββββββββ
βββββ RRF Fusion ββ
layer weight Γ recency Γ access freq = finalScore
β
βΌ
ββ Merge & Deduplicate ββββββββββββββββββββββββ
β Same memory from multiple variants: β
β β keep highest finalScore as base β
β β multi-hit boost: +8% Γ ln(hits) β
β 2 hits +5.5% / 3 hits +8.8% β
β Result: union of all variants (~30+ items) β
ββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββ LLM Reranker (1 LLM call) βββββββββββββββββ
β All merged results β LLM scores 0-1 β
β Final = rerankerScore Γ w β
β + originalScore Γ (1-w) β
β w = 0.5 default, adjustable in Dashboard β
β Output: top 15 results β
ββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
Priority inject: constraint/persona first
β fill remaining budget β inject into AI context
Total: 2 LLM calls, ~5-7s latency
Query Expansion (optional): The LLM generates 2-3 variant queries using synonyms and rephrasings. Each variant is searched separately, expanding the candidate pool. Memories hit by multiple variants receive a logarithmic boost (diminishing returns). Enable in Dashboard β Gate β Query Expansion.
LLM Reranker (optional): After merging all variant results, the LLM re-scores them for query-specific relevance. The final score fuses the reranker score with the original score using a configurable weight (default 50:50), preserving signals like layer priority, recency, and access frequency. Supports llm (extraction model) and cohere (Cohere Rerank API). Enable in Dashboard β Search β Reranker.
Priority injection: When formatting results for context injection, constraint and agent_persona memories are injected first to ensure critical rules and persona are never truncated by the token budget.
When connected via MCP, the AI automatically gets these tools:
| Tool | What it does |
|---|---|
cortex_recall |
Search memories with priority injection (constraints and persona first) |
cortex_remember |
Store a memory: user facts, constraints, policies, or agent self-observations |
cortex_forget |
Remove or correct a memory |
cortex_search_debug |
Debug search scoring details |
cortex_stats |
Get memory statistics |
| Provider | Models | Notes |
|---|---|---|
| OpenAI | gpt-4o-mini, gpt-4.1-nano/mini, gpt-4o, o3/o4-mini | Default. Best cost-performance ratio |
| Anthropic | claude-haiku-4-5, claude-sonnet-4-5, claude-opus-4-5 | Highest extraction quality |
| Google Gemini | gemini-2.5-flash/pro, gemini-2.0-flash | Free tier available on AI Studio |
| DeepSeek | deepseek-chat, deepseek-reasoner | Cheapest. OpenAI-compatible API |
| OpenRouter | 100+ models from all providers | Unified gateway |
| Ollama | qwen2.5, llama3.2, mistral, deepseek-r1, etc. | Fully local, no API key |
| Provider | Models | Notes |
|---|---|---|
| OpenAI | text-embedding-3-small/large | Default (1536d). Most reliable |
| Google Gemini | gemini-embedding-001, text-embedding-004 | Free on AI Studio |
| Voyage AI | voyage-3, voyage-3-lite, voyage-code-3 | High quality |
| Ollama | bge-m3, nomic-embed-text, mxbai-embed-large | Local, zero cost |
All providers are configurable via the Dashboard UI or cortex.json. See cortex-provider-reference.md for detailed model comparisons and pricing.
Warning: Changing embedding models
Each embedding model produces vectors of a specific dimension. If you switch to a model with different dimensions, all existing vectors become incompatible. After changing the embedding model or dimensions:
- Go to Dashboard β Settings β Data Management β Reindex Vectors
- This regenerates all vectors using the new model (requires API calls for every stored memory)
- Until reindexed, vector search (recall, dedup, smart update) will not work correctly
| Method | Path | Description |
|---|---|---|
POST |
/api/v1/recall |
Search memories and get injection context |
POST |
/api/v1/ingest |
Ingest conversation for memory extraction |
POST |
/api/v1/flush |
Emergency flush before compaction |
POST |
/api/v1/search |
Hybrid search with debug info |
GET/POST/PATCH/DELETE |
/api/v1/memories |
Memory CRUD |
GET/POST/DELETE |
/api/v1/relations |
Entity relation CRUD |
GET/POST/PATCH/DELETE |
/api/v1/agents |
Agent management |
GET |
/api/v1/agents/:id/config |
Agent merged configuration |
GET |
/api/v1/extraction-logs |
Extraction quality audit logs |
POST |
/api/v1/lifecycle/run |
Trigger lifecycle engine |
GET |
/api/v1/lifecycle/preview |
Dry-run preview |
GET |
/api/v1/health |
Health check |
GET |
/api/v1/stats |
Memory statistics |
GET/PATCH |
/api/v1/config |
Configuration |
Cortex works out of the box with just an OPENAI_API_KEY. For advanced setups:
| Option | Description |
|---|---|
| LLM Provider | OpenAI, Anthropic, Google Gemini, DeepSeek, OpenRouter, Ollama |
| Embedding Provider | OpenAI, Google, Voyage AI, Ollama |
| Vector Backend | SQLite vec0 (default), Qdrant, Milvus |
| Per-Agent Config | Each agent can override global LLM/embedding settings |
| Offline Mode | Use Ollama for fully local, no-API-key setup |
See DESIGN.md for full configuration options and cortex-provider-reference.md for provider selection guide.
cortex/
βββ packages/
β βββ server/ # Core service (Fastify + SQLite)
β βββ mcp-client/ # MCP stdio adapter (npm: @cortex/mcp-client)
β βββ cortex-bridge/ # OpenClaw plugin (npm: @cortexmem/cortex-bridge)
β βββ dashboard/ # React management SPA
βββ docker-compose.yml
βββ DESIGN.md # Full technical design document
βββ cortex-provider-reference.md # LLM/Embedding provider guide
With default settings (gpt-4o-mini + text-embedding-3-small):
- ~$0.55/month at 50 conversations/day
- Scales linearly; even 3x usage stays under $2/month
- With DeepSeek + Google Embedding: as low as ~$0.10/month
Upstream bug: openclaw/openclaw#21863
Resolved β Fixed upstream in commit 72d1d36. The agent_end hook now fires correctly in streaming mode. Automatic memory extraction works in all modes.
MIT