- Remove Phase 4 RAG pipeline in favor of retrieval-only architecture - Add FastMCP server exposing search, get_chunk, list_libraries tools - Mount MCP endpoints (streamable HTTP + SSE) via Starlette in ASGI config - Update README to clarify Mnemosyne is a retrieval engine, not RAG - Let calling LLMs drive synthesis and iterative retrieval themselves
6.9 KiB
Phase 5: MCP Server
The MCP (Model Context Protocol) server exposes Mnemosyne's retrieval primitives — search, chunk fetch, and library/collection/item discovery — to LLM clients like Claude Desktop, Cursor, or any MCP-compatible agent.
This is intentionally a retrieval surface, not a RAG pipeline. The server returns ranked evidence; the calling LLM is responsible for synthesis, citation, and follow-up. If a "knowledge subagent" wrapper is ever wanted, it lives outside Mnemosyne as a thin client over these tools.
Architecture
┌──────────────────────────┐ ┌─────────────────────┐
│ Claude Desktop / Cursor │ Streamable HTTP │ uvicorn :8001 │
│ (MCP client) │ ─────────────────▶ │ mnemosyne.asgi:app │
└──────────────────────────┘ /mcp/ /mcp/sse └──────┬──────────────┘
│
▼
┌────────────────┐
│ FastMCP server │
│ + middleware │
└──────┬─────────┘
│
┌──────────────────┼─────────────────┐
▼ ▼ ▼
┌────────────────┐ ┌──────────────┐ ┌──────────────┐
│ SearchService │ │ Neo4j Cypher │ │ S3 / MinIO │
│ (Phase 3) │ │ discovery │ │ chunk text │
└────────────────┘ └──────────────┘ └──────────────┘
The MCP server runs as a separate Uvicorn ASGI process alongside the existing Django/Gunicorn WSGI process. Both processes share the same Django settings, Postgres, Neo4j, and S3 — the MCP server is a thin protocol surface, not a duplicate stack.
Tool surface
| Tool | Purpose | Returns |
|---|---|---|
search |
Hybrid retrieval: vector + full-text + concept-graph + Synesis re-ranking | Ranked candidates with chunk_uid, text_preview, score, source |
get_chunk |
Fetch the full text of a chunk by chunk_uid (preview is only ~500 chars) |
Full chunk text + parent item context |
list_libraries |
Discover libraries and their library_type |
uid, name, library_type, description |
list_collections |
Discover collections, optional library_uid filter |
uid, name, description, parent library |
list_items |
Discover indexed documents, optional collection / library filter | uid, title, item_type, chunk_count, embedding_status |
search accepts these named arguments:
query(required)library_uid,library_type,collection_uid— scoping filters (all optional, AND-combined)limit— default 20rerank— defaultTrue(Synesis cross-attention re-ranking when configured)include_images— defaultTruesearch_types— default["vector", "fulltext", "graph"]
Concept-graph traversal tools (list_concepts, get_concept_neighbors) are intentionally deferred — ship the search + discovery surface first, observe how clients use it, then expand.
Authentication
Tools calls require a Bearer token (MCPToken). Listing tools is unauthenticated so clients can discover the surface. Tokens are managed via Django admin or the management command:
python manage.py create_mcp_token --user r@helu.ca --name "Claude Desktop"
Optional flags:
--tools search,get_chunk— restrict the token to a whitelist--expires-days 30— set an expiry
The token is printed once — there's no way to retrieve it later. Revoke or set expiry in the Django admin under MCP Server → MCP tokens.
For local development you can set MCP_REQUIRE_AUTH=False in your environment to skip auth entirely. Never disable auth in production.
Running the server
# Development
uvicorn mnemosyne.asgi:app --host 127.0.0.1 --port 8001 --workers 1
# Health check
curl http://localhost:8001/mcp/health
# {"status":"ok"}
Single worker required. SSE transport keeps session state in worker memory; multi-worker deployments would route POSTs to the wrong worker.
In production, run alongside the WSGI Django process and route via a reverse proxy:
location /mcp/ {
proxy_pass http://127.0.0.1:8001;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_buffering off; # required for SSE
proxy_cache off; # required for SSE
proxy_read_timeout 300s;
}
Client configuration
Claude Desktop (claude_desktop_config.json):
{
"mcpServers": {
"mnemosyne": {
"url": "http://localhost:8001/mcp/",
"headers": {
"Authorization": "Bearer YOUR_TOKEN_HERE"
}
}
}
}
For SSE transport, change the URL to http://localhost:8001/mcp/sse/.
Observability
Prometheus metrics are exported on the WSGI Django side (/metrics):
| Metric | Labels | Purpose |
|---|---|---|
mcp_tool_invocations_total |
tool, status | Per-tool call counter |
mcp_tool_duration_seconds |
tool | Per-tool duration histogram |
mcp_auth_failures_total |
reason | Auth-rejection counter (missing token, expired, tool not allowed) |
Files
| Path | Purpose |
|---|---|
mcp_server/models.py |
MCPToken Django ORM model |
mcp_server/auth.py |
resolve_mcp_user, MCPAuthMiddleware |
mcp_server/server.py |
FastMCP instance + tool registration |
mcp_server/tools/search.py |
search, get_chunk |
mcp_server/tools/discovery.py |
list_libraries, list_collections, list_items |
mcp_server/management/commands/create_mcp_token.py |
Token bootstrap command |
mnemosyne/asgi.py |
Mounts FastMCP at /mcp and /mcp/sse |
docs/Pattern_Django-MCP_V1-00.md |
Underlying integration pattern (FastMCP + Django ASGI + bearer auth) |
Testing
TEST_NEO4J_ENABLED=0 python manage.py test mcp_server \
--testrunner=test_db_manager.django_integration.PostgreSQLTestRunner
The mcp_server test suite covers token model, auth resolution, tool registration, and the management command. It does not require Neo4j (set TEST_NEO4J_ENABLED=0) — only Postgres via the Docker-backed test runner.