Files

Robert Helewka 2df22941d2 feat: replace server-side RAG with MCP retrieval primitives

- Remove Phase 4 RAG pipeline in favor of retrieval-only architecture
- Add FastMCP server exposing search, get_chunk, list_libraries tools
- Mount MCP endpoints (streamable HTTP + SSE) via Starlette in ASGI config
- Update README to clarify Mnemosyne is a retrieval engine, not RAG
- Let calling LLMs drive synthesis and iterative retrieval themselves

2026-04-26 15:34:26 -04:00

6.9 KiB

Raw Blame History

Phase 5: MCP Server

The MCP (Model Context Protocol) server exposes Mnemosyne's retrieval primitives — search, chunk fetch, and library/collection/item discovery — to LLM clients like Claude Desktop, Cursor, or any MCP-compatible agent.

This is intentionally a retrieval surface, not a RAG pipeline. The server returns ranked evidence; the calling LLM is responsible for synthesis, citation, and follow-up. If a "knowledge subagent" wrapper is ever wanted, it lives outside Mnemosyne as a thin client over these tools.

Architecture

┌──────────────────────────┐                    ┌─────────────────────┐
│ Claude Desktop / Cursor  │  Streamable HTTP   │  uvicorn :8001      │
│ (MCP client)             │ ─────────────────▶ │  mnemosyne.asgi:app │
└──────────────────────────┘   /mcp/  /mcp/sse  └──────┬──────────────┘
                                                       │
                                                       ▼
                                              ┌────────────────┐
                                              │ FastMCP server │
                                              │ + middleware   │
                                              └──────┬─────────┘
                                                     │
                                  ┌──────────────────┼─────────────────┐
                                  ▼                  ▼                 ▼
                         ┌────────────────┐  ┌──────────────┐  ┌──────────────┐
                         │ SearchService  │  │ Neo4j Cypher │  │ S3 / MinIO   │
                         │ (Phase 3)      │  │ discovery    │  │ chunk text   │
                         └────────────────┘  └──────────────┘  └──────────────┘

The MCP server runs as a separate Uvicorn ASGI process alongside the existing Django/Gunicorn WSGI process. Both processes share the same Django settings, Postgres, Neo4j, and S3 — the MCP server is a thin protocol surface, not a duplicate stack.

Tool surface

Tool	Purpose	Returns
`search`	Hybrid retrieval: vector + full-text + concept-graph + Synesis re-ranking	Ranked candidates with `chunk_uid`, `text_preview`, score, source
`get_chunk`	Fetch the full text of a chunk by `chunk_uid` (preview is only ~500 chars)	Full chunk text + parent item context
`list_libraries`	Discover libraries and their `library_type`	uid, name, library_type, description
`list_collections`	Discover collections, optional `library_uid` filter	uid, name, description, parent library
`list_items`	Discover indexed documents, optional collection / library filter	uid, title, item_type, chunk_count, embedding_status

search accepts these named arguments:

query (required)
library_uid, library_type, collection_uid — scoping filters (all optional, AND-combined)
limit — default 20
rerank — default True (Synesis cross-attention re-ranking when configured)
include_images — default True
search_types — default ["vector", "fulltext", "graph"]

Concept-graph traversal tools (list_concepts, get_concept_neighbors) are intentionally deferred — ship the search + discovery surface first, observe how clients use it, then expand.

Authentication

Tools calls require a Bearer token (MCPToken). Listing tools is unauthenticated so clients can discover the surface. Tokens are managed via Django admin or the management command:

python manage.py create_mcp_token --user r@helu.ca --name "Claude Desktop"

Optional flags:

--tools search,get_chunk — restrict the token to a whitelist
--expires-days 30 — set an expiry

The token is printed once — there's no way to retrieve it later. Revoke or set expiry in the Django admin under MCP Server → MCP tokens.

For local development you can set MCP_REQUIRE_AUTH=False in your environment to skip auth entirely. Never disable auth in production.

Running the server

# Development
uvicorn mnemosyne.asgi:app --host 127.0.0.1 --port 8001 --workers 1

# Health check
curl http://localhost:8001/mcp/health
# {"status":"ok"}

Single worker required. SSE transport keeps session state in worker memory; multi-worker deployments would route POSTs to the wrong worker.

In production, run alongside the WSGI Django process and route via a reverse proxy:

location /mcp/ {
    proxy_pass http://127.0.0.1:8001;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_buffering off;            # required for SSE
    proxy_cache off;                # required for SSE
    proxy_read_timeout 300s;
}

Client configuration

Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "mnemosyne": {
      "url": "http://localhost:8001/mcp/",
      "headers": {
        "Authorization": "Bearer YOUR_TOKEN_HERE"
      }
    }
  }
}

For SSE transport, change the URL to http://localhost:8001/mcp/sse/.

Observability

Prometheus metrics are exported on the WSGI Django side (/metrics):

Metric	Labels	Purpose
`mcp_tool_invocations_total`	tool, status	Per-tool call counter
`mcp_tool_duration_seconds`	tool	Per-tool duration histogram
`mcp_auth_failures_total`	reason	Auth-rejection counter (missing token, expired, tool not allowed)

Files

Path	Purpose
`mcp_server/models.py`	`MCPToken` Django ORM model
`mcp_server/auth.py`	`resolve_mcp_user`, `MCPAuthMiddleware`
`mcp_server/server.py`	FastMCP instance + tool registration
`mcp_server/tools/search.py`	`search`, `get_chunk`
`mcp_server/tools/discovery.py`	`list_libraries`, `list_collections`, `list_items`
`mcp_server/management/commands/create_mcp_token.py`	Token bootstrap command
`mnemosyne/asgi.py`	Mounts FastMCP at `/mcp` and `/mcp/sse`
`docs/Pattern_Django-MCP_V1-00.md`	Underlying integration pattern (FastMCP + Django ASGI + bearer auth)

Testing

TEST_NEO4J_ENABLED=0 python manage.py test mcp_server \
    --testrunner=test_db_manager.django_integration.PostgreSQLTestRunner

The mcp_server test suite covers token model, auth resolution, tool registration, and the management command. It does not require Neo4j (set TEST_NEO4J_ENABLED=0) — only Postgres via the Docker-backed test runner.

6.9 KiB Raw Blame History