# Phase 5: MCP Server The MCP (Model Context Protocol) server exposes Mnemosyne's retrieval primitives — search, chunk fetch, and library/collection/item discovery — to LLM clients like Claude Desktop, Cursor, or any MCP-compatible agent. This is intentionally a **retrieval surface, not a RAG pipeline**. The server returns ranked evidence; the calling LLM is responsible for synthesis, citation, and follow-up. If a "knowledge subagent" wrapper is ever wanted, it lives outside Mnemosyne as a thin client over these tools. ## Architecture ``` ┌──────────────────────────┐ ┌─────────────────────┐ │ Claude Desktop / Cursor │ Streamable HTTP │ uvicorn :8001 │ │ (MCP client) │ ─────────────────▶ │ mnemosyne.asgi:app │ └──────────────────────────┘ /mcp/ /mcp/sse └──────┬──────────────┘ │ ▼ ┌────────────────┐ │ FastMCP server │ │ + middleware │ └──────┬─────────┘ │ ┌──────────────────┼─────────────────┐ ▼ ▼ ▼ ┌────────────────┐ ┌──────────────┐ ┌──────────────┐ │ SearchService │ │ Neo4j Cypher │ │ S3 / MinIO │ │ (Phase 3) │ │ discovery │ │ chunk text │ └────────────────┘ └──────────────┘ └──────────────┘ ``` The MCP server runs as a **separate Uvicorn ASGI process** alongside the existing Django/Gunicorn WSGI process. Both processes share the same Django settings, Postgres, Neo4j, and S3 — the MCP server is a thin protocol surface, not a duplicate stack. ## Tool surface | Tool | Purpose | Returns | |------|---------|---------| | `search` | Hybrid retrieval: vector + full-text + concept-graph + Synesis re-ranking | Ranked candidates with `chunk_uid`, `text_preview`, score, source | | `get_chunk` | Fetch the full text of a chunk by `chunk_uid` (preview is only ~500 chars) | Full chunk text + parent item context | | `list_libraries` | Discover libraries and their `library_type` | uid, name, library_type, description | | `list_collections` | Discover collections, optional `library_uid` filter | uid, name, description, parent library | | `list_items` | Discover indexed documents, optional collection / library filter | uid, title, item_type, chunk_count, embedding_status | `search` accepts these named arguments: - `query` (required) - `library_uid`, `library_type`, `collection_uid` — scoping filters (all optional, AND-combined) - `limit` — default 20 - `rerank` — default `True` (Synesis cross-attention re-ranking when configured) - `include_images` — default `True` - `search_types` — default `["vector", "fulltext", "graph"]` Concept-graph traversal tools (`list_concepts`, `get_concept_neighbors`) are intentionally deferred — ship the search + discovery surface first, observe how clients use it, then expand. ## Authentication Tools calls require a Bearer token (`MCPToken`). Listing tools is unauthenticated so clients can discover the surface. Tokens are managed via Django admin or the management command: ```bash python manage.py create_mcp_token --user r@helu.ca --name "Claude Desktop" ``` Optional flags: - `--tools search,get_chunk` — restrict the token to a whitelist - `--expires-days 30` — set an expiry The token is printed once — there's no way to retrieve it later. Revoke or set expiry in the Django admin under **MCP Server → MCP tokens**. For local development you can set `MCP_REQUIRE_AUTH=False` in your environment to skip auth entirely. **Never disable auth in production.** ## Running the server ```bash # Development uvicorn mnemosyne.asgi:app --host 127.0.0.1 --port 8001 --workers 1 # Health check curl http://localhost:8001/mcp/health # {"status":"ok"} ``` **Single worker required.** SSE transport keeps session state in worker memory; multi-worker deployments would route POSTs to the wrong worker. In production, run alongside the WSGI Django process and route via a reverse proxy: ```nginx location /mcp/ { proxy_pass http://127.0.0.1:8001; proxy_http_version 1.1; proxy_set_header Host $host; proxy_buffering off; # required for SSE proxy_cache off; # required for SSE proxy_read_timeout 300s; } ``` ## Client configuration Claude Desktop (`claude_desktop_config.json`): ```json { "mcpServers": { "mnemosyne": { "url": "http://localhost:8001/mcp/", "headers": { "Authorization": "Bearer YOUR_TOKEN_HERE" } } } } ``` For SSE transport, change the URL to `http://localhost:8001/mcp/sse/`. ## Observability Prometheus metrics are exported on the WSGI Django side (`/metrics`): | Metric | Labels | Purpose | |--------|--------|---------| | `mcp_tool_invocations_total` | tool, status | Per-tool call counter | | `mcp_tool_duration_seconds` | tool | Per-tool duration histogram | | `mcp_auth_failures_total` | reason | Auth-rejection counter (missing token, expired, tool not allowed) | ## Files | Path | Purpose | |------|---------| | `mcp_server/models.py` | `MCPToken` Django ORM model | | `mcp_server/auth.py` | `resolve_mcp_user`, `MCPAuthMiddleware` | | `mcp_server/server.py` | FastMCP instance + tool registration | | `mcp_server/tools/search.py` | `search`, `get_chunk` | | `mcp_server/tools/discovery.py` | `list_libraries`, `list_collections`, `list_items` | | `mcp_server/management/commands/create_mcp_token.py` | Token bootstrap command | | `mnemosyne/asgi.py` | Mounts FastMCP at `/mcp` and `/mcp/sse` | | `docs/Pattern_Django-MCP_V1-00.md` | Underlying integration pattern (FastMCP + Django ASGI + bearer auth) | ## Testing ```bash TEST_NEO4J_ENABLED=0 python manage.py test mcp_server \ --testrunner=test_db_manager.django_integration.PostgreSQLTestRunner ``` The mcp_server test suite covers token model, auth resolution, tool registration, and the management command. It does not require Neo4j (set `TEST_NEO4J_ENABLED=0`) — only Postgres via the Docker-backed test runner.