feat: replace server-side RAG with MCP retrieval primitives

- Remove Phase 4 RAG pipeline in favor of retrieval-only architecture
- Add FastMCP server exposing search, get_chunk, list_libraries tools
- Mount MCP endpoints (streamable HTTP + SSE) via Starlette in ASGI config
- Update README to clarify Mnemosyne is a retrieval engine, not RAG
- Let calling LLMs drive synthesis and iterative retrieval themselves
This commit is contained in:
2026-04-26 15:34:26 -04:00
parent 388b37e471
commit 2df22941d2
30 changed files with 1180 additions and 126 deletions

144
docs/PHASE_5_MCP_SERVER.md Normal file
View File

@@ -0,0 +1,144 @@
# Phase 5: MCP Server
The MCP (Model Context Protocol) server exposes Mnemosyne's retrieval primitives — search, chunk fetch, and library/collection/item discovery — to LLM clients like Claude Desktop, Cursor, or any MCP-compatible agent.
This is intentionally a **retrieval surface, not a RAG pipeline**. The server returns ranked evidence; the calling LLM is responsible for synthesis, citation, and follow-up. If a "knowledge subagent" wrapper is ever wanted, it lives outside Mnemosyne as a thin client over these tools.
## Architecture
```
┌──────────────────────────┐ ┌─────────────────────┐
│ Claude Desktop / Cursor │ Streamable HTTP │ uvicorn :8001 │
│ (MCP client) │ ─────────────────▶ │ mnemosyne.asgi:app │
└──────────────────────────┘ /mcp/ /mcp/sse └──────┬──────────────┘
┌────────────────┐
│ FastMCP server │
│ + middleware │
└──────┬─────────┘
┌──────────────────┼─────────────────┐
▼ ▼ ▼
┌────────────────┐ ┌──────────────┐ ┌──────────────┐
│ SearchService │ │ Neo4j Cypher │ │ S3 / MinIO │
│ (Phase 3) │ │ discovery │ │ chunk text │
└────────────────┘ └──────────────┘ └──────────────┘
```
The MCP server runs as a **separate Uvicorn ASGI process** alongside the existing Django/Gunicorn WSGI process. Both processes share the same Django settings, Postgres, Neo4j, and S3 — the MCP server is a thin protocol surface, not a duplicate stack.
## Tool surface
| Tool | Purpose | Returns |
|------|---------|---------|
| `search` | Hybrid retrieval: vector + full-text + concept-graph + Synesis re-ranking | Ranked candidates with `chunk_uid`, `text_preview`, score, source |
| `get_chunk` | Fetch the full text of a chunk by `chunk_uid` (preview is only ~500 chars) | Full chunk text + parent item context |
| `list_libraries` | Discover libraries and their `library_type` | uid, name, library_type, description |
| `list_collections` | Discover collections, optional `library_uid` filter | uid, name, description, parent library |
| `list_items` | Discover indexed documents, optional collection / library filter | uid, title, item_type, chunk_count, embedding_status |
`search` accepts these named arguments:
- `query` (required)
- `library_uid`, `library_type`, `collection_uid` — scoping filters (all optional, AND-combined)
- `limit` — default 20
- `rerank` — default `True` (Synesis cross-attention re-ranking when configured)
- `include_images` — default `True`
- `search_types` — default `["vector", "fulltext", "graph"]`
Concept-graph traversal tools (`list_concepts`, `get_concept_neighbors`) are intentionally deferred — ship the search + discovery surface first, observe how clients use it, then expand.
## Authentication
Tools calls require a Bearer token (`MCPToken`). Listing tools is unauthenticated so clients can discover the surface. Tokens are managed via Django admin or the management command:
```bash
python manage.py create_mcp_token --user r@helu.ca --name "Claude Desktop"
```
Optional flags:
- `--tools search,get_chunk` — restrict the token to a whitelist
- `--expires-days 30` — set an expiry
The token is printed once — there's no way to retrieve it later. Revoke or set expiry in the Django admin under **MCP Server → MCP tokens**.
For local development you can set `MCP_REQUIRE_AUTH=False` in your environment to skip auth entirely. **Never disable auth in production.**
## Running the server
```bash
# Development
uvicorn mnemosyne.asgi:app --host 127.0.0.1 --port 8001 --workers 1
# Health check
curl http://localhost:8001/mcp/health
# {"status":"ok"}
```
**Single worker required.** SSE transport keeps session state in worker memory; multi-worker deployments would route POSTs to the wrong worker.
In production, run alongside the WSGI Django process and route via a reverse proxy:
```nginx
location /mcp/ {
proxy_pass http://127.0.0.1:8001;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_buffering off; # required for SSE
proxy_cache off; # required for SSE
proxy_read_timeout 300s;
}
```
## Client configuration
Claude Desktop (`claude_desktop_config.json`):
```json
{
"mcpServers": {
"mnemosyne": {
"url": "http://localhost:8001/mcp/",
"headers": {
"Authorization": "Bearer YOUR_TOKEN_HERE"
}
}
}
}
```
For SSE transport, change the URL to `http://localhost:8001/mcp/sse/`.
## Observability
Prometheus metrics are exported on the WSGI Django side (`/metrics`):
| Metric | Labels | Purpose |
|--------|--------|---------|
| `mcp_tool_invocations_total` | tool, status | Per-tool call counter |
| `mcp_tool_duration_seconds` | tool | Per-tool duration histogram |
| `mcp_auth_failures_total` | reason | Auth-rejection counter (missing token, expired, tool not allowed) |
## Files
| Path | Purpose |
|------|---------|
| `mcp_server/models.py` | `MCPToken` Django ORM model |
| `mcp_server/auth.py` | `resolve_mcp_user`, `MCPAuthMiddleware` |
| `mcp_server/server.py` | FastMCP instance + tool registration |
| `mcp_server/tools/search.py` | `search`, `get_chunk` |
| `mcp_server/tools/discovery.py` | `list_libraries`, `list_collections`, `list_items` |
| `mcp_server/management/commands/create_mcp_token.py` | Token bootstrap command |
| `mnemosyne/asgi.py` | Mounts FastMCP at `/mcp` and `/mcp/sse` |
| `docs/Pattern_Django-MCP_V1-00.md` | Underlying integration pattern (FastMCP + Django ASGI + bearer auth) |
## Testing
```bash
TEST_NEO4J_ENABLED=0 python manage.py test mcp_server \
--testrunner=test_db_manager.django_integration.PostgreSQLTestRunner
```
The mcp_server test suite covers token model, auth resolution, tool registration, and the management command. It does not require Neo4j (set `TEST_NEO4J_ENABLED=0`) — only Postgres via the Docker-backed test runner.

View File

@@ -1,96 +0,0 @@
## Red Panda Approval™
This project follows Red Panda Approval standards - our gold standard for Django application quality. Code must be elegant, reliable, and maintainable to earn the approval of our adorable red panda judges.
### The 5 Sacred Django Criteria
1. **Fresh Migration Test** - Clean migrations from empty database
2. **Elegant Simplicity** - No unnecessary complexity
3. **Observable & Debuggable** - Proper logging and error handling
4. **Consistent Patterns** - Follow Django conventions
5. **Actually Works** - Passes all checks and serves real user needs
### Standards
# Environment
Virtual environment: ~/env/PROJECT/bin/activate
Python version: 3.12
# Code Organization
Maximum file length: 1000 lines
CSS: External .css files only (no inline/embedded)
JS: External .js files only (no inline/embedded)
# Required Packages
- Bootstrap 5.x (no custom CSS unless absolutely necessary)
- Bootstrap Icons (no emojis)
- django-crispy-forms + crispy-bootstrap5
- django-allauth
# Testing
Framework: Django TestCase (not pytest)
Minimum coverage: XX%? (optional)
### Database Conventions
# Development vs Production
- Development: SQLite
- Production: PostgreSQL
- Use dj-database-url for configuration
# Model Naming
- Model names: singular PascalCase (User, BlogPost, OrderItem)
- Related names: plural snake_case with proper English pluralization
- user.blog_posts, order.items
- category.industries (not industrys)
- person.children (not childs)
- analysis.analyses (not analysiss)
- Through tables: describe relationship (ProjectMembership, CourseEnrollment)
# Field Naming
- Foreign keys: singular without _id suffix (author, category, parent)
- Boolean fields: use prefixes (is_active, has_permission, can_edit)
- Date fields: use suffixes (created_at, updated_at, published_on)
- Avoid abbreviations (use description, not desc)
# Required Model Fields
All models should include:
- created_at = models.DateTimeField(auto_now_add=True)
- updated_at = models.DateTimeField(auto_now=True)
Consider adding:
- id = models.UUIDField(primary_key=True) for public-facing models
- is_active = models.BooleanField(default=True) for soft deletes
# Indexing
- Add db_index=True to frequently queried fields
- Use Meta.indexes for composite indexes
- Document why each index exists
# Migrations
- Never edit migrations that have been deployed
- Use meaningful migration names: --name add_email_to_profile
- One logical change per migration when possible
- Test migrations both forward and backward
# Queries
- Use select_related() for foreign keys
- Use prefetch_related() for reverse relations and M2M
- Avoid queries in loops (N+1 problem)
- Use .only() and .defer() for large models
- Add comments explaining complex querysets
## Monitoring & Health Check Endpoints
Follow standard Kubernetes health check endpoints for container orchestration:
### /ready/ - Readiness probe checks if the application is ready to serve traffic
Validates database connectivity
Validates cache connectivity
Returns 200 if ready, 503 if dependencies are unavailable
Used by load balancers to determine if pod should receive traffic
### /live/ - Liveness probe checks if the application process is alive
Simple health check with minimal logic
Returns 200 if Django is responding to requests
Used by Kubernetes to determine if pod should be restarted
Note: For detailed metrics and monitoring, use Prometheus and Alloy integration rather than custom health endpoints.