feat: replace server-side RAG with MCP retrieval primitives
- Remove Phase 4 RAG pipeline in favor of retrieval-only architecture - Add FastMCP server exposing search, get_chunk, list_libraries tools - Mount MCP endpoints (streamable HTTP + SSE) via Starlette in ASGI config - Update README to clarify Mnemosyne is a retrieval engine, not RAG - Let calling LLMs drive synthesis and iterative retrieval themselves
This commit is contained in:
144
docs/PHASE_5_MCP_SERVER.md
Normal file
144
docs/PHASE_5_MCP_SERVER.md
Normal file
@@ -0,0 +1,144 @@
|
||||
# Phase 5: MCP Server
|
||||
|
||||
The MCP (Model Context Protocol) server exposes Mnemosyne's retrieval primitives — search, chunk fetch, and library/collection/item discovery — to LLM clients like Claude Desktop, Cursor, or any MCP-compatible agent.
|
||||
|
||||
This is intentionally a **retrieval surface, not a RAG pipeline**. The server returns ranked evidence; the calling LLM is responsible for synthesis, citation, and follow-up. If a "knowledge subagent" wrapper is ever wanted, it lives outside Mnemosyne as a thin client over these tools.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌──────────────────────────┐ ┌─────────────────────┐
|
||||
│ Claude Desktop / Cursor │ Streamable HTTP │ uvicorn :8001 │
|
||||
│ (MCP client) │ ─────────────────▶ │ mnemosyne.asgi:app │
|
||||
└──────────────────────────┘ /mcp/ /mcp/sse └──────┬──────────────┘
|
||||
│
|
||||
▼
|
||||
┌────────────────┐
|
||||
│ FastMCP server │
|
||||
│ + middleware │
|
||||
└──────┬─────────┘
|
||||
│
|
||||
┌──────────────────┼─────────────────┐
|
||||
▼ ▼ ▼
|
||||
┌────────────────┐ ┌──────────────┐ ┌──────────────┐
|
||||
│ SearchService │ │ Neo4j Cypher │ │ S3 / MinIO │
|
||||
│ (Phase 3) │ │ discovery │ │ chunk text │
|
||||
└────────────────┘ └──────────────┘ └──────────────┘
|
||||
```
|
||||
|
||||
The MCP server runs as a **separate Uvicorn ASGI process** alongside the existing Django/Gunicorn WSGI process. Both processes share the same Django settings, Postgres, Neo4j, and S3 — the MCP server is a thin protocol surface, not a duplicate stack.
|
||||
|
||||
## Tool surface
|
||||
|
||||
| Tool | Purpose | Returns |
|
||||
|------|---------|---------|
|
||||
| `search` | Hybrid retrieval: vector + full-text + concept-graph + Synesis re-ranking | Ranked candidates with `chunk_uid`, `text_preview`, score, source |
|
||||
| `get_chunk` | Fetch the full text of a chunk by `chunk_uid` (preview is only ~500 chars) | Full chunk text + parent item context |
|
||||
| `list_libraries` | Discover libraries and their `library_type` | uid, name, library_type, description |
|
||||
| `list_collections` | Discover collections, optional `library_uid` filter | uid, name, description, parent library |
|
||||
| `list_items` | Discover indexed documents, optional collection / library filter | uid, title, item_type, chunk_count, embedding_status |
|
||||
|
||||
`search` accepts these named arguments:
|
||||
|
||||
- `query` (required)
|
||||
- `library_uid`, `library_type`, `collection_uid` — scoping filters (all optional, AND-combined)
|
||||
- `limit` — default 20
|
||||
- `rerank` — default `True` (Synesis cross-attention re-ranking when configured)
|
||||
- `include_images` — default `True`
|
||||
- `search_types` — default `["vector", "fulltext", "graph"]`
|
||||
|
||||
Concept-graph traversal tools (`list_concepts`, `get_concept_neighbors`) are intentionally deferred — ship the search + discovery surface first, observe how clients use it, then expand.
|
||||
|
||||
## Authentication
|
||||
|
||||
Tools calls require a Bearer token (`MCPToken`). Listing tools is unauthenticated so clients can discover the surface. Tokens are managed via Django admin or the management command:
|
||||
|
||||
```bash
|
||||
python manage.py create_mcp_token --user r@helu.ca --name "Claude Desktop"
|
||||
```
|
||||
|
||||
Optional flags:
|
||||
|
||||
- `--tools search,get_chunk` — restrict the token to a whitelist
|
||||
- `--expires-days 30` — set an expiry
|
||||
|
||||
The token is printed once — there's no way to retrieve it later. Revoke or set expiry in the Django admin under **MCP Server → MCP tokens**.
|
||||
|
||||
For local development you can set `MCP_REQUIRE_AUTH=False` in your environment to skip auth entirely. **Never disable auth in production.**
|
||||
|
||||
## Running the server
|
||||
|
||||
```bash
|
||||
# Development
|
||||
uvicorn mnemosyne.asgi:app --host 127.0.0.1 --port 8001 --workers 1
|
||||
|
||||
# Health check
|
||||
curl http://localhost:8001/mcp/health
|
||||
# {"status":"ok"}
|
||||
```
|
||||
|
||||
**Single worker required.** SSE transport keeps session state in worker memory; multi-worker deployments would route POSTs to the wrong worker.
|
||||
|
||||
In production, run alongside the WSGI Django process and route via a reverse proxy:
|
||||
|
||||
```nginx
|
||||
location /mcp/ {
|
||||
proxy_pass http://127.0.0.1:8001;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Host $host;
|
||||
proxy_buffering off; # required for SSE
|
||||
proxy_cache off; # required for SSE
|
||||
proxy_read_timeout 300s;
|
||||
}
|
||||
```
|
||||
|
||||
## Client configuration
|
||||
|
||||
Claude Desktop (`claude_desktop_config.json`):
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"mnemosyne": {
|
||||
"url": "http://localhost:8001/mcp/",
|
||||
"headers": {
|
||||
"Authorization": "Bearer YOUR_TOKEN_HERE"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
For SSE transport, change the URL to `http://localhost:8001/mcp/sse/`.
|
||||
|
||||
## Observability
|
||||
|
||||
Prometheus metrics are exported on the WSGI Django side (`/metrics`):
|
||||
|
||||
| Metric | Labels | Purpose |
|
||||
|--------|--------|---------|
|
||||
| `mcp_tool_invocations_total` | tool, status | Per-tool call counter |
|
||||
| `mcp_tool_duration_seconds` | tool | Per-tool duration histogram |
|
||||
| `mcp_auth_failures_total` | reason | Auth-rejection counter (missing token, expired, tool not allowed) |
|
||||
|
||||
## Files
|
||||
|
||||
| Path | Purpose |
|
||||
|------|---------|
|
||||
| `mcp_server/models.py` | `MCPToken` Django ORM model |
|
||||
| `mcp_server/auth.py` | `resolve_mcp_user`, `MCPAuthMiddleware` |
|
||||
| `mcp_server/server.py` | FastMCP instance + tool registration |
|
||||
| `mcp_server/tools/search.py` | `search`, `get_chunk` |
|
||||
| `mcp_server/tools/discovery.py` | `list_libraries`, `list_collections`, `list_items` |
|
||||
| `mcp_server/management/commands/create_mcp_token.py` | Token bootstrap command |
|
||||
| `mnemosyne/asgi.py` | Mounts FastMCP at `/mcp` and `/mcp/sse` |
|
||||
| `docs/Pattern_Django-MCP_V1-00.md` | Underlying integration pattern (FastMCP + Django ASGI + bearer auth) |
|
||||
|
||||
## Testing
|
||||
|
||||
```bash
|
||||
TEST_NEO4J_ENABLED=0 python manage.py test mcp_server \
|
||||
--testrunner=test_db_manager.django_integration.PostgreSQLTestRunner
|
||||
```
|
||||
|
||||
The mcp_server test suite covers token model, auth resolution, tool registration, and the management command. It does not require Neo4j (set `TEST_NEO4J_ENABLED=0`) — only Postgres via the Docker-backed test runner.
|
||||
@@ -1,96 +0,0 @@
|
||||
## Red Panda Approval™
|
||||
|
||||
This project follows Red Panda Approval standards - our gold standard for Django application quality. Code must be elegant, reliable, and maintainable to earn the approval of our adorable red panda judges.
|
||||
|
||||
### The 5 Sacred Django Criteria
|
||||
|
||||
1. **Fresh Migration Test** - Clean migrations from empty database
|
||||
2. **Elegant Simplicity** - No unnecessary complexity
|
||||
3. **Observable & Debuggable** - Proper logging and error handling
|
||||
4. **Consistent Patterns** - Follow Django conventions
|
||||
5. **Actually Works** - Passes all checks and serves real user needs
|
||||
|
||||
### Standards
|
||||
|
||||
# Environment
|
||||
Virtual environment: ~/env/PROJECT/bin/activate
|
||||
Python version: 3.12
|
||||
|
||||
# Code Organization
|
||||
Maximum file length: 1000 lines
|
||||
CSS: External .css files only (no inline/embedded)
|
||||
JS: External .js files only (no inline/embedded)
|
||||
|
||||
# Required Packages
|
||||
- Bootstrap 5.x (no custom CSS unless absolutely necessary)
|
||||
- Bootstrap Icons (no emojis)
|
||||
- django-crispy-forms + crispy-bootstrap5
|
||||
- django-allauth
|
||||
|
||||
# Testing
|
||||
Framework: Django TestCase (not pytest)
|
||||
Minimum coverage: XX%? (optional)
|
||||
|
||||
### Database Conventions
|
||||
|
||||
# Development vs Production
|
||||
- Development: SQLite
|
||||
- Production: PostgreSQL
|
||||
- Use dj-database-url for configuration
|
||||
|
||||
# Model Naming
|
||||
- Model names: singular PascalCase (User, BlogPost, OrderItem)
|
||||
- Related names: plural snake_case with proper English pluralization
|
||||
- user.blog_posts, order.items
|
||||
- category.industries (not industrys)
|
||||
- person.children (not childs)
|
||||
- analysis.analyses (not analysiss)
|
||||
- Through tables: describe relationship (ProjectMembership, CourseEnrollment)
|
||||
|
||||
# Field Naming
|
||||
- Foreign keys: singular without _id suffix (author, category, parent)
|
||||
- Boolean fields: use prefixes (is_active, has_permission, can_edit)
|
||||
- Date fields: use suffixes (created_at, updated_at, published_on)
|
||||
- Avoid abbreviations (use description, not desc)
|
||||
|
||||
# Required Model Fields
|
||||
All models should include:
|
||||
- created_at = models.DateTimeField(auto_now_add=True)
|
||||
- updated_at = models.DateTimeField(auto_now=True)
|
||||
|
||||
Consider adding:
|
||||
- id = models.UUIDField(primary_key=True) for public-facing models
|
||||
- is_active = models.BooleanField(default=True) for soft deletes
|
||||
|
||||
# Indexing
|
||||
- Add db_index=True to frequently queried fields
|
||||
- Use Meta.indexes for composite indexes
|
||||
- Document why each index exists
|
||||
|
||||
# Migrations
|
||||
- Never edit migrations that have been deployed
|
||||
- Use meaningful migration names: --name add_email_to_profile
|
||||
- One logical change per migration when possible
|
||||
- Test migrations both forward and backward
|
||||
|
||||
# Queries
|
||||
- Use select_related() for foreign keys
|
||||
- Use prefetch_related() for reverse relations and M2M
|
||||
- Avoid queries in loops (N+1 problem)
|
||||
- Use .only() and .defer() for large models
|
||||
- Add comments explaining complex querysets
|
||||
|
||||
## Monitoring & Health Check Endpoints
|
||||
Follow standard Kubernetes health check endpoints for container orchestration:
|
||||
|
||||
### /ready/ - Readiness probe checks if the application is ready to serve traffic
|
||||
Validates database connectivity
|
||||
Validates cache connectivity
|
||||
Returns 200 if ready, 503 if dependencies are unavailable
|
||||
Used by load balancers to determine if pod should receive traffic
|
||||
|
||||
### /live/ - Liveness probe checks if the application process is alive
|
||||
Simple health check with minimal logic
|
||||
Returns 200 if Django is responding to requests
|
||||
Used by Kubernetes to determine if pod should be restarted
|
||||
Note: For detailed metrics and monitoring, use Prometheus and Alloy integration rather than custom health endpoints.
|
||||
Reference in New Issue
Block a user