feat: replace server-side RAG with MCP retrieval primitives
- Remove Phase 4 RAG pipeline in favor of retrieval-only architecture - Add FastMCP server exposing search, get_chunk, list_libraries tools - Mount MCP endpoints (streamable HTTP + SSE) via Starlette in ASGI config - Update README to clarify Mnemosyne is a retrieval engine, not RAG - Let calling LLMs drive synthesis and iterative retrieval themselves
This commit is contained in:
15
README.md
15
README.md
@@ -47,8 +47,8 @@ This **content-type awareness** flows through every layer: chunking strategy, em
|
||||
|
||||
```
|
||||
Query → Vector Search (Neo4j) + Graph Traversal (Cypher) + Full-Text Search
|
||||
→ Candidate Fusion → Qwen3-VL Re-ranking → Content-Type Context Injection
|
||||
→ LLM Response with Citations
|
||||
→ Candidate Fusion → Qwen3-VL Re-ranking → Ranked Chunks + Metadata
|
||||
→ MCP tool result (the calling LLM does its own synthesis)
|
||||
```
|
||||
|
||||
## Heritage
|
||||
@@ -82,14 +82,21 @@ celery -A mnemosyne flower --port=5555 # Web monitoring UI
|
||||
|
||||
See [Phase 2: Celery Workers & Scheduler](docs/PHASE_2_EMBEDDING_PIPELINE.md#celery-workers--scheduler) for full details on queues, reliability settings, and task progress tracking.
|
||||
|
||||
## Architecture Note: Retrieval, Not Synthesis
|
||||
|
||||
Mnemosyne is a **retrieval engine**, not a RAG pipeline. It stores, embeds, and ranks — it does not synthesize answers.
|
||||
|
||||
The earlier roadmap had a server-side RAG layer that took a query and returned a written answer with citations. That layer has been removed. Calling LLMs (Claude via MCP, principally) are perfectly capable of driving iterative retrieval themselves when given the right primitives, and a server-side synthesis hop adds latency, cost, and a place where errors are harder to debug. Letting the calling LLM see chunks directly — and follow citations, pivot mid-search, or call `get_chunk` for full text — beats pre-digesting them.
|
||||
|
||||
If a "knowledge subagent" is ever wanted (a wrapper that takes a question and returns a written answer), it lives **outside** Mnemosyne as a thin client over the MCP tools, with its own system prompt. No coupling, no extra inference hop inside the server, and the subagent's behavior can iterate independently.
|
||||
|
||||
## Documentation
|
||||
|
||||
- **[Architecture Documentation](docs/mnemosyne.html)** — Full system architecture with diagrams
|
||||
- **[Phase 1: Foundation](docs/PHASE_1_FOUNDATION.md)** — Project skeleton, Neo4j data model, content-type system
|
||||
- **[Phase 2: Embedding Pipeline](docs/PHASE_2_EMBEDDING_PIPELINE.md)** — Qwen3-VL multimodal embedding
|
||||
- **[Phase 3: Search & Re-ranking](docs/PHASE_3_SEARCH_AND_RERANKING.md)** — Hybrid search + re-ranker
|
||||
- **[Phase 4: RAG Pipeline](docs/PHASE_4_RAG_PIPELINE.md)** — Content-type-aware generation
|
||||
- **[Phase 5: MCP Server](docs/PHASE_5_MCP_SERVER.md)** — LLM integration interface
|
||||
- **[Phase 5: MCP Server](docs/PHASE_5_MCP_SERVER.md)** — Retrieval primitives for LLMs (`search`, `get_chunk`, `list_libraries`, …)
|
||||
- **[Phase 6: Backport to Spelunker](docs/PHASE_6_BACKPORT_TO_SPELUNKER.md)** — Proven patterns flowing back
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user