- Remove Phase 4 RAG pipeline in favor of retrieval-only architecture - Add FastMCP server exposing search, get_chunk, list_libraries tools - Mount MCP endpoints (streamable HTTP + SSE) via Starlette in ASGI config - Update README to clarify Mnemosyne is a retrieval engine, not RAG - Let calling LLMs drive synthesis and iterative retrieval themselves
103 lines
6.3 KiB
Markdown
103 lines
6.3 KiB
Markdown
# Mnemosyne
|
|
|
|
*"The electric light did not come from the continuous improvement of candles."* — Oren Harari
|
|
|
|
**The memory of everything you know.**
|
|
|
|
Mnemosyne is a content-type-aware, multimodal personal knowledge management system built on Neo4j knowledge graphs and Qwen3-VL multimodal AI models. Named after the Titan goddess of memory and mother of the nine Muses, Mnemosyne doesn't just store your knowledge — it understands what kind of knowledge it is, connects it through relationships, and makes it all searchable through text, images, and natural language.
|
|
|
|
## What Makes This Different
|
|
|
|
Every existing knowledge base tool treats all documents identically: text in, chunks out, vectors stored. A novel and a PostgreSQL manual get the same treatment.
|
|
|
|
Mnemosyne knows the difference:
|
|
|
|
- **A textbook** has chapters, an index, technical terminology, and pedagogical structure. It's chunked accordingly, and when an LLM retrieves results, it knows this is instructional content.
|
|
- **A novel** has narrative flow, characters, plot arcs, dialogue. The LLM knows to interpret results as creative fiction.
|
|
- **Album artwork** is a visual asset tied to an artist, genre, and era. It's embedded multimodally — searchable by both image similarity and text description.
|
|
- **A journal entry** is personal, temporal, reflective. The LLM treats it differently than a reference manual.
|
|
|
|
This **content-type awareness** flows through every layer: chunking strategy, embedding instructions, re-ranking, and the final LLM prompt.
|
|
|
|
## Core Architecture
|
|
|
|
| Component | Technology | Purpose |
|
|
|-----------|-----------|---------|
|
|
| **Knowledge Graph** | Neo4j 5.x | Relationships + vector storage (no dimension limits) |
|
|
| **Multimodal Embeddings** | Qwen3-VL-Embedding-8B | Text + image + video in unified vector space (4096d) |
|
|
| **Multimodal Re-ranking** | Synesis (Qwen3-VL-Reranker-2B) | Cross-attention precision scoring via `/v1/rerank` |
|
|
| **Web Framework** | Django 5.x + DRF | Auth, admin, API, content management |
|
|
| **Object Storage** | S3/MinIO | Original content + chunk text storage |
|
|
| **Async Processing** | Celery + RabbitMQ | Document embedding, graph construction |
|
|
| **LLM Interface** | MCP Server | Primary interface for Claude, Copilot, etc. |
|
|
| **GPU Serving** | vLLM + llama.cpp | Local model inference |
|
|
|
|
## Library Types
|
|
|
|
| Library | Example Content | Multimodal? | Graph Relationships |
|
|
|---------|----------------|-------------|-------------------|
|
|
| **Fiction** | Novels, short stories | Cover art | Author → Book → Character → Theme |
|
|
| **Technical** | Textbooks, manuals, docs | Diagrams, screenshots | Product → Manual → Section → Procedure |
|
|
| **Music** | Lyrics, liner notes | Album artwork | Artist → Album → Track → Genre |
|
|
| **Film** | Scripts, synopses | Stills, posters | Director → Film → Scene → Actor |
|
|
| **Art** | Descriptions, catalogs | The artwork itself | Artist → Piece → Style → Movement |
|
|
| **Journals** | Personal entries | Photos | Date → Entry → Topic → Person/Place |
|
|
|
|
## Search Pipeline
|
|
|
|
```
|
|
Query → Vector Search (Neo4j) + Graph Traversal (Cypher) + Full-Text Search
|
|
→ Candidate Fusion → Qwen3-VL Re-ranking → Ranked Chunks + Metadata
|
|
→ MCP tool result (the calling LLM does its own synthesis)
|
|
```
|
|
|
|
## Heritage
|
|
|
|
Mnemosyne's RAG pipeline architecture is inspired by [Spelunker](https://git.helu.ca/r/spelunker), an enterprise RFP response platform. The proven patterns — hybrid search, two-stage RAG (responder + reviewer), citation-based retrieval, and async document processing — are carried forward and enhanced with multimodal capabilities and knowledge graph relationships.
|
|
|
|
## Running Celery Workers
|
|
|
|
Mnemosyne uses Celery with RabbitMQ for async document embedding. From the `mnemosyne/` directory:
|
|
|
|
```bash
|
|
# Development — single worker, all queues
|
|
celery -A mnemosyne worker -l info -Q celery,embedding,batch
|
|
|
|
# Or skip workers entirely with eager mode (.env):
|
|
CELERY_TASK_ALWAYS_EAGER=True
|
|
```
|
|
|
|
**Production — separate workers:**
|
|
```bash
|
|
celery -A mnemosyne worker -l info -Q embedding -c 1 -n embedding@%h # GPU-bound embedding
|
|
celery -A mnemosyne worker -l info -Q batch -c 2 -n batch@%h # Batch orchestration
|
|
celery -A mnemosyne worker -l info -Q celery -c 2 -n default@%h # LLM API validation
|
|
```
|
|
|
|
**Scheduler & Monitoring:**
|
|
```bash
|
|
celery -A mnemosyne beat -l info # Periodic task scheduler
|
|
celery -A mnemosyne flower --port=5555 # Web monitoring UI
|
|
```
|
|
|
|
See [Phase 2: Celery Workers & Scheduler](docs/PHASE_2_EMBEDDING_PIPELINE.md#celery-workers--scheduler) for full details on queues, reliability settings, and task progress tracking.
|
|
|
|
## Architecture Note: Retrieval, Not Synthesis
|
|
|
|
Mnemosyne is a **retrieval engine**, not a RAG pipeline. It stores, embeds, and ranks — it does not synthesize answers.
|
|
|
|
The earlier roadmap had a server-side RAG layer that took a query and returned a written answer with citations. That layer has been removed. Calling LLMs (Claude via MCP, principally) are perfectly capable of driving iterative retrieval themselves when given the right primitives, and a server-side synthesis hop adds latency, cost, and a place where errors are harder to debug. Letting the calling LLM see chunks directly — and follow citations, pivot mid-search, or call `get_chunk` for full text — beats pre-digesting them.
|
|
|
|
If a "knowledge subagent" is ever wanted (a wrapper that takes a question and returns a written answer), it lives **outside** Mnemosyne as a thin client over the MCP tools, with its own system prompt. No coupling, no extra inference hop inside the server, and the subagent's behavior can iterate independently.
|
|
|
|
## Documentation
|
|
|
|
- **[Architecture Documentation](docs/mnemosyne.html)** — Full system architecture with diagrams
|
|
- **[Phase 1: Foundation](docs/PHASE_1_FOUNDATION.md)** — Project skeleton, Neo4j data model, content-type system
|
|
- **[Phase 2: Embedding Pipeline](docs/PHASE_2_EMBEDDING_PIPELINE.md)** — Qwen3-VL multimodal embedding
|
|
- **[Phase 3: Search & Re-ranking](docs/PHASE_3_SEARCH_AND_RERANKING.md)** — Hybrid search + re-ranker
|
|
- **[Phase 5: MCP Server](docs/PHASE_5_MCP_SERVER.md)** — Retrieval primitives for LLMs (`search`, `get_chunk`, `list_libraries`, …)
|
|
- **[Phase 6: Backport to Spelunker](docs/PHASE_6_BACKPORT_TO_SPELUNKER.md)** — Proven patterns flowing back
|
|
|
|
|