"The electric light did not come from the continuous improvement of candles." — Oren Harari
Mnemosyne is a content-type-aware, multimodal personal knowledge management system built on Neo4j knowledge graphs and Qwen3-VL multimodal AI. Named after the Titan goddess of memory, it understands what kind of knowledge it holds and makes it searchable through text, images, and natural language.
Mnemosyne is a personal knowledge management system that treats content type as a first-class concept. Unlike generic knowledge bases that treat all documents identically, Mnemosyne understands the difference between a novel, a technical manual, album artwork, and a journal entry — and adjusts its chunking, embedding, search, and LLM prompting accordingly.
Mnemosyne's RAG pipeline architecture is inspired by Spelunker, an enterprise RFP response platform built on Django, PostgreSQL/pgvector, and LangChain. The proven patterns — hybrid search, two-stage RAG, citation-based retrieval, async document processing, and SME-approved knowledge bases — are carried forward and enhanced with multimodal capabilities and knowledge graph relationships. Proven patterns from Mnemosyne will be backported to Spelunker.
mnemosyne/
├── mnemosyne/ # Django settings, URLs, WSGI/ASGI
├── core/ # Users, auth, profiles
├── library/ # Neo4j models (Library, Collection, Item, Chunk, Concept)
├── engine/ # RAG pipeline services
│ ├── embeddings.py # Qwen3-VL embedding client
│ ├── reranker.py # Qwen3-VL reranker client
│ ├── search.py # Hybrid search (vector + graph + full-text)
│ ├── pipeline.py # Two-stage RAG (responder + reviewer)
│ ├── llm_client.py # OpenAI-compatible LLM client
│ └── content_types.py # Library type definitions
├── mcp_server/ # MCP tool definitions
├── importers/ # Content import tools
├── llm_manager/ # LLM API/model config (ported from Spelunker)
├── static/
├── templates/
├── docker-compose.yml
├── pyproject.toml
└── manage.py
Neo4j stores all content knowledge: libraries, collections, items, chunks, concepts, and their relationships + vector embeddings. PostgreSQL stores only Django operational data: users, auth, LLM configurations, analytics, and Celery results. Content never lives in PostgreSQL.
| Node | Key Properties | Vector? |
|---|---|---|
| Library | name, library_type, chunking_config, embedding_instruction, llm_context_prompt | No |
| Collection | name, description, metadata | No |
| Item | title, item_type, s3_key, content_hash, metadata, created_at | No |
| Chunk | chunk_index, chunk_s3_key, chunk_size, embedding (4096d) | Yes |
| Concept | name, concept_type, embedding (4096d) | Yes |
| Image | s3_key, image_type, description, metadata | No |
| ImageEmbedding | embedding (4096d multimodal) | Yes |
| Relationship | From → To | Properties |
|---|---|---|
| CONTAINS | Library → Collection | — |
| CONTAINS | Collection → Item | position |
| HAS_CHUNK | Item → Chunk | — |
| HAS_IMAGE | Item → Image | image_role |
| HAS_EMBEDDING | Image → ImageEmbedding | — |
| REFERENCES | Item → Concept | relevance |
| MENTIONS | Chunk → Concept | — |
| RELATED_TO | Item → Item | relationship_type, weight |
| RELATED_TO | Concept → Concept | relationship_type |
// Chunk text+image embeddings (4096 dimensions, no pgvector limits!)
CREATE VECTOR INDEX chunk_embedding FOR (c:Chunk)
ON (c.embedding) OPTIONS {indexConfig: {
`vector.dimensions`: 4096,
`vector.similarity_function`: 'cosine'
}}
// Concept embeddings for semantic concept search
CREATE VECTOR INDEX concept_embedding FOR (con:Concept)
ON (con.embedding) OPTIONS {indexConfig: {
`vector.dimensions`: 4096,
`vector.similarity_function`: 'cosine'
}}
// Image multimodal embeddings
CREATE VECTOR INDEX image_embedding FOR (ie:ImageEmbedding)
ON (ie.embedding) OPTIONS {indexConfig: {
`vector.dimensions`: 4096,
`vector.similarity_function`: 'cosine'
}}
// Full-text index for keyword/BM25-style search
CREATE FULLTEXT INDEX chunk_fulltext FOR (c:Chunk) ON EACH [c.text_preview]
Each Library has a library_type that defines how content is chunked, what embedding instructions are sent to Qwen3-VL, what re-ranking instructions are used, and what context prompt is injected when the LLM generates answers. This is configured per library in the database — not hardcoded.
Chunking: Chapter-aware, preserve dialogue blocks, narrative flow
Embedding Instruction: "Represent the narrative passage for literary retrieval, capturing themes, characters, and plot elements"
Reranker Instruction: "Score relevance of this fiction excerpt to the query, considering narrative themes and character arcs"
LLM Context: "The following excerpts are from fiction. Interpret as narrative — consider themes, symbolism, character development."
Multimodal: Cover art, illustrations
Graph: Author → Book → Character → Theme
Chunking: Section/heading-aware, preserve code blocks and tables as atomic units
Embedding Instruction: "Represent the technical documentation for precise procedural retrieval"
Reranker Instruction: "Score relevance of this technical documentation to the query, prioritizing procedural accuracy"
LLM Context: "The following excerpts are from technical documentation. Provide precise, actionable instructions."
Multimodal: Diagrams, screenshots, wiring diagrams
Graph: Product → Manual → Section → Procedure → Tool
Chunking: Song-level (lyrics as one chunk), verse/chorus segmentation
Embedding Instruction: "Represent the song lyrics and album context for music discovery and thematic analysis"
Reranker Instruction: "Score relevance considering lyrical themes, musical context, and artist style"
LLM Context: "The following excerpts are song lyrics and music metadata. Interpret in musical and cultural context."
Multimodal: Album artwork, liner note images
Graph: Artist → Album → Track → Genre; Track → SAMPLES → Track
Chunking: Scene-level for scripts, paragraph-level for synopses
Embedding Instruction: "Represent the film content for cinematic retrieval, capturing visual and narrative elements"
Multimodal: Movie stills, posters, screenshots
Graph: Director → Film → Scene → Actor; Film → BASED_ON → Book
Chunking: Description-level, catalog entry as unit
Embedding Instruction: "Represent the artwork and its description for visual and stylistic retrieval"
Multimodal: The artwork itself — primary content is visual
Graph: Artist → Piece → Style → Movement; Piece → INSPIRED_BY → Piece
Chunking: Entry-level (one entry = one chunk), paragraph split for long entries
Embedding Instruction: "Represent the personal journal entry for temporal and reflective retrieval"
Multimodal: Photos, sketches attached to entries
Graph: Date → Entry → Topic; Entry → MENTIONS → Person/Place
Stage 1 — Embedding (Qwen3-VL-Embedding-8B): Generates 4096-dimensional vectors from text, images, screenshots, and video in a unified semantic space. Accepts content-type-specific instructions for optimized representations.
Stage 2 — Re-ranking (Qwen3-VL-Reranker-8B): Takes (query, document) pairs — where both can be multimodal — and outputs precise relevance scores via cross-attention. Dramatically sharpens retrieval accuracy.
--runner pooling--runner pooling + score endpointTraditional RAG systems OCR images and diagrams, producing garbled text. Multimodal embedding understands the visual content directly:
Cosine similarity via Neo4j vector index on Chunk and ImageEmbedding nodes.
CALL db.index.vector.queryNodes(
'chunk_embedding', 30,
$query_vector
) YIELD node, score
WHERE score > $threshold
Walk relationships to find contextually related content that vector search alone would miss.
MATCH (c:Chunk)-[:HAS_CHUNK]-(i:Item)
-[:REFERENCES]->(con:Concept)
-[:RELATED_TO]-(con2:Concept)
<-[:REFERENCES]-(i2:Item)
-[:HAS_CHUNK]->(c2:Chunk)
RETURN c2, i2
Neo4j native full-text index for keyword matching (BM25-equivalent).
CALL db.index.fulltext.queryNodes(
'chunk_fulltext',
$query_text
) YIELD node, score
Mnemosyne exposes its capabilities as MCP tools, making the entire knowledge base accessible to Claude, Copilot, and any MCP-compatible LLM client. The MCP server is a primary interface, not an afterthought.
| Tool | Description |
|---|---|
search_library | Semantic + graph + full-text search with re-ranking. Filters by library, collection, content type. |
ask_about | Full RAG pipeline — search, re-rank, content-type context injection, LLM response with citations. |
find_similar | Find items similar to a given item using vector similarity. Optionally search across libraries. |
search_by_image | Multimodal search — find content matching an uploaded image. |
explore_connections | Traverse knowledge graph from an item — find related concepts, authors, themes. |
| Tool | Description |
|---|---|
browse_libraries | List all libraries with their content types and item counts. |
browse_collections | List collections within a library. |
get_item | Get detailed info about a specific item, including metadata and graph connections. |
add_content | Add new content to a library — triggers async embedding + graph construction. |
get_concepts | List extracted concepts for an item or across a library. |
| Model | Qwen3-VL-Reranker-8B |
| VRAM (bf16) | ~18GB |
| Serving | vLLM --runner pooling |
| Port | :8001 |
| Role | Multimodal re-ranking |
| Headroom | ~14GB for chat model |
| Model | Qwen3-VL-Embedding-8B |
| VRAM (bf16) | ~18GB |
| Serving | vLLM --runner pooling |
| Port | :8002 |
| Role | Multimodal embedding |
| Headroom | ~6GB |
Text-only Qwen3-Reranker-0.6B GGUF served via llama-server on existing systemd/Ansible infrastructure. Managed by the same playbooks, monitored by the same Grafana dashboards. Used when vLLM services are down or for text-only workloads.
Mnemosyne and Spelunker share: GPU model services (llama.cpp + vLLM), MinIO/S3 (separate buckets), Neo4j (separate databases), RabbitMQ (separate vhosts), and Grafana monitoring. Each is its own Docker Compose stack but points to shared infra.
Mnemosyne proves the architecture with no legacy constraints. Once validated, proven components flow back to Spelunker to enhance its RFP workflow with multimodal understanding and re-ranking precision.
| Component | Mnemosyne (Prove) | Spelunker (Backport) |
|---|---|---|
| RerankerService | Qwen3-VL multimodal + llama.cpp text | Drop into rag/services/reranker.py |
| Multimodal Embedding | Qwen3-VL-Embedding via vLLM | Add alongside OpenAI embeddings, MRL@1536d for pgvector compat |
| Diagram Understanding | Image pages embedded multimodally | PDF diagrams in RFP docs become searchable |
| MCP Server | Primary interface from day one | Add as secondary interface to Spelunker |
| Neo4j (optional) | Primary vector + graph store | Could replace pgvector, or run alongside |
| Content-Type Config | Library type definitions | Adapt as document classification in Spelunker |
This document describes the target architecture for Mnemosyne. Phase implementation documents provide detailed build plans.