Mnemosyne Architecture

"The electric light did not come from the continuous improvement of candles." — Oren Harari

Mnemosyne is a content-type-aware, multimodal personal knowledge management system built on Neo4j knowledge graphs and Qwen3-VL multimodal AI. Named after the Titan goddess of memory, it understands what kind of knowledge it holds and makes it searchable through text, images, and natural language.

Overview

Purpose

Mnemosyne is a personal knowledge management system that treats content type as a first-class concept. Unlike generic knowledge bases that treat all documents identically, Mnemosyne understands the difference between a novel, a technical manual, album artwork, and a journal entry — and adjusts its chunking, embedding, search, and LLM prompting accordingly.

Knowledge Graph

  • Neo4j stores relationships between content, not just vectors
  • Author → Book → Character → Theme traversals
  • Artist → Album → Track → Genre connections
  • No vector dimension limits (full 4096d Qwen3-VL)
  • Graph + vector + full-text search in one database

Multimodal AI

  • Qwen3-VL-Embedding: text + images + video in one vector space
  • Qwen3-VL-Reranker: cross-attention scoring across modalities
  • Album art, diagrams, screenshots become searchable
  • Local GPU inference (5090 + 3090) — zero API costs
  • llama.cpp text fallback via existing Ansible/systemd infra

Content-Type Awareness

  • Library types define chunking, embedding, and prompt behavior
  • Fiction: narrative-aware chunking, character extraction
  • Technical: section-aware, code block preservation
  • Music: lyrics as primary, metadata-heavy (genre, mood)
  • Each type injects context into the LLM prompt

Key Differentiators

  • Content-type-aware pipeline — chunking, embedding instructions, re-ranking instructions, and LLM context all adapt per library type
  • Neo4j knowledge graph — traversable relationships, not just flat vector similarity
  • Full multimodal — Qwen3-VL processes images, diagrams, album art alongside text in a unified vector space
  • No dimension limits — Neo4j handles 4096d vectors natively (pgvector caps at 2000)
  • MCP-first interface — designed for LLM integration from day one
  • Proven RAG architecture — two-stage responder/reviewer pattern inherited from Spelunker
  • Local GPU inference — zero ongoing API costs via vLLM + llama.cpp on RTX 5090/3090

Heritage

Mnemosyne's RAG pipeline architecture is inspired by Spelunker, an enterprise RFP response platform built on Django, PostgreSQL/pgvector, and LangChain. The proven patterns — hybrid search, two-stage RAG, citation-based retrieval, async document processing, and SME-approved knowledge bases — are carried forward and enhanced with multimodal capabilities and knowledge graph relationships. Proven patterns from Mnemosyne will be backported to Spelunker.

System Architecture

High-Level Architecture

graph TB subgraph Clients["Client Layer"] MCP["MCP Clients
(Claude, Copilot, etc.)"] UI["Django Web UI"] API["REST API (DRF)"] end subgraph App["Application Layer — Django"] Core["core/
Users, Auth"] Library["library/
Libraries, Collections, Items"] Engine["engine/
Embedding, Search, Reranker, RAG"] MCPServer["mcp_server/
MCP Tool Interface"] Importers["importers/
File, Calibre, Web"] end subgraph Data["Data Layer"] Neo4j["Neo4j 5.x
Knowledge Graph + Vectors"] PG["PostgreSQL
Auth, Config, Analytics"] S3["S3/MinIO
Content + Chunks"] RMQ["RabbitMQ
Task Queue"] end subgraph GPU["GPU Services"] vLLM_E["vLLM
Qwen3-VL-Embedding-8B
(Multimodal Embed)"] vLLM_R["vLLM
Qwen3-VL-Reranker-8B
(Multimodal Rerank)"] LCPP["llama.cpp
Qwen3-Reranker-0.6B
(Text Fallback)"] LCPP_C["llama.cpp
Qwen3 Chat
(RAG Responder)"] end MCP --> MCPServer UI --> Core API --> Library API --> Engine MCPServer --> Engine MCPServer --> Library Library --> Neo4j Engine --> Neo4j Engine --> S3 Core --> PG Engine --> vLLM_E Engine --> vLLM_R Engine --> LCPP Engine --> LCPP_C Library --> RMQ

Django Apps

  • core/ — Users, authentication, profiles, permissions
  • library/ — Libraries, Collections, Items, Chunks, Concepts (Neo4j models)
  • engine/ — Embedding, search, reranker, RAG pipeline services
  • mcp_server/ — MCP tool definitions and server interface
  • importers/ — Content acquisition (file upload, Calibre, web scrape)
  • llm_manager/ — LLM API/model config, usage tracking (from Spelunker)

Technology Stack

  • Django 5.x, Python ≥3.12, Django REST Framework
  • Neo4j 5.x + django-neomodel — knowledge graph + vector index
  • PostgreSQL — Django auth, config, analytics only
  • S3/MinIO — all content and chunk storage
  • Celery + RabbitMQ — async embedding and graph construction
  • vLLM ≥0.14 — Qwen3-VL multimodal serving
  • llama.cpp — text model serving (existing Ansible infra)
  • MCP SDK — Model Context Protocol server

Project Structure

mnemosyne/
├── mnemosyne/          # Django settings, URLs, WSGI/ASGI
├── core/               # Users, auth, profiles
├── library/            # Neo4j models (Library, Collection, Item, Chunk, Concept)
├── engine/             # RAG pipeline services
│   ├── embeddings.py   # Qwen3-VL embedding client
│   ├── reranker.py     # Qwen3-VL reranker client
│   ├── search.py       # Hybrid search (vector + graph + full-text)
│   ├── pipeline.py     # Two-stage RAG (responder + reviewer)
│   ├── llm_client.py   # OpenAI-compatible LLM client
│   └── content_types.py # Library type definitions
├── mcp_server/         # MCP tool definitions
├── importers/          # Content import tools
├── llm_manager/        # LLM API/model config (ported from Spelunker)
├── static/
├── templates/
├── docker-compose.yml
├── pyproject.toml
└── manage.py

Data Model — Neo4j Knowledge Graph

Dual Database Strategy

Neo4j stores all content knowledge: libraries, collections, items, chunks, concepts, and their relationships + vector embeddings. PostgreSQL stores only Django operational data: users, auth, LLM configurations, analytics, and Celery results. Content never lives in PostgreSQL.

Graph Schema

graph LR L["Library
(fiction, technical,
music, art, journal)"] -->|CONTAINS| Col["Collection
(genre, author,
artist, project)"] Col -->|CONTAINS| I["Item
(book, manual,
album, film, entry)"] I -->|HAS_CHUNK| Ch["Chunk
(text + optional image
+ 4096d vector)"] I -->|REFERENCES| Con["Concept
(person, topic,
technique, theme)"] I -->|RELATED_TO| I Con -->|RELATED_TO| Con Ch -->|MENTIONS| Con I -->|HAS_IMAGE| Img["Image
(cover, diagram,
artwork, still)"] Img -->|HAS_EMBEDDING| ImgE["ImageEmbedding
(4096d multimodal
vector)"]

Core Nodes

NodeKey PropertiesVector?
Libraryname, library_type, chunking_config, embedding_instruction, llm_context_promptNo
Collectionname, description, metadataNo
Itemtitle, item_type, s3_key, content_hash, metadata, created_atNo
Chunkchunk_index, chunk_s3_key, chunk_size, embedding (4096d)Yes
Conceptname, concept_type, embedding (4096d)Yes
Images3_key, image_type, description, metadataNo
ImageEmbeddingembedding (4096d multimodal)Yes

Relationships

RelationshipFrom → ToProperties
CONTAINSLibrary → Collection
CONTAINSCollection → Itemposition
HAS_CHUNKItem → Chunk
HAS_IMAGEItem → Imageimage_role
HAS_EMBEDDINGImage → ImageEmbedding
REFERENCESItem → Conceptrelevance
MENTIONSChunk → Concept
RELATED_TOItem → Itemrelationship_type, weight
RELATED_TOConcept → Conceptrelationship_type

Neo4j Vector Indexes

// Chunk text+image embeddings (4096 dimensions, no pgvector limits!)
CREATE VECTOR INDEX chunk_embedding FOR (c:Chunk)
ON (c.embedding) OPTIONS {indexConfig: {
  `vector.dimensions`: 4096,
  `vector.similarity_function`: 'cosine'
}}

// Concept embeddings for semantic concept search
CREATE VECTOR INDEX concept_embedding FOR (con:Concept)
ON (con.embedding) OPTIONS {indexConfig: {
  `vector.dimensions`: 4096,
  `vector.similarity_function`: 'cosine'
}}

// Image multimodal embeddings
CREATE VECTOR INDEX image_embedding FOR (ie:ImageEmbedding)
ON (ie.embedding) OPTIONS {indexConfig: {
  `vector.dimensions`: 4096,
  `vector.similarity_function`: 'cosine'
}}

// Full-text index for keyword/BM25-style search
CREATE FULLTEXT INDEX chunk_fulltext FOR (c:Chunk) ON EACH [c.text_preview]

Content Type System

The Core Innovation

Each Library has a library_type that defines how content is chunked, what embedding instructions are sent to Qwen3-VL, what re-ranking instructions are used, and what context prompt is injected when the LLM generates answers. This is configured per library in the database — not hardcoded.

Fiction

Chunking: Chapter-aware, preserve dialogue blocks, narrative flow

Embedding Instruction: "Represent the narrative passage for literary retrieval, capturing themes, characters, and plot elements"

Reranker Instruction: "Score relevance of this fiction excerpt to the query, considering narrative themes and character arcs"

LLM Context: "The following excerpts are from fiction. Interpret as narrative — consider themes, symbolism, character development."

Multimodal: Cover art, illustrations

Graph: Author → Book → Character → Theme

Technical

Chunking: Section/heading-aware, preserve code blocks and tables as atomic units

Embedding Instruction: "Represent the technical documentation for precise procedural retrieval"

Reranker Instruction: "Score relevance of this technical documentation to the query, prioritizing procedural accuracy"

LLM Context: "The following excerpts are from technical documentation. Provide precise, actionable instructions."

Multimodal: Diagrams, screenshots, wiring diagrams

Graph: Product → Manual → Section → Procedure → Tool

Music

Chunking: Song-level (lyrics as one chunk), verse/chorus segmentation

Embedding Instruction: "Represent the song lyrics and album context for music discovery and thematic analysis"

Reranker Instruction: "Score relevance considering lyrical themes, musical context, and artist style"

LLM Context: "The following excerpts are song lyrics and music metadata. Interpret in musical and cultural context."

Multimodal: Album artwork, liner note images

Graph: Artist → Album → Track → Genre; Track → SAMPLES → Track

Film

Chunking: Scene-level for scripts, paragraph-level for synopses

Embedding Instruction: "Represent the film content for cinematic retrieval, capturing visual and narrative elements"

Multimodal: Movie stills, posters, screenshots

Graph: Director → Film → Scene → Actor; Film → BASED_ON → Book

Art

Chunking: Description-level, catalog entry as unit

Embedding Instruction: "Represent the artwork and its description for visual and stylistic retrieval"

Multimodal: The artwork itself — primary content is visual

Graph: Artist → Piece → Style → Movement; Piece → INSPIRED_BY → Piece

Journals

Chunking: Entry-level (one entry = one chunk), paragraph split for long entries

Embedding Instruction: "Represent the personal journal entry for temporal and reflective retrieval"

Multimodal: Photos, sketches attached to entries

Graph: Date → Entry → Topic; Entry → MENTIONS → Person/Place

Multimodal Embedding & Re-ranking Pipeline

Two-Stage Multimodal Pipeline

Stage 1 — Embedding (Qwen3-VL-Embedding-8B): Generates 4096-dimensional vectors from text, images, screenshots, and video in a unified semantic space. Accepts content-type-specific instructions for optimized representations.

Stage 2 — Re-ranking (Qwen3-VL-Reranker-8B): Takes (query, document) pairs — where both can be multimodal — and outputs precise relevance scores via cross-attention. Dramatically sharpens retrieval accuracy.

Embedding & Ingestion Flow

flowchart TD A["New Content
(file upload, import)"] --> B{"Content Type?"} B -->|"Text (PDF, DOCX, MD)"| C["Parse Text
+ Extract Images"] B -->|"Image (art, photo)"| D["Image Only"] B -->|"Mixed (manual + diagrams)"| E["Parse Text
+ Keep Page Images"] C --> F["Chunk Text
(content-type-aware)"] D --> G["Image to S3"] E --> F E --> G F --> H["Store Chunks in S3"] H --> I["Qwen3-VL-Embedding
(text + instruction)"] G --> J["Qwen3-VL-Embedding
(image + instruction)"] I --> K["4096d Vector"] J --> K K --> L["Store in Neo4j
Chunk/ImageEmbedding Node"] L --> M["Extract Concepts
(LLM entity extraction)"] M --> N["Create Concept Nodes
+ REFERENCES/MENTIONS edges"]

Qwen3-VL-Embedding-8B

  • Dimensions: 4096 (full), or MRL truncation to 3072/2048/1536/1024
  • Input: Text, images, screenshots, video, or any mix
  • Instruction-aware: Content-type instruction improves quality 1–5%
  • Quantization: Int8 (~8GB VRAM), Int4 (~4GB VRAM)
  • Serving: vLLM with --runner pooling
  • Languages: 30+ languages supported

Qwen3-VL-Reranker-8B

  • Architecture: Single-tower cross-attention (deep query↔document interaction)
  • Input: (query, document) pairs — both can be multimodal
  • Output: Relevance score (sigmoid of yes/no token probabilities)
  • Instruction-aware: Custom re-ranking instructions per content type
  • Serving: vLLM with --runner pooling + score endpoint
  • Fallback: Qwen3-Reranker-0.6B via llama.cpp (text-only)

Why Multimodal Matters

Traditional RAG systems OCR images and diagrams, producing garbled text. Multimodal embedding understands the visual content directly:

  • Technical diagrams: Wiring diagrams, network topologies, architecture diagrams — searchable by visual content, not OCR garbage
  • Album artwork: "psychedelic album covers from the 70s" finds matching art via visual similarity
  • Art: The actual painting/sculpture becomes the searchable content, not just its text description
  • PDF pages: Image-only PDF pages with charts and tables are embedded as images, not skipped

Search Pipeline — GraphRAG + Vector + Re-rank

Search Flow

flowchart TD Q["User Query"] --> E["Embed Query
(Qwen3-VL-Embedding)"] E --> VS["1. Vector Search
(Neo4j vector index)
Top-K × 3 oversample"] E --> GT["2. Graph Traversal
(Cypher queries)
Concept + relationship walks"] Q --> FT["3. Full-Text Search
(Neo4j fulltext index)
Keyword matching"] VS --> F["Candidate Fusion
+ Deduplication"] GT --> F FT --> F F --> RR["4. Re-Rank
(Qwen3-VL-Reranker)
Cross-attention scoring"] RR --> TK["Top-K Results"] TK --> CTX["Inject Content-Type
Context Prompt"] CTX --> LLM["5. LLM Responder
(Two-stage RAG)"] LLM --> REV["6. LLM Reviewer
(Quality + citation check)"] REV --> ANS["Final Answer
with Citations"]
1. Vector Search

Cosine similarity via Neo4j vector index on Chunk and ImageEmbedding nodes.

CALL db.index.vector.queryNodes(
  'chunk_embedding', 30,
  $query_vector
) YIELD node, score
WHERE score > $threshold
2. Graph Traversal

Walk relationships to find contextually related content that vector search alone would miss.

MATCH (c:Chunk)-[:HAS_CHUNK]-(i:Item)
  -[:REFERENCES]->(con:Concept)
  -[:RELATED_TO]-(con2:Concept)
  <-[:REFERENCES]-(i2:Item)
  -[:HAS_CHUNK]->(c2:Chunk)
RETURN c2, i2
3. Full-Text Search

Neo4j native full-text index for keyword matching (BM25-equivalent).

CALL db.index.fulltext.queryNodes(
  'chunk_fulltext',
  $query_text
) YIELD node, score

MCP Server Interface

MCP-First Design

Mnemosyne exposes its capabilities as MCP tools, making the entire knowledge base accessible to Claude, Copilot, and any MCP-compatible LLM client. The MCP server is a primary interface, not an afterthought.

Search & Retrieval Tools

ToolDescription
search_librarySemantic + graph + full-text search with re-ranking. Filters by library, collection, content type.
ask_aboutFull RAG pipeline — search, re-rank, content-type context injection, LLM response with citations.
find_similarFind items similar to a given item using vector similarity. Optionally search across libraries.
search_by_imageMultimodal search — find content matching an uploaded image.
explore_connectionsTraverse knowledge graph from an item — find related concepts, authors, themes.

Management & Navigation Tools

ToolDescription
browse_librariesList all libraries with their content types and item counts.
browse_collectionsList collections within a library.
get_itemGet detailed info about a specific item, including metadata and graph connections.
add_contentAdd new content to a library — triggers async embedding + graph construction.
get_conceptsList extracted concepts for an item or across a library.

GPU Services

RTX 5090 (32GB VRAM)

ModelQwen3-VL-Reranker-8B
VRAM (bf16)~18GB
ServingvLLM --runner pooling
Port:8001
RoleMultimodal re-ranking
Headroom~14GB for chat model

RTX 3090 (24GB VRAM)

ModelQwen3-VL-Embedding-8B
VRAM (bf16)~18GB
ServingvLLM --runner pooling
Port:8002
RoleMultimodal embedding
Headroom~6GB

Fallback: llama.cpp (Existing Ansible Infra)

Text-only Qwen3-Reranker-0.6B GGUF served via llama-server on existing systemd/Ansible infrastructure. Managed by the same playbooks, monitored by the same Grafana dashboards. Used when vLLM services are down or for text-only workloads.

Deployment

Core Services

  • web: Django app (Gunicorn)
  • postgres: PostgreSQL (auth/config only)
  • neo4j: Neo4j 5.x (knowledge graph + vectors)
  • rabbitmq: Celery broker

Async Processing

  • celery-worker: Embedding, graph construction
  • celery-beat: Scheduled re-sync tasks

Storage & Proxy

  • minio: S3-compatible content storage
  • nginx: Static/proxy
  • mcp-server: MCP interface process

Shared Infrastructure with Spelunker

Mnemosyne and Spelunker share: GPU model services (llama.cpp + vLLM), MinIO/S3 (separate buckets), Neo4j (separate databases), RabbitMQ (separate vhosts), and Grafana monitoring. Each is its own Docker Compose stack but points to shared infra.

Backport Strategy to Spelunker

Build Forward, Backport Back

Mnemosyne proves the architecture with no legacy constraints. Once validated, proven components flow back to Spelunker to enhance its RFP workflow with multimodal understanding and re-ranking precision.

ComponentMnemosyne (Prove)Spelunker (Backport)
RerankerServiceQwen3-VL multimodal + llama.cpp textDrop into rag/services/reranker.py
Multimodal EmbeddingQwen3-VL-Embedding via vLLMAdd alongside OpenAI embeddings, MRL@1536d for pgvector compat
Diagram UnderstandingImage pages embedded multimodallyPDF diagrams in RFP docs become searchable
MCP ServerPrimary interface from day oneAdd as secondary interface to Spelunker
Neo4j (optional)Primary vector + graph storeCould replace pgvector, or run alongside
Content-Type ConfigLibrary type definitionsAdapt as document classification in Spelunker

Documentation Complete

This document describes the target architecture for Mnemosyne. Phase implementation documents provide detailed build plans.