Files
mnemosyne/docs/mnemosyne.html
Robert Helewka 99bdb4ac92 Add Themis application with custom widgets, views, and utilities
- Implemented custom form widgets for date, time, and datetime fields with DaisyUI styling.
- Created utility functions for formatting dates, times, and numbers according to user preferences.
- Developed views for profile settings, API key management, and notifications, including health check endpoints.
- Added URL configurations for Themis tests and main application routes.
- Established test cases for custom widgets to ensure proper functionality and integration.
- Defined project metadata and dependencies in pyproject.toml for package management.
2026-03-21 02:00:18 +00:00

733 lines
46 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Mnemosyne — Architecture Documentation</title>
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet">
<link href="https://cdn.jsdelivr.net/npm/bootstrap-icons@1.11.0/font/bootstrap-icons.css" rel="stylesheet">
<script src="https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.min.js"></script>
<script>mermaid.initialize({ startOnLoad: true, theme: 'default' });</script>
</head>
<body>
<div class="container-fluid">
<nav class="navbar navbar-dark bg-dark rounded mb-4">
<div class="container-fluid">
<a class="navbar-brand" href="#"><i class="bi bi-book"></i> Mnemosyne — Architecture Documentation</a>
<div class="navbar-nav d-flex flex-row">
<a class="nav-link me-3" href="#overview">Overview</a>
<a class="nav-link me-3" href="#architecture">Architecture</a>
<a class="nav-link me-3" href="#data-model">Data Model</a>
<a class="nav-link me-3" href="#content-types">Content Types</a>
<a class="nav-link me-3" href="#multimodal-pipeline">Multimodal</a>
<a class="nav-link me-3" href="#search-pipeline">Search</a>
<a class="nav-link me-3" href="#mcp-interface">MCP</a>
<a class="nav-link me-3" href="#gpu-services">GPU</a>
<a class="nav-link" href="#deployment">Deployment</a>
</div>
</div>
</nav>
<div class="row">
<div class="col-12">
<h1 class="display-4 mb-2"><i class="bi bi-book-fill"></i> Mnemosyne <span class="badge bg-primary">Architecture</span></h1>
<p class="lead text-muted fst-italic">"The electric light did not come from the continuous improvement of candles." — Oren Harari</p>
<p class="lead">Mnemosyne is a content-type-aware, multimodal personal knowledge management system built on Neo4j knowledge graphs and Qwen3-VL multimodal AI. Named after the Titan goddess of memory, it understands <em>what kind</em> of knowledge it holds and makes it searchable through text, images, and natural language.</p>
</div>
</div>
<!-- SECTION: OVERVIEW -->
<section id="overview" class="mb-5">
<h2 class="h2 mb-4"><i class="bi bi-info-circle"></i> Overview</h2>
<div class="alert alert-primary border-start border-4 border-primary">
<h3>Purpose</h3>
<p><strong>Mnemosyne</strong> is a personal knowledge management system that treats content type as a first-class concept. Unlike generic knowledge bases that treat all documents identically, Mnemosyne understands the difference between a novel, a technical manual, album artwork, and a journal entry — and adjusts its chunking, embedding, search, and LLM prompting accordingly.</p>
</div>
<div class="row g-4 mb-4">
<div class="col-lg-4">
<div class="card h-100">
<div class="card-body">
<h3 class="card-title text-primary"><i class="bi bi-diagram-3"></i> Knowledge Graph</h3>
<ul class="mb-0">
<li>Neo4j stores relationships between content, not just vectors</li>
<li>Author → Book → Character → Theme traversals</li>
<li>Artist → Album → Track → Genre connections</li>
<li>No vector dimension limits (full 4096d Qwen3-VL)</li>
<li>Graph + vector + full-text search in one database</li>
</ul>
</div>
</div>
</div>
<div class="col-lg-4">
<div class="card h-100">
<div class="card-body">
<h3 class="card-title text-primary"><i class="bi bi-eye"></i> Multimodal AI</h3>
<ul class="mb-0">
<li>Qwen3-VL-Embedding: text + images + video in one vector space</li>
<li>Qwen3-VL-Reranker: cross-attention scoring across modalities</li>
<li>Album art, diagrams, screenshots become searchable</li>
<li>Local GPU inference (5090 + 3090) — zero API costs</li>
<li>llama.cpp text fallback via existing Ansible/systemd infra</li>
</ul>
</div>
</div>
</div>
<div class="col-lg-4">
<div class="card h-100">
<div class="card-body">
<h3 class="card-title text-primary"><i class="bi bi-tags"></i> Content-Type Awareness</h3>
<ul class="mb-0">
<li>Library types define chunking, embedding, and prompt behavior</li>
<li>Fiction: narrative-aware chunking, character extraction</li>
<li>Technical: section-aware, code block preservation</li>
<li>Music: lyrics as primary, metadata-heavy (genre, mood)</li>
<li>Each type injects context into the LLM prompt</li>
</ul>
</div>
</div>
</div>
</div>
<div class="alert alert-info border-start border-4 border-info">
<h3>Key Differentiators</h3>
<ul class="mb-0">
<li><strong>Content-type-aware pipeline</strong> — chunking, embedding instructions, re-ranking instructions, and LLM context all adapt per library type</li>
<li><strong>Neo4j knowledge graph</strong> — traversable relationships, not just flat vector similarity</li>
<li><strong>Full multimodal</strong> — Qwen3-VL processes images, diagrams, album art alongside text in a unified vector space</li>
<li><strong>No dimension limits</strong> — Neo4j handles 4096d vectors natively (pgvector caps at 2000)</li>
<li><strong>MCP-first interface</strong> — designed for LLM integration from day one</li>
<li><strong>Proven RAG architecture</strong> — two-stage responder/reviewer pattern inherited from Spelunker</li>
<li><strong>Local GPU inference</strong> — zero ongoing API costs via vLLM + llama.cpp on RTX 5090/3090</li>
</ul>
</div>
<div class="alert alert-secondary border-start border-4 border-secondary">
<h3>Heritage</h3>
<p class="mb-0">Mnemosyne's RAG pipeline architecture is inspired by <strong>Spelunker</strong>, an enterprise RFP response platform built on Django, PostgreSQL/pgvector, and LangChain. The proven patterns — hybrid search, two-stage RAG, citation-based retrieval, async document processing, and SME-approved knowledge bases — are carried forward and enhanced with multimodal capabilities and knowledge graph relationships. Proven patterns from Mnemosyne will be backported to Spelunker.</p>
</div>
</section>
<!-- SECTION: ARCHITECTURE -->
<section id="architecture" class="mb-5">
<h2 class="h2 mb-4"><i class="bi bi-diagram-3"></i> System Architecture</h2>
<div class="card mb-4">
<div class="card-header bg-primary text-white"><h3 class="mb-0"><i class="bi bi-diagram-3"></i> High-Level Architecture</h3></div>
<div class="card-body">
<div class="mermaid">
graph TB
subgraph Clients["Client Layer"]
MCP["MCP Clients<br/>(Claude, Copilot, etc.)"]
UI["Django Web UI"]
API["REST API (DRF)"]
end
subgraph App["Application Layer — Django"]
Core["core/<br/>Users, Auth"]
Library["library/<br/>Libraries, Collections, Items"]
Engine["engine/<br/>Embedding, Search, Reranker, RAG"]
MCPServer["mcp_server/<br/>MCP Tool Interface"]
Importers["importers/<br/>File, Calibre, Web"]
end
subgraph Data["Data Layer"]
Neo4j["Neo4j 5.x<br/>Knowledge Graph + Vectors"]
PG["PostgreSQL<br/>Auth, Config, Analytics"]
S3["S3/MinIO<br/>Content + Chunks"]
RMQ["RabbitMQ<br/>Task Queue"]
end
subgraph GPU["GPU Services"]
vLLM_E["vLLM<br/>Qwen3-VL-Embedding-8B<br/>(Multimodal Embed)"]
vLLM_R["vLLM<br/>Qwen3-VL-Reranker-8B<br/>(Multimodal Rerank)"]
LCPP["llama.cpp<br/>Qwen3-Reranker-0.6B<br/>(Text Fallback)"]
LCPP_C["llama.cpp<br/>Qwen3 Chat<br/>(RAG Responder)"]
end
MCP --> MCPServer
UI --> Core
API --> Library
API --> Engine
MCPServer --> Engine
MCPServer --> Library
Library --> Neo4j
Engine --> Neo4j
Engine --> S3
Core --> PG
Engine --> vLLM_E
Engine --> vLLM_R
Engine --> LCPP
Engine --> LCPP_C
Library --> RMQ
</div>
</div>
</div>
<div class="row g-4 mb-4">
<div class="col-md-6">
<div class="card">
<div class="card-header bg-primary text-white"><h4 class="mb-0"><i class="bi bi-folder"></i> Django Apps</h4></div>
<div class="card-body">
<ul class="list-group list-group-flush">
<li class="list-group-item"><strong>core/</strong> — Users, authentication, profiles, permissions</li>
<li class="list-group-item"><strong>library/</strong> — Libraries, Collections, Items, Chunks, Concepts (Neo4j models)</li>
<li class="list-group-item"><strong>engine/</strong> — Embedding, search, reranker, RAG pipeline services</li>
<li class="list-group-item"><strong>mcp_server/</strong> — MCP tool definitions and server interface</li>
<li class="list-group-item"><strong>importers/</strong> — Content acquisition (file upload, Calibre, web scrape)</li>
<li class="list-group-item"><strong>llm_manager/</strong> — LLM API/model config, usage tracking (from Spelunker)</li>
</ul>
</div>
</div>
</div>
<div class="col-md-6">
<div class="card">
<div class="card-header bg-success text-white"><h4 class="mb-0"><i class="bi bi-stack"></i> Technology Stack</h4></div>
<div class="card-body">
<ul>
<li><strong>Django 5.x</strong>, Python ≥3.12, Django REST Framework</li>
<li><strong>Neo4j 5.x</strong> + django-neomodel — knowledge graph + vector index</li>
<li><strong>PostgreSQL</strong> — Django auth, config, analytics only</li>
<li><strong>S3/MinIO</strong> — all content and chunk storage</li>
<li><strong>Celery + RabbitMQ</strong> — async embedding and graph construction</li>
<li><strong>vLLM ≥0.14</strong> — Qwen3-VL multimodal serving</li>
<li><strong>llama.cpp</strong> — text model serving (existing Ansible infra)</li>
<li><strong>MCP SDK</strong> — Model Context Protocol server</li>
</ul>
</div>
</div>
</div>
</div>
<h3 class="mt-4">Project Structure</h3>
<pre class="bg-light p-3 rounded"><code>mnemosyne/
├── mnemosyne/ # Django settings, URLs, WSGI/ASGI
├── core/ # Users, auth, profiles
├── library/ # Neo4j models (Library, Collection, Item, Chunk, Concept)
├── engine/ # RAG pipeline services
│ ├── embeddings.py # Qwen3-VL embedding client
│ ├── reranker.py # Qwen3-VL reranker client
│ ├── search.py # Hybrid search (vector + graph + full-text)
│ ├── pipeline.py # Two-stage RAG (responder + reviewer)
│ ├── llm_client.py # OpenAI-compatible LLM client
│ └── content_types.py # Library type definitions
├── mcp_server/ # MCP tool definitions
├── importers/ # Content import tools
├── llm_manager/ # LLM API/model config (ported from Spelunker)
├── static/
├── templates/
├── docker-compose.yml
├── pyproject.toml
└── manage.py</code></pre>
</section>
<!-- SECTION: DATA MODEL -->
<section id="data-model" class="mb-5">
<h2 class="h2 mb-4"><i class="bi bi-database"></i> Data Model — Neo4j Knowledge Graph</h2>
<div class="alert alert-info border-start border-4 border-info">
<h3>Dual Database Strategy</h3>
<p class="mb-0"><strong>Neo4j</strong> stores all content knowledge: libraries, collections, items, chunks, concepts, and their relationships + vector embeddings. <strong>PostgreSQL</strong> stores only Django operational data: users, auth, LLM configurations, analytics, and Celery results. Content never lives in PostgreSQL.</p>
</div>
<div class="card mb-4">
<div class="card-header bg-primary text-white"><h3 class="mb-0"><i class="bi bi-diagram-2"></i> Graph Schema</h3></div>
<div class="card-body">
<div class="mermaid">
graph LR
L["Library<br/>(fiction, technical,<br/>music, art, journal)"] -->|CONTAINS| Col["Collection<br/>(genre, author,<br/>artist, project)"]
Col -->|CONTAINS| I["Item<br/>(book, manual,<br/>album, film, entry)"]
I -->|HAS_CHUNK| Ch["Chunk<br/>(text + optional image<br/>+ 4096d vector)"]
I -->|REFERENCES| Con["Concept<br/>(person, topic,<br/>technique, theme)"]
I -->|RELATED_TO| I
Con -->|RELATED_TO| Con
Ch -->|MENTIONS| Con
I -->|HAS_IMAGE| Img["Image<br/>(cover, diagram,<br/>artwork, still)"]
Img -->|HAS_EMBEDDING| ImgE["ImageEmbedding<br/>(4096d multimodal<br/>vector)"]
</div>
</div>
</div>
<div class="row g-4 mb-4">
<div class="col-md-6">
<div class="card h-100">
<div class="card-header bg-primary text-white"><h4 class="mb-0">Core Nodes</h4></div>
<div class="card-body">
<table class="table table-sm">
<thead><tr><th>Node</th><th>Key Properties</th><th>Vector?</th></tr></thead>
<tbody>
<tr><td><strong>Library</strong></td><td>name, library_type, chunking_config, embedding_instruction, llm_context_prompt</td><td>No</td></tr>
<tr><td><strong>Collection</strong></td><td>name, description, metadata</td><td>No</td></tr>
<tr><td><strong>Item</strong></td><td>title, item_type, s3_key, content_hash, metadata, created_at</td><td>No</td></tr>
<tr><td><strong>Chunk</strong></td><td>chunk_index, chunk_s3_key, chunk_size, embedding (4096d)</td><td><strong>Yes</strong></td></tr>
<tr><td><strong>Concept</strong></td><td>name, concept_type, embedding (4096d)</td><td><strong>Yes</strong></td></tr>
<tr><td><strong>Image</strong></td><td>s3_key, image_type, description, metadata</td><td>No</td></tr>
<tr><td><strong>ImageEmbedding</strong></td><td>embedding (4096d multimodal)</td><td><strong>Yes</strong></td></tr>
</tbody>
</table>
</div>
</div>
</div>
<div class="col-md-6">
<div class="card h-100">
<div class="card-header bg-success text-white"><h4 class="mb-0">Relationships</h4></div>
<div class="card-body">
<table class="table table-sm">
<thead><tr><th>Relationship</th><th>From → To</th><th>Properties</th></tr></thead>
<tbody>
<tr><td><strong>CONTAINS</strong></td><td>Library → Collection</td><td></td></tr>
<tr><td><strong>CONTAINS</strong></td><td>Collection → Item</td><td>position</td></tr>
<tr><td><strong>HAS_CHUNK</strong></td><td>Item → Chunk</td><td></td></tr>
<tr><td><strong>HAS_IMAGE</strong></td><td>Item → Image</td><td>image_role</td></tr>
<tr><td><strong>HAS_EMBEDDING</strong></td><td>Image → ImageEmbedding</td><td></td></tr>
<tr><td><strong>REFERENCES</strong></td><td>Item → Concept</td><td>relevance</td></tr>
<tr><td><strong>MENTIONS</strong></td><td>Chunk → Concept</td><td></td></tr>
<tr><td><strong>RELATED_TO</strong></td><td>Item → Item</td><td>relationship_type, weight</td></tr>
<tr><td><strong>RELATED_TO</strong></td><td>Concept → Concept</td><td>relationship_type</td></tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
<div class="alert alert-warning border-start border-4 border-warning">
<h4><i class="bi bi-lightning"></i> Neo4j Vector Indexes</h4>
<pre class="bg-light p-3 rounded mb-0"><code>// Chunk text+image embeddings (4096 dimensions, no pgvector limits!)
CREATE VECTOR INDEX chunk_embedding FOR (c:Chunk)
ON (c.embedding) OPTIONS {indexConfig: {
`vector.dimensions`: 4096,
`vector.similarity_function`: 'cosine'
}}
// Concept embeddings for semantic concept search
CREATE VECTOR INDEX concept_embedding FOR (con:Concept)
ON (con.embedding) OPTIONS {indexConfig: {
`vector.dimensions`: 4096,
`vector.similarity_function`: 'cosine'
}}
// Image multimodal embeddings
CREATE VECTOR INDEX image_embedding FOR (ie:ImageEmbedding)
ON (ie.embedding) OPTIONS {indexConfig: {
`vector.dimensions`: 4096,
`vector.similarity_function`: 'cosine'
}}
// Full-text index for keyword/BM25-style search
CREATE FULLTEXT INDEX chunk_fulltext FOR (c:Chunk) ON EACH [c.text_preview]</code></pre>
</div>
</section>
<!-- SECTION: CONTENT TYPES -->
<section id="content-types" class="mb-5">
<h2 class="h2 mb-4"><i class="bi bi-tags"></i> Content Type System</h2>
<div class="alert alert-primary border-start border-4 border-primary">
<h3>The Core Innovation</h3>
<p class="mb-0">Each Library has a <strong>library_type</strong> that defines how content is chunked, what embedding instructions are sent to Qwen3-VL, what re-ranking instructions are used, and what context prompt is injected when the LLM generates answers. This is configured per library in the database — not hardcoded.</p>
</div>
<div class="row g-4 mb-4">
<div class="col-md-4">
<div class="card h-100 border-primary">
<div class="card-header bg-primary text-white"><h5 class="mb-0"><i class="bi bi-book"></i> Fiction</h5></div>
<div class="card-body">
<p><strong>Chunking:</strong> Chapter-aware, preserve dialogue blocks, narrative flow</p>
<p><strong>Embedding Instruction:</strong> <em>"Represent the narrative passage for literary retrieval, capturing themes, characters, and plot elements"</em></p>
<p><strong>Reranker Instruction:</strong> <em>"Score relevance of this fiction excerpt to the query, considering narrative themes and character arcs"</em></p>
<p><strong>LLM Context:</strong> <em>"The following excerpts are from fiction. Interpret as narrative — consider themes, symbolism, character development."</em></p>
<p><strong>Multimodal:</strong> Cover art, illustrations</p>
<p><strong>Graph:</strong> Author → Book → Character → Theme</p>
</div>
</div>
</div>
<div class="col-md-4">
<div class="card h-100 border-success">
<div class="card-header bg-success text-white"><h5 class="mb-0"><i class="bi bi-gear"></i> Technical</h5></div>
<div class="card-body">
<p><strong>Chunking:</strong> Section/heading-aware, preserve code blocks and tables as atomic units</p>
<p><strong>Embedding Instruction:</strong> <em>"Represent the technical documentation for precise procedural retrieval"</em></p>
<p><strong>Reranker Instruction:</strong> <em>"Score relevance of this technical documentation to the query, prioritizing procedural accuracy"</em></p>
<p><strong>LLM Context:</strong> <em>"The following excerpts are from technical documentation. Provide precise, actionable instructions."</em></p>
<p><strong>Multimodal:</strong> Diagrams, screenshots, wiring diagrams</p>
<p><strong>Graph:</strong> Product → Manual → Section → Procedure → Tool</p>
</div>
</div>
</div>
<div class="col-md-4">
<div class="card h-100 border-info">
<div class="card-header bg-info text-white"><h5 class="mb-0"><i class="bi bi-music-note-beamed"></i> Music</h5></div>
<div class="card-body">
<p><strong>Chunking:</strong> Song-level (lyrics as one chunk), verse/chorus segmentation</p>
<p><strong>Embedding Instruction:</strong> <em>"Represent the song lyrics and album context for music discovery and thematic analysis"</em></p>
<p><strong>Reranker Instruction:</strong> <em>"Score relevance considering lyrical themes, musical context, and artist style"</em></p>
<p><strong>LLM Context:</strong> <em>"The following excerpts are song lyrics and music metadata. Interpret in musical and cultural context."</em></p>
<p><strong>Multimodal:</strong> Album artwork, liner note images</p>
<p><strong>Graph:</strong> Artist → Album → Track → Genre; Track → SAMPLES → Track</p>
</div>
</div>
</div>
</div>
<div class="row g-4 mb-4">
<div class="col-md-4">
<div class="card h-100 border-warning">
<div class="card-header bg-warning text-dark"><h5 class="mb-0"><i class="bi bi-film"></i> Film</h5></div>
<div class="card-body">
<p><strong>Chunking:</strong> Scene-level for scripts, paragraph-level for synopses</p>
<p><strong>Embedding Instruction:</strong> <em>"Represent the film content for cinematic retrieval, capturing visual and narrative elements"</em></p>
<p><strong>Multimodal:</strong> Movie stills, posters, screenshots</p>
<p><strong>Graph:</strong> Director → Film → Scene → Actor; Film → BASED_ON → Book</p>
</div>
</div>
</div>
<div class="col-md-4">
<div class="card h-100 border-danger">
<div class="card-header bg-danger text-white"><h5 class="mb-0"><i class="bi bi-palette"></i> Art</h5></div>
<div class="card-body">
<p><strong>Chunking:</strong> Description-level, catalog entry as unit</p>
<p><strong>Embedding Instruction:</strong> <em>"Represent the artwork and its description for visual and stylistic retrieval"</em></p>
<p><strong>Multimodal:</strong> <strong>The artwork itself</strong> — primary content is visual</p>
<p><strong>Graph:</strong> Artist → Piece → Style → Movement; Piece → INSPIRED_BY → Piece</p>
</div>
</div>
</div>
<div class="col-md-4">
<div class="card h-100 border-secondary">
<div class="card-header bg-secondary text-white"><h5 class="mb-0"><i class="bi bi-journal-text"></i> Journals</h5></div>
<div class="card-body">
<p><strong>Chunking:</strong> Entry-level (one entry = one chunk), paragraph split for long entries</p>
<p><strong>Embedding Instruction:</strong> <em>"Represent the personal journal entry for temporal and reflective retrieval"</em></p>
<p><strong>Multimodal:</strong> Photos, sketches attached to entries</p>
<p><strong>Graph:</strong> Date → Entry → Topic; Entry → MENTIONS → Person/Place</p>
</div>
</div>
</div>
</div>
</section>
<!-- SECTION: MULTIMODAL PIPELINE -->
<section id="multimodal-pipeline" class="mb-5">
<h2 class="h2 mb-4"><i class="bi bi-eye-fill"></i> Multimodal Embedding &amp; Re-ranking Pipeline</h2>
<div class="alert alert-primary border-start border-4 border-primary">
<h3>Two-Stage Multimodal Pipeline</h3>
<p><strong>Stage 1 — Embedding (Qwen3-VL-Embedding-8B):</strong> Generates 4096-dimensional vectors from text, images, screenshots, and video in a unified semantic space. Accepts content-type-specific instructions for optimized representations.</p>
<p class="mb-0"><strong>Stage 2 — Re-ranking (Qwen3-VL-Reranker-8B):</strong> Takes (query, document) pairs — where both can be multimodal — and outputs precise relevance scores via cross-attention. Dramatically sharpens retrieval accuracy.</p>
</div>
<div class="card mb-4">
<div class="card-header bg-success text-white"><h3 class="mb-0"><i class="bi bi-flow-chart"></i> Embedding &amp; Ingestion Flow</h3></div>
<div class="card-body">
<div class="mermaid">
flowchart TD
A["New Content<br/>(file upload, import)"] --> B{"Content Type?"}
B -->|"Text (PDF, DOCX, MD)"| C["Parse Text<br/>+ Extract Images"]
B -->|"Image (art, photo)"| D["Image Only"]
B -->|"Mixed (manual + diagrams)"| E["Parse Text<br/>+ Keep Page Images"]
C --> F["Chunk Text<br/>(content-type-aware)"]
D --> G["Image to S3"]
E --> F
E --> G
F --> H["Store Chunks in S3"]
H --> I["Qwen3-VL-Embedding<br/>(text + instruction)"]
G --> J["Qwen3-VL-Embedding<br/>(image + instruction)"]
I --> K["4096d Vector"]
J --> K
K --> L["Store in Neo4j<br/>Chunk/ImageEmbedding Node"]
L --> M["Extract Concepts<br/>(LLM entity extraction)"]
M --> N["Create Concept Nodes<br/>+ REFERENCES/MENTIONS edges"]
</div>
</div>
</div>
<div class="row g-4 mb-4">
<div class="col-md-6">
<div class="card h-100">
<div class="card-header bg-info text-white"><h4 class="mb-0">Qwen3-VL-Embedding-8B</h4></div>
<div class="card-body">
<ul>
<li><strong>Dimensions:</strong> 4096 (full), or MRL truncation to 3072/2048/1536/1024</li>
<li><strong>Input:</strong> Text, images, screenshots, video, or any mix</li>
<li><strong>Instruction-aware:</strong> Content-type instruction improves quality 15%</li>
<li><strong>Quantization:</strong> Int8 (~8GB VRAM), Int4 (~4GB VRAM)</li>
<li><strong>Serving:</strong> vLLM with <code>--runner pooling</code></li>
<li><strong>Languages:</strong> 30+ languages supported</li>
</ul>
</div>
</div>
</div>
<div class="col-md-6">
<div class="card h-100">
<div class="card-header bg-warning text-dark"><h4 class="mb-0">Qwen3-VL-Reranker-8B</h4></div>
<div class="card-body">
<ul>
<li><strong>Architecture:</strong> Single-tower cross-attention (deep query↔document interaction)</li>
<li><strong>Input:</strong> (query, document) pairs — both can be multimodal</li>
<li><strong>Output:</strong> Relevance score (sigmoid of yes/no token probabilities)</li>
<li><strong>Instruction-aware:</strong> Custom re-ranking instructions per content type</li>
<li><strong>Serving:</strong> vLLM with <code>--runner pooling</code> + score endpoint</li>
<li><strong>Fallback:</strong> Qwen3-Reranker-0.6B via llama.cpp (text-only)</li>
</ul>
</div>
</div>
</div>
</div>
<div class="alert alert-info border-start border-4 border-info">
<h4><i class="bi bi-image"></i> Why Multimodal Matters</h4>
<p>Traditional RAG systems OCR images and diagrams, producing garbled text. Multimodal embedding understands the <em>visual content</em> directly:</p>
<ul class="mb-0">
<li><strong>Technical diagrams:</strong> Wiring diagrams, network topologies, architecture diagrams — searchable by visual content, not OCR garbage</li>
<li><strong>Album artwork:</strong> "psychedelic album covers from the 70s" finds matching art via visual similarity</li>
<li><strong>Art:</strong> The actual painting/sculpture becomes the searchable content, not just its text description</li>
<li><strong>PDF pages:</strong> Image-only PDF pages with charts and tables are embedded as images, not skipped</li>
</ul>
</div>
</section>
<!-- SECTION: SEARCH PIPELINE -->
<section id="search-pipeline" class="mb-5">
<h2 class="h2 mb-4"><i class="bi bi-search"></i> Search Pipeline — GraphRAG + Vector + Re-rank</h2>
<div class="card mb-4">
<div class="card-header bg-primary text-white"><h3 class="mb-0"><i class="bi bi-flow-chart"></i> Search Flow</h3></div>
<div class="card-body">
<div class="mermaid">
flowchart TD
Q["User Query"] --> E["Embed Query<br/>(Qwen3-VL-Embedding)"]
E --> VS["1. Vector Search<br/>(Neo4j vector index)<br/>Top-K × 3 oversample"]
E --> GT["2. Graph Traversal<br/>(Cypher queries)<br/>Concept + relationship walks"]
Q --> FT["3. Full-Text Search<br/>(Neo4j fulltext index)<br/>Keyword matching"]
VS --> F["Candidate Fusion<br/>+ Deduplication"]
GT --> F
FT --> F
F --> RR["4. Re-Rank<br/>(Qwen3-VL-Reranker)<br/>Cross-attention scoring"]
RR --> TK["Top-K Results"]
TK --> CTX["Inject Content-Type<br/>Context Prompt"]
CTX --> LLM["5. LLM Responder<br/>(Two-stage RAG)"]
LLM --> REV["6. LLM Reviewer<br/>(Quality + citation check)"]
REV --> ANS["Final Answer<br/>with Citations"]
</div>
</div>
</div>
<div class="row g-4 mb-4">
<div class="col-md-4">
<div class="card h-100">
<div class="card-header bg-primary text-white"><h5 class="mb-0">1. Vector Search</h5></div>
<div class="card-body">
<p>Cosine similarity via Neo4j vector index on Chunk and ImageEmbedding nodes.</p>
<pre class="bg-light p-2 rounded"><code>CALL db.index.vector.queryNodes(
'chunk_embedding', 30,
$query_vector
) YIELD node, score
WHERE score > $threshold</code></pre>
</div>
</div>
</div>
<div class="col-md-4">
<div class="card h-100">
<div class="card-header bg-success text-white"><h5 class="mb-0">2. Graph Traversal</h5></div>
<div class="card-body">
<p>Walk relationships to find contextually related content that vector search alone would miss.</p>
<pre class="bg-light p-2 rounded"><code>MATCH (c:Chunk)-[:HAS_CHUNK]-(i:Item)
-[:REFERENCES]->(con:Concept)
-[:RELATED_TO]-(con2:Concept)
<-[:REFERENCES]-(i2:Item)
-[:HAS_CHUNK]->(c2:Chunk)
RETURN c2, i2</code></pre>
</div>
</div>
</div>
<div class="col-md-4">
<div class="card h-100">
<div class="card-header bg-info text-white"><h5 class="mb-0">3. Full-Text Search</h5></div>
<div class="card-body">
<p>Neo4j native full-text index for keyword matching (BM25-equivalent).</p>
<pre class="bg-light p-2 rounded"><code>CALL db.index.fulltext.queryNodes(
'chunk_fulltext',
$query_text
) YIELD node, score</code></pre>
</div>
</div>
</div>
</div>
</section>
<!-- SECTION: MCP INTERFACE -->
<section id="mcp-interface" class="mb-5">
<h2 class="h2 mb-4"><i class="bi bi-plug"></i> MCP Server Interface</h2>
<div class="alert alert-primary border-start border-4 border-primary">
<h3>MCP-First Design</h3>
<p class="mb-0">Mnemosyne exposes its capabilities as MCP tools, making the entire knowledge base accessible to Claude, Copilot, and any MCP-compatible LLM client. The MCP server is a primary interface, not an afterthought.</p>
</div>
<div class="row g-4 mb-4">
<div class="col-md-6">
<div class="card h-100">
<div class="card-header bg-primary text-white"><h4 class="mb-0">Search &amp; Retrieval Tools</h4></div>
<div class="card-body">
<table class="table table-sm">
<thead><tr><th>Tool</th><th>Description</th></tr></thead>
<tbody>
<tr><td><code>search_library</code></td><td>Semantic + graph + full-text search with re-ranking. Filters by library, collection, content type.</td></tr>
<tr><td><code>ask_about</code></td><td>Full RAG pipeline — search, re-rank, content-type context injection, LLM response with citations.</td></tr>
<tr><td><code>find_similar</code></td><td>Find items similar to a given item using vector similarity. Optionally search across libraries.</td></tr>
<tr><td><code>search_by_image</code></td><td>Multimodal search — find content matching an uploaded image.</td></tr>
<tr><td><code>explore_connections</code></td><td>Traverse knowledge graph from an item — find related concepts, authors, themes.</td></tr>
</tbody>
</table>
</div>
</div>
</div>
<div class="col-md-6">
<div class="card h-100">
<div class="card-header bg-success text-white"><h4 class="mb-0">Management &amp; Navigation Tools</h4></div>
<div class="card-body">
<table class="table table-sm">
<thead><tr><th>Tool</th><th>Description</th></tr></thead>
<tbody>
<tr><td><code>browse_libraries</code></td><td>List all libraries with their content types and item counts.</td></tr>
<tr><td><code>browse_collections</code></td><td>List collections within a library.</td></tr>
<tr><td><code>get_item</code></td><td>Get detailed info about a specific item, including metadata and graph connections.</td></tr>
<tr><td><code>add_content</code></td><td>Add new content to a library — triggers async embedding + graph construction.</td></tr>
<tr><td><code>get_concepts</code></td><td>List extracted concepts for an item or across a library.</td></tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</section>
<!-- SECTION: GPU SERVICES -->
<section id="gpu-services" class="mb-5">
<h2 class="h2 mb-4"><i class="bi bi-gpu-card"></i> GPU Services</h2>
<div class="row g-4 mb-4">
<div class="col-md-6">
<div class="card h-100">
<div class="card-header bg-primary text-white"><h4 class="mb-0">RTX 5090 (32GB VRAM)</h4></div>
<div class="card-body">
<table class="table table-sm">
<tbody>
<tr><td><strong>Model</strong></td><td>Qwen3-VL-Reranker-8B</td></tr>
<tr><td><strong>VRAM (bf16)</strong></td><td>~18GB</td></tr>
<tr><td><strong>Serving</strong></td><td>vLLM <code>--runner pooling</code></td></tr>
<tr><td><strong>Port</strong></td><td>:8001</td></tr>
<tr><td><strong>Role</strong></td><td>Multimodal re-ranking</td></tr>
<tr><td><strong>Headroom</strong></td><td>~14GB for chat model</td></tr>
</tbody>
</table>
</div>
</div>
</div>
<div class="col-md-6">
<div class="card h-100">
<div class="card-header bg-success text-white"><h4 class="mb-0">RTX 3090 (24GB VRAM)</h4></div>
<div class="card-body">
<table class="table table-sm">
<tbody>
<tr><td><strong>Model</strong></td><td>Qwen3-VL-Embedding-8B</td></tr>
<tr><td><strong>VRAM (bf16)</strong></td><td>~18GB</td></tr>
<tr><td><strong>Serving</strong></td><td>vLLM <code>--runner pooling</code></td></tr>
<tr><td><strong>Port</strong></td><td>:8002</td></tr>
<tr><td><strong>Role</strong></td><td>Multimodal embedding</td></tr>
<tr><td><strong>Headroom</strong></td><td>~6GB</td></tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
<div class="alert alert-info border-start border-4 border-info">
<h4><i class="bi bi-arrow-repeat"></i> Fallback: llama.cpp (Existing Ansible Infra)</h4>
<p class="mb-0">Text-only Qwen3-Reranker-0.6B GGUF served via <code>llama-server</code> on existing systemd/Ansible infrastructure. Managed by the same playbooks, monitored by the same Grafana dashboards. Used when vLLM services are down or for text-only workloads.</p>
</div>
</section>
<!-- SECTION: DEPLOYMENT -->
<section id="deployment" class="mb-5">
<h2 class="h2 mb-4"><i class="bi bi-box-seam"></i> Deployment</h2>
<div class="row g-4 mb-4">
<div class="col-md-4">
<div class="card h-100">
<div class="card-header bg-primary text-white"><h4 class="mb-0">Core Services</h4></div>
<div class="card-body">
<ul class="mb-0">
<li><strong>web:</strong> Django app (Gunicorn)</li>
<li><strong>postgres:</strong> PostgreSQL (auth/config only)</li>
<li><strong>neo4j:</strong> Neo4j 5.x (knowledge graph + vectors)</li>
<li><strong>rabbitmq:</strong> Celery broker</li>
</ul>
</div>
</div>
</div>
<div class="col-md-4">
<div class="card h-100">
<div class="card-header bg-success text-white"><h4 class="mb-0">Async Processing</h4></div>
<div class="card-body">
<ul class="mb-0">
<li><strong>celery-worker:</strong> Embedding, graph construction</li>
<li><strong>celery-beat:</strong> Scheduled re-sync tasks</li>
</ul>
</div>
</div>
</div>
<div class="col-md-4">
<div class="card h-100">
<div class="card-header bg-info text-white"><h4 class="mb-0">Storage &amp; Proxy</h4></div>
<div class="card-body">
<ul class="mb-0">
<li><strong>minio:</strong> S3-compatible content storage</li>
<li><strong>nginx:</strong> Static/proxy</li>
<li><strong>mcp-server:</strong> MCP interface process</li>
</ul>
</div>
</div>
</div>
</div>
<div class="alert alert-secondary border-start border-4 border-secondary">
<h4>Shared Infrastructure with Spelunker</h4>
<p class="mb-0">Mnemosyne and Spelunker share: GPU model services (llama.cpp + vLLM), MinIO/S3 (separate buckets), Neo4j (separate databases), RabbitMQ (separate vhosts), and Grafana monitoring. Each is its own Docker Compose stack but points to shared infra.</p>
</div>
</section>
<!-- SECTION: BACKPORT -->
<section id="backport" class="mb-5">
<h2 class="h2 mb-4"><i class="bi bi-arrow-left-right"></i> Backport Strategy to Spelunker</h2>
<div class="alert alert-warning border-start border-4 border-warning">
<h3>Build Forward, Backport Back</h3>
<p class="mb-0">Mnemosyne proves the architecture with no legacy constraints. Once validated, proven components flow back to Spelunker to enhance its RFP workflow with multimodal understanding and re-ranking precision.</p>
</div>
<table class="table table-bordered">
<thead class="table-dark"><tr><th>Component</th><th>Mnemosyne (Prove)</th><th>Spelunker (Backport)</th></tr></thead>
<tbody>
<tr><td><strong>RerankerService</strong></td><td>Qwen3-VL multimodal + llama.cpp text</td><td>Drop into <code>rag/services/reranker.py</code></td></tr>
<tr><td><strong>Multimodal Embedding</strong></td><td>Qwen3-VL-Embedding via vLLM</td><td>Add alongside OpenAI embeddings, MRL@1536d for pgvector compat</td></tr>
<tr><td><strong>Diagram Understanding</strong></td><td>Image pages embedded multimodally</td><td>PDF diagrams in RFP docs become searchable</td></tr>
<tr><td><strong>MCP Server</strong></td><td>Primary interface from day one</td><td>Add as secondary interface to Spelunker</td></tr>
<tr><td><strong>Neo4j (optional)</strong></td><td>Primary vector + graph store</td><td>Could replace pgvector, or run alongside</td></tr>
<tr><td><strong>Content-Type Config</strong></td><td>Library type definitions</td><td>Adapt as document classification in Spelunker</td></tr>
</tbody>
</table>
</section>
<div class="alert alert-success border-start border-4 border-success mt-5">
<h3><i class="bi bi-check-circle"></i> Documentation Complete</h3>
<p class="mb-0">This document describes the target architecture for Mnemosyne. Phase implementation documents provide detailed build plans.</p>
</div>
</div>
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js"></script>
</body>
</html>