mnemosyne/docs/mnemosyne.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Mnemosyne — Architecture Documentation</title>
    <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet">
    <link href="https://cdn.jsdelivr.net/npm/bootstrap-icons@1.11.0/font/bootstrap-icons.css" rel="stylesheet">
    <script src="https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.min.js"></script>
    <script>mermaid.initialize({ startOnLoad: true, theme: 'default' });</script>
</head>
<body>
    <div class="container-fluid">
        <nav class="navbar navbar-dark bg-dark rounded mb-4">
            <div class="container-fluid">
                <a class="navbar-brand" href="#"><i class="bi bi-book"></i> Mnemosyne — Architecture Documentation</a>
                <div class="navbar-nav d-flex flex-row">
                    <a class="nav-link me-3" href="#overview">Overview</a>
                    <a class="nav-link me-3" href="#architecture">Architecture</a>
                    <a class="nav-link me-3" href="#data-model">Data Model</a>
                    <a class="nav-link me-3" href="#content-types">Content Types</a>
                    <a class="nav-link me-3" href="#multimodal-pipeline">Multimodal</a>
                    <a class="nav-link me-3" href="#search-pipeline">Search</a>
                    <a class="nav-link me-3" href="#mcp-interface">MCP</a>
                    <a class="nav-link me-3" href="#gpu-services">GPU</a>
                    <a class="nav-link" href="#deployment">Deployment</a>
                </div>
            </div>
        </nav>

        <div class="row">
            <div class="col-12">
                <h1 class="display-4 mb-2"><i class="bi bi-book-fill"></i> Mnemosyne <span class="badge bg-primary">Architecture</span></h1>
                <p class="lead text-muted fst-italic">"The electric light did not come from the continuous improvement of candles." — Oren Harari</p>
                <p class="lead">Mnemosyne is a content-type-aware, multimodal personal knowledge management system built on Neo4j knowledge graphs and Qwen3-VL multimodal AI. Named after the Titan goddess of memory, it understands <em>what kind</em> of knowledge it holds and makes it searchable through text, images, and natural language.</p>
            </div>
        </div>

        <!-- SECTION: OVERVIEW -->
        <section id="overview" class="mb-5">
            <h2 class="h2 mb-4"><i class="bi bi-info-circle"></i> Overview</h2>

            <div class="alert alert-primary border-start border-4 border-primary">
                <h3>Purpose</h3>
                <p><strong>Mnemosyne</strong> is a personal knowledge management system that treats content type as a first-class concept. Unlike generic knowledge bases that treat all documents identically, Mnemosyne understands the difference between a novel, a technical manual, album artwork, and a journal entry — and adjusts its chunking, embedding, search, and LLM prompting accordingly.</p>
            </div>

            <div class="row g-4 mb-4">
                <div class="col-lg-4">
                    <div class="card h-100">
                        <div class="card-body">
                            <h3 class="card-title text-primary"><i class="bi bi-diagram-3"></i> Knowledge Graph</h3>
                            <ul class="mb-0">
                                <li>Neo4j stores relationships between content, not just vectors</li>
                                <li>Author → Book → Character → Theme traversals</li>
                                <li>Artist → Album → Track → Genre connections</li>
                                <li>No vector dimension limits (full 4096d Qwen3-VL)</li>
                                <li>Graph + vector + full-text search in one database</li>
                            </ul>
                        </div>
                    </div>
                </div>
                <div class="col-lg-4">
                    <div class="card h-100">
                        <div class="card-body">
                            <h3 class="card-title text-primary"><i class="bi bi-eye"></i> Multimodal AI</h3>
                            <ul class="mb-0">
                                <li>Qwen3-VL-Embedding: text + images + video in one vector space</li>
                                <li>Qwen3-VL-Reranker: cross-attention scoring across modalities</li>
                                <li>Album art, diagrams, screenshots become searchable</li>
                                <li>Local GPU inference (5090 + 3090) — zero API costs</li>
                                <li>llama.cpp text fallback via existing Ansible/systemd infra</li>
                            </ul>
                        </div>
                    </div>
                </div>
                <div class="col-lg-4">
                    <div class="card h-100">
                        <div class="card-body">
                            <h3 class="card-title text-primary"><i class="bi bi-tags"></i> Content-Type Awareness</h3>
                            <ul class="mb-0">
                                <li>Library types define chunking, embedding, and prompt behavior</li>
                                <li>Fiction: narrative-aware chunking, character extraction</li>
                                <li>Technical: section-aware, code block preservation</li>
                                <li>Music: lyrics as primary, metadata-heavy (genre, mood)</li>
                                <li>Each type injects context into the LLM prompt</li>
                            </ul>
                        </div>
                    </div>
                </div>
            </div>

            <div class="alert alert-info border-start border-4 border-info">
                <h3>Key Differentiators</h3>
                <ul class="mb-0">
                    <li><strong>Content-type-aware pipeline</strong> — chunking, embedding instructions, re-ranking instructions, and LLM context all adapt per library type</li>
                    <li><strong>Neo4j knowledge graph</strong> — traversable relationships, not just flat vector similarity</li>
                    <li><strong>Full multimodal</strong> — Qwen3-VL processes images, diagrams, album art alongside text in a unified vector space</li>
                    <li><strong>No dimension limits</strong> — Neo4j handles 4096d vectors natively (pgvector caps at 2000)</li>
                    <li><strong>MCP-first interface</strong> — designed for LLM integration from day one</li>
                    <li><strong>Proven RAG architecture</strong> — two-stage responder/reviewer pattern inherited from Spelunker</li>
                    <li><strong>Local GPU inference</strong> — zero ongoing API costs via vLLM + llama.cpp on RTX 5090/3090</li>
                </ul>
            </div>

            <div class="alert alert-secondary border-start border-4 border-secondary">
                <h3>Heritage</h3>
                <p class="mb-0">Mnemosyne's RAG pipeline architecture is inspired by <strong>Spelunker</strong>, an enterprise RFP response platform built on Django, PostgreSQL/pgvector, and LangChain. The proven patterns — hybrid search, two-stage RAG, citation-based retrieval, async document processing, and SME-approved knowledge bases — are carried forward and enhanced with multimodal capabilities and knowledge graph relationships. Proven patterns from Mnemosyne will be backported to Spelunker.</p>
            </div>
        </section>

        <!-- SECTION: ARCHITECTURE -->
        <section id="architecture" class="mb-5">
            <h2 class="h2 mb-4"><i class="bi bi-diagram-3"></i> System Architecture</h2>

            <div class="card mb-4">
                <div class="card-header bg-primary text-white"><h3 class="mb-0"><i class="bi bi-diagram-3"></i> High-Level Architecture</h3></div>
                <div class="card-body">
                    <div class="mermaid">
graph TB
    subgraph Clients["Client Layer"]
        MCP["MCP Clients<br/>(Claude, Copilot, etc.)"]
        UI["Django Web UI"]
        API["REST API (DRF)"]
    end

    subgraph App["Application Layer — Django"]
        Core["core/<br/>Users, Auth"]
        Library["library/<br/>Libraries, Collections, Items"]
        Engine["engine/<br/>Embedding, Search, Reranker, RAG"]
        MCPServer["mcp_server/<br/>MCP Tool Interface"]
        Importers["importers/<br/>File, Calibre, Web"]
    end

    subgraph Data["Data Layer"]
        Neo4j["Neo4j 5.x<br/>Knowledge Graph + Vectors"]
        PG["PostgreSQL<br/>Auth, Config, Analytics"]
        S3["S3/MinIO<br/>Content + Chunks"]
        RMQ["RabbitMQ<br/>Task Queue"]
    end

    subgraph GPU["GPU Services"]
        vLLM_E["vLLM<br/>Qwen3-VL-Embedding-8B<br/>(Multimodal Embed)"]
        vLLM_R["vLLM<br/>Qwen3-VL-Reranker-8B<br/>(Multimodal Rerank)"]
        LCPP["llama.cpp<br/>Qwen3-Reranker-0.6B<br/>(Text Fallback)"]
        LCPP_C["llama.cpp<br/>Qwen3 Chat<br/>(RAG Responder)"]
    end

    MCP --> MCPServer
    UI --> Core
    API --> Library
    API --> Engine
    MCPServer --> Engine
    MCPServer --> Library

    Library --> Neo4j
    Engine --> Neo4j
    Engine --> S3
    Core --> PG
    Engine --> vLLM_E
    Engine --> vLLM_R
    Engine --> LCPP
    Engine --> LCPP_C
    Library --> RMQ
                    </div>
                </div>
            </div>

            <div class="row g-4 mb-4">
                <div class="col-md-6">
                    <div class="card">
                        <div class="card-header bg-primary text-white"><h4 class="mb-0"><i class="bi bi-folder"></i> Django Apps</h4></div>
                        <div class="card-body">
                            <ul class="list-group list-group-flush">
                                <li class="list-group-item"><strong>core/</strong> — Users, authentication, profiles, permissions</li>
                                <li class="list-group-item"><strong>library/</strong> — Libraries, Collections, Items, Chunks, Concepts (Neo4j models)</li>
                                <li class="list-group-item"><strong>engine/</strong> — Embedding, search, reranker, RAG pipeline services</li>
                                <li class="list-group-item"><strong>mcp_server/</strong> — MCP tool definitions and server interface</li>
                                <li class="list-group-item"><strong>importers/</strong> — Content acquisition (file upload, Calibre, web scrape)</li>
                                <li class="list-group-item"><strong>llm_manager/</strong> — LLM API/model config, usage tracking (from Spelunker)</li>
                            </ul>
                        </div>
                    </div>
                </div>
                <div class="col-md-6">
                    <div class="card">
                        <div class="card-header bg-success text-white"><h4 class="mb-0"><i class="bi bi-stack"></i> Technology Stack</h4></div>
                        <div class="card-body">
                            <ul>
                                <li><strong>Django 5.x</strong>, Python ≥3.12, Django REST Framework</li>
                                <li><strong>Neo4j 5.x</strong> + django-neomodel — knowledge graph + vector index</li>
                                <li><strong>PostgreSQL</strong> — Django auth, config, analytics only</li>
                                <li><strong>S3/MinIO</strong> — all content and chunk storage</li>
                                <li><strong>Celery + RabbitMQ</strong> — async embedding and graph construction</li>
                                <li><strong>vLLM ≥0.14</strong> — Qwen3-VL multimodal serving</li>
                                <li><strong>llama.cpp</strong> — text model serving (existing Ansible infra)</li>
                                <li><strong>MCP SDK</strong> — Model Context Protocol server</li>
                            </ul>
                        </div>
                    </div>
                </div>
            </div>

            <h3 class="mt-4">Project Structure</h3>
            <pre class="bg-light p-3 rounded"><code>mnemosyne/
├── mnemosyne/          # Django settings, URLs, WSGI/ASGI
├── core/               # Users, auth, profiles
├── library/            # Neo4j models (Library, Collection, Item, Chunk, Concept)
├── engine/             # RAG pipeline services
│   ├── embeddings.py   # Qwen3-VL embedding client
│   ├── reranker.py     # Qwen3-VL reranker client
│   ├── search.py       # Hybrid search (vector + graph + full-text)
│   ├── pipeline.py     # Two-stage RAG (responder + reviewer)
│   ├── llm_client.py   # OpenAI-compatible LLM client
│   └── content_types.py # Library type definitions
├── mcp_server/         # MCP tool definitions
├── importers/          # Content import tools
├── llm_manager/        # LLM API/model config (ported from Spelunker)
├── static/
├── templates/
├── docker-compose.yml
├── pyproject.toml
└── manage.py</code></pre>
        </section>

        <!-- SECTION: DATA MODEL -->
        <section id="data-model" class="mb-5">
            <h2 class="h2 mb-4"><i class="bi bi-database"></i> Data Model — Neo4j Knowledge Graph</h2>

            <div class="alert alert-info border-start border-4 border-info">
                <h3>Dual Database Strategy</h3>
                <p class="mb-0"><strong>Neo4j</strong> stores all content knowledge: libraries, collections, items, chunks, concepts, and their relationships + vector embeddings. <strong>PostgreSQL</strong> stores only Django operational data: users, auth, LLM configurations, analytics, and Celery results. Content never lives in PostgreSQL.</p>
            </div>

            <div class="card mb-4">
                <div class="card-header bg-primary text-white"><h3 class="mb-0"><i class="bi bi-diagram-2"></i> Graph Schema</h3></div>
                <div class="card-body">
                    <div class="mermaid">
graph LR
    L["Library<br/>(fiction, technical,<br/>music, art, journal)"] -->|CONTAINS| Col["Collection<br/>(genre, author,<br/>artist, project)"]
    Col -->|CONTAINS| I["Item<br/>(book, manual,<br/>album, film, entry)"]
    I -->|HAS_CHUNK| Ch["Chunk<br/>(text + optional image<br/>+ 4096d vector)"]
    I -->|REFERENCES| Con["Concept<br/>(person, topic,<br/>technique, theme)"]
    I -->|RELATED_TO| I
    Con -->|RELATED_TO| Con
    Ch -->|MENTIONS| Con
    I -->|HAS_IMAGE| Img["Image<br/>(cover, diagram,<br/>artwork, still)"]
    Img -->|HAS_EMBEDDING| ImgE["ImageEmbedding<br/>(4096d multimodal<br/>vector)"]
                    </div>
                </div>
            </div>

            <div class="row g-4 mb-4">
                <div class="col-md-6">
                    <div class="card h-100">
                        <div class="card-header bg-primary text-white"><h4 class="mb-0">Core Nodes</h4></div>
                        <div class="card-body">
                            <table class="table table-sm">
                                <thead><tr><th>Node</th><th>Key Properties</th><th>Vector?</th></tr></thead>
                                <tbody>
                                    <tr><td><strong>Library</strong></td><td>name, library_type, chunking_config, embedding_instruction, llm_context_prompt</td><td>No</td></tr>
                                    <tr><td><strong>Collection</strong></td><td>name, description, metadata</td><td>No</td></tr>
                                    <tr><td><strong>Item</strong></td><td>title, item_type, s3_key, content_hash, metadata, created_at</td><td>No</td></tr>
                                    <tr><td><strong>Chunk</strong></td><td>chunk_index, chunk_s3_key, chunk_size, embedding (4096d)</td><td><strong>Yes</strong></td></tr>
                                    <tr><td><strong>Concept</strong></td><td>name, concept_type, embedding (4096d)</td><td><strong>Yes</strong></td></tr>
                                    <tr><td><strong>Image</strong></td><td>s3_key, image_type, description, metadata</td><td>No</td></tr>
                                    <tr><td><strong>ImageEmbedding</strong></td><td>embedding (4096d multimodal)</td><td><strong>Yes</strong></td></tr>
                                </tbody>
                            </table>
                        </div>
                    </div>
                </div>
                <div class="col-md-6">
                    <div class="card h-100">
                        <div class="card-header bg-success text-white"><h4 class="mb-0">Relationships</h4></div>
                        <div class="card-body">
                            <table class="table table-sm">
                                <thead><tr><th>Relationship</th><th>From → To</th><th>Properties</th></tr></thead>
                                <tbody>
                                    <tr><td><strong>CONTAINS</strong></td><td>Library → Collection</td><td>—</td></tr>
                                    <tr><td><strong>CONTAINS</strong></td><td>Collection → Item</td><td>position</td></tr>
                                    <tr><td><strong>HAS_CHUNK</strong></td><td>Item → Chunk</td><td>—</td></tr>
                                    <tr><td><strong>HAS_IMAGE</strong></td><td>Item → Image</td><td>image_role</td></tr>
                                    <tr><td><strong>HAS_EMBEDDING</strong></td><td>Image → ImageEmbedding</td><td>—</td></tr>
                                    <tr><td><strong>REFERENCES</strong></td><td>Item → Concept</td><td>relevance</td></tr>
                                    <tr><td><strong>MENTIONS</strong></td><td>Chunk → Concept</td><td>—</td></tr>
                                    <tr><td><strong>RELATED_TO</strong></td><td>Item → Item</td><td>relationship_type, weight</td></tr>
                                    <tr><td><strong>RELATED_TO</strong></td><td>Concept → Concept</td><td>relationship_type</td></tr>
                                </tbody>
                            </table>
                        </div>
                    </div>
                </div>
            </div>

            <div class="alert alert-warning border-start border-4 border-warning">
                <h4><i class="bi bi-lightning"></i> Neo4j Vector Indexes</h4>
                <pre class="bg-light p-3 rounded mb-0"><code>// Chunk text+image embeddings (4096 dimensions, no pgvector limits!)
CREATE VECTOR INDEX chunk_embedding FOR (c:Chunk)
ON (c.embedding) OPTIONS {indexConfig: {
  `vector.dimensions`: 4096,
  `vector.similarity_function`: 'cosine'
}}

// Concept embeddings for semantic concept search
CREATE VECTOR INDEX concept_embedding FOR (con:Concept)
ON (con.embedding) OPTIONS {indexConfig: {
  `vector.dimensions`: 4096,
  `vector.similarity_function`: 'cosine'
}}

// Image multimodal embeddings
CREATE VECTOR INDEX image_embedding FOR (ie:ImageEmbedding)
ON (ie.embedding) OPTIONS {indexConfig: {
  `vector.dimensions`: 4096,
  `vector.similarity_function`: 'cosine'
}}

// Full-text index for keyword/BM25-style search
CREATE FULLTEXT INDEX chunk_fulltext FOR (c:Chunk) ON EACH [c.text_preview]</code></pre>
            </div>
        </section>

        <!-- SECTION: CONTENT TYPES -->
        <section id="content-types" class="mb-5">
            <h2 class="h2 mb-4"><i class="bi bi-tags"></i> Content Type System</h2>

            <div class="alert alert-primary border-start border-4 border-primary">
                <h3>The Core Innovation</h3>
                <p class="mb-0">Each Library has a <strong>library_type</strong> that defines how content is chunked, what embedding instructions are sent to Qwen3-VL, what re-ranking instructions are used, and what context prompt is injected when the LLM generates answers. This is configured per library in the database — not hardcoded.</p>
            </div>

            <div class="row g-4 mb-4">
                <div class="col-md-4">
                    <div class="card h-100 border-primary">
                        <div class="card-header bg-primary text-white"><h5 class="mb-0"><i class="bi bi-book"></i> Fiction</h5></div>
                        <div class="card-body">
                            <p><strong>Chunking:</strong> Chapter-aware, preserve dialogue blocks, narrative flow</p>
                            <p><strong>Embedding Instruction:</strong> <em>"Represent the narrative passage for literary retrieval, capturing themes, characters, and plot elements"</em></p>
                            <p><strong>Reranker Instruction:</strong> <em>"Score relevance of this fiction excerpt to the query, considering narrative themes and character arcs"</em></p>
                            <p><strong>LLM Context:</strong> <em>"The following excerpts are from fiction. Interpret as narrative — consider themes, symbolism, character development."</em></p>
                            <p><strong>Multimodal:</strong> Cover art, illustrations</p>
                            <p><strong>Graph:</strong> Author → Book → Character → Theme</p>
                        </div>
                    </div>
                </div>
                <div class="col-md-4">
                    <div class="card h-100 border-success">
                        <div class="card-header bg-success text-white"><h5 class="mb-0"><i class="bi bi-gear"></i> Technical</h5></div>
                        <div class="card-body">
                            <p><strong>Chunking:</strong> Section/heading-aware, preserve code blocks and tables as atomic units</p>
                            <p><strong>Embedding Instruction:</strong> <em>"Represent the technical documentation for precise procedural retrieval"</em></p>
                            <p><strong>Reranker Instruction:</strong> <em>"Score relevance of this technical documentation to the query, prioritizing procedural accuracy"</em></p>
                            <p><strong>LLM Context:</strong> <em>"The following excerpts are from technical documentation. Provide precise, actionable instructions."</em></p>
                            <p><strong>Multimodal:</strong> Diagrams, screenshots, wiring diagrams</p>
                            <p><strong>Graph:</strong> Product → Manual → Section → Procedure → Tool</p>
                        </div>
                    </div>
                </div>
                <div class="col-md-4">
                    <div class="card h-100 border-info">
                        <div class="card-header bg-info text-white"><h5 class="mb-0"><i class="bi bi-music-note-beamed"></i> Music</h5></div>
                        <div class="card-body">
                            <p><strong>Chunking:</strong> Song-level (lyrics as one chunk), verse/chorus segmentation</p>
                            <p><strong>Embedding Instruction:</strong> <em>"Represent the song lyrics and album context for music discovery and thematic analysis"</em></p>
                            <p><strong>Reranker Instruction:</strong> <em>"Score relevance considering lyrical themes, musical context, and artist style"</em></p>
                            <p><strong>LLM Context:</strong> <em>"The following excerpts are song lyrics and music metadata. Interpret in musical and cultural context."</em></p>
                            <p><strong>Multimodal:</strong> Album artwork, liner note images</p>
                            <p><strong>Graph:</strong> Artist → Album → Track → Genre; Track → SAMPLES → Track</p>
                        </div>
                    </div>
                </div>
            </div>
            <div class="row g-4 mb-4">
                <div class="col-md-4">
                    <div class="card h-100 border-warning">
                        <div class="card-header bg-warning text-dark"><h5 class="mb-0"><i class="bi bi-film"></i> Film</h5></div>
                        <div class="card-body">
                            <p><strong>Chunking:</strong> Scene-level for scripts, paragraph-level for synopses</p>
                            <p><strong>Embedding Instruction:</strong> <em>"Represent the film content for cinematic retrieval, capturing visual and narrative elements"</em></p>
                            <p><strong>Multimodal:</strong> Movie stills, posters, screenshots</p>
                            <p><strong>Graph:</strong> Director → Film → Scene → Actor; Film → BASED_ON → Book</p>
                        </div>
                    </div>
                </div>
                <div class="col-md-4">
                    <div class="card h-100 border-danger">
                        <div class="card-header bg-danger text-white"><h5 class="mb-0"><i class="bi bi-palette"></i> Art</h5></div>
                        <div class="card-body">
                            <p><strong>Chunking:</strong> Description-level, catalog entry as unit</p>
                            <p><strong>Embedding Instruction:</strong> <em>"Represent the artwork and its description for visual and stylistic retrieval"</em></p>
                            <p><strong>Multimodal:</strong> <strong>The artwork itself</strong> — primary content is visual</p>
                            <p><strong>Graph:</strong> Artist → Piece → Style → Movement; Piece → INSPIRED_BY → Piece</p>
                        </div>
                    </div>
                </div>
                <div class="col-md-4">
                    <div class="card h-100 border-secondary">
                        <div class="card-header bg-secondary text-white"><h5 class="mb-0"><i class="bi bi-journal-text"></i> Journals</h5></div>
                        <div class="card-body">
                            <p><strong>Chunking:</strong> Entry-level (one entry = one chunk), paragraph split for long entries</p>
                            <p><strong>Embedding Instruction:</strong> <em>"Represent the personal journal entry for temporal and reflective retrieval"</em></p>
                            <p><strong>Multimodal:</strong> Photos, sketches attached to entries</p>
                            <p><strong>Graph:</strong> Date → Entry → Topic; Entry → MENTIONS → Person/Place</p>
                        </div>
                    </div>
                </div>
            </div>
        </section>

        <!-- SECTION: MULTIMODAL PIPELINE -->
        <section id="multimodal-pipeline" class="mb-5">
            <h2 class="h2 mb-4"><i class="bi bi-eye-fill"></i> Multimodal Embedding &amp; Re-ranking Pipeline</h2>

            <div class="alert alert-primary border-start border-4 border-primary">
                <h3>Two-Stage Multimodal Pipeline</h3>
                <p><strong>Stage 1 — Embedding (Qwen3-VL-Embedding-8B):</strong> Generates 4096-dimensional vectors from text, images, screenshots, and video in a unified semantic space. Accepts content-type-specific instructions for optimized representations.</p>
                <p class="mb-0"><strong>Stage 2 — Re-ranking (Qwen3-VL-Reranker-8B):</strong> Takes (query, document) pairs — where both can be multimodal — and outputs precise relevance scores via cross-attention. Dramatically sharpens retrieval accuracy.</p>
            </div>

            <div class="card mb-4">
                <div class="card-header bg-success text-white"><h3 class="mb-0"><i class="bi bi-flow-chart"></i> Embedding &amp; Ingestion Flow</h3></div>
                <div class="card-body">
                    <div class="mermaid">
flowchart TD
    A["New Content<br/>(file upload, import)"] --> B{"Content Type?"}
    B -->|"Text (PDF, DOCX, MD)"| C["Parse Text<br/>+ Extract Images"]
    B -->|"Image (art, photo)"| D["Image Only"]
    B -->|"Mixed (manual + diagrams)"| E["Parse Text<br/>+ Keep Page Images"]
    C --> F["Chunk Text<br/>(content-type-aware)"]
    D --> G["Image to S3"]
    E --> F
    E --> G
    F --> H["Store Chunks in S3"]
    H --> I["Qwen3-VL-Embedding<br/>(text + instruction)"]
    G --> J["Qwen3-VL-Embedding<br/>(image + instruction)"]
    I --> K["4096d Vector"]
    J --> K
    K --> L["Store in Neo4j<br/>Chunk/ImageEmbedding Node"]
    L --> M["Extract Concepts<br/>(LLM entity extraction)"]
    M --> N["Create Concept Nodes<br/>+ REFERENCES/MENTIONS edges"]
                    </div>
                </div>
            </div>

            <div class="row g-4 mb-4">
                <div class="col-md-6">
                    <div class="card h-100">
                        <div class="card-header bg-info text-white"><h4 class="mb-0">Qwen3-VL-Embedding-8B</h4></div>
                        <div class="card-body">
                            <ul>
                                <li><strong>Dimensions:</strong> 4096 (full), or MRL truncation to 3072/2048/1536/1024</li>
                                <li><strong>Input:</strong> Text, images, screenshots, video, or any mix</li>
                                <li><strong>Instruction-aware:</strong> Content-type instruction improves quality 1–5%</li>
                                <li><strong>Quantization:</strong> Int8 (~8GB VRAM), Int4 (~4GB VRAM)</li>
                                <li><strong>Serving:</strong> vLLM with <code>--runner pooling</code></li>
                                <li><strong>Languages:</strong> 30+ languages supported</li>
                            </ul>
                        </div>
                    </div>
                </div>
                <div class="col-md-6">
                    <div class="card h-100">
                        <div class="card-header bg-warning text-dark"><h4 class="mb-0">Qwen3-VL-Reranker-8B</h4></div>
                        <div class="card-body">
                            <ul>
                                <li><strong>Architecture:</strong> Single-tower cross-attention (deep query↔document interaction)</li>
                                <li><strong>Input:</strong> (query, document) pairs — both can be multimodal</li>
                                <li><strong>Output:</strong> Relevance score (sigmoid of yes/no token probabilities)</li>
                                <li><strong>Instruction-aware:</strong> Custom re-ranking instructions per content type</li>
                                <li><strong>Serving:</strong> vLLM with <code>--runner pooling</code> + score endpoint</li>
                                <li><strong>Fallback:</strong> Qwen3-Reranker-0.6B via llama.cpp (text-only)</li>
                            </ul>
                        </div>
                    </div>
                </div>
            </div>

            <div class="alert alert-info border-start border-4 border-info">
                <h4><i class="bi bi-image"></i> Why Multimodal Matters</h4>
                <p>Traditional RAG systems OCR images and diagrams, producing garbled text. Multimodal embedding understands the <em>visual content</em> directly:</p>
                <ul class="mb-0">
                    <li><strong>Technical diagrams:</strong> Wiring diagrams, network topologies, architecture diagrams — searchable by visual content, not OCR garbage</li>
                    <li><strong>Album artwork:</strong> "psychedelic album covers from the 70s" finds matching art via visual similarity</li>
                    <li><strong>Art:</strong> The actual painting/sculpture becomes the searchable content, not just its text description</li>
                    <li><strong>PDF pages:</strong> Image-only PDF pages with charts and tables are embedded as images, not skipped</li>
                </ul>
            </div>
        </section>

        <!-- SECTION: SEARCH PIPELINE -->
        <section id="search-pipeline" class="mb-5">
            <h2 class="h2 mb-4"><i class="bi bi-search"></i> Search Pipeline — GraphRAG + Vector + Re-rank</h2>

            <div class="card mb-4">
                <div class="card-header bg-primary text-white"><h3 class="mb-0"><i class="bi bi-flow-chart"></i> Search Flow</h3></div>
                <div class="card-body">
                    <div class="mermaid">
flowchart TD
    Q["User Query"] --> E["Embed Query<br/>(Qwen3-VL-Embedding)"]
    E --> VS["1. Vector Search<br/>(Neo4j vector index)<br/>Top-K × 3 oversample"]
    E --> GT["2. Graph Traversal<br/>(Cypher queries)<br/>Concept + relationship walks"]
    Q --> FT["3. Full-Text Search<br/>(Neo4j fulltext index)<br/>Keyword matching"]
    VS --> F["Candidate Fusion<br/>+ Deduplication"]
    GT --> F
    FT --> F
    F --> RR["4. Re-Rank<br/>(Qwen3-VL-Reranker)<br/>Cross-attention scoring"]
    RR --> TK["Top-K Results"]
    TK --> CTX["Inject Content-Type<br/>Context Prompt"]
    CTX --> LLM["5. LLM Responder<br/>(Two-stage RAG)"]
    LLM --> REV["6. LLM Reviewer<br/>(Quality + citation check)"]
    REV --> ANS["Final Answer<br/>with Citations"]
                    </div>
                </div>
            </div>

            <div class="row g-4 mb-4">
                <div class="col-md-4">
                    <div class="card h-100">
                        <div class="card-header bg-primary text-white"><h5 class="mb-0">1. Vector Search</h5></div>
                        <div class="card-body">
                            <p>Cosine similarity via Neo4j vector index on Chunk and ImageEmbedding nodes.</p>
                            <pre class="bg-light p-2 rounded"><code>CALL db.index.vector.queryNodes(
  'chunk_embedding', 30,
  $query_vector
) YIELD node, score
WHERE score > $threshold</code></pre>
                        </div>
                    </div>
                </div>
                <div class="col-md-4">
                    <div class="card h-100">
                        <div class="card-header bg-success text-white"><h5 class="mb-0">2. Graph Traversal</h5></div>
                        <div class="card-body">
                            <p>Walk relationships to find contextually related content that vector search alone would miss.</p>
                            <pre class="bg-light p-2 rounded"><code>MATCH (c:Chunk)-[:HAS_CHUNK]-(i:Item)
  -[:REFERENCES]->(con:Concept)
  -[:RELATED_TO]-(con2:Concept)
  <-[:REFERENCES]-(i2:Item)
  -[:HAS_CHUNK]->(c2:Chunk)
RETURN c2, i2</code></pre>
                        </div>
                    </div>
                </div>
                <div class="col-md-4">
                    <div class="card h-100">
                        <div class="card-header bg-info text-white"><h5 class="mb-0">3. Full-Text Search</h5></div>
                        <div class="card-body">
                            <p>Neo4j native full-text index for keyword matching (BM25-equivalent).</p>
                            <pre class="bg-light p-2 rounded"><code>CALL db.index.fulltext.queryNodes(
  'chunk_fulltext',
  $query_text
) YIELD node, score</code></pre>
                        </div>
                    </div>
                </div>
            </div>
        </section>

        <!-- SECTION: MCP INTERFACE -->
        <section id="mcp-interface" class="mb-5">
            <h2 class="h2 mb-4"><i class="bi bi-plug"></i> MCP Server Interface</h2>

            <div class="alert alert-primary border-start border-4 border-primary">
                <h3>MCP-First Design</h3>
                <p class="mb-0">Mnemosyne exposes its capabilities as MCP tools, making the entire knowledge base accessible to Claude, Copilot, and any MCP-compatible LLM client. The MCP server is a primary interface, not an afterthought.</p>
            </div>

            <div class="row g-4 mb-4">
                <div class="col-md-6">
                    <div class="card h-100">
                        <div class="card-header bg-primary text-white"><h4 class="mb-0">Search &amp; Retrieval Tools</h4></div>
                        <div class="card-body">
                            <table class="table table-sm">
                                <thead><tr><th>Tool</th><th>Description</th></tr></thead>
                                <tbody>
                                    <tr><td><code>search_library</code></td><td>Semantic + graph + full-text search with re-ranking. Filters by library, collection, content type.</td></tr>
                                    <tr><td><code>ask_about</code></td><td>Full RAG pipeline — search, re-rank, content-type context injection, LLM response with citations.</td></tr>
                                    <tr><td><code>find_similar</code></td><td>Find items similar to a given item using vector similarity. Optionally search across libraries.</td></tr>
                                    <tr><td><code>search_by_image</code></td><td>Multimodal search — find content matching an uploaded image.</td></tr>
                                    <tr><td><code>explore_connections</code></td><td>Traverse knowledge graph from an item — find related concepts, authors, themes.</td></tr>
                                </tbody>
                            </table>
                        </div>
                    </div>
                </div>
                <div class="col-md-6">
                    <div class="card h-100">
                        <div class="card-header bg-success text-white"><h4 class="mb-0">Management &amp; Navigation Tools</h4></div>
                        <div class="card-body">
                            <table class="table table-sm">
                                <thead><tr><th>Tool</th><th>Description</th></tr></thead>
                                <tbody>
                                    <tr><td><code>browse_libraries</code></td><td>List all libraries with their content types and item counts.</td></tr>
                                    <tr><td><code>browse_collections</code></td><td>List collections within a library.</td></tr>
                                    <tr><td><code>get_item</code></td><td>Get detailed info about a specific item, including metadata and graph connections.</td></tr>
                                    <tr><td><code>add_content</code></td><td>Add new content to a library — triggers async embedding + graph construction.</td></tr>
                                    <tr><td><code>get_concepts</code></td><td>List extracted concepts for an item or across a library.</td></tr>
                                </tbody>
                            </table>
                        </div>
                    </div>
                </div>
            </div>
        </section>

        <!-- SECTION: GPU SERVICES -->
        <section id="gpu-services" class="mb-5">
            <h2 class="h2 mb-4"><i class="bi bi-gpu-card"></i> GPU Services</h2>

            <div class="row g-4 mb-4">
                <div class="col-md-6">
                    <div class="card h-100">
                        <div class="card-header bg-primary text-white"><h4 class="mb-0">RTX 5090 (32GB VRAM)</h4></div>
                        <div class="card-body">
                            <table class="table table-sm">
                                <tbody>
                                    <tr><td><strong>Model</strong></td><td>Qwen3-VL-Reranker-8B</td></tr>
                                    <tr><td><strong>VRAM (bf16)</strong></td><td>~18GB</td></tr>
                                    <tr><td><strong>Serving</strong></td><td>vLLM <code>--runner pooling</code></td></tr>
                                    <tr><td><strong>Port</strong></td><td>:8001</td></tr>
                                    <tr><td><strong>Role</strong></td><td>Multimodal re-ranking</td></tr>
                                    <tr><td><strong>Headroom</strong></td><td>~14GB for chat model</td></tr>
                                </tbody>
                            </table>
                        </div>
                    </div>
                </div>
                <div class="col-md-6">
                    <div class="card h-100">
                        <div class="card-header bg-success text-white"><h4 class="mb-0">RTX 3090 (24GB VRAM)</h4></div>
                        <div class="card-body">
                            <table class="table table-sm">
                                <tbody>
                                    <tr><td><strong>Model</strong></td><td>Qwen3-VL-Embedding-8B</td></tr>
                                    <tr><td><strong>VRAM (bf16)</strong></td><td>~18GB</td></tr>
                                    <tr><td><strong>Serving</strong></td><td>vLLM <code>--runner pooling</code></td></tr>
                                    <tr><td><strong>Port</strong></td><td>:8002</td></tr>
                                    <tr><td><strong>Role</strong></td><td>Multimodal embedding</td></tr>
                                    <tr><td><strong>Headroom</strong></td><td>~6GB</td></tr>
                                </tbody>
                            </table>
                        </div>
                    </div>
                </div>
            </div>

            <div class="alert alert-info border-start border-4 border-info">
                <h4><i class="bi bi-arrow-repeat"></i> Fallback: llama.cpp (Existing Ansible Infra)</h4>
                <p class="mb-0">Text-only Qwen3-Reranker-0.6B GGUF served via <code>llama-server</code> on existing systemd/Ansible infrastructure. Managed by the same playbooks, monitored by the same Grafana dashboards. Used when vLLM services are down or for text-only workloads.</p>
            </div>
        </section>

        <!-- SECTION: DEPLOYMENT -->
        <section id="deployment" class="mb-5">
            <h2 class="h2 mb-4"><i class="bi bi-box-seam"></i> Deployment</h2>

            <div class="row g-4 mb-4">
                <div class="col-md-4">
                    <div class="card h-100">
                        <div class="card-header bg-primary text-white"><h4 class="mb-0">Core Services</h4></div>
                        <div class="card-body">
                            <ul class="mb-0">
                                <li><strong>web:</strong> Django app (Gunicorn)</li>
                                <li><strong>postgres:</strong> PostgreSQL (auth/config only)</li>
                                <li><strong>neo4j:</strong> Neo4j 5.x (knowledge graph + vectors)</li>
                                <li><strong>rabbitmq:</strong> Celery broker</li>
                            </ul>
                        </div>
                    </div>
                </div>
                <div class="col-md-4">
                    <div class="card h-100">
                        <div class="card-header bg-success text-white"><h4 class="mb-0">Async Processing</h4></div>
                        <div class="card-body">
                            <ul class="mb-0">
                                <li><strong>celery-worker:</strong> Embedding, graph construction</li>
                                <li><strong>celery-beat:</strong> Scheduled re-sync tasks</li>
                            </ul>
                        </div>
                    </div>
                </div>
                <div class="col-md-4">
                    <div class="card h-100">
                        <div class="card-header bg-info text-white"><h4 class="mb-0">Storage &amp; Proxy</h4></div>
                        <div class="card-body">
                            <ul class="mb-0">
                                <li><strong>minio:</strong> S3-compatible content storage</li>
                                <li><strong>nginx:</strong> Static/proxy</li>
                                <li><strong>mcp-server:</strong> MCP interface process</li>
                            </ul>
                        </div>
                    </div>
                </div>
            </div>

            <div class="alert alert-secondary border-start border-4 border-secondary">
                <h4>Shared Infrastructure with Spelunker</h4>
                <p class="mb-0">Mnemosyne and Spelunker share: GPU model services (llama.cpp + vLLM), MinIO/S3 (separate buckets), Neo4j (separate databases), RabbitMQ (separate vhosts), and Grafana monitoring. Each is its own Docker Compose stack but points to shared infra.</p>
            </div>
        </section>

        <!-- SECTION: BACKPORT -->
        <section id="backport" class="mb-5">
            <h2 class="h2 mb-4"><i class="bi bi-arrow-left-right"></i> Backport Strategy to Spelunker</h2>

            <div class="alert alert-warning border-start border-4 border-warning">
                <h3>Build Forward, Backport Back</h3>
                <p class="mb-0">Mnemosyne proves the architecture with no legacy constraints. Once validated, proven components flow back to Spelunker to enhance its RFP workflow with multimodal understanding and re-ranking precision.</p>
            </div>

            <table class="table table-bordered">
                <thead class="table-dark"><tr><th>Component</th><th>Mnemosyne (Prove)</th><th>Spelunker (Backport)</th></tr></thead>
                <tbody>
                    <tr><td><strong>RerankerService</strong></td><td>Qwen3-VL multimodal + llama.cpp text</td><td>Drop into <code>rag/services/reranker.py</code></td></tr>
                    <tr><td><strong>Multimodal Embedding</strong></td><td>Qwen3-VL-Embedding via vLLM</td><td>Add alongside OpenAI embeddings, MRL@1536d for pgvector compat</td></tr>
                    <tr><td><strong>Diagram Understanding</strong></td><td>Image pages embedded multimodally</td><td>PDF diagrams in RFP docs become searchable</td></tr>
                    <tr><td><strong>MCP Server</strong></td><td>Primary interface from day one</td><td>Add as secondary interface to Spelunker</td></tr>
                    <tr><td><strong>Neo4j (optional)</strong></td><td>Primary vector + graph store</td><td>Could replace pgvector, or run alongside</td></tr>
                    <tr><td><strong>Content-Type Config</strong></td><td>Library type definitions</td><td>Adapt as document classification in Spelunker</td></tr>
                </tbody>
            </table>
        </section>

        <div class="alert alert-success border-start border-4 border-success mt-5">
            <h3><i class="bi bi-check-circle"></i> Documentation Complete</h3>
            <p class="mb-0">This document describes the target architecture for Mnemosyne. Phase implementation documents provide detailed build plans.</p>
        </div>
    </div>
    <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js"></script>
</body>
</html>