Go to file

Robert Helewka 81426327bf feat(mcp): store MCP tokens as SHA-256 hashes instead of plaintext

Replace plaintext token storage with SHA-256 hashes so leaked database
contents cannot be used to authenticate. Plaintext is generated, shown
once at creation time, and never persisted.

- Add `hash_token()` helper and `MCPTokenManager.create_token()` that
  returns `(instance, plaintext)`.
- Replace `token` field with indexed `token_hash`; look up bearers by
  hashing the incoming value.
- Update dashboard, management command, and admin to surface plaintext
  only at creation. Disable admin "add" since it cannot reveal plaintext.
- Migration drops the old `token` column and adds `token_hash`;
  pre-existing tokens are invalidated and must be reissued.

2026-04-27 09:01:36 -04:00

docs

feat: replace server-side RAG with MCP retrieval primitives

2026-04-26 15:34:26 -04:00

mnemosyne

feat(mcp): store MCP tokens as SHA-256 hashes instead of plaintext

2026-04-27 09:01:36 -04:00

.gitignore

Add Themis application with custom widgets, views, and utilities

2026-03-21 02:00:18 +00:00

LICENSE

Add Themis application with custom widgets, views, and utilities

2026-03-21 02:00:18 +00:00

pyproject.toml

feat: replace server-side RAG with MCP retrieval primitives

2026-04-26 15:34:26 -04:00

README.md

feat: replace server-side RAG with MCP retrieval primitives

2026-04-26 15:34:26 -04:00

README.md

Mnemosyne

"The electric light did not come from the continuous improvement of candles." — Oren Harari

The memory of everything you know.

Mnemosyne is a content-type-aware, multimodal personal knowledge management system built on Neo4j knowledge graphs and Qwen3-VL multimodal AI models. Named after the Titan goddess of memory and mother of the nine Muses, Mnemosyne doesn't just store your knowledge — it understands what kind of knowledge it is, connects it through relationships, and makes it all searchable through text, images, and natural language.

What Makes This Different

Every existing knowledge base tool treats all documents identically: text in, chunks out, vectors stored. A novel and a PostgreSQL manual get the same treatment.

Mnemosyne knows the difference:

A textbook has chapters, an index, technical terminology, and pedagogical structure. It's chunked accordingly, and when an LLM retrieves results, it knows this is instructional content.
A novel has narrative flow, characters, plot arcs, dialogue. The LLM knows to interpret results as creative fiction.
Album artwork is a visual asset tied to an artist, genre, and era. It's embedded multimodally — searchable by both image similarity and text description.
A journal entry is personal, temporal, reflective. The LLM treats it differently than a reference manual.

This content-type awareness flows through every layer: chunking strategy, embedding instructions, re-ranking, and the final LLM prompt.

Core Architecture

Component	Technology	Purpose
Knowledge Graph	Neo4j 5.x	Relationships + vector storage (no dimension limits)
Multimodal Embeddings	Qwen3-VL-Embedding-8B	Text + image + video in unified vector space (4096d)
Multimodal Re-ranking	Synesis (Qwen3-VL-Reranker-2B)	Cross-attention precision scoring via `/v1/rerank`
Web Framework	Django 5.x + DRF	Auth, admin, API, content management
Object Storage	S3/MinIO	Original content + chunk text storage
Async Processing	Celery + RabbitMQ	Document embedding, graph construction
LLM Interface	MCP Server	Primary interface for Claude, Copilot, etc.
GPU Serving	vLLM + llama.cpp	Local model inference

Library Types

Library	Example Content	Multimodal?	Graph Relationships
Fiction	Novels, short stories	Cover art	Author → Book → Character → Theme
Technical	Textbooks, manuals, docs	Diagrams, screenshots	Product → Manual → Section → Procedure
Music	Lyrics, liner notes	Album artwork	Artist → Album → Track → Genre
Film	Scripts, synopses	Stills, posters	Director → Film → Scene → Actor
Art	Descriptions, catalogs	The artwork itself	Artist → Piece → Style → Movement
Journals	Personal entries	Photos	Date → Entry → Topic → Person/Place

Search Pipeline

Query → Vector Search (Neo4j) + Graph Traversal (Cypher) + Full-Text Search
  → Candidate Fusion → Qwen3-VL Re-ranking → Ranked Chunks + Metadata
    → MCP tool result (the calling LLM does its own synthesis)

Heritage

Mnemosyne's RAG pipeline architecture is inspired by Spelunker, an enterprise RFP response platform. The proven patterns — hybrid search, two-stage RAG (responder + reviewer), citation-based retrieval, and async document processing — are carried forward and enhanced with multimodal capabilities and knowledge graph relationships.

Running Celery Workers

Mnemosyne uses Celery with RabbitMQ for async document embedding. From the mnemosyne/ directory:

# Development — single worker, all queues
celery -A mnemosyne worker -l info -Q celery,embedding,batch

# Or skip workers entirely with eager mode (.env):
CELERY_TASK_ALWAYS_EAGER=True

Production — separate workers:

celery -A mnemosyne worker -l info -Q embedding -c 1 -n embedding@%h    # GPU-bound embedding
celery -A mnemosyne worker -l info -Q batch -c 2 -n batch@%h            # Batch orchestration
celery -A mnemosyne worker -l info -Q celery -c 2 -n default@%h         # LLM API validation

Scheduler & Monitoring:

celery -A mnemosyne beat -l info            # Periodic task scheduler
celery -A mnemosyne flower --port=5555      # Web monitoring UI

See Phase 2: Celery Workers & Scheduler for full details on queues, reliability settings, and task progress tracking.

Architecture Note: Retrieval, Not Synthesis

Mnemosyne is a retrieval engine, not a RAG pipeline. It stores, embeds, and ranks — it does not synthesize answers.

The earlier roadmap had a server-side RAG layer that took a query and returned a written answer with citations. That layer has been removed. Calling LLMs (Claude via MCP, principally) are perfectly capable of driving iterative retrieval themselves when given the right primitives, and a server-side synthesis hop adds latency, cost, and a place where errors are harder to debug. Letting the calling LLM see chunks directly — and follow citations, pivot mid-search, or call get_chunk for full text — beats pre-digesting them.

If a "knowledge subagent" is ever wanted (a wrapper that takes a question and returns a written answer), it lives outside Mnemosyne as a thin client over the MCP tools, with its own system prompt. No coupling, no extra inference hop inside the server, and the subagent's behavior can iterate independently.

Documentation

Architecture Documentation — Full system architecture with diagrams
Phase 1: Foundation — Project skeleton, Neo4j data model, content-type system
Phase 2: Embedding Pipeline — Qwen3-VL multimodal embedding
Phase 3: Search & Re-ranking — Hybrid search + re-ranker
Phase 5: MCP Server — Retrieval primitives for LLMs (search, get_chunk, list_libraries, …)
Phase 6: Backport to Spelunker — Proven patterns flowing back

Languages

Python 61.8%

JavaScript 21.3%

HTML 9.4%

CSS 6.9%

Shell 0.4%

Other 0.2%