Robert Helewka e2a6d45b77 chore(validator): drop .env, keep all config in FastAgent YAMLs
OPENAI_BASE_URL was duplicated between .env and fastagent.config.yaml;
the YAML is authoritative, so .env is dead weight. Removing the .env
template and gitignore entry, updating README to reflect.

The real fastagent.secrets.yaml stays gitignored;
fastagent.secrets.yaml.example remains as the documented schema.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 07:01:52 -04:00

Mnemosyne

"The electric light did not come from the continuous improvement of candles." — Oren Harari

The memory of everything you know.

Mnemosyne is a content-type-aware, multimodal personal knowledge management system built on Neo4j knowledge graphs and Qwen3-VL multimodal AI models. Named after the Titan goddess of memory and mother of the nine Muses, Mnemosyne doesn't just store your knowledge — it understands what kind of knowledge it is, connects it through relationships, and makes it all searchable through text, images, and natural language.

What Makes This Different

Every existing knowledge base tool treats all documents identically: text in, chunks out, vectors stored. A novel and a PostgreSQL manual get the same treatment.

Mnemosyne knows the difference:

  • A textbook has chapters, an index, technical terminology, and pedagogical structure. It's chunked accordingly, and when an LLM retrieves results, it knows this is instructional content.
  • A novel has narrative flow, characters, plot arcs, dialogue. The LLM knows to interpret results as creative fiction.
  • Album artwork is a visual asset tied to an artist, genre, and era. It's embedded multimodally — searchable by both image similarity and text description.
  • A journal entry is personal, temporal, reflective. The LLM treats it differently than a reference manual.

This content-type awareness flows through every layer: chunking strategy, embedding instructions, re-ranking, and the final LLM prompt.

Core Architecture

Component Technology Purpose
Knowledge Graph Neo4j 5.x Relationships + vector storage (no dimension limits)
Multimodal Embeddings Qwen3-VL-Embedding-8B Text + image + video in unified vector space (4096d)
Multimodal Re-ranking Synesis (Qwen3-VL-Reranker-2B) Cross-attention precision scoring via /v1/rerank
Web Framework Django 5.x + DRF Auth, admin, API, content management
Object Storage S3/MinIO Original content + chunk text storage
Async Processing Celery + RabbitMQ Document embedding, graph construction
LLM Interface MCP Server Primary interface for Claude, Copilot, etc.
GPU Serving vLLM + llama.cpp Local model inference

Library Types

Library Example Content Multimodal? Graph Relationships
Fiction Novels, short stories Cover art Author → Book → Character → Theme
Nonfiction History, biography, science writing Photos, charts Author → Work → Topic → Person/Place
Technical Textbooks, manuals, docs Diagrams, screenshots Product → Manual → Section → Procedure
Music Lyrics, liner notes Album artwork Artist → Album → Track → Genre
Film Scripts, synopses Stills, posters Director → Film → Scene → Actor
Art Descriptions, catalogs The artwork itself Artist → Piece → Style → Movement
Journal Personal entries, plans, observations Photos Date → Entry → Topic → Person/Place
Business Proposals, marketing, strategy Logos, charts Client → Engagement → Deliverable
Finance Statements, tax, market commentary Charts, statement scans Account → Instrument → Period

Search Pipeline

Query → Vector Search (Neo4j) + Graph Traversal (Cypher) + Full-Text Search
  → Candidate Fusion → Qwen3-VL Re-ranking → Ranked Chunks + Metadata
    → MCP tool result (the calling LLM does its own synthesis)

Heritage

Mnemosyne's RAG pipeline architecture is inspired by Spelunker, an enterprise RFP response platform. The proven patterns — hybrid search, two-stage RAG (responder + reviewer), citation-based retrieval, and async document processing — are carried forward and enhanced with multimodal capabilities and knowledge graph relationships.

Running Mnemosyne

Mnemosyne runs as three cooperating processes: the Django web app (REST API + admin), the MCP server (LLM-facing tools), and one or more Celery workers (async embedding + ingest). All three read configuration from mnemosyne/.env (copy from mnemosyne/.env example and fill in secrets).

Hosts in the Ouranos lab:

  • Postgresportia.incus:5432 (Django ORM: users, IngestJob)
  • Neo4jariel.incus:25554 (knowledge graph + vectors)
  • RabbitMQoberon.incus:5672 (Celery broker)
  • MinIOnyx.helu.ca:8555 (S3-compatible; mnemosyne-content and daedalus buckets)
  • Memcached127.0.0.1:11211 (task progress)

One-time setup

cd mnemosyne/
python manage.py migrate                       # Apply Django ORM migrations
python manage.py setup_neo4j_indexes           # Create Neo4j vector + full-text indexes
python manage.py load_library_types            # Load LIBRARY_TYPE_DEFAULTS into Neo4j

Start the web app

The Django REST API serves /library/api/* (libraries, collections, items, search, workspaces, ingest) and Django admin. Use Gunicorn in production; runserver for dev.

cd mnemosyne/

# Development
python manage.py runserver 0.0.0.0:8000

# Production
gunicorn --bind 0.0.0.0:8000 --workers 3 mnemosyne.wsgi:application

Start the MCP server

The MCP server exposes the LLM-facing tools (search, get_chunk, list_libraries, list_collections, list_items, get_health) over Streamable HTTP at /mcp and SSE at /mcp/sse. Run as a separate Uvicorn process, on its own port, so it can be reverse-proxied or scaled independently of the Django app.

cd mnemosyne/

# Single command: ASGI server hosting the FastMCP app
uvicorn mnemosyne.asgi:app --host 0.0.0.0 --port 22091 --workers 1

The mcp_server/asgi.py mounts FastMCP at /mcp (Streamable HTTP) and /mcp/sse (SSE), with a /mcp/health JSON probe for HAProxy/Pallas.

Start a Celery worker

A single worker that handles all queues (development) plus the focused command Daedalus depends on (the embedding queue, where the Daedalus ingest task lives).

cd mnemosyne/

# Development — one worker, all queues
celery -A mnemosyne worker -l info -Q celery,embedding,batch

# Production — embedding queue (handles Daedalus ingest + embed_item)
celery -A mnemosyne worker -l info -Q embedding -c 1 -n embedding@%h

# Production — batch queue (collection/library bulk operations)
celery -A mnemosyne worker -l info -Q batch -c 2 -n batch@%h

# Production — default queue (LLM validation, misc)
celery -A mnemosyne worker -l info -Q celery -c 2 -n default@%h

Daedalus's POST /library/api/ingest/ dispatches library.tasks.ingest_from_daedalus to the embedding queue. If you only run one worker, make sure it consumes embedding or that task will sit in the broker.

To bypass workers in dev/test, set CELERY_TASK_ALWAYS_EAGER=True in .env.

Scheduler & monitoring (optional):

celery -A mnemosyne beat -l info            # Periodic task scheduler
celery -A mnemosyne flower --port=5555      # Web monitoring UI

See Phase 2: Celery Workers & Scheduler for queue tuning, reliability settings, and task progress tracking.

Daedalus integration endpoints

These endpoints are used by the Daedalus FastAPI backend (HTTP Basic auth). All under /library/api/:

Method Route Purpose
POST /workspaces/ Create a workspace (idempotent on workspace_id); body: {workspace_id, name, library_type, description?}
GET /workspaces/{workspace_id}/ Workspace status (item/chunk counts)
DELETE /workspaces/{workspace_id}/ Delete workspace + reachable content; preserves shared concepts
POST /ingest/ Queue a file for ingestion + embedding
GET /jobs/{job_id}/ Poll ingest job status
POST /jobs/{job_id}/retry/ Re-dispatch a failed job
GET /jobs/?status=&library_uid= List recent jobs

See docs/mnemosyne_integration.md for the full Daedalus contract.

Architecture Note: Retrieval, Not Synthesis

Mnemosyne is a retrieval engine, not a RAG pipeline. It stores, embeds, and ranks — it does not synthesize answers.

The earlier roadmap had a server-side RAG layer that took a query and returned a written answer with citations. That layer has been removed. Calling LLMs (Claude via MCP, principally) are perfectly capable of driving iterative retrieval themselves when given the right primitives, and a server-side synthesis hop adds latency, cost, and a place where errors are harder to debug. Letting the calling LLM see chunks directly — and follow citations, pivot mid-search, or call get_chunk for full text — beats pre-digesting them.

If a "knowledge subagent" is ever wanted (a wrapper that takes a question and returns a written answer), it lives outside Mnemosyne as a thin client over the MCP tools, with its own system prompt. No coupling, no extra inference hop inside the server, and the subagent's behavior can iterate independently.

Documentation

Description
No description provided
Readme MIT 2.7 MiB
Languages
Python 61.8%
JavaScript 21.3%
HTML 9.4%
CSS 6.9%
Shell 0.4%
Other 0.2%