mnemosyne/README.md

# Mnemosyne

*"The electric light did not come from the continuous improvement of candles."* — Oren Harari

**The memory of everything you know.**

Mnemosyne is a content-type-aware, multimodal personal knowledge management system built on Neo4j knowledge graphs and Qwen3-VL multimodal AI models. Named after the Titan goddess of memory and mother of the nine Muses, Mnemosyne doesn't just store your knowledge — it understands what kind of knowledge it is, connects it through relationships, and makes it all searchable through text, images, and natural language.

## What Makes This Different

Every existing knowledge base tool treats all documents identically: text in, chunks out, vectors stored. A novel and a PostgreSQL manual get the same treatment.

Mnemosyne knows the difference:

- **A textbook** has chapters, an index, technical terminology, and pedagogical structure. It's chunked accordingly, and when an LLM retrieves results, it knows this is instructional content.
- **A novel** has narrative flow, characters, plot arcs, dialogue. The LLM knows to interpret results as creative fiction.
- **Album artwork** is a visual asset tied to an artist, genre, and era. It's embedded multimodally — searchable by both image similarity and text description.
- **A journal entry** is personal, temporal, reflective. The LLM treats it differently than a reference manual.

This **content-type awareness** flows through every layer: chunking strategy, embedding instructions, re-ranking, and the final LLM prompt.

## Core Architecture

| Component | Technology | Purpose |
|-----------|-----------|---------|
| **Knowledge Graph** | Neo4j 5.x | Relationships + vector storage (no dimension limits) |
| **Multimodal Embeddings** | Qwen3-VL-Embedding-8B | Text + image + video in unified vector space (4096d) |
| **Multimodal Re-ranking** | Synesis (Qwen3-VL-Reranker-2B) | Cross-attention precision scoring via `/v1/rerank` |
| **Web Framework** | Django 5.x + DRF | Auth, admin, API, content management |
| **Object Storage** | S3/MinIO | Original content + chunk text storage |
| **Async Processing** | Celery + RabbitMQ | Document embedding, graph construction |
| **LLM Interface** | MCP Server | Primary interface for Claude, Copilot, etc. |
| **GPU Serving** | vLLM + llama.cpp | Local model inference |

## Library Types

| Library | Example Content | Multimodal? | Graph Relationships |
|---------|----------------|-------------|-------------------|
| **Fiction** | Novels, short stories | Cover art | Author → Book → Character → Theme |
| **Nonfiction** | History, biography, science writing | Photos, charts | Author → Work → Topic → Person/Place |
| **Technical** | Textbooks, manuals, docs | Diagrams, screenshots | Product → Manual → Section → Procedure |
| **Music** | Lyrics, liner notes | Album artwork | Artist → Album → Track → Genre |
| **Film** | Scripts, synopses | Stills, posters | Director → Film → Scene → Actor |
| **Art** | Descriptions, catalogs | The artwork itself | Artist → Piece → Style → Movement |
| **Journal** | Personal entries, plans, observations | Photos | Date → Entry → Topic → Person/Place |
| **Business** | Proposals, marketing, strategy | Logos, charts | Client → Engagement → Deliverable |
| **Finance** | Statements, tax, market commentary | Charts, statement scans | Account → Instrument → Period |

## Search Pipeline

```
Query → Vector Search (Neo4j) + Graph Traversal (Cypher) + Full-Text Search
  → Candidate Fusion → Qwen3-VL Re-ranking → Ranked Chunks + Metadata
    → MCP tool result (the calling LLM does its own synthesis)
```

## Heritage

Mnemosyne's RAG pipeline architecture is inspired by [Spelunker](https://git.helu.ca/r/spelunker), an enterprise RFP response platform. The proven patterns — hybrid search, two-stage RAG (responder + reviewer), citation-based retrieval, and async document processing — are carried forward and enhanced with multimodal capabilities and knowledge graph relationships.

## Running Mnemosyne

Mnemosyne runs as three cooperating processes: the Django web app (REST API + admin), the MCP server (LLM-facing tools), and one or more Celery workers (async embedding + ingest). All three read configuration from `mnemosyne/.env` (copy from `mnemosyne/.env example` and fill in secrets).

Hosts in the Ouranos lab:
- **Postgres** — `portia.incus:5432` (Django ORM: users, IngestJob)
- **Neo4j** — `umbriel.incus:7687` (Bolt; dedicated instance — see note below — knowledge graph + vectors; HTTP Browser on `umbriel.incus:25555`)
- **RabbitMQ** — `oberon.incus:5672` (Celery broker)
- **MinIO** — `nyx.helu.ca:8555` (S3-compatible; `mnemosyne-content` and `daedalus` buckets)
- **Memcached** — `127.0.0.1:11211` (task progress)

> **Neo4j must be dedicated to Mnemosyne.** Don't share the instance with Spelunker or any other graph workload. Mnemosyne owns the `Library`, `Collection`, `Item`, `Chunk`, and `Concept` labels and runs its own indexes (`chunk_embedding_index`, full-text indexes per library_type) and schema migrations (`setup_neo4j_indexes`, `load_library_types`). The Phase-1 workspace-delete path runs label-scoped `DETACH DELETE` over those labels, and a workspace_id-scoped subgraph is the unit of isolation — both assume single-tenancy. A shared instance risks (1) label/property collisions corrupting the other tenant's graph, (2) vector-index memory contention degrading search latency for both apps, (3) management commands mutating schema another tenant depends on, and (4) backup/restore that can't be reasoned about per-app. Neo4j Community Edition is sufficient — the multi-database feature is Enterprise-only, so isolation has to come from running a separate server process. Run a dedicated instance per environment (one for staging, one for production); point each via `NEOMODEL_NEO4J_BOLT_URL` in that environment's `mnemosyne/.env`.

### One-time setup

```bash
cd mnemosyne/
python manage.py migrate                       # Apply Django ORM migrations
python manage.py load_library_types            # Load LIBRARY_TYPE_DEFAULTS into Neo4j
# --- seed the system embedding model in /admin/llm_manager/llmmodel/ here ---
python manage.py setup_neo4j_indexes           # Create Neo4j vector + full-text indexes
```

> **Seed the embedding model before running `setup_neo4j_indexes`.** Vector
> index dimensions are read from the row in ``llm_manager_llmmodel`` that
> has ``is_system_embedding_model=True`` and a non-null ``vector_dimensions``.
> There is deliberately no hardcoded fallback: an index built at the wrong
> dimension silently breaks every search. The command will exit non-zero
> with a clear error if no such row exists, which is why the
> ``docker compose`` ``init`` sidecar does **not** run
> ``setup_neo4j_indexes`` — the stack brings up `migrate` +
> `load_library_types` only, you land in `/admin/` to configure the system
> embedding model, and then you run
> ``docker compose exec app python manage.py setup_neo4j_indexes`` manually
> once. Until that last step runs, vector search returns empty results and
> `library/apps.py` logs a readiness warning. See
> [Docker bootstrap order](#docker-bootstrap-order) below for the full flow.

### Start the web app

The Django REST API serves `/library/api/*` (libraries, collections, items, search, workspaces, ingest) and Django admin. Use Gunicorn in production; `runserver` for dev.

```bash
cd mnemosyne/

# Development
python manage.py runserver 0.0.0.0:8000

# Production
gunicorn --bind 0.0.0.0:8000 --workers 3 mnemosyne.wsgi:application
```

### Start the MCP server

The MCP server exposes the LLM-facing tools (`search`, `get_chunk`, `list_libraries`, `list_collections`, `list_items`, `get_health`) over Streamable HTTP at `/mcp` and SSE at `/mcp/sse`. Run as a separate Uvicorn process, on its own port, so it can be reverse-proxied or scaled independently of the Django app.

```bash
cd mnemosyne/

# Single command: ASGI server hosting the FastMCP app
uvicorn mnemosyne.asgi:app --host 0.0.0.0 --port 231s91 --workers 1
```

The `mcp_server/asgi.py` mounts FastMCP at `/mcp` (Streamable HTTP) and `/mcp/sse` (SSE), with a `/mcp/health` JSON probe for HAProxy/Pallas.

### Start a Celery worker

A single worker that handles all queues (development) plus the focused command Daedalus depends on (the `embedding` queue, where the Daedalus ingest task lives).

```bash
cd mnemosyne/

# Development — one worker, all queues
celery -A mnemosyne worker -l info -Q celery,embedding,batch

# Production — embedding queue (handles Daedalus ingest + embed_item)
celery -A mnemosyne worker -l info -Q embedding -c 1 -n embedding@%h

# Production — batch queue (collection/library bulk operations)
celery -A mnemosyne worker -l info -Q batch -c 2 -n batch@%h

# Production — default queue (LLM validation, misc)
celery -A mnemosyne worker -l info -Q celery -c 2 -n default@%h
```

Daedalus's `POST /library/api/ingest/` dispatches `library.tasks.ingest_from_daedalus` to the **embedding** queue. If you only run one worker, make sure it consumes `embedding` or that task will sit in the broker.

To bypass workers in dev/test, set `CELERY_TASK_ALWAYS_EAGER=True` in `.env`.

**Scheduler & monitoring (optional):**
```bash
celery -A mnemosyne beat -l info            # Periodic task scheduler
celery -A mnemosyne flower --port=5555      # Web monitoring UI
```

See [Phase 2: Celery Workers & Scheduler](docs/PHASE_2_EMBEDDING_PIPELINE.md#celery-workers--scheduler) for queue tuning, reliability settings, and task progress tracking.

### Daedalus integration endpoints

These endpoints are used by the Daedalus FastAPI backend (HTTP Basic auth). All under `/library/api/`:

| Method | Route | Purpose |
|--------|-------|---------|
| POST | `/workspaces/` | Create a workspace (idempotent on `workspace_id`); body: `{workspace_id, name, library_type, description?}` |
| GET | `/workspaces/{workspace_id}/` | Workspace status (item/chunk counts) |
| DELETE | `/workspaces/{workspace_id}/` | Delete workspace + reachable content; preserves shared concepts |
| POST | `/ingest/` | Queue a file for ingestion + embedding |
| GET | `/jobs/{job_id}/` | Poll ingest job status |
| POST | `/jobs/{job_id}/retry/` | Re-dispatch a failed job |
| GET | `/jobs/?status=&library_uid=` | List recent jobs |

See [docs/mnemosyne_integration.md](docs/mnemosyne_integration.md) for the full Daedalus contract.

## Production Deployment

Production runs as four containers from a single image (built and pushed by [`.gitea/workflows/cve-scan-docker-build.yml`](.gitea/workflows/cve-scan-docker-build.yml) on every push to `main`):

| Service | Role | Port |
|---------|------|------|
| `app` | Django REST API + admin (gunicorn) | internal :8000 |
| `mcp` | FastMCP server (uvicorn) | internal :22091 |
| `worker` | Celery worker — embedding/ingest/batch | — |
| `web` | Reverse proxy + static files (nginx) | host :23090 |

Plus a one-shot `static-init` service that copies `/app/staticfiles` (baked into the image at build time via `collectstatic`) into the shared volume nginx reads from. It runs to completion on every `up`, so static-file changes propagate on each deploy without manual intervention.

External services (NOT spun up by compose): Postgres on Portia, Neo4j on Umbriel (dedicated Mnemosyne instance), RabbitMQ on Oberon, S3/MinIO on Nyx, Memcached, embedder + reranker. All reached over the internal 10.10.0.0/24 network.

### Environment scoping

Each compose service declares *only* the environment variables it actually needs — there is no shared `env_file:`. The rationale:

- The MCP server (the most exposed surface, because it talks to outside LLMs) should never see the Celery broker URL or the LLM API encryption key. It only needs Postgres, Neo4j, Memcached, S3, and the MCP-specific auth toggle.
- The Celery worker has no business knowing `ALLOWED_HOSTS`, `CSRF_TRUSTED_ORIGINS`, `MCP_REQUIRE_AUTH`, or the email backend — it doesn't serve HTTP.
- The Django app doesn't need the Daedalus S3 credentials — only the ingest Celery task reads that bucket.
- When a shared secret (like the broker password) is mis-configured, the blast radius is limited to the services that actually need that secret, so you can still observe the rest of the stack while debugging.

Values are interpolated from a `.env` file at the **repo root** (not `mnemosyne/.env`, which is the dev config for bare-Python runs). Copy `.env.example` to `.env` and fill in the blanks, or — in production — have your Ansible role render `.env` from a Jinja2 template with secrets from the vault.

```bash
cp .env.example .env
$EDITOR .env       # fill in SECRET_KEY, DB/RabbitMQ/S3 creds, LLM_API_SECRETS_ENCRYPTION_KEY
```

The per-service surface is defined by the `environment:` blocks in `docker-compose.yaml`; `.env.example` documents every variable with which service(s) consume it.

> **Broker URL gotcha.** If the RabbitMQ password contains any of `@ : / # % + ? & =` or a space, it must be percent-encoded in `CELERY_BROKER_URL`. Kombu's URL parser is strict, and this is the most common cause of a `PLAIN 403 ACCESS_REFUSED` at worker startup when the same credentials work fine under bare-Python `celery` invocations (because you were probably passing them as kwargs, not a URL).

### Docker bootstrap order

Three steps — the first and third are one-liners, the middle step is a
manual sit-down in `/admin/` to configure the system embedding model.
`setup_neo4j_indexes` is **not** run automatically: it reads vector
dimensions from that admin row and hard-fails if the row is missing, so
bundling it into the `init` sidecar would make `app` unreachable on
first boot. Running it manually after admin configuration is the
chicken-and-egg escape.

```bash
# 1. Generate the root .env from the template (or let Ansible do it),
#    pull the image, and bring the stack up. The `init` sidecar runs
#    `migrate` + `load_library_types` and exits; `app`, `mcp`, and
#    `worker` come up healthy.
cp .env.example .env && $EDITOR .env
docker compose pull
docker compose up -d

# 2. Browse to /admin/llm_manager/llmapi/ and add the embedding provider
#    (e.g. Pan Synesis, with the right base URL and API key). Then
#    /admin/llm_manager/llmmodel/ and add one row for the embedding model:
#       - api             = the api you just created
#       - name            = the provider's model name
#       - vector_dimensions = whatever your embedding provider returns
#       - is_system_embedding_model = True
#    Save, then come back to the shell.

# 3. Create Neo4j vector + full-text indexes at the right dimensions.
#    Idempotent — re-run after an embedding-model swap with `--drop` to
#    rebuild, which requires re-embedding all content.
docker compose exec app python manage.py setup_neo4j_indexes
```

Until step 3 runs, vector search returns empty results and
`library/apps.py` logs a readiness warning each time the app boots. This
is deliberate: an index built at the wrong dimension silently breaks
every search, so loud failure beats quiet misconfiguration.

### Day-to-day

```bash
docker compose ps                  # service status + health
docker compose logs -f app         # tail Django app logs
docker compose logs -f web         # tail nginx logs
docker compose logs -f worker      # tail Celery worker logs
docker compose restart mcp         # restart just the MCP server

# After a new image is published:
docker compose pull && docker compose up -d
```

### Things to verify in `.env` before bringing up

The root `.env` (the one compose interpolates from — not `mnemosyne/.env`) needs the following set for a working production deploy:

- `DEBUG=False`
- `USE_LOCAL_STORAGE=False`
- `KVDB_LOCATION=<external-memcached-host>:11211` — `127.0.0.1` does not resolve from inside containers
- `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` filled in (Mnemosyne's own MinIO bucket)
- `DAEDALUS_S3_ACCESS_KEY_ID` / `DAEDALUS_S3_SECRET_ACCESS_KEY` filled in for cross-bucket ingest reads
- `CELERY_BROKER_URL` with the RabbitMQ password **percent-encoded** if it contains URL-special characters
- `ALLOWED_HOSTS` includes the public hostname HAProxy routes to (e.g. `mnemosyne.ouranos.helu.ca`)
- `CSRF_TRUSTED_ORIGINS` includes `https://<same-hostname>`
- `LLM_API_SECRETS_ENCRYPTION_KEY` set to a real Fernet key (generated once per environment)

### Verifying the environment reached a container

If a service misbehaves on startup — typically the worker with an `AccessRefused` from RabbitMQ, or the app with a DB auth error — the fastest diagnostic is to print what Django actually parsed, since that removes every layer of env-file / interpolation / URL-encoding ambiguity:

```bash
# What broker URL did the worker actually receive?
docker compose run --rm --no-deps worker \
    python -c "from django.conf import settings; print(repr(settings.CELERY_BROKER_URL))"

# What DB host/user?
docker compose run --rm --no-deps app \
    python -c "from django.conf import settings; print(settings.DATABASES['default'])"
```

The `repr(...)` form surfaces CRLF, trailing whitespace, stray quotes, or characters that should have been percent-encoded.

### Health probes

| Endpoint | Probes | Auth |
|----------|--------|------|
| `GET /live/` | Django process alive (always 200 if gunicorn is up) | None |
| `GET /ready/` | PostgreSQL + Memcached reachable (503 if either is down) | None |
| `GET /healthz` | MCP server `/mcp/health` — used as the HAProxy `health_path` | None |
| `GET /metrics` | Prometheus scrape | Internal networks only |

> **Trailing slashes matter.** Always use `/live/` and `/ready/` (with the trailing slash). The un-slashed forms (`/live`, `/ready`) trigger Django's `APPEND_SLASH` 301 redirect — health check clients that don't follow redirects will report a failure even when the service is healthy.

## Architecture Note: Retrieval, Not Synthesis

Mnemosyne is a **retrieval engine**, not a RAG pipeline. It stores, embeds, and ranks — it does not synthesize answers.

The earlier roadmap had a server-side RAG layer that took a query and returned a written answer with citations. That layer has been removed. Calling LLMs (Claude via MCP, principally) are perfectly capable of driving iterative retrieval themselves when given the right primitives, and a server-side synthesis hop adds latency, cost, and a place where errors are harder to debug. Letting the calling LLM see chunks directly — and follow citations, pivot mid-search, or call `get_chunk` for full text — beats pre-digesting them.

If a "knowledge subagent" is ever wanted (a wrapper that takes a question and returns a written answer), it lives **outside** Mnemosyne as a thin client over the MCP tools, with its own system prompt. No coupling, no extra inference hop inside the server, and the subagent's behavior can iterate independently.

## Documentation

- **[Architecture Documentation](docs/mnemosyne.html)** — Full system architecture with diagrams
- **[Phase 1: Foundation](docs/PHASE_1_FOUNDATION.md)** — Project skeleton, Neo4j data model, content-type system
- **[Phase 2: Embedding Pipeline](docs/PHASE_2_EMBEDDING_PIPELINE.md)** — Qwen3-VL multimodal embedding
- **[Phase 3: Search & Re-ranking](docs/PHASE_3_SEARCH_AND_RERANKING.md)** — Hybrid search + re-ranker
- **[Phase 5: MCP Server](docs/PHASE_5_MCP_SERVER.md)** — Retrieval primitives for LLMs (`search`, `get_chunk`, `list_libraries`, …)
- **[Phase 6: Backport to Spelunker](docs/PHASE_6_BACKPORT_TO_SPELUNKER.md)** — Proven patterns flowing back