Reorganize Docker Compose services: the Django/gunicorn container is now `app` and nginx is `web`, better reflecting their roles. Add a dedicated gunicorn configuration and install curl in the runtime image for health checks. Update documentation to reflect: - Neo4j migration from ariel.incus to a dedicated umbriel.incus instance - Rationale for requiring a dedicated Neo4j instance (single-tenancy assumptions, label/index isolation, schema ownership) - New service naming in compose commands and log tailing examples
243 lines
14 KiB
Markdown
243 lines
14 KiB
Markdown
# Mnemosyne
|
|
|
|
*"The electric light did not come from the continuous improvement of candles."* — Oren Harari
|
|
|
|
**The memory of everything you know.**
|
|
|
|
Mnemosyne is a content-type-aware, multimodal personal knowledge management system built on Neo4j knowledge graphs and Qwen3-VL multimodal AI models. Named after the Titan goddess of memory and mother of the nine Muses, Mnemosyne doesn't just store your knowledge — it understands what kind of knowledge it is, connects it through relationships, and makes it all searchable through text, images, and natural language.
|
|
|
|
## What Makes This Different
|
|
|
|
Every existing knowledge base tool treats all documents identically: text in, chunks out, vectors stored. A novel and a PostgreSQL manual get the same treatment.
|
|
|
|
Mnemosyne knows the difference:
|
|
|
|
- **A textbook** has chapters, an index, technical terminology, and pedagogical structure. It's chunked accordingly, and when an LLM retrieves results, it knows this is instructional content.
|
|
- **A novel** has narrative flow, characters, plot arcs, dialogue. The LLM knows to interpret results as creative fiction.
|
|
- **Album artwork** is a visual asset tied to an artist, genre, and era. It's embedded multimodally — searchable by both image similarity and text description.
|
|
- **A journal entry** is personal, temporal, reflective. The LLM treats it differently than a reference manual.
|
|
|
|
This **content-type awareness** flows through every layer: chunking strategy, embedding instructions, re-ranking, and the final LLM prompt.
|
|
|
|
## Core Architecture
|
|
|
|
| Component | Technology | Purpose |
|
|
|-----------|-----------|---------|
|
|
| **Knowledge Graph** | Neo4j 5.x | Relationships + vector storage (no dimension limits) |
|
|
| **Multimodal Embeddings** | Qwen3-VL-Embedding-8B | Text + image + video in unified vector space (4096d) |
|
|
| **Multimodal Re-ranking** | Synesis (Qwen3-VL-Reranker-2B) | Cross-attention precision scoring via `/v1/rerank` |
|
|
| **Web Framework** | Django 5.x + DRF | Auth, admin, API, content management |
|
|
| **Object Storage** | S3/MinIO | Original content + chunk text storage |
|
|
| **Async Processing** | Celery + RabbitMQ | Document embedding, graph construction |
|
|
| **LLM Interface** | MCP Server | Primary interface for Claude, Copilot, etc. |
|
|
| **GPU Serving** | vLLM + llama.cpp | Local model inference |
|
|
|
|
## Library Types
|
|
|
|
| Library | Example Content | Multimodal? | Graph Relationships |
|
|
|---------|----------------|-------------|-------------------|
|
|
| **Fiction** | Novels, short stories | Cover art | Author → Book → Character → Theme |
|
|
| **Nonfiction** | History, biography, science writing | Photos, charts | Author → Work → Topic → Person/Place |
|
|
| **Technical** | Textbooks, manuals, docs | Diagrams, screenshots | Product → Manual → Section → Procedure |
|
|
| **Music** | Lyrics, liner notes | Album artwork | Artist → Album → Track → Genre |
|
|
| **Film** | Scripts, synopses | Stills, posters | Director → Film → Scene → Actor |
|
|
| **Art** | Descriptions, catalogs | The artwork itself | Artist → Piece → Style → Movement |
|
|
| **Journal** | Personal entries, plans, observations | Photos | Date → Entry → Topic → Person/Place |
|
|
| **Business** | Proposals, marketing, strategy | Logos, charts | Client → Engagement → Deliverable |
|
|
| **Finance** | Statements, tax, market commentary | Charts, statement scans | Account → Instrument → Period |
|
|
|
|
## Search Pipeline
|
|
|
|
```
|
|
Query → Vector Search (Neo4j) + Graph Traversal (Cypher) + Full-Text Search
|
|
→ Candidate Fusion → Qwen3-VL Re-ranking → Ranked Chunks + Metadata
|
|
→ MCP tool result (the calling LLM does its own synthesis)
|
|
```
|
|
|
|
## Heritage
|
|
|
|
Mnemosyne's RAG pipeline architecture is inspired by [Spelunker](https://git.helu.ca/r/spelunker), an enterprise RFP response platform. The proven patterns — hybrid search, two-stage RAG (responder + reviewer), citation-based retrieval, and async document processing — are carried forward and enhanced with multimodal capabilities and knowledge graph relationships.
|
|
|
|
## Running Mnemosyne
|
|
|
|
Mnemosyne runs as three cooperating processes: the Django web app (REST API + admin), the MCP server (LLM-facing tools), and one or more Celery workers (async embedding + ingest). All three read configuration from `mnemosyne/.env` (copy from `mnemosyne/.env example` and fill in secrets).
|
|
|
|
Hosts in the Ouranos lab:
|
|
- **Postgres** — `portia.incus:5432` (Django ORM: users, IngestJob)
|
|
- **Neo4j** — `umbriel.incus:7687` (Bolt; dedicated instance — see note below — knowledge graph + vectors; HTTP Browser on `umbriel.incus:25555`)
|
|
- **RabbitMQ** — `oberon.incus:5672` (Celery broker)
|
|
- **MinIO** — `nyx.helu.ca:8555` (S3-compatible; `mnemosyne-content` and `daedalus` buckets)
|
|
- **Memcached** — `127.0.0.1:11211` (task progress)
|
|
|
|
> **Neo4j must be dedicated to Mnemosyne.** Don't share the instance with Spelunker or any other graph workload. Mnemosyne owns the `Library`, `Collection`, `Item`, `Chunk`, and `Concept` labels and runs its own indexes (`chunk_embedding_index`, full-text indexes per library_type) and schema migrations (`setup_neo4j_indexes`, `load_library_types`). The Phase-1 workspace-delete path runs label-scoped `DETACH DELETE` over those labels, and a workspace_id-scoped subgraph is the unit of isolation — both assume single-tenancy. A shared instance risks (1) label/property collisions corrupting the other tenant's graph, (2) vector-index memory contention degrading search latency for both apps, (3) management commands mutating schema another tenant depends on, and (4) backup/restore that can't be reasoned about per-app. Neo4j Community Edition is sufficient — the multi-database feature is Enterprise-only, so isolation has to come from running a separate server process. Run a dedicated instance per environment (one for staging, one for production); point each via `NEOMODEL_NEO4J_BOLT_URL` in that environment's `mnemosyne/.env`.
|
|
|
|
### One-time setup
|
|
|
|
```bash
|
|
cd mnemosyne/
|
|
python manage.py migrate # Apply Django ORM migrations
|
|
python manage.py setup_neo4j_indexes # Create Neo4j vector + full-text indexes
|
|
python manage.py load_library_types # Load LIBRARY_TYPE_DEFAULTS into Neo4j
|
|
```
|
|
|
|
### Start the web app
|
|
|
|
The Django REST API serves `/library/api/*` (libraries, collections, items, search, workspaces, ingest) and Django admin. Use Gunicorn in production; `runserver` for dev.
|
|
|
|
```bash
|
|
cd mnemosyne/
|
|
|
|
# Development
|
|
python manage.py runserver 0.0.0.0:8000
|
|
|
|
# Production
|
|
gunicorn --bind 0.0.0.0:8000 --workers 3 mnemosyne.wsgi:application
|
|
```
|
|
|
|
### Start the MCP server
|
|
|
|
The MCP server exposes the LLM-facing tools (`search`, `get_chunk`, `list_libraries`, `list_collections`, `list_items`, `get_health`) over Streamable HTTP at `/mcp` and SSE at `/mcp/sse`. Run as a separate Uvicorn process, on its own port, so it can be reverse-proxied or scaled independently of the Django app.
|
|
|
|
```bash
|
|
cd mnemosyne/
|
|
|
|
# Single command: ASGI server hosting the FastMCP app
|
|
uvicorn mnemosyne.asgi:app --host 0.0.0.0 --port 22091 --workers 1
|
|
```
|
|
|
|
The `mcp_server/asgi.py` mounts FastMCP at `/mcp` (Streamable HTTP) and `/mcp/sse` (SSE), with a `/mcp/health` JSON probe for HAProxy/Pallas.
|
|
|
|
### Start a Celery worker
|
|
|
|
A single worker that handles all queues (development) plus the focused command Daedalus depends on (the `embedding` queue, where the Daedalus ingest task lives).
|
|
|
|
```bash
|
|
cd mnemosyne/
|
|
|
|
# Development — one worker, all queues
|
|
celery -A mnemosyne worker -l info -Q celery,embedding,batch
|
|
|
|
# Production — embedding queue (handles Daedalus ingest + embed_item)
|
|
celery -A mnemosyne worker -l info -Q embedding -c 1 -n embedding@%h
|
|
|
|
# Production — batch queue (collection/library bulk operations)
|
|
celery -A mnemosyne worker -l info -Q batch -c 2 -n batch@%h
|
|
|
|
# Production — default queue (LLM validation, misc)
|
|
celery -A mnemosyne worker -l info -Q celery -c 2 -n default@%h
|
|
```
|
|
|
|
Daedalus's `POST /library/api/ingest/` dispatches `library.tasks.ingest_from_daedalus` to the **embedding** queue. If you only run one worker, make sure it consumes `embedding` or that task will sit in the broker.
|
|
|
|
To bypass workers in dev/test, set `CELERY_TASK_ALWAYS_EAGER=True` in `.env`.
|
|
|
|
**Scheduler & monitoring (optional):**
|
|
```bash
|
|
celery -A mnemosyne beat -l info # Periodic task scheduler
|
|
celery -A mnemosyne flower --port=5555 # Web monitoring UI
|
|
```
|
|
|
|
See [Phase 2: Celery Workers & Scheduler](docs/PHASE_2_EMBEDDING_PIPELINE.md#celery-workers--scheduler) for queue tuning, reliability settings, and task progress tracking.
|
|
|
|
### Daedalus integration endpoints
|
|
|
|
These endpoints are used by the Daedalus FastAPI backend (HTTP Basic auth). All under `/library/api/`:
|
|
|
|
| Method | Route | Purpose |
|
|
|--------|-------|---------|
|
|
| POST | `/workspaces/` | Create a workspace (idempotent on `workspace_id`); body: `{workspace_id, name, library_type, description?}` |
|
|
| GET | `/workspaces/{workspace_id}/` | Workspace status (item/chunk counts) |
|
|
| DELETE | `/workspaces/{workspace_id}/` | Delete workspace + reachable content; preserves shared concepts |
|
|
| POST | `/ingest/` | Queue a file for ingestion + embedding |
|
|
| GET | `/jobs/{job_id}/` | Poll ingest job status |
|
|
| POST | `/jobs/{job_id}/retry/` | Re-dispatch a failed job |
|
|
| GET | `/jobs/?status=&library_uid=` | List recent jobs |
|
|
|
|
See [docs/mnemosyne_integration.md](docs/mnemosyne_integration.md) for the full Daedalus contract.
|
|
|
|
## Production Deployment
|
|
|
|
Production runs as four containers from a single image (built and pushed by [`.gitea/workflows/cve-scan-docker-build.yml`](.gitea/workflows/cve-scan-docker-build.yml) on every push to `main`):
|
|
|
|
| Service | Role | Port |
|
|
|---------|------|------|
|
|
| `app` | Django REST API + admin (gunicorn) | internal :8000 |
|
|
| `mcp` | FastMCP server (uvicorn) | internal :22091 |
|
|
| `worker` | Celery worker — embedding/ingest/batch | — |
|
|
| `web` | Reverse proxy + static files (nginx) | host :23090 |
|
|
|
|
Plus a one-shot `static-init` service that copies `/app/staticfiles` (baked into the image at build time via `collectstatic`) into the shared volume nginx reads from. It runs to completion on every `up`, so static-file changes propagate on each deploy without manual intervention.
|
|
|
|
External services (NOT spun up by compose): Postgres on Portia, Neo4j on Umbriel (dedicated Mnemosyne instance), RabbitMQ on Oberon, S3/MinIO on Nyx, Memcached, embedder + reranker. All reached over the internal 10.10.0.0/24 network. URLs and credentials live in `mnemosyne/.env`.
|
|
|
|
### First-time bring-up
|
|
|
|
```bash
|
|
# Pull the image (or build locally with `docker compose build`)
|
|
docker compose pull
|
|
|
|
# DB migrations (one-shot)
|
|
docker compose run --rm app migrate
|
|
|
|
# Neo4j indexes + library_type defaults (one-shot)
|
|
docker compose run --rm app setup
|
|
|
|
# Bring the stack up
|
|
docker compose up -d
|
|
```
|
|
|
|
### Day-to-day
|
|
|
|
```bash
|
|
docker compose ps # service status + health
|
|
docker compose logs -f app # tail Django app logs
|
|
docker compose logs -f web # tail nginx logs
|
|
docker compose logs -f worker # tail Celery worker logs
|
|
docker compose restart mcp # restart just the MCP server
|
|
|
|
# After a new image is published:
|
|
docker compose pull && docker compose up -d
|
|
```
|
|
|
|
### Things to verify in `mnemosyne/.env` before bringing up
|
|
|
|
The development `.env` has a few values that need adjusting for production:
|
|
|
|
- `DEBUG=False`
|
|
- `USE_LOCAL_STORAGE=False` (already set; just confirm)
|
|
- `KVDB_LOCATION=<external-memcached-host>:11211` — `127.0.0.1` does not resolve from inside containers
|
|
- `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` filled in
|
|
- `DAEDALUS_S3_*` filled in for cross-bucket reads from the Daedalus bucket
|
|
- `ALLOWED_HOSTS` includes the public hostname HAProxy routes to (e.g. `mnemosyne.ouranos.helu.ca`)
|
|
- `LLM_API_SECRETS_ENCRYPTION_KEY` set to a real Fernet key
|
|
|
|
### Health probes
|
|
|
|
| Endpoint | Probes | Auth |
|
|
|----------|--------|------|
|
|
| `GET /live/` | Django process alive (always 200 if gunicorn is up) | None |
|
|
| `GET /ready/` | PostgreSQL + Memcached reachable (503 if either is down) | None |
|
|
| `GET /healthz` | MCP server `/mcp/health` — used as the HAProxy `health_path` | None |
|
|
| `GET /metrics` | Prometheus scrape | Internal networks only |
|
|
|
|
> **Trailing slashes matter.** Always use `/live/` and `/ready/` (with the trailing slash). The un-slashed forms (`/live`, `/ready`) trigger Django's `APPEND_SLASH` 301 redirect — health check clients that don't follow redirects will report a failure even when the service is healthy.
|
|
|
|
## Architecture Note: Retrieval, Not Synthesis
|
|
|
|
Mnemosyne is a **retrieval engine**, not a RAG pipeline. It stores, embeds, and ranks — it does not synthesize answers.
|
|
|
|
The earlier roadmap had a server-side RAG layer that took a query and returned a written answer with citations. That layer has been removed. Calling LLMs (Claude via MCP, principally) are perfectly capable of driving iterative retrieval themselves when given the right primitives, and a server-side synthesis hop adds latency, cost, and a place where errors are harder to debug. Letting the calling LLM see chunks directly — and follow citations, pivot mid-search, or call `get_chunk` for full text — beats pre-digesting them.
|
|
|
|
If a "knowledge subagent" is ever wanted (a wrapper that takes a question and returns a written answer), it lives **outside** Mnemosyne as a thin client over the MCP tools, with its own system prompt. No coupling, no extra inference hop inside the server, and the subagent's behavior can iterate independently.
|
|
|
|
## Documentation
|
|
|
|
- **[Architecture Documentation](docs/mnemosyne.html)** — Full system architecture with diagrams
|
|
- **[Phase 1: Foundation](docs/PHASE_1_FOUNDATION.md)** — Project skeleton, Neo4j data model, content-type system
|
|
- **[Phase 2: Embedding Pipeline](docs/PHASE_2_EMBEDDING_PIPELINE.md)** — Qwen3-VL multimodal embedding
|
|
- **[Phase 3: Search & Re-ranking](docs/PHASE_3_SEARCH_AND_RERANKING.md)** — Hybrid search + re-ranker
|
|
- **[Phase 5: MCP Server](docs/PHASE_5_MCP_SERVER.md)** — Retrieval primitives for LLMs (`search`, `get_chunk`, `list_libraries`, …)
|
|
- **[Phase 6: Backport to Spelunker](docs/PHASE_6_BACKPORT_TO_SPELUNKER.md)** — Proven patterns flowing back
|
|
|
|
|