Files
mnemosyne/docs/deploy.md
Robert Helewka 409da7d109
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 56s
CVE Scan & Docker Build / build-and-push (push) Successful in 3m30s
docs: replace daedalus-service basic auth with per-user DRF tokens
2026-05-22 22:59:59 -04:00

14 KiB

Mnemosyne — Ansible Deployment Reference

This document gives the Ansible author everything needed to write and maintain the Mnemosyne deployment role. All implementation decisions are already locked in docker-compose.yaml and nginx/mnemosyne.conf; this document explains the why behind each decision and provides the authoritative list of variables, one-time steps, and verification checks.


1. Host & Stack Overview

Item Value
Deploy target puck.incus (Incus container, 10.10.0.0/24)
Compose project directory /srv/mnemosyne
Image registry git.helu.ca/r/mnemosyne:latest
Public host port 23181 (nginx → HAProxy on Titania → https://mnemosyne.ouranos.helu.ca)
Internal app port app:8000 (Django/gunicorn)
Internal MCP port mcp:8001 (FastMCP/uvicorn)

The four compose services (app, mcp, worker, web) all run from the same image. A one-shot static-init service seeds the nginx static-file volume on every up so static-file changes propagate automatically on deploy without manual intervention.


2. External Dependencies (NOT managed by this role)

These services must exist before Mnemosyne can start. The role only consumes credentials; it does not provision these hosts.

Service Host Notes
PostgreSQL portia.incus:5432 Database mnemosyne, user mnemosyne
Neo4j umbriel.incus:7687 Bolt protocol. Must be dedicated to Mnemosyne — do not share with Spelunker or any other graph workload (see README §Note on Neo4j). HTTP browser on umbriel.incus:25555.
RabbitMQ oberon.incus:5672 vhost mnemosyne, user mnemosyne
MinIO (Mnemosyne bucket) nyx.helu.ca:8555 Bucket mnemosyne-content. Credentials scoped read+write.
MinIO (Daedalus bucket) nyx.helu.ca:8555 Bucket daedalus. Read-only cross-bucket credentials for the ingest worker.
Memcached oberon.incus:11211 Shared; prefix mnemosyne avoids collisions.
Embedder (Qwen3-VL-Embedding) Configured via EMBEDDING_* vars in settings GPU host on Nyx; not managed here.
Reranker (Synesis) Configured via RERANKER_* vars in settings GPU host on Nyx; not managed here.

3. Role Tasks

3.1 Directory & file layout

/srv/mnemosyne/
├── docker-compose.yaml        ← copied from repo (or symlinked via git pull)
├── nginx/
│   └── mnemosyne.conf         ← copied from repo nginx/mnemosyne.conf
└── .env                       ← rendered from Jinja2 template + vault secrets

The role should:

  1. Create /srv/mnemosyne/ and nginx/ (owner: root, mode 0750).
  2. Render .env from the vault-sourced Jinja2 template (mode 0600, owner root).
  3. Copy (or git pull) docker-compose.yaml and nginx/mnemosyne.conf from the repo.

3.2 Pull & start

- name: Pull latest image
  community.docker.docker_compose_v2:
    project_src: /srv/mnemosyne
    pull: always

- name: Bring stack up
  community.docker.docker_compose_v2:
    project_src: /srv/mnemosyne
    state: present

This triggers static-init automatically on every up — no separate handler needed.

3.3 One-time setup (run once on first deploy, idempotent thereafter)

These management commands are safe to re-run; they do nothing if the target state already exists. Run them as a post-start task gated on a creates: sentinel or an explicit when: mnemosyne_first_deploy flag.

# Apply Django ORM migrations (PostgreSQL schema)
docker compose -f /srv/mnemosyne/docker-compose.yaml \
    run --rm app migrate

# Create Neo4j vector + full-text indexes and load library-type defaults
docker compose -f /srv/mnemosyne/docker-compose.yaml \
    run --rm app setup

# Seed the MCPSigningKey used to sign long-lived Pallas team JWTs.
# --retire-other deactivates any previously-active key.  The hex
# emitted to stdout is persisted in Mnemosyne's database and is
# not re-injected from the vault — no operator action required
# beyond running this command once per fresh deployment.
docker compose -f /srv/mnemosyne/docker-compose.yaml \
    run --rm app \
    python manage.py seed_signing_key --kid daedalus-1 --retire-other

# Create Django groups for SSO role mapping (View Only / Staff / SME / Admin).
# Safe to re-run — idempotent.
docker compose -f /srv/mnemosyne/docker-compose.yaml \
    run --rm app \
    python manage.py create_sso_groups

The seed_signing_key command prints the generated secret once to stdout — it is safe to discard that output after the command succeeds. Mnemosyne persists the active key inside MCPSigningKey and reads it directly when minting each team JWT; Daedalus never sees this value. To rotate, re-run the command with --retire-other and then rotate every Pallas team JWT via the Daedalus admin UI so consumers pick up bearers signed with the new key.


4. Environment Variables (.env template)

All variables are consumed by docker-compose.yaml for interpolation into the relevant service environment: blocks. The per-service scoping is defined in docker-compose.yaml; the .env file just provides values.

Django core — app, mcp, worker

Variable Example / default Notes
SECRET_KEY {{ vault_mnemosyne_secret_key }} Fernet-safe; never rotate without re-encrypting stored API keys first
DEBUG False
TIME_ZONE UTC
LANGUAGE_CODE en-us

HTTP surface — app (CSRF), app + mcp (ALLOWED_HOSTS)

Variable Example
ALLOWED_HOSTS localhost,127.0.0.1,mnemosyne.ouranos.helu.ca
CSRF_TRUSTED_ORIGINS https://mnemosyne.ouranos.helu.ca

PostgreSQL — app, mcp, worker

Variable Example
APP_DB_NAME mnemosyne
APP_DB_USER mnemosyne
APP_DB_PASSWORD {{ vault_mnemosyne_db_password }}
DB_HOST portia.incus
DB_PORT 5432

Neo4j — app, mcp, worker

Variable Example
NEOMODEL_NEO4J_BOLT_URL bolt://neo4j:{{ vault_neo4j_password }}@umbriel.incus:7687

URL-encode the password if it contains @ : / # % + ? & = or a space. The Bolt URL parser is strict.

Memcached — app, mcp, worker

Variable Example
KVDB_LOCATION oberon.incus:11211
KVDB_PREFIX mnemosyne

S3 / MinIO (Mnemosyne bucket) — app, mcp, worker

Variable Example
USE_LOCAL_STORAGE False
AWS_ACCESS_KEY_ID {{ vault_mnemosyne_s3_key }}
AWS_SECRET_ACCESS_KEY {{ vault_mnemosyne_s3_secret }}
AWS_STORAGE_BUCKET_NAME mnemosyne-content
AWS_S3_ENDPOINT_URL https://nyx.helu.ca:8555
AWS_S3_USE_SSL True
AWS_S3_VERIFY False (self-signed cert on Nyx)
AWS_S3_REGION_NAME us-east-1

Daedalus S3 (cross-bucket reads) — worker only

Variable Example
DAEDALUS_S3_ENDPOINT_URL https://nyx.helu.ca:8555
DAEDALUS_S3_ACCESS_KEY_ID {{ vault_daedalus_s3_read_key }}
DAEDALUS_S3_SECRET_ACCESS_KEY {{ vault_daedalus_s3_read_secret }}
DAEDALUS_S3_BUCKET_NAME daedalus
DAEDALUS_S3_REGION_NAME us-east-1
DAEDALUS_S3_USE_SSL True
DAEDALUS_S3_VERIFY True

Celery / RabbitMQ — app (producer), worker (consumer)

Variable Example
CELERY_BROKER_URL amqp://mnemosyne:{{ vault_rabbitmq_password | urlencode }}@oberon.incus:5672/mnemosyne
CELERY_RESULT_BACKEND rpc://
CELERY_TASK_ALWAYS_EAGER False

Percent-encode the RabbitMQ password in the broker URL if it contains any URL-special characters. Use Ansible's urlencode filter or pre-encode in the vault variable. An unencoded password is the most common cause of PLAIN 403 ACCESS_REFUSED at worker startup.

Worker tuning — worker only

Variable Default Notes
CELERY_QUEUES celery,embedding,batch Override per host for dedicated queue workers
CELERY_CONCURRENCY 2 Number of worker processes

MCP server — mcp only

Variable Production value
MCP_REQUIRE_AUTH True

SSO / Casdoor — app only

Variable Example / default Notes
CASDOOR_ENABLED True Set False to disable SSO and show only local login
CASDOOR_ORIGIN https://casdoor.ouranos.helu.ca Backend URL used for OIDC discovery (/.well-known/openid-configuration)
CASDOOR_ORIGIN_FRONTEND https://casdoor.ouranos.helu.ca Frontend URL shown to the browser (may differ behind a reverse proxy)
CASDOOR_CLIENT_ID {{ vault_mnemosyne_casdoor_client_id }} OAuth client ID from the Casdoor application
CASDOOR_CLIENT_SECRET {{ vault_mnemosyne_casdoor_client_secret }} OAuth client secret from the Casdoor application
CASDOOR_ORG_NAME ouranos Default organisation slug in Casdoor
CASDOOR_SSL_VERIFY true true in production; false only in sandboxes with self-signed certs
ALLOW_LOCAL_LOGIN False Show the local username/password form to non-superusers. Superusers always see it regardless of this flag.

Register the OIDC callback URL in the Casdoor application before enabling SSO:

https://mnemosyne.ouranos.helu.ca/accounts/oidc/casdoor/login/callback/

LLM API encryption — app, worker

Variable Notes
LLM_API_SECRETS_ENCRYPTION_KEY Fernet key. Generate once: python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())". Never rotate without re-encrypting all stored provider keys first.

Email — app only

Variable Example
EMAIL_HOST oberon.incus
EMAIL_PORT 22025
EMAIL_USE_TLS False

Embedding pipeline — worker only

Variable Default
EMBEDDING_BATCH_SIZE 8
EMBEDDING_TIMEOUT 120

Search & re-ranker — app, mcp

Variable Default
SEARCH_VECTOR_TOP_K 50
SEARCH_FULLTEXT_TOP_K 30
SEARCH_GRAPH_MAX_DEPTH 2
SEARCH_RRF_K 60
SEARCH_DEFAULT_LIMIT 20
RERANKER_MAX_CANDIDATES 32
RERANKER_TIMEOUT 30

Logging — app, mcp, worker

Variable Default
LOGGING_LEVEL INFO
DJANGO_LOGGING_LEVEL WARNING
CELERY_LOGGING_LEVEL INFO

5. Health Probes & Verification

After docker compose up -d, wait for all services to report healthy:

docker compose -f /srv/mnemosyne/docker-compose.yaml ps

Expected: app, mcp, worker, web all healthy; static-init exited (0).

Per-service probes

Service Healthcheck command Expected
app curl -f http://localhost:8000/live/ 200
mcp curl -f http://localhost:8001/mcp/health 200 JSON
web curl -f http://localhost/live/ 200 (proxied to app)
worker celery -A mnemosyne inspect ping -d celery@$HOSTNAME pong

External checks (from inside the 10.10.0.0/24 network)

# Django liveness (via nginx)
curl -f http://puck.incus:23181/live/

# Django readiness (Postgres + Memcached)
curl -f http://puck.incus:23181/ready/

# MCP health (proxied from /healthz → mcp:8001/mcp/health)
curl -f http://puck.incus:23181/healthz

# Prometheus metrics (internal only)
curl http://puck.incus:23181/metrics | head -5

Verify Daedalus auth (per-user API token)

Daedalus now authenticates as a Mnemosyne user via the DRF token shown on /profile/settings/. To smoke-test from a deploy host:

curl -H "Authorization: Token <user-api-token>" \
    https://mnemosyne.ouranos.helu.ca/library/api/workspaces/ws_smoke/ \
    -o /dev/null -w "%{http_code}"
# Expect: 200 if the workspace exists for that user, 404 otherwise.

Verify MCP connectivity (from a client with a valid MCPToken)

curl -H "Authorization: Bearer <token>" \
    https://mnemosyne.ouranos.helu.ca/mcp/health
# Expect: {"status": "ok", ...}

6. Upgrade Procedure

A standard upgrade (new image pushed to git.helu.ca/r/mnemosyne:latest):

cd /srv/mnemosyne
docker compose pull
docker compose up -d          # static-init re-seeds; running containers replaced
docker compose run --rm app migrate   # no-op if no new migrations

The static-init service runs to completion on every up, propagating static file changes without manual volume reset.


7. Rollback

# Pin to a specific digest
docker compose pull git.helu.ca/r/mnemosyne@sha256:<digest>
# Edit docker-compose.yaml image: line to use the digest, then:
docker compose up -d

Alternatively, tag good images in the registry before each deploy and reference the tag.


8. HAProxy / Titania Configuration Notes

Titania terminates TLS and forwards to puck.incus:23181. The nginx config preserves X-Forwarded-Proto: https so Django's request.is_secure(), secure cookies, and build_absolute_uri() work correctly.

The HAProxy health_path for this backend should be /healthz (not /live/ or /ready/) — /healthz short-circuits directly to the FastMCP health endpoint without touching Django, so it can confirm the MCP server is up even if Django is momentarily unhealthy.

If HAProxy checks don't follow redirects, use /live/ and /ready/ with the trailing slash. The un-slashed forms (/live, /ready) trigger Django's APPEND_SLASH 301 redirect, which health checkers that don't follow redirects will report as a failure.


9. Vault Variables Summary

Vault variable Used in .env as
vault_mnemosyne_secret_key SECRET_KEY
vault_mnemosyne_db_password APP_DB_PASSWORD
vault_neo4j_password embedded in NEOMODEL_NEO4J_BOLT_URL
vault_mnemosyne_s3_key AWS_ACCESS_KEY_ID
vault_mnemosyne_s3_secret AWS_SECRET_ACCESS_KEY
vault_daedalus_s3_read_key DAEDALUS_S3_ACCESS_KEY_ID
vault_daedalus_s3_read_secret DAEDALUS_S3_SECRET_ACCESS_KEY
vault_rabbitmq_password embedded in CELERY_BROKER_URL
vault_mnemosyne_llm_encryption_key LLM_API_SECRETS_ENCRYPTION_KEY
vault_mnemosyne_casdoor_client_id CASDOOR_CLIENT_ID
vault_mnemosyne_casdoor_client_secret CASDOOR_CLIENT_SECRET