From de0d7a4317ab69ebce134cf33aa68b42eae5cc6b Mon Sep 17 00:00:00 2001 From: Robert Helewka Date: Mon, 4 May 2026 08:56:49 -0400 Subject: [PATCH] docs(mnemosyne): update integration doc for container deployment --- docs/deploy.md | 375 ++++++++++++++++++++++++++++++++++ docs/mnemosyne_integration.md | 26 +-- 2 files changed, 389 insertions(+), 12 deletions(-) create mode 100644 docs/deploy.md diff --git a/docs/deploy.md b/docs/deploy.md new file mode 100644 index 0000000..9fcfcd6 --- /dev/null +++ b/docs/deploy.md @@ -0,0 +1,375 @@ +# Mnemosyne — Ansible Deployment Reference + +This document gives the Ansible author everything needed to write and maintain the +Mnemosyne deployment role. All implementation decisions are already locked in +`docker-compose.yaml` and `nginx/mnemosyne.conf`; this document explains the +*why* behind each decision and provides the authoritative list of variables, +one-time steps, and verification checks. + +--- + +## 1. Host & Stack Overview + +| Item | Value | +|------|-------| +| Deploy target | `puck.incus` (Incus container, 10.10.0.0/24) | +| Compose project directory | `/opt/mnemosyne` | +| Image registry | `git.helu.ca/r/mnemosyne:latest` | +| Public host port | **23181** (nginx → HAProxy on Titania → `https://mnemosyne.ouranos.helu.ca`) | +| Internal app port | `app:8000` (Django/gunicorn) | +| Internal MCP port | `mcp:8001` (FastMCP/uvicorn) | + +The four compose services (`app`, `mcp`, `worker`, `web`) all run from the same +image. A one-shot `static-init` service seeds the nginx static-file volume on +every `up` so static-file changes propagate automatically on deploy without +manual intervention. + +--- + +## 2. External Dependencies (NOT managed by this role) + +These services must exist before Mnemosyne can start. The role only consumes +credentials; it does not provision these hosts. + +| Service | Host | Notes | +|---------|------|-------| +| PostgreSQL | `portia.incus:5432` | Database `mnemosyne`, user `mnemosyne` | +| Neo4j | `umbriel.incus:7687` | Bolt protocol. **Must be dedicated to Mnemosyne** — do not share with Spelunker or any other graph workload (see README §Note on Neo4j). HTTP browser on `umbriel.incus:25555`. | +| RabbitMQ | `oberon.incus:5672` | vhost `mnemosyne`, user `mnemosyne` | +| MinIO (Mnemosyne bucket) | `nyx.helu.ca:8555` | Bucket `mnemosyne-content`. Credentials scoped read+write. | +| MinIO (Daedalus bucket) | `nyx.helu.ca:8555` | Bucket `daedalus`. **Read-only** cross-bucket credentials for the ingest worker. | +| Memcached | `oberon.incus:11211` | Shared; prefix `mnemosyne` avoids collisions. | +| Embedder (Qwen3-VL-Embedding) | Configured via `EMBEDDING_*` vars in settings | GPU host on Nyx; not managed here. | +| Reranker (Synesis) | Configured via `RERANKER_*` vars in settings | GPU host on Nyx; not managed here. | + +--- + +## 3. Role Tasks + +### 3.1 Directory & file layout + +``` +/opt/mnemosyne/ +├── docker-compose.yaml ← copied from repo (or symlinked via git pull) +├── nginx/ +│ └── mnemosyne.conf ← copied from repo nginx/mnemosyne.conf +└── .env ← rendered from Jinja2 template + vault secrets +``` + +The role should: +1. Create `/opt/mnemosyne/` and `nginx/` (owner: `root`, mode `0750`). +2. Render `.env` from the vault-sourced Jinja2 template (mode `0600`, owner `root`). +3. Copy (or `git pull`) `docker-compose.yaml` and `nginx/mnemosyne.conf` from the repo. + +### 3.2 Pull & start + +```yaml +- name: Pull latest image + community.docker.docker_compose_v2: + project_src: /opt/mnemosyne + pull: always + +- name: Bring stack up + community.docker.docker_compose_v2: + project_src: /opt/mnemosyne + state: present +``` + +This triggers `static-init` automatically on every `up` — no separate handler needed. + +### 3.3 One-time setup (run once on first deploy, idempotent thereafter) + +These management commands are safe to re-run; they do nothing if the target state +already exists. Run them as a post-start task gated on a `creates:` sentinel or +an explicit `when: mnemosyne_first_deploy` flag. + +```bash +# Apply Django ORM migrations (PostgreSQL schema) +docker compose -f /opt/mnemosyne/docker-compose.yaml \ + run --rm app migrate + +# Create Neo4j vector + full-text indexes and load library-type defaults +docker compose -f /opt/mnemosyne/docker-compose.yaml \ + run --rm app setup + +# Create the daedalus-service user (HTTP Basic auth for ingest API) +# Pass --password from vault; idempotent if user already exists. +docker compose -f /opt/mnemosyne/docker-compose.yaml \ + run --rm app \ + python manage.py ensure_service_user \ + --username daedalus-service \ + --password "{{ vault_mnemosyne_daedalus_service_password }}" + +# Seed the MCP signing key (for Phase 2 per-turn JWT auth) +# --retire-other deactivates any previously-active key. +# Print the secret_hex and store in vault as vault_mnemosyne_signing_secret. +docker compose -f /opt/mnemosyne/docker-compose.yaml \ + run --rm app \ + python manage.py seed_signing_key --kid daedalus-1 --retire-other +``` + +The `seed_signing_key` command prints the generated secret once to stdout — +capture it and store in the vault. The Daedalus role reads this secret from the +same vault variable to mint per-turn tokens (Phase 2). + +--- + +## 4. Environment Variables (`.env` template) + +All variables are consumed by `docker-compose.yaml` for interpolation into the +relevant service `environment:` blocks. The per-service scoping is defined in +`docker-compose.yaml`; the `.env` file just provides values. + +### Django core — `app`, `mcp`, `worker` + +| Variable | Example / default | Notes | +|----------|-------------------|-------| +| `SECRET_KEY` | `{{ vault_mnemosyne_secret_key }}` | Fernet-safe; never rotate without re-encrypting stored API keys first | +| `DEBUG` | `False` | | +| `TIME_ZONE` | `UTC` | | +| `LANGUAGE_CODE` | `en-us` | | + +### HTTP surface — `app` (CSRF), `app` + `mcp` (ALLOWED_HOSTS) + +| Variable | Example | +|----------|---------| +| `ALLOWED_HOSTS` | `localhost,127.0.0.1,mnemosyne.ouranos.helu.ca` | +| `CSRF_TRUSTED_ORIGINS` | `https://mnemosyne.ouranos.helu.ca` | + +### PostgreSQL — `app`, `mcp`, `worker` + +| Variable | Example | +|----------|---------| +| `APP_DB_NAME` | `mnemosyne` | +| `APP_DB_USER` | `mnemosyne` | +| `APP_DB_PASSWORD` | `{{ vault_mnemosyne_db_password }}` | +| `DB_HOST` | `portia.incus` | +| `DB_PORT` | `5432` | + +### Neo4j — `app`, `mcp`, `worker` + +| Variable | Example | +|----------|---------| +| `NEOMODEL_NEO4J_BOLT_URL` | `bolt://neo4j:{{ vault_neo4j_password }}@umbriel.incus:7687` | + +> **URL-encode the password** if it contains `@ : / # % + ? & =` or a space. +> The Bolt URL parser is strict. + +### Memcached — `app`, `mcp`, `worker` + +| Variable | Example | +|----------|---------| +| `KVDB_LOCATION` | `oberon.incus:11211` | +| `KVDB_PREFIX` | `mnemosyne` | + +### S3 / MinIO (Mnemosyne bucket) — `app`, `mcp`, `worker` + +| Variable | Example | +|----------|---------| +| `USE_LOCAL_STORAGE` | `False` | +| `AWS_ACCESS_KEY_ID` | `{{ vault_mnemosyne_s3_key }}` | +| `AWS_SECRET_ACCESS_KEY` | `{{ vault_mnemosyne_s3_secret }}` | +| `AWS_STORAGE_BUCKET_NAME` | `mnemosyne-content` | +| `AWS_S3_ENDPOINT_URL` | `https://nyx.helu.ca:8555` | +| `AWS_S3_USE_SSL` | `True` | +| `AWS_S3_VERIFY` | `False` (self-signed cert on Nyx) | +| `AWS_S3_REGION_NAME` | `us-east-1` | + +### Daedalus S3 (cross-bucket reads) — `worker` only + +| Variable | Example | +|----------|---------| +| `DAEDALUS_S3_ENDPOINT_URL` | `https://nyx.helu.ca:8555` | +| `DAEDALUS_S3_ACCESS_KEY_ID` | `{{ vault_daedalus_s3_read_key }}` | +| `DAEDALUS_S3_SECRET_ACCESS_KEY` | `{{ vault_daedalus_s3_read_secret }}` | +| `DAEDALUS_S3_BUCKET_NAME` | `daedalus` | +| `DAEDALUS_S3_REGION_NAME` | `us-east-1` | +| `DAEDALUS_S3_USE_SSL` | `True` | +| `DAEDALUS_S3_VERIFY` | `True` | + +### Celery / RabbitMQ — `app` (producer), `worker` (consumer) + +| Variable | Example | +|----------|---------| +| `CELERY_BROKER_URL` | `amqp://mnemosyne:{{ vault_rabbitmq_password \| urlencode }}@oberon.incus:5672/mnemosyne` | +| `CELERY_RESULT_BACKEND` | `rpc://` | +| `CELERY_TASK_ALWAYS_EAGER` | `False` | + +> **Percent-encode** the RabbitMQ password in the broker URL if it contains any +> URL-special characters. Use Ansible's `urlencode` filter or pre-encode in the +> vault variable. An unencoded password is the most common cause of +> `PLAIN 403 ACCESS_REFUSED` at worker startup. + +### Worker tuning — `worker` only + +| Variable | Default | Notes | +|----------|---------|-------| +| `CELERY_QUEUES` | `celery,embedding,batch` | Override per host for dedicated queue workers | +| `CELERY_CONCURRENCY` | `2` | Number of worker processes | + +### MCP server — `mcp` only + +| Variable | Production value | +|----------|-----------------| +| `MCP_REQUIRE_AUTH` | `True` | + +### LLM API encryption — `app`, `worker` + +| Variable | Notes | +|----------|-------| +| `LLM_API_SECRETS_ENCRYPTION_KEY` | Fernet key. Generate once: `python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"`. Never rotate without re-encrypting all stored provider keys first. | + +### Email — `app` only + +| Variable | Example | +|----------|---------| +| `EMAIL_HOST` | `oberon.incus` | +| `EMAIL_PORT` | `22025` | +| `EMAIL_USE_TLS` | `False` | + +### Embedding pipeline — `worker` only + +| Variable | Default | +|----------|---------| +| `EMBEDDING_BATCH_SIZE` | `8` | +| `EMBEDDING_TIMEOUT` | `120` | + +### Search & re-ranker — `app`, `mcp` + +| Variable | Default | +|----------|---------| +| `SEARCH_VECTOR_TOP_K` | `50` | +| `SEARCH_FULLTEXT_TOP_K` | `30` | +| `SEARCH_GRAPH_MAX_DEPTH` | `2` | +| `SEARCH_RRF_K` | `60` | +| `SEARCH_DEFAULT_LIMIT` | `20` | +| `RERANKER_MAX_CANDIDATES` | `32` | +| `RERANKER_TIMEOUT` | `30` | + +### Logging — `app`, `mcp`, `worker` + +| Variable | Default | +|----------|---------| +| `LOGGING_LEVEL` | `INFO` | +| `DJANGO_LOGGING_LEVEL` | `WARNING` | +| `CELERY_LOGGING_LEVEL` | `INFO` | + +--- + +## 5. Health Probes & Verification + +After `docker compose up -d`, wait for all services to report healthy: + +```bash +docker compose -f /opt/mnemosyne/docker-compose.yaml ps +``` + +Expected: `app`, `mcp`, `worker`, `web` all `healthy`; `static-init` `exited (0)`. + +### Per-service probes + +| Service | Healthcheck command | Expected | +|---------|---------------------|----------| +| `app` | `curl -f http://localhost:8000/live/` | 200 | +| `mcp` | `curl -f http://localhost:8001/mcp/health` | 200 JSON | +| `web` | `curl -f http://localhost/live/` | 200 (proxied to app) | +| `worker` | `celery -A mnemosyne inspect ping -d celery@$HOSTNAME` | `pong` | + +### External checks (from inside the 10.10.0.0/24 network) + +```bash +# Django liveness (via nginx) +curl -f http://puck.incus:23181/live/ + +# Django readiness (Postgres + Memcached) +curl -f http://puck.incus:23181/ready/ + +# MCP health (proxied from /healthz → mcp:8001/mcp/health) +curl -f http://puck.incus:23181/healthz + +# Prometheus metrics (internal only) +curl http://puck.incus:23181/metrics | head -5 +``` + +### Verify the daedalus-service account + +```bash +curl -u daedalus-service: \ + https://mnemosyne.ouranos.helu.ca/library/api/workspaces/ \ + -o /dev/null -w "%{http_code}" +# Expect: 200 +``` + +### Verify MCP connectivity (from a client with a valid MCPToken) + +```bash +curl -H "Authorization: Bearer " \ + https://mnemosyne.ouranos.helu.ca/mcp/health +# Expect: {"status": "ok", ...} +``` + +--- + +## 6. Upgrade Procedure + +A standard upgrade (new image pushed to `git.helu.ca/r/mnemosyne:latest`): + +```bash +cd /opt/mnemosyne +docker compose pull +docker compose up -d # static-init re-seeds; running containers replaced +docker compose run --rm app migrate # no-op if no new migrations +``` + +The `static-init` service runs to completion on every `up`, propagating static +file changes without manual volume reset. + +--- + +## 7. Rollback + +```bash +# Pin to a specific digest +docker compose pull git.helu.ca/r/mnemosyne@sha256: +# Edit docker-compose.yaml image: line to use the digest, then: +docker compose up -d +``` + +Alternatively, tag good images in the registry before each deploy and reference +the tag. + +--- + +## 8. HAProxy / Titania Configuration Notes + +Titania terminates TLS and forwards to `puck.incus:23181`. The nginx config +preserves `X-Forwarded-Proto: https` so Django's `request.is_secure()`, secure +cookies, and `build_absolute_uri()` work correctly. + +The HAProxy `health_path` for this backend should be `/healthz` (not `/live/` or +`/ready/`) — `/healthz` short-circuits directly to the FastMCP health endpoint +without touching Django, so it can confirm the MCP server is up even if Django +is momentarily unhealthy. + +If HAProxy checks don't follow redirects, use `/live/` and `/ready/` **with** the +trailing slash. The un-slashed forms (`/live`, `/ready`) trigger Django's +`APPEND_SLASH` 301 redirect, which health checkers that don't follow redirects +will report as a failure. + +--- + +## 9. Vault Variables Summary + +| Vault variable | Used in `.env` as | +|----------------|-------------------| +| `vault_mnemosyne_secret_key` | `SECRET_KEY` | +| `vault_mnemosyne_db_password` | `APP_DB_PASSWORD` | +| `vault_neo4j_password` | embedded in `NEOMODEL_NEO4J_BOLT_URL` | +| `vault_mnemosyne_s3_key` | `AWS_ACCESS_KEY_ID` | +| `vault_mnemosyne_s3_secret` | `AWS_SECRET_ACCESS_KEY` | +| `vault_daedalus_s3_read_key` | `DAEDALUS_S3_ACCESS_KEY_ID` | +| `vault_daedalus_s3_read_secret` | `DAEDALUS_S3_SECRET_ACCESS_KEY` | +| `vault_rabbitmq_password` | embedded in `CELERY_BROKER_URL` | +| `vault_mnemosyne_llm_encryption_key` | `LLM_API_SECRETS_ENCRYPTION_KEY` | +| `vault_mnemosyne_daedalus_service_password` | passed to `ensure_service_user --password` | +| `vault_mnemosyne_signing_secret` | (Phase 2) printed by `seed_signing_key`, stored here, consumed by Daedalus role | diff --git a/docs/mnemosyne_integration.md b/docs/mnemosyne_integration.md index d1f3df1..d9eee56 100644 --- a/docs/mnemosyne_integration.md +++ b/docs/mnemosyne_integration.md @@ -25,10 +25,13 @@ Mnemosyne exposes two interfaces for the wider Ouranos ecosystem: ### Port & URL -| Endpoint | Internal | Public (via nginx) | +| Endpoint | Internal (container) | Public (via nginx on host port 23181) | |---|---|---| -| MCP server | `http://mcp:22091/mcp/` | `http://puck.incus:23090/mcp/` | -| Health check | `http://mcp:22091/mcp/health` | `http://puck.incus:23090/healthz` | +| Django REST API | `http://app:8000/` | `https://mnemosyne.ouranos.helu.ca/` | +| MCP server | `http://mcp:8001/mcp/` | `https://mnemosyne.ouranos.helu.ca/mcp/` | +| MCP health | `http://mcp:8001/mcp/health` | `https://mnemosyne.ouranos.helu.ca/healthz` | +| Django liveness | `http://app:8000/live/` | internal only | +| Django readiness | `http://app:8000/ready/` | internal only | ### Project structure (as built) @@ -71,17 +74,16 @@ The `workspace_id` parameter is present on every search/discovery tool but is ** ### Deployment -Separate Uvicorn process alongside Django's Gunicorn: +Production runs as four containers from a single image via `docker-compose.yaml`. The nginx `web` container is the only publicly-exposed service, listening on **host port 23181**, which HAProxy on Titania reverse-proxies as `https://mnemosyne.ouranos.helu.ca`. -```bash -# Django WSGI (existing) -gunicorn --bind :22090 --workers 3 mnemosyne.wsgi +| Container | Internal port | Role | +|-----------|--------------|------| +| `app` | 8000 | Django REST API + admin (gunicorn) | +| `mcp` | 8001 | FastMCP ASGI server (uvicorn) | +| `worker` | — | Celery worker (embedding/ingest/batch) | +| `web` | 80 → host **23181** | nginx reverse proxy + static files | -# MCP ASGI (new) -uvicorn mcp_server.asgi:app --host 0.0.0.0 --port 22091 --workers 1 -``` - -Auth is disabled (`MCP_REQUIRE_AUTH=False`) since all traffic is internal (10.10.0.0/24). +Auth is controlled by `MCP_REQUIRE_AUTH` in `.env`. Production sets it to `True`; the internal validator and ad-hoc testing may use `False` on an isolated network. ### ⚠️ DEBUG LOG Points — MCP Server