- Change app healthcheck from /live/ to /ready/ to verify full readiness including dependencies (DB, Neo4j, S3) - Increase healthcheck timeout from 5s to 10s to accommodate dependency checks - Add S3 bucket connectivity check to readiness probe - Update deployment documentation to use /srv/mnemosyne instead of /opt/mnemosyne as the compose project directory
376 lines
13 KiB
Markdown
376 lines
13 KiB
Markdown
# Mnemosyne — Ansible Deployment Reference
|
|
|
|
This document gives the Ansible author everything needed to write and maintain the
|
|
Mnemosyne deployment role. All implementation decisions are already locked in
|
|
`docker-compose.yaml` and `nginx/mnemosyne.conf`; this document explains the
|
|
*why* behind each decision and provides the authoritative list of variables,
|
|
one-time steps, and verification checks.
|
|
|
|
---
|
|
|
|
## 1. Host & Stack Overview
|
|
|
|
| Item | Value |
|
|
|------|-------|
|
|
| Deploy target | `puck.incus` (Incus container, 10.10.0.0/24) |
|
|
| Compose project directory | `/srv/mnemosyne` |
|
|
| Image registry | `git.helu.ca/r/mnemosyne:latest` |
|
|
| Public host port | **23181** (nginx → HAProxy on Titania → `https://mnemosyne.ouranos.helu.ca`) |
|
|
| Internal app port | `app:8000` (Django/gunicorn) |
|
|
| Internal MCP port | `mcp:8001` (FastMCP/uvicorn) |
|
|
|
|
The four compose services (`app`, `mcp`, `worker`, `web`) all run from the same
|
|
image. A one-shot `static-init` service seeds the nginx static-file volume on
|
|
every `up` so static-file changes propagate automatically on deploy without
|
|
manual intervention.
|
|
|
|
---
|
|
|
|
## 2. External Dependencies (NOT managed by this role)
|
|
|
|
These services must exist before Mnemosyne can start. The role only consumes
|
|
credentials; it does not provision these hosts.
|
|
|
|
| Service | Host | Notes |
|
|
|---------|------|-------|
|
|
| PostgreSQL | `portia.incus:5432` | Database `mnemosyne`, user `mnemosyne` |
|
|
| Neo4j | `umbriel.incus:7687` | Bolt protocol. **Must be dedicated to Mnemosyne** — do not share with Spelunker or any other graph workload (see README §Note on Neo4j). HTTP browser on `umbriel.incus:25555`. |
|
|
| RabbitMQ | `oberon.incus:5672` | vhost `mnemosyne`, user `mnemosyne` |
|
|
| MinIO (Mnemosyne bucket) | `nyx.helu.ca:8555` | Bucket `mnemosyne-content`. Credentials scoped read+write. |
|
|
| MinIO (Daedalus bucket) | `nyx.helu.ca:8555` | Bucket `daedalus`. **Read-only** cross-bucket credentials for the ingest worker. |
|
|
| Memcached | `oberon.incus:11211` | Shared; prefix `mnemosyne` avoids collisions. |
|
|
| Embedder (Qwen3-VL-Embedding) | Configured via `EMBEDDING_*` vars in settings | GPU host on Nyx; not managed here. |
|
|
| Reranker (Synesis) | Configured via `RERANKER_*` vars in settings | GPU host on Nyx; not managed here. |
|
|
|
|
---
|
|
|
|
## 3. Role Tasks
|
|
|
|
### 3.1 Directory & file layout
|
|
|
|
```
|
|
/srv/mnemosyne/
|
|
├── docker-compose.yaml ← copied from repo (or symlinked via git pull)
|
|
├── nginx/
|
|
│ └── mnemosyne.conf ← copied from repo nginx/mnemosyne.conf
|
|
└── .env ← rendered from Jinja2 template + vault secrets
|
|
```
|
|
|
|
The role should:
|
|
1. Create `/srv/mnemosyne/` and `nginx/` (owner: `root`, mode `0750`).
|
|
2. Render `.env` from the vault-sourced Jinja2 template (mode `0600`, owner `root`).
|
|
3. Copy (or `git pull`) `docker-compose.yaml` and `nginx/mnemosyne.conf` from the repo.
|
|
|
|
### 3.2 Pull & start
|
|
|
|
```yaml
|
|
- name: Pull latest image
|
|
community.docker.docker_compose_v2:
|
|
project_src: /srv/mnemosyne
|
|
pull: always
|
|
|
|
- name: Bring stack up
|
|
community.docker.docker_compose_v2:
|
|
project_src: /srv/mnemosyne
|
|
state: present
|
|
```
|
|
|
|
This triggers `static-init` automatically on every `up` — no separate handler needed.
|
|
|
|
### 3.3 One-time setup (run once on first deploy, idempotent thereafter)
|
|
|
|
These management commands are safe to re-run; they do nothing if the target state
|
|
already exists. Run them as a post-start task gated on a `creates:` sentinel or
|
|
an explicit `when: mnemosyne_first_deploy` flag.
|
|
|
|
```bash
|
|
# Apply Django ORM migrations (PostgreSQL schema)
|
|
docker compose -f /srv/mnemosyne/docker-compose.yaml \
|
|
run --rm app migrate
|
|
|
|
# Create Neo4j vector + full-text indexes and load library-type defaults
|
|
docker compose -f /srv/mnemosyne/docker-compose.yaml \
|
|
run --rm app setup
|
|
|
|
# Create the daedalus-service user (HTTP Basic auth for ingest API)
|
|
# Pass --password from vault; idempotent if user already exists.
|
|
docker compose -f /srv/mnemosyne/docker-compose.yaml \
|
|
run --rm app \
|
|
python manage.py ensure_service_user \
|
|
--username daedalus-service \
|
|
--password "{{ vault_mnemosyne_daedalus_service_password }}"
|
|
|
|
# Seed the MCP signing key (for Phase 2 per-turn JWT auth)
|
|
# --retire-other deactivates any previously-active key.
|
|
# Print the secret_hex and store in vault as vault_mnemosyne_signing_secret.
|
|
docker compose -f /srv/mnemosyne/docker-compose.yaml \
|
|
run --rm app \
|
|
python manage.py seed_signing_key --kid daedalus-1 --retire-other
|
|
```
|
|
|
|
The `seed_signing_key` command prints the generated secret once to stdout —
|
|
capture it and store in the vault. The Daedalus role reads this secret from the
|
|
same vault variable to mint per-turn tokens (Phase 2).
|
|
|
|
---
|
|
|
|
## 4. Environment Variables (`.env` template)
|
|
|
|
All variables are consumed by `docker-compose.yaml` for interpolation into the
|
|
relevant service `environment:` blocks. The per-service scoping is defined in
|
|
`docker-compose.yaml`; the `.env` file just provides values.
|
|
|
|
### Django core — `app`, `mcp`, `worker`
|
|
|
|
| Variable | Example / default | Notes |
|
|
|----------|-------------------|-------|
|
|
| `SECRET_KEY` | `{{ vault_mnemosyne_secret_key }}` | Fernet-safe; never rotate without re-encrypting stored API keys first |
|
|
| `DEBUG` | `False` | |
|
|
| `TIME_ZONE` | `UTC` | |
|
|
| `LANGUAGE_CODE` | `en-us` | |
|
|
|
|
### HTTP surface — `app` (CSRF), `app` + `mcp` (ALLOWED_HOSTS)
|
|
|
|
| Variable | Example |
|
|
|----------|---------|
|
|
| `ALLOWED_HOSTS` | `localhost,127.0.0.1,mnemosyne.ouranos.helu.ca` |
|
|
| `CSRF_TRUSTED_ORIGINS` | `https://mnemosyne.ouranos.helu.ca` |
|
|
|
|
### PostgreSQL — `app`, `mcp`, `worker`
|
|
|
|
| Variable | Example |
|
|
|----------|---------|
|
|
| `APP_DB_NAME` | `mnemosyne` |
|
|
| `APP_DB_USER` | `mnemosyne` |
|
|
| `APP_DB_PASSWORD` | `{{ vault_mnemosyne_db_password }}` |
|
|
| `DB_HOST` | `portia.incus` |
|
|
| `DB_PORT` | `5432` |
|
|
|
|
### Neo4j — `app`, `mcp`, `worker`
|
|
|
|
| Variable | Example |
|
|
|----------|---------|
|
|
| `NEOMODEL_NEO4J_BOLT_URL` | `bolt://neo4j:{{ vault_neo4j_password }}@umbriel.incus:7687` |
|
|
|
|
> **URL-encode the password** if it contains `@ : / # % + ? & =` or a space.
|
|
> The Bolt URL parser is strict.
|
|
|
|
### Memcached — `app`, `mcp`, `worker`
|
|
|
|
| Variable | Example |
|
|
|----------|---------|
|
|
| `KVDB_LOCATION` | `oberon.incus:11211` |
|
|
| `KVDB_PREFIX` | `mnemosyne` |
|
|
|
|
### S3 / MinIO (Mnemosyne bucket) — `app`, `mcp`, `worker`
|
|
|
|
| Variable | Example |
|
|
|----------|---------|
|
|
| `USE_LOCAL_STORAGE` | `False` |
|
|
| `AWS_ACCESS_KEY_ID` | `{{ vault_mnemosyne_s3_key }}` |
|
|
| `AWS_SECRET_ACCESS_KEY` | `{{ vault_mnemosyne_s3_secret }}` |
|
|
| `AWS_STORAGE_BUCKET_NAME` | `mnemosyne-content` |
|
|
| `AWS_S3_ENDPOINT_URL` | `https://nyx.helu.ca:8555` |
|
|
| `AWS_S3_USE_SSL` | `True` |
|
|
| `AWS_S3_VERIFY` | `False` (self-signed cert on Nyx) |
|
|
| `AWS_S3_REGION_NAME` | `us-east-1` |
|
|
|
|
### Daedalus S3 (cross-bucket reads) — `worker` only
|
|
|
|
| Variable | Example |
|
|
|----------|---------|
|
|
| `DAEDALUS_S3_ENDPOINT_URL` | `https://nyx.helu.ca:8555` |
|
|
| `DAEDALUS_S3_ACCESS_KEY_ID` | `{{ vault_daedalus_s3_read_key }}` |
|
|
| `DAEDALUS_S3_SECRET_ACCESS_KEY` | `{{ vault_daedalus_s3_read_secret }}` |
|
|
| `DAEDALUS_S3_BUCKET_NAME` | `daedalus` |
|
|
| `DAEDALUS_S3_REGION_NAME` | `us-east-1` |
|
|
| `DAEDALUS_S3_USE_SSL` | `True` |
|
|
| `DAEDALUS_S3_VERIFY` | `True` |
|
|
|
|
### Celery / RabbitMQ — `app` (producer), `worker` (consumer)
|
|
|
|
| Variable | Example |
|
|
|----------|---------|
|
|
| `CELERY_BROKER_URL` | `amqp://mnemosyne:{{ vault_rabbitmq_password \| urlencode }}@oberon.incus:5672/mnemosyne` |
|
|
| `CELERY_RESULT_BACKEND` | `rpc://` |
|
|
| `CELERY_TASK_ALWAYS_EAGER` | `False` |
|
|
|
|
> **Percent-encode** the RabbitMQ password in the broker URL if it contains any
|
|
> URL-special characters. Use Ansible's `urlencode` filter or pre-encode in the
|
|
> vault variable. An unencoded password is the most common cause of
|
|
> `PLAIN 403 ACCESS_REFUSED` at worker startup.
|
|
|
|
### Worker tuning — `worker` only
|
|
|
|
| Variable | Default | Notes |
|
|
|----------|---------|-------|
|
|
| `CELERY_QUEUES` | `celery,embedding,batch` | Override per host for dedicated queue workers |
|
|
| `CELERY_CONCURRENCY` | `2` | Number of worker processes |
|
|
|
|
### MCP server — `mcp` only
|
|
|
|
| Variable | Production value |
|
|
|----------|-----------------|
|
|
| `MCP_REQUIRE_AUTH` | `True` |
|
|
|
|
### LLM API encryption — `app`, `worker`
|
|
|
|
| Variable | Notes |
|
|
|----------|-------|
|
|
| `LLM_API_SECRETS_ENCRYPTION_KEY` | Fernet key. Generate once: `python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"`. Never rotate without re-encrypting all stored provider keys first. |
|
|
|
|
### Email — `app` only
|
|
|
|
| Variable | Example |
|
|
|----------|---------|
|
|
| `EMAIL_HOST` | `oberon.incus` |
|
|
| `EMAIL_PORT` | `22025` |
|
|
| `EMAIL_USE_TLS` | `False` |
|
|
|
|
### Embedding pipeline — `worker` only
|
|
|
|
| Variable | Default |
|
|
|----------|---------|
|
|
| `EMBEDDING_BATCH_SIZE` | `8` |
|
|
| `EMBEDDING_TIMEOUT` | `120` |
|
|
|
|
### Search & re-ranker — `app`, `mcp`
|
|
|
|
| Variable | Default |
|
|
|----------|---------|
|
|
| `SEARCH_VECTOR_TOP_K` | `50` |
|
|
| `SEARCH_FULLTEXT_TOP_K` | `30` |
|
|
| `SEARCH_GRAPH_MAX_DEPTH` | `2` |
|
|
| `SEARCH_RRF_K` | `60` |
|
|
| `SEARCH_DEFAULT_LIMIT` | `20` |
|
|
| `RERANKER_MAX_CANDIDATES` | `32` |
|
|
| `RERANKER_TIMEOUT` | `30` |
|
|
|
|
### Logging — `app`, `mcp`, `worker`
|
|
|
|
| Variable | Default |
|
|
|----------|---------|
|
|
| `LOGGING_LEVEL` | `INFO` |
|
|
| `DJANGO_LOGGING_LEVEL` | `WARNING` |
|
|
| `CELERY_LOGGING_LEVEL` | `INFO` |
|
|
|
|
---
|
|
|
|
## 5. Health Probes & Verification
|
|
|
|
After `docker compose up -d`, wait for all services to report healthy:
|
|
|
|
```bash
|
|
docker compose -f /srv/mnemosyne/docker-compose.yaml ps
|
|
```
|
|
|
|
Expected: `app`, `mcp`, `worker`, `web` all `healthy`; `static-init` `exited (0)`.
|
|
|
|
### Per-service probes
|
|
|
|
| Service | Healthcheck command | Expected |
|
|
|---------|---------------------|----------|
|
|
| `app` | `curl -f http://localhost:8000/live/` | 200 |
|
|
| `mcp` | `curl -f http://localhost:8001/mcp/health` | 200 JSON |
|
|
| `web` | `curl -f http://localhost/live/` | 200 (proxied to app) |
|
|
| `worker` | `celery -A mnemosyne inspect ping -d celery@$HOSTNAME` | `pong` |
|
|
|
|
### External checks (from inside the 10.10.0.0/24 network)
|
|
|
|
```bash
|
|
# Django liveness (via nginx)
|
|
curl -f http://puck.incus:23181/live/
|
|
|
|
# Django readiness (Postgres + Memcached)
|
|
curl -f http://puck.incus:23181/ready/
|
|
|
|
# MCP health (proxied from /healthz → mcp:8001/mcp/health)
|
|
curl -f http://puck.incus:23181/healthz
|
|
|
|
# Prometheus metrics (internal only)
|
|
curl http://puck.incus:23181/metrics | head -5
|
|
```
|
|
|
|
### Verify the daedalus-service account
|
|
|
|
```bash
|
|
curl -u daedalus-service:<password> \
|
|
https://mnemosyne.ouranos.helu.ca/library/api/workspaces/ \
|
|
-o /dev/null -w "%{http_code}"
|
|
# Expect: 200
|
|
```
|
|
|
|
### Verify MCP connectivity (from a client with a valid MCPToken)
|
|
|
|
```bash
|
|
curl -H "Authorization: Bearer <token>" \
|
|
https://mnemosyne.ouranos.helu.ca/mcp/health
|
|
# Expect: {"status": "ok", ...}
|
|
```
|
|
|
|
---
|
|
|
|
## 6. Upgrade Procedure
|
|
|
|
A standard upgrade (new image pushed to `git.helu.ca/r/mnemosyne:latest`):
|
|
|
|
```bash
|
|
cd /srv/mnemosyne
|
|
docker compose pull
|
|
docker compose up -d # static-init re-seeds; running containers replaced
|
|
docker compose run --rm app migrate # no-op if no new migrations
|
|
```
|
|
|
|
The `static-init` service runs to completion on every `up`, propagating static
|
|
file changes without manual volume reset.
|
|
|
|
---
|
|
|
|
## 7. Rollback
|
|
|
|
```bash
|
|
# Pin to a specific digest
|
|
docker compose pull git.helu.ca/r/mnemosyne@sha256:<digest>
|
|
# Edit docker-compose.yaml image: line to use the digest, then:
|
|
docker compose up -d
|
|
```
|
|
|
|
Alternatively, tag good images in the registry before each deploy and reference
|
|
the tag.
|
|
|
|
---
|
|
|
|
## 8. HAProxy / Titania Configuration Notes
|
|
|
|
Titania terminates TLS and forwards to `puck.incus:23181`. The nginx config
|
|
preserves `X-Forwarded-Proto: https` so Django's `request.is_secure()`, secure
|
|
cookies, and `build_absolute_uri()` work correctly.
|
|
|
|
The HAProxy `health_path` for this backend should be `/healthz` (not `/live/` or
|
|
`/ready/`) — `/healthz` short-circuits directly to the FastMCP health endpoint
|
|
without touching Django, so it can confirm the MCP server is up even if Django
|
|
is momentarily unhealthy.
|
|
|
|
If HAProxy checks don't follow redirects, use `/live/` and `/ready/` **with** the
|
|
trailing slash. The un-slashed forms (`/live`, `/ready`) trigger Django's
|
|
`APPEND_SLASH` 301 redirect, which health checkers that don't follow redirects
|
|
will report as a failure.
|
|
|
|
---
|
|
|
|
## 9. Vault Variables Summary
|
|
|
|
| Vault variable | Used in `.env` as |
|
|
|----------------|-------------------|
|
|
| `vault_mnemosyne_secret_key` | `SECRET_KEY` |
|
|
| `vault_mnemosyne_db_password` | `APP_DB_PASSWORD` |
|
|
| `vault_neo4j_password` | embedded in `NEOMODEL_NEO4J_BOLT_URL` |
|
|
| `vault_mnemosyne_s3_key` | `AWS_ACCESS_KEY_ID` |
|
|
| `vault_mnemosyne_s3_secret` | `AWS_SECRET_ACCESS_KEY` |
|
|
| `vault_daedalus_s3_read_key` | `DAEDALUS_S3_ACCESS_KEY_ID` |
|
|
| `vault_daedalus_s3_read_secret` | `DAEDALUS_S3_SECRET_ACCESS_KEY` |
|
|
| `vault_rabbitmq_password` | embedded in `CELERY_BROKER_URL` |
|
|
| `vault_mnemosyne_llm_encryption_key` | `LLM_API_SECRETS_ENCRYPTION_KEY` |
|
|
| `vault_mnemosyne_daedalus_service_password` | passed to `ensure_service_user --password` |
|
|
| `vault_mnemosyne_signing_secret` | (Phase 2) printed by `seed_signing_key`, stored here, consumed by Daedalus role |
|