feat(deploy): production docker compose stack + Gitea CI image build

Adds a complete deployment surface for production:

  Dockerfile               multi-stage 3.12-slim build, collectstatic
                           baked into the image, runs as non-root mnemosyne
                           uid/gid 1000.
  docker/entrypoint.sh     dispatches `web | mcp | worker | beat | migrate
                           | setup | shell` from a single image, so every
                           service in compose runs the same artifact.
  docker-compose.yaml      five services: static-init (one-shot copies
                           statics into the shared volume on every up),
                           web (gunicorn), mcp (uvicorn), worker (celery),
                           nginx. External services (Postgres, Neo4j,
                           RabbitMQ, S3, Memcached, embedder, reranker)
                           reached over the 10.10.0.0/24 internal network
                           and configured via mnemosyne/.env.
  nginx/mnemosyne.conf     reverse proxy: /library/* and /admin/* → web,
                           /mcp/* → mcp, /static/* → volume, /metrics
                           internal-network-only (127/8 + RFC1918), /healthz
                           proxies to /mcp/health for liveness probes.
  .gitea/workflows/        CVE scan + image build, image pushed to
                           git.helu.ca/r/mnemosyne. Trivy scans pyproject
                           extras (dev/test/lint/docs) and the built image.
  pyproject.toml           adds [test], [lint], [docs] extras so the CI
                           pip-compile step has something to resolve.

README documents the bring-up flow (`docker compose run --rm web migrate`,
then `setup`, then `up -d`), day-to-day commands, and the env-var values
that need adjusting for production (DEBUG=False, KVDB_LOCATION pointing
at the external memcached, AWS keys filled in, etc.).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-04-29 12:05:23 -04:00
parent 1cd556c3f6
commit 236d9e2e74
7 changed files with 547 additions and 0 deletions

View File

@@ -153,6 +153,66 @@ These endpoints are used by the Daedalus FastAPI backend (HTTP Basic auth). All
See [docs/mnemosyne_integration.md](docs/mnemosyne_integration.md) for the full Daedalus contract.
## Production Deployment
Production runs as four containers from a single image (built and pushed by [`.gitea/workflows/cve-scan-docker-build.yml`](.gitea/workflows/cve-scan-docker-build.yml) on every push to `main`):
| Service | Role | Port |
|---------|------|------|
| `web` | Django REST API + admin (gunicorn) | internal :8000 |
| `mcp` | FastMCP server (uvicorn) | internal :22091 |
| `worker` | Celery worker — embedding/ingest/batch | — |
| `nginx` | Reverse proxy + static files | host :23090 |
Plus a one-shot `static-init` service that copies `/app/staticfiles` (baked into the image at build time via `collectstatic`) into the shared volume nginx reads from. It runs to completion on every `up`, so static-file changes propagate on each deploy without manual intervention.
External services (NOT spun up by compose): Postgres on Portia, Neo4j on Ariel, RabbitMQ on Oberon, S3/MinIO on Nyx, Memcached, embedder + reranker. All reached over the internal 10.10.0.0/24 network. URLs and credentials live in `mnemosyne/.env`.
### First-time bring-up
```bash
# Pull the image (or build locally with `docker compose build`)
docker compose pull
# DB migrations (one-shot)
docker compose run --rm web migrate
# Neo4j indexes + library_type defaults (one-shot)
docker compose run --rm web setup
# Bring the stack up
docker compose up -d
```
### Day-to-day
```bash
docker compose ps # service status + health
docker compose logs -f web # tail web logs
docker compose logs -f worker # tail Celery worker logs
docker compose restart mcp # restart just the MCP server
# After a new image is published:
docker compose pull && docker compose up -d
```
### Things to verify in `mnemosyne/.env` before bringing up
The development `.env` has a few values that need adjusting for production:
- `DEBUG=False`
- `USE_LOCAL_STORAGE=False` (already set; just confirm)
- `KVDB_LOCATION=<external-memcached-host>:11211``127.0.0.1` does not resolve from inside containers
- `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` filled in
- `DAEDALUS_S3_*` filled in for cross-bucket reads from the Daedalus bucket
- `ALLOWED_HOSTS` includes the public hostname HAProxy routes to (e.g. `mnemosyne.ouranos.helu.ca`)
- `LLM_API_SECRETS_ENCRYPTION_KEY` set to a real Fernet key
### Health probes
- `GET http://nginx-host:23090/healthz` → proxies to `/mcp/health`, returns `{"status":"ok"}` when the MCP server is up
- `GET http://nginx-host:23090/metrics` → Prometheus scrape endpoint, internal-network-only
## Architecture Note: Retrieval, Not Synthesis
Mnemosyne is a **retrieval engine**, not a RAG pipeline. It stores, embeds, and ranks — it does not synthesize answers.