feat(docker): rename web service to app, add nginx as web
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 53s
CVE Scan & Docker Build / build-and-push (push) Successful in 3m0s

Reorganize Docker Compose services: the Django/gunicorn container is now
`app` and nginx is `web`, better reflecting their roles. Add a dedicated
gunicorn configuration and install curl in the runtime image for health
checks.

Update documentation to reflect:
- Neo4j migration from ariel.incus to a dedicated umbriel.incus instance
- Rationale for requiring a dedicated Neo4j instance (single-tenancy
  assumptions, label/index isolation, schema ownership)
- New service naming in compose commands and log tailing examples
This commit is contained in:
2026-05-03 19:35:27 -04:00
parent a2c885cf34
commit 7185d326eb
10 changed files with 163 additions and 38 deletions

View File

@@ -64,11 +64,13 @@ Mnemosyne runs as three cooperating processes: the Django web app (REST API + ad
Hosts in the Ouranos lab:
- **Postgres** — `portia.incus:5432` (Django ORM: users, IngestJob)
- **Neo4j** — `ariel.incus:25554` (knowledge graph + vectors)
- **Neo4j** — `umbriel.incus:7687` (Bolt; dedicated instance — see note below — knowledge graph + vectors; HTTP Browser on `umbriel.incus:25555`)
- **RabbitMQ** — `oberon.incus:5672` (Celery broker)
- **MinIO** — `nyx.helu.ca:8555` (S3-compatible; `mnemosyne-content` and `daedalus` buckets)
- **Memcached** — `127.0.0.1:11211` (task progress)
> **Neo4j must be dedicated to Mnemosyne.** Don't share the instance with Spelunker or any other graph workload. Mnemosyne owns the `Library`, `Collection`, `Item`, `Chunk`, and `Concept` labels and runs its own indexes (`chunk_embedding_index`, full-text indexes per library_type) and schema migrations (`setup_neo4j_indexes`, `load_library_types`). The Phase-1 workspace-delete path runs label-scoped `DETACH DELETE` over those labels, and a workspace_id-scoped subgraph is the unit of isolation — both assume single-tenancy. A shared instance risks (1) label/property collisions corrupting the other tenant's graph, (2) vector-index memory contention degrading search latency for both apps, (3) management commands mutating schema another tenant depends on, and (4) backup/restore that can't be reasoned about per-app. Neo4j Community Edition is sufficient — the multi-database feature is Enterprise-only, so isolation has to come from running a separate server process. Run a dedicated instance per environment (one for staging, one for production); point each via `NEOMODEL_NEO4J_BOLT_URL` in that environment's `mnemosyne/.env`.
### One-time setup
```bash
@@ -159,14 +161,14 @@ Production runs as four containers from a single image (built and pushed by [`.g
| Service | Role | Port |
|---------|------|------|
| `web` | Django REST API + admin (gunicorn) | internal :8000 |
| `app` | Django REST API + admin (gunicorn) | internal :8000 |
| `mcp` | FastMCP server (uvicorn) | internal :22091 |
| `worker` | Celery worker — embedding/ingest/batch | — |
| `nginx` | Reverse proxy + static files | host :23090 |
| `web` | Reverse proxy + static files (nginx) | host :23090 |
Plus a one-shot `static-init` service that copies `/app/staticfiles` (baked into the image at build time via `collectstatic`) into the shared volume nginx reads from. It runs to completion on every `up`, so static-file changes propagate on each deploy without manual intervention.
External services (NOT spun up by compose): Postgres on Portia, Neo4j on Ariel, RabbitMQ on Oberon, S3/MinIO on Nyx, Memcached, embedder + reranker. All reached over the internal 10.10.0.0/24 network. URLs and credentials live in `mnemosyne/.env`.
External services (NOT spun up by compose): Postgres on Portia, Neo4j on Umbriel (dedicated Mnemosyne instance), RabbitMQ on Oberon, S3/MinIO on Nyx, Memcached, embedder + reranker. All reached over the internal 10.10.0.0/24 network. URLs and credentials live in `mnemosyne/.env`.
### First-time bring-up
@@ -175,10 +177,10 @@ External services (NOT spun up by compose): Postgres on Portia, Neo4j on Ariel,
docker compose pull
# DB migrations (one-shot)
docker compose run --rm web migrate
docker compose run --rm app migrate
# Neo4j indexes + library_type defaults (one-shot)
docker compose run --rm web setup
docker compose run --rm app setup
# Bring the stack up
docker compose up -d
@@ -188,7 +190,8 @@ docker compose up -d
```bash
docker compose ps # service status + health
docker compose logs -f web # tail web logs
docker compose logs -f app # tail Django app logs
docker compose logs -f web # tail nginx logs
docker compose logs -f worker # tail Celery worker logs
docker compose restart mcp # restart just the MCP server
@@ -210,8 +213,14 @@ The development `.env` has a few values that need adjusting for production:
### Health probes
- `GET http://nginx-host:23090/healthz` → proxies to `/mcp/health`, returns `{"status":"ok"}` when the MCP server is up
- `GET http://nginx-host:23090/metrics` → Prometheus scrape endpoint, internal-network-only
| Endpoint | Probes | Auth |
|----------|--------|------|
| `GET /live/` | Django process alive (always 200 if gunicorn is up) | None |
| `GET /ready/` | PostgreSQL + Memcached reachable (503 if either is down) | None |
| `GET /healthz` | MCP server `/mcp/health` — used as the HAProxy `health_path` | None |
| `GET /metrics` | Prometheus scrape | Internal networks only |
> **Trailing slashes matter.** Always use `/live/` and `/ready/` (with the trailing slash). The un-slashed forms (`/live`, `/ready`) trigger Django's `APPEND_SLASH` 301 redirect — health check clients that don't follow redirects will report a failure even when the service is healthy.
## Architecture Note: Retrieval, Not Synthesis