feat(docker): rename web service to app, add nginx as web
Reorganize Docker Compose services: the Django/gunicorn container is now `app` and nginx is `web`, better reflecting their roles. Add a dedicated gunicorn configuration and install curl in the runtime image for health checks. Update documentation to reflect: - Neo4j migration from ariel.incus to a dedicated umbriel.incus instance - Rationale for requiring a dedicated Neo4j instance (single-tenancy assumptions, label/index isolation, schema ownership) - New service naming in compose commands and log tailing examples
This commit is contained in:
27
README.md
27
README.md
@@ -64,11 +64,13 @@ Mnemosyne runs as three cooperating processes: the Django web app (REST API + ad
|
||||
|
||||
Hosts in the Ouranos lab:
|
||||
- **Postgres** — `portia.incus:5432` (Django ORM: users, IngestJob)
|
||||
- **Neo4j** — `ariel.incus:25554` (knowledge graph + vectors)
|
||||
- **Neo4j** — `umbriel.incus:7687` (Bolt; dedicated instance — see note below — knowledge graph + vectors; HTTP Browser on `umbriel.incus:25555`)
|
||||
- **RabbitMQ** — `oberon.incus:5672` (Celery broker)
|
||||
- **MinIO** — `nyx.helu.ca:8555` (S3-compatible; `mnemosyne-content` and `daedalus` buckets)
|
||||
- **Memcached** — `127.0.0.1:11211` (task progress)
|
||||
|
||||
> **Neo4j must be dedicated to Mnemosyne.** Don't share the instance with Spelunker or any other graph workload. Mnemosyne owns the `Library`, `Collection`, `Item`, `Chunk`, and `Concept` labels and runs its own indexes (`chunk_embedding_index`, full-text indexes per library_type) and schema migrations (`setup_neo4j_indexes`, `load_library_types`). The Phase-1 workspace-delete path runs label-scoped `DETACH DELETE` over those labels, and a workspace_id-scoped subgraph is the unit of isolation — both assume single-tenancy. A shared instance risks (1) label/property collisions corrupting the other tenant's graph, (2) vector-index memory contention degrading search latency for both apps, (3) management commands mutating schema another tenant depends on, and (4) backup/restore that can't be reasoned about per-app. Neo4j Community Edition is sufficient — the multi-database feature is Enterprise-only, so isolation has to come from running a separate server process. Run a dedicated instance per environment (one for staging, one for production); point each via `NEOMODEL_NEO4J_BOLT_URL` in that environment's `mnemosyne/.env`.
|
||||
|
||||
### One-time setup
|
||||
|
||||
```bash
|
||||
@@ -159,14 +161,14 @@ Production runs as four containers from a single image (built and pushed by [`.g
|
||||
|
||||
| Service | Role | Port |
|
||||
|---------|------|------|
|
||||
| `web` | Django REST API + admin (gunicorn) | internal :8000 |
|
||||
| `app` | Django REST API + admin (gunicorn) | internal :8000 |
|
||||
| `mcp` | FastMCP server (uvicorn) | internal :22091 |
|
||||
| `worker` | Celery worker — embedding/ingest/batch | — |
|
||||
| `nginx` | Reverse proxy + static files | host :23090 |
|
||||
| `web` | Reverse proxy + static files (nginx) | host :23090 |
|
||||
|
||||
Plus a one-shot `static-init` service that copies `/app/staticfiles` (baked into the image at build time via `collectstatic`) into the shared volume nginx reads from. It runs to completion on every `up`, so static-file changes propagate on each deploy without manual intervention.
|
||||
|
||||
External services (NOT spun up by compose): Postgres on Portia, Neo4j on Ariel, RabbitMQ on Oberon, S3/MinIO on Nyx, Memcached, embedder + reranker. All reached over the internal 10.10.0.0/24 network. URLs and credentials live in `mnemosyne/.env`.
|
||||
External services (NOT spun up by compose): Postgres on Portia, Neo4j on Umbriel (dedicated Mnemosyne instance), RabbitMQ on Oberon, S3/MinIO on Nyx, Memcached, embedder + reranker. All reached over the internal 10.10.0.0/24 network. URLs and credentials live in `mnemosyne/.env`.
|
||||
|
||||
### First-time bring-up
|
||||
|
||||
@@ -175,10 +177,10 @@ External services (NOT spun up by compose): Postgres on Portia, Neo4j on Ariel,
|
||||
docker compose pull
|
||||
|
||||
# DB migrations (one-shot)
|
||||
docker compose run --rm web migrate
|
||||
docker compose run --rm app migrate
|
||||
|
||||
# Neo4j indexes + library_type defaults (one-shot)
|
||||
docker compose run --rm web setup
|
||||
docker compose run --rm app setup
|
||||
|
||||
# Bring the stack up
|
||||
docker compose up -d
|
||||
@@ -188,7 +190,8 @@ docker compose up -d
|
||||
|
||||
```bash
|
||||
docker compose ps # service status + health
|
||||
docker compose logs -f web # tail web logs
|
||||
docker compose logs -f app # tail Django app logs
|
||||
docker compose logs -f web # tail nginx logs
|
||||
docker compose logs -f worker # tail Celery worker logs
|
||||
docker compose restart mcp # restart just the MCP server
|
||||
|
||||
@@ -210,8 +213,14 @@ The development `.env` has a few values that need adjusting for production:
|
||||
|
||||
### Health probes
|
||||
|
||||
- `GET http://nginx-host:23090/healthz` → proxies to `/mcp/health`, returns `{"status":"ok"}` when the MCP server is up
|
||||
- `GET http://nginx-host:23090/metrics` → Prometheus scrape endpoint, internal-network-only
|
||||
| Endpoint | Probes | Auth |
|
||||
|----------|--------|------|
|
||||
| `GET /live/` | Django process alive (always 200 if gunicorn is up) | None |
|
||||
| `GET /ready/` | PostgreSQL + Memcached reachable (503 if either is down) | None |
|
||||
| `GET /healthz` | MCP server `/mcp/health` — used as the HAProxy `health_path` | None |
|
||||
| `GET /metrics` | Prometheus scrape | Internal networks only |
|
||||
|
||||
> **Trailing slashes matter.** Always use `/live/` and `/ready/` (with the trailing slash). The un-slashed forms (`/live`, `/ready`) trigger Django's `APPEND_SLASH` 301 redirect — health check clients that don't follow redirects will report a failure even when the service is healthy.
|
||||
|
||||
## Architecture Note: Retrieval, Not Synthesis
|
||||
|
||||
|
||||
Reference in New Issue
Block a user