feat(docker): rename web service to app, add nginx as web
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 53s
CVE Scan & Docker Build / build-and-push (push) Successful in 3m0s

Reorganize Docker Compose services: the Django/gunicorn container is now
`app` and nginx is `web`, better reflecting their roles. Add a dedicated
gunicorn configuration and install curl in the runtime image for health
checks.

Update documentation to reflect:
- Neo4j migration from ariel.incus to a dedicated umbriel.incus instance
- Rationale for requiring a dedicated Neo4j instance (single-tenancy
  assumptions, label/index isolation, schema ownership)
- New service naming in compose commands and log tailing examples
This commit is contained in:
2026-05-03 19:35:27 -04:00
parent a2c885cf34
commit 7185d326eb
10 changed files with 163 additions and 38 deletions

13
.env.example Normal file
View File

@@ -0,0 +1,13 @@
# =============================================================================
# Mnemosyne — Docker Compose environment
# =============================================================================
# This file documents variables consumed by docker-compose.yaml itself
# (image tags, port overrides, etc.). It is NOT the application config.
#
# Application config lives in mnemosyne/.env — copy mnemosyne/.env\ example
# to mnemosyne/.env and fill in your values before running `docker compose up`.
#
# This file has no required variables for a default deployment: the compose
# file uses a fixed image tag and port. Add overrides here if you parameterise
# those in docker-compose.yaml (e.g. MNEMOSYNE_IMAGE, MNEMOSYNE_PORT).
# =============================================================================

View File

@@ -62,6 +62,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
zlib1g \ zlib1g \
libssl3 \ libssl3 \
ca-certificates \ ca-certificates \
curl \
&& rm -rf /var/lib/apt/lists/* && rm -rf /var/lib/apt/lists/*
ENV PYTHONDONTWRITEBYTECODE=1 \ ENV PYTHONDONTWRITEBYTECODE=1 \
@@ -77,6 +78,7 @@ COPY --from=builder /usr/local/bin /usr/local/bin
WORKDIR /app WORKDIR /app
COPY --from=builder /build/mnemosyne /app COPY --from=builder /build/mnemosyne /app
COPY docker/entrypoint.sh /usr/local/bin/entrypoint.sh COPY docker/entrypoint.sh /usr/local/bin/entrypoint.sh
COPY docker/gunicorn.conf.py /app/docker/gunicorn.conf.py
RUN chmod +x /usr/local/bin/entrypoint.sh RUN chmod +x /usr/local/bin/entrypoint.sh
# Non-root user for everything that runs in this image. uid:gid 1000:1000 # Non-root user for everything that runs in this image. uid:gid 1000:1000

View File

@@ -64,11 +64,13 @@ Mnemosyne runs as three cooperating processes: the Django web app (REST API + ad
Hosts in the Ouranos lab: Hosts in the Ouranos lab:
- **Postgres** — `portia.incus:5432` (Django ORM: users, IngestJob) - **Postgres** — `portia.incus:5432` (Django ORM: users, IngestJob)
- **Neo4j** — `ariel.incus:25554` (knowledge graph + vectors) - **Neo4j** — `umbriel.incus:7687` (Bolt; dedicated instance — see note below — knowledge graph + vectors; HTTP Browser on `umbriel.incus:25555`)
- **RabbitMQ** — `oberon.incus:5672` (Celery broker) - **RabbitMQ** — `oberon.incus:5672` (Celery broker)
- **MinIO** — `nyx.helu.ca:8555` (S3-compatible; `mnemosyne-content` and `daedalus` buckets) - **MinIO** — `nyx.helu.ca:8555` (S3-compatible; `mnemosyne-content` and `daedalus` buckets)
- **Memcached** — `127.0.0.1:11211` (task progress) - **Memcached** — `127.0.0.1:11211` (task progress)
> **Neo4j must be dedicated to Mnemosyne.** Don't share the instance with Spelunker or any other graph workload. Mnemosyne owns the `Library`, `Collection`, `Item`, `Chunk`, and `Concept` labels and runs its own indexes (`chunk_embedding_index`, full-text indexes per library_type) and schema migrations (`setup_neo4j_indexes`, `load_library_types`). The Phase-1 workspace-delete path runs label-scoped `DETACH DELETE` over those labels, and a workspace_id-scoped subgraph is the unit of isolation — both assume single-tenancy. A shared instance risks (1) label/property collisions corrupting the other tenant's graph, (2) vector-index memory contention degrading search latency for both apps, (3) management commands mutating schema another tenant depends on, and (4) backup/restore that can't be reasoned about per-app. Neo4j Community Edition is sufficient — the multi-database feature is Enterprise-only, so isolation has to come from running a separate server process. Run a dedicated instance per environment (one for staging, one for production); point each via `NEOMODEL_NEO4J_BOLT_URL` in that environment's `mnemosyne/.env`.
### One-time setup ### One-time setup
```bash ```bash
@@ -159,14 +161,14 @@ Production runs as four containers from a single image (built and pushed by [`.g
| Service | Role | Port | | Service | Role | Port |
|---------|------|------| |---------|------|------|
| `web` | Django REST API + admin (gunicorn) | internal :8000 | | `app` | Django REST API + admin (gunicorn) | internal :8000 |
| `mcp` | FastMCP server (uvicorn) | internal :22091 | | `mcp` | FastMCP server (uvicorn) | internal :22091 |
| `worker` | Celery worker — embedding/ingest/batch | — | | `worker` | Celery worker — embedding/ingest/batch | — |
| `nginx` | Reverse proxy + static files | host :23090 | | `web` | Reverse proxy + static files (nginx) | host :23090 |
Plus a one-shot `static-init` service that copies `/app/staticfiles` (baked into the image at build time via `collectstatic`) into the shared volume nginx reads from. It runs to completion on every `up`, so static-file changes propagate on each deploy without manual intervention. Plus a one-shot `static-init` service that copies `/app/staticfiles` (baked into the image at build time via `collectstatic`) into the shared volume nginx reads from. It runs to completion on every `up`, so static-file changes propagate on each deploy without manual intervention.
External services (NOT spun up by compose): Postgres on Portia, Neo4j on Ariel, RabbitMQ on Oberon, S3/MinIO on Nyx, Memcached, embedder + reranker. All reached over the internal 10.10.0.0/24 network. URLs and credentials live in `mnemosyne/.env`. External services (NOT spun up by compose): Postgres on Portia, Neo4j on Umbriel (dedicated Mnemosyne instance), RabbitMQ on Oberon, S3/MinIO on Nyx, Memcached, embedder + reranker. All reached over the internal 10.10.0.0/24 network. URLs and credentials live in `mnemosyne/.env`.
### First-time bring-up ### First-time bring-up
@@ -175,10 +177,10 @@ External services (NOT spun up by compose): Postgres on Portia, Neo4j on Ariel,
docker compose pull docker compose pull
# DB migrations (one-shot) # DB migrations (one-shot)
docker compose run --rm web migrate docker compose run --rm app migrate
# Neo4j indexes + library_type defaults (one-shot) # Neo4j indexes + library_type defaults (one-shot)
docker compose run --rm web setup docker compose run --rm app setup
# Bring the stack up # Bring the stack up
docker compose up -d docker compose up -d
@@ -188,7 +190,8 @@ docker compose up -d
```bash ```bash
docker compose ps # service status + health docker compose ps # service status + health
docker compose logs -f web # tail web logs docker compose logs -f app # tail Django app logs
docker compose logs -f web # tail nginx logs
docker compose logs -f worker # tail Celery worker logs docker compose logs -f worker # tail Celery worker logs
docker compose restart mcp # restart just the MCP server docker compose restart mcp # restart just the MCP server
@@ -210,8 +213,14 @@ The development `.env` has a few values that need adjusting for production:
### Health probes ### Health probes
- `GET http://nginx-host:23090/healthz` → proxies to `/mcp/health`, returns `{"status":"ok"}` when the MCP server is up | Endpoint | Probes | Auth |
- `GET http://nginx-host:23090/metrics` → Prometheus scrape endpoint, internal-network-only |----------|--------|------|
| `GET /live/` | Django process alive (always 200 if gunicorn is up) | None |
| `GET /ready/` | PostgreSQL + Memcached reachable (503 if either is down) | None |
| `GET /healthz` | MCP server `/mcp/health` — used as the HAProxy `health_path` | None |
| `GET /metrics` | Prometheus scrape | Internal networks only |
> **Trailing slashes matter.** Always use `/live/` and `/ready/` (with the trailing slash). The un-slashed forms (`/live`, `/ready`) trigger Django's `APPEND_SLASH` 301 redirect — health check clients that don't follow redirects will report a failure even when the service is healthy.
## Architecture Note: Retrieval, Not Synthesis ## Architecture Note: Retrieval, Not Synthesis

View File

@@ -2,20 +2,20 @@
# Mnemosyne — production deployment # Mnemosyne — production deployment
# ============================================================================= # =============================================================================
# Four services, all from the same image: # Four services, all from the same image:
# web — Django REST API + admin (gunicorn, port 8000) # app — Django REST API + admin (gunicorn, port 8000)
# mcp — FastMCP server (uvicorn, port 22091) # mcp — FastMCP server (uvicorn, port 22091)
# worker — Celery worker (embedding/ingest/batch queues) # worker — Celery worker (embedding/ingest/batch queues)
# nginx — reverse proxy, public port 23090 # web — reverse proxy, public port 23090 (nginx)
# #
# External services (NOT spun up here): Postgres on Portia, Neo4j on Ariel, # External services (NOT spun up here): Postgres on Portia, Neo4j on Umbriel,
# RabbitMQ on Oberon, S3/MinIO on Nyx, Memcached on its own host, embedder # RabbitMQ on Oberon, S3/MinIO on Nyx, Memcached on its own host, embedder
# and reranker on Nyx, smtp4dev on Oberon. All reached over the internal # and reranker on Nyx, smtp4dev on Oberon. All reached over the internal
# 10.10.0.0/24 network. # 10.10.0.0/24 network.
# #
# Run: # Run:
# docker compose up -d # docker compose up -d
# docker compose run --rm web migrate # one-shot DB migrate # docker compose run --rm app migrate # one-shot DB migrate
# docker compose run --rm web setup # Neo4j indexes + library types # docker compose run --rm app setup # Neo4j indexes + library types
# ============================================================================= # =============================================================================
services: services:
@@ -31,8 +31,8 @@ services:
- mnemosyne-static:/shared-static - mnemosyne-static:/shared-static
restart: "no" restart: "no"
# ── Web app: Django REST API + admin ─────────────────────────────────────── # ── App: Django REST API + admin ──────────────────────────────────────────
web: app:
image: git.helu.ca/r/mnemosyne:latest image: git.helu.ca/r/mnemosyne:latest
command: ["web"] command: ["web"]
env_file: mnemosyne/.env env_file: mnemosyne/.env
@@ -45,7 +45,7 @@ services:
expose: expose:
- "8000" - "8000"
healthcheck: healthcheck:
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/admin/login/').read()"] test: ["CMD", "curl", "-f", "http://localhost:8000/live/"]
interval: 30s interval: 30s
timeout: 5s timeout: 5s
retries: 3 retries: 3
@@ -62,7 +62,7 @@ services:
expose: expose:
- "22091" - "22091"
healthcheck: healthcheck:
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:22091/mcp/health').read()"] test: ["CMD", "curl", "-f", "http://localhost:22091/mcp/health"]
interval: 30s interval: 30s
timeout: 5s timeout: 5s
retries: 3 retries: 3
@@ -74,6 +74,9 @@ services:
command: ["worker"] command: ["worker"]
env_file: mnemosyne/.env env_file: mnemosyne/.env
restart: unless-stopped restart: unless-stopped
depends_on:
app:
condition: service_healthy
volumes: volumes:
- mnemosyne-media:/app/media - mnemosyne-media:/app/media
healthcheck: healthcheck:
@@ -83,26 +86,28 @@ services:
retries: 3 retries: 3
start_period: 60s start_period: 60s
# ── nginx: reverse proxy, public port 23090 ──────────────────────────────── # ── Web: nginx reverse proxy, public port 23090 ───────────────────────────
nginx: web:
image: nginx:alpine image: nginx:alpine
restart: unless-stopped restart: unless-stopped
depends_on: depends_on:
- web app:
- mcp condition: service_healthy
mcp:
condition: service_healthy
ports: ports:
- "23090:80" - "23090:80"
volumes: volumes:
- ./nginx/mnemosyne.conf:/etc/nginx/conf.d/default.conf:ro - ./nginx/mnemosyne.conf:/etc/nginx/conf.d/default.conf:ro
- mnemosyne-static:/var/www/static:ro - mnemosyne-static:/var/www/static:ro
healthcheck: healthcheck:
test: ["CMD", "wget", "-qO-", "http://localhost/healthz"] test: ["CMD", "curl", "-f", "http://localhost/live/"]
interval: 30s interval: 30s
timeout: 5s timeout: 5s
retries: 3 retries: 3
volumes: volumes:
# Static files baked into the image at /app/staticfiles. The web service # Static files baked into the image at /app/staticfiles. The app service
# mounts this volume, populating it on first start; nginx reads from it. # mounts this volume, populating it on first start; nginx reads from it.
mnemosyne-static: mnemosyne-static:
# Local FileSystemStorage fallback. Production uses USE_LOCAL_STORAGE=False # Local FileSystemStorage fallback. Production uses USE_LOCAL_STORAGE=False

View File

@@ -10,6 +10,7 @@ case "$1" in
web) web)
# Django REST API + admin (gunicorn → wsgi). # Django REST API + admin (gunicorn → wsgi).
exec gunicorn \ exec gunicorn \
--config /app/docker/gunicorn.conf.py \
--bind 0.0.0.0:8000 \ --bind 0.0.0.0:8000 \
--workers "${GUNICORN_WORKERS:-3}" \ --workers "${GUNICORN_WORKERS:-3}" \
--access-logfile - \ --access-logfile - \

27
docker/gunicorn.conf.py Normal file
View File

@@ -0,0 +1,27 @@
import logging
import re
_PROBE_PATH = re.compile(
r"^(?:/live|/ready|/metrics|/healthz|/health[^ ]*|/ping)/?(?:\?|$)"
)
class _ProbePathFilter(logging.Filter):
def filter(self, record: logging.LogRecord) -> bool:
request = getattr(record, "args", None)
if isinstance(request, dict):
path = request.get("U") or request.get("r", "")
else:
path = record.getMessage()
return not _PROBE_PATH.search(path)
_filter = _ProbePathFilter()
def on_starting(server):
logging.getLogger("gunicorn.access").addFilter(_filter)
def post_worker_init(worker):
logging.getLogger("gunicorn.access").addFilter(_filter)

View File

@@ -18,7 +18,9 @@ DB_HOST=portia.incus
DB_PORT=5432 DB_PORT=5432
# --- Neo4j Graph Database --- # --- Neo4j Graph Database ---
NEOMODEL_NEO4J_BOLT_URL=bolt://neo4j:password@ariel.incus:25554 # Dedicated Mnemosyne instance on Umbriel — do not share with Spelunker or any
# other graph workload. See README.md for the full rationale.
NEOMODEL_NEO4J_BOLT_URL=bolt://neo4j:password@umbriel.incus:7687
# --- Memcached --- # --- Memcached ---
KVDB_LOCATION=127.0.0.1:11211 KVDB_LOCATION=127.0.0.1:11211

View File

@@ -8,6 +8,9 @@ from django.urls import include, path
from . import views from . import views
urlpatterns = [ urlpatterns = [
# Health checks — no auth, must return 200 (not redirect) — use trailing slash
path("live/", views.live, name="live"),
path("ready/", views.ready, name="ready"),
# Landing / Dashboard # Landing / Dashboard
path("", views.landing, name="landing"), path("", views.landing, name="landing"),
path("dashboard/", views.dashboard, name="dashboard"), path("dashboard/", views.dashboard, name="dashboard"),

View File

@@ -3,6 +3,9 @@ Mnemosyne project-level views — landing page and dashboard.
""" """
from django.contrib.auth.decorators import login_required from django.contrib.auth.decorators import login_required
from django.core.cache import cache
from django.db import connection
from django.http import JsonResponse
from django.shortcuts import render from django.shortcuts import render
from llm_manager.models import LLMApi, LLMModel from llm_manager.models import LLMApi, LLMModel
@@ -43,3 +46,22 @@ def dashboard(request):
context["library_count"] = None context["library_count"] = None
return render(request, "mnemosyne/dashboard.html", context) return render(request, "mnemosyne/dashboard.html", context)
def live(request):
return JsonResponse({"status": "ok"})
def ready(request):
errors = {}
try:
connection.ensure_connection()
except Exception as e:
errors["db"] = str(e)
try:
cache.get("__readiness_probe__")
except Exception as e:
errors["cache"] = str(e)
if errors:
return JsonResponse({"status": "error", "errors": errors}, status=503)
return JsonResponse({"status": "ok"})

View File

@@ -2,9 +2,22 @@
# and the FastMCP server. HAProxy on Titania terminates TLS and routes by # and the FastMCP server. HAProxy on Titania terminates TLS and routes by
# hostname; this nginx is plain HTTP on the internal network. # hostname; this nginx is plain HTTP on the internal network.
# Suppress probe paths from the access log (health checks, Prometheus scrapes).
# These fire every 1530 s and would drown out real traffic in Loki.
map $request_uri $loggable {
default 1;
~^/live(/|\?|$) 0;
~^/ready(/|\?|$) 0;
~^/metrics(/|\?|$) 0;
~^/healthz(/|\?|$) 0;
~^/health 0;
~^/mcp/health(/|\?|$) 0;
~^/ping(/|\?|$) 0;
}
# Map of upstreams to give us readable proxy_pass targets and easy retries. # Map of upstreams to give us readable proxy_pass targets and easy retries.
upstream mnemosyne_web { upstream mnemosyne_app {
server web:8000 max_fails=3 fail_timeout=30s; server app:8000 max_fails=3 fail_timeout=30s;
} }
upstream mnemosyne_mcp { upstream mnemosyne_mcp {
@@ -15,16 +28,50 @@ server {
listen 80 default_server; listen 80 default_server;
server_name _; server_name _;
access_log /var/log/nginx/access.log combined if=$loggable;
# Reasonable limits — file uploads to the ingest endpoint can be big, # Reasonable limits — file uploads to the ingest endpoint can be big,
# but the bulk path is S3-direct from Daedalus. 64 MB covers admin # but the bulk path is S3-direct from Daedalus. 64 MB covers admin
# uploads and direct REST POST /library/api/items/upload. # uploads and direct REST POST /library/api/items/upload.
client_max_body_size 64m; client_max_body_size 64m;
client_body_timeout 120s; client_body_timeout 120s;
# Liveness probe — always 200 if the Django process is up.
# Use the trailing-slash form: /live/ returns 200 directly.
# /live (no slash) triggers Django's APPEND_SLASH 301 redirect, which
# will cause health check clients that don't follow redirects to fail.
location = /live/ {
proxy_pass http://mnemosyne_app;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
access_log off;
}
# Readiness probe — 200 only when PostgreSQL + Memcached are reachable.
# Same trailing-slash rule applies.
location = /ready/ {
proxy_pass http://mnemosyne_app;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
access_log off;
}
# HAProxy liveness probe — proxies through to the MCP health endpoint.
location = /healthz {
proxy_pass http://mnemosyne_mcp/mcp/health;
access_log off;
}
# Mnemosyne's REST API — Django REST Framework views + admin. # Mnemosyne's REST API — Django REST Framework views + admin.
# Under /library/api/* per mnemosyne/urls.py and /admin/* per Django. # Under /library/api/* per mnemosyne/urls.py and /admin/* per Django.
location /library/ { location /library/ {
proxy_pass http://mnemosyne_web; proxy_pass http://mnemosyne_app;
proxy_http_version 1.1; proxy_http_version 1.1;
proxy_set_header Host $host; proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Real-IP $remote_addr;
@@ -34,7 +81,7 @@ server {
} }
location /admin/ { location /admin/ {
proxy_pass http://mnemosyne_web; proxy_pass http://mnemosyne_app;
proxy_http_version 1.1; proxy_http_version 1.1;
proxy_set_header Host $host; proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Real-IP $remote_addr;
@@ -59,7 +106,7 @@ server {
} }
# Static files baked into the image at /app/staticfiles, mounted into # Static files baked into the image at /app/staticfiles, mounted into
# this nginx via a named volume populated by the web service. # this nginx via a named volume populated by the app service.
location /static/ { location /static/ {
alias /var/www/static/; alias /var/www/static/;
access_log off; access_log off;
@@ -67,20 +114,14 @@ server {
} }
# Prometheus scrape endpoint — internal networks only. # Prometheus scrape endpoint — internal networks only.
# Allows: localhost + RFC1918 private ranges (10/8, 172.16/12, 192.168/16). # Allows: loopback + all RFC1918 private ranges.
location /metrics { location /metrics {
allow 127.0.0.0/8; allow 127.0.0.0/8;
allow 10.0.0.0/8; allow 10.0.0.0/8;
allow 172.16.0.0/12; allow 172.16.0.0/12;
allow 192.168.0.0/16; allow 192.168.0.0/16;
deny all; deny all;
proxy_pass http://mnemosyne_web; proxy_pass http://mnemosyne_app;
access_log off;
}
# Liveness probe — proxies through to the MCP health endpoint.
location = /healthz {
proxy_pass http://mnemosyne_mcp/mcp/health;
access_log off; access_log off;
} }
} }