feat(docker): rename web service to app, add nginx as web
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 53s
CVE Scan & Docker Build / build-and-push (push) Successful in 3m0s

Reorganize Docker Compose services: the Django/gunicorn container is now
`app` and nginx is `web`, better reflecting their roles. Add a dedicated
gunicorn configuration and install curl in the runtime image for health
checks.

Update documentation to reflect:
- Neo4j migration from ariel.incus to a dedicated umbriel.incus instance
- Rationale for requiring a dedicated Neo4j instance (single-tenancy
  assumptions, label/index isolation, schema ownership)
- New service naming in compose commands and log tailing examples
This commit is contained in:
2026-05-03 19:35:27 -04:00
parent a2c885cf34
commit 7185d326eb
10 changed files with 163 additions and 38 deletions

13
.env.example Normal file
View File

@@ -0,0 +1,13 @@
# =============================================================================
# Mnemosyne — Docker Compose environment
# =============================================================================
# This file documents variables consumed by docker-compose.yaml itself
# (image tags, port overrides, etc.). It is NOT the application config.
#
# Application config lives in mnemosyne/.env — copy mnemosyne/.env\ example
# to mnemosyne/.env and fill in your values before running `docker compose up`.
#
# This file has no required variables for a default deployment: the compose
# file uses a fixed image tag and port. Add overrides here if you parameterise
# those in docker-compose.yaml (e.g. MNEMOSYNE_IMAGE, MNEMOSYNE_PORT).
# =============================================================================

View File

@@ -62,6 +62,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
zlib1g \
libssl3 \
ca-certificates \
curl \
&& rm -rf /var/lib/apt/lists/*
ENV PYTHONDONTWRITEBYTECODE=1 \
@@ -77,6 +78,7 @@ COPY --from=builder /usr/local/bin /usr/local/bin
WORKDIR /app
COPY --from=builder /build/mnemosyne /app
COPY docker/entrypoint.sh /usr/local/bin/entrypoint.sh
COPY docker/gunicorn.conf.py /app/docker/gunicorn.conf.py
RUN chmod +x /usr/local/bin/entrypoint.sh
# Non-root user for everything that runs in this image. uid:gid 1000:1000

View File

@@ -64,11 +64,13 @@ Mnemosyne runs as three cooperating processes: the Django web app (REST API + ad
Hosts in the Ouranos lab:
- **Postgres** — `portia.incus:5432` (Django ORM: users, IngestJob)
- **Neo4j** — `ariel.incus:25554` (knowledge graph + vectors)
- **Neo4j** — `umbriel.incus:7687` (Bolt; dedicated instance — see note below — knowledge graph + vectors; HTTP Browser on `umbriel.incus:25555`)
- **RabbitMQ** — `oberon.incus:5672` (Celery broker)
- **MinIO** — `nyx.helu.ca:8555` (S3-compatible; `mnemosyne-content` and `daedalus` buckets)
- **Memcached** — `127.0.0.1:11211` (task progress)
> **Neo4j must be dedicated to Mnemosyne.** Don't share the instance with Spelunker or any other graph workload. Mnemosyne owns the `Library`, `Collection`, `Item`, `Chunk`, and `Concept` labels and runs its own indexes (`chunk_embedding_index`, full-text indexes per library_type) and schema migrations (`setup_neo4j_indexes`, `load_library_types`). The Phase-1 workspace-delete path runs label-scoped `DETACH DELETE` over those labels, and a workspace_id-scoped subgraph is the unit of isolation — both assume single-tenancy. A shared instance risks (1) label/property collisions corrupting the other tenant's graph, (2) vector-index memory contention degrading search latency for both apps, (3) management commands mutating schema another tenant depends on, and (4) backup/restore that can't be reasoned about per-app. Neo4j Community Edition is sufficient — the multi-database feature is Enterprise-only, so isolation has to come from running a separate server process. Run a dedicated instance per environment (one for staging, one for production); point each via `NEOMODEL_NEO4J_BOLT_URL` in that environment's `mnemosyne/.env`.
### One-time setup
```bash
@@ -159,14 +161,14 @@ Production runs as four containers from a single image (built and pushed by [`.g
| Service | Role | Port |
|---------|------|------|
| `web` | Django REST API + admin (gunicorn) | internal :8000 |
| `app` | Django REST API + admin (gunicorn) | internal :8000 |
| `mcp` | FastMCP server (uvicorn) | internal :22091 |
| `worker` | Celery worker — embedding/ingest/batch | — |
| `nginx` | Reverse proxy + static files | host :23090 |
| `web` | Reverse proxy + static files (nginx) | host :23090 |
Plus a one-shot `static-init` service that copies `/app/staticfiles` (baked into the image at build time via `collectstatic`) into the shared volume nginx reads from. It runs to completion on every `up`, so static-file changes propagate on each deploy without manual intervention.
External services (NOT spun up by compose): Postgres on Portia, Neo4j on Ariel, RabbitMQ on Oberon, S3/MinIO on Nyx, Memcached, embedder + reranker. All reached over the internal 10.10.0.0/24 network. URLs and credentials live in `mnemosyne/.env`.
External services (NOT spun up by compose): Postgres on Portia, Neo4j on Umbriel (dedicated Mnemosyne instance), RabbitMQ on Oberon, S3/MinIO on Nyx, Memcached, embedder + reranker. All reached over the internal 10.10.0.0/24 network. URLs and credentials live in `mnemosyne/.env`.
### First-time bring-up
@@ -175,10 +177,10 @@ External services (NOT spun up by compose): Postgres on Portia, Neo4j on Ariel,
docker compose pull
# DB migrations (one-shot)
docker compose run --rm web migrate
docker compose run --rm app migrate
# Neo4j indexes + library_type defaults (one-shot)
docker compose run --rm web setup
docker compose run --rm app setup
# Bring the stack up
docker compose up -d
@@ -188,7 +190,8 @@ docker compose up -d
```bash
docker compose ps # service status + health
docker compose logs -f web # tail web logs
docker compose logs -f app # tail Django app logs
docker compose logs -f web # tail nginx logs
docker compose logs -f worker # tail Celery worker logs
docker compose restart mcp # restart just the MCP server
@@ -210,8 +213,14 @@ The development `.env` has a few values that need adjusting for production:
### Health probes
- `GET http://nginx-host:23090/healthz` → proxies to `/mcp/health`, returns `{"status":"ok"}` when the MCP server is up
- `GET http://nginx-host:23090/metrics` → Prometheus scrape endpoint, internal-network-only
| Endpoint | Probes | Auth |
|----------|--------|------|
| `GET /live/` | Django process alive (always 200 if gunicorn is up) | None |
| `GET /ready/` | PostgreSQL + Memcached reachable (503 if either is down) | None |
| `GET /healthz` | MCP server `/mcp/health` — used as the HAProxy `health_path` | None |
| `GET /metrics` | Prometheus scrape | Internal networks only |
> **Trailing slashes matter.** Always use `/live/` and `/ready/` (with the trailing slash). The un-slashed forms (`/live`, `/ready`) trigger Django's `APPEND_SLASH` 301 redirect — health check clients that don't follow redirects will report a failure even when the service is healthy.
## Architecture Note: Retrieval, Not Synthesis

View File

@@ -2,20 +2,20 @@
# Mnemosyne — production deployment
# =============================================================================
# Four services, all from the same image:
# web — Django REST API + admin (gunicorn, port 8000)
# app — Django REST API + admin (gunicorn, port 8000)
# mcp — FastMCP server (uvicorn, port 22091)
# worker — Celery worker (embedding/ingest/batch queues)
# nginx — reverse proxy, public port 23090
# web — reverse proxy, public port 23090 (nginx)
#
# External services (NOT spun up here): Postgres on Portia, Neo4j on Ariel,
# External services (NOT spun up here): Postgres on Portia, Neo4j on Umbriel,
# RabbitMQ on Oberon, S3/MinIO on Nyx, Memcached on its own host, embedder
# and reranker on Nyx, smtp4dev on Oberon. All reached over the internal
# 10.10.0.0/24 network.
#
# Run:
# docker compose up -d
# docker compose run --rm web migrate # one-shot DB migrate
# docker compose run --rm web setup # Neo4j indexes + library types
# docker compose run --rm app migrate # one-shot DB migrate
# docker compose run --rm app setup # Neo4j indexes + library types
# =============================================================================
services:
@@ -31,8 +31,8 @@ services:
- mnemosyne-static:/shared-static
restart: "no"
# ── Web app: Django REST API + admin ───────────────────────────────────────
web:
# ── App: Django REST API + admin ──────────────────────────────────────────
app:
image: git.helu.ca/r/mnemosyne:latest
command: ["web"]
env_file: mnemosyne/.env
@@ -45,7 +45,7 @@ services:
expose:
- "8000"
healthcheck:
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/admin/login/').read()"]
test: ["CMD", "curl", "-f", "http://localhost:8000/live/"]
interval: 30s
timeout: 5s
retries: 3
@@ -62,7 +62,7 @@ services:
expose:
- "22091"
healthcheck:
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:22091/mcp/health').read()"]
test: ["CMD", "curl", "-f", "http://localhost:22091/mcp/health"]
interval: 30s
timeout: 5s
retries: 3
@@ -74,6 +74,9 @@ services:
command: ["worker"]
env_file: mnemosyne/.env
restart: unless-stopped
depends_on:
app:
condition: service_healthy
volumes:
- mnemosyne-media:/app/media
healthcheck:
@@ -83,26 +86,28 @@ services:
retries: 3
start_period: 60s
# ── nginx: reverse proxy, public port 23090 ────────────────────────────────
nginx:
# ── Web: nginx reverse proxy, public port 23090 ───────────────────────────
web:
image: nginx:alpine
restart: unless-stopped
depends_on:
- web
- mcp
app:
condition: service_healthy
mcp:
condition: service_healthy
ports:
- "23090:80"
volumes:
- ./nginx/mnemosyne.conf:/etc/nginx/conf.d/default.conf:ro
- mnemosyne-static:/var/www/static:ro
healthcheck:
test: ["CMD", "wget", "-qO-", "http://localhost/healthz"]
test: ["CMD", "curl", "-f", "http://localhost/live/"]
interval: 30s
timeout: 5s
retries: 3
volumes:
# Static files baked into the image at /app/staticfiles. The web service
# Static files baked into the image at /app/staticfiles. The app service
# mounts this volume, populating it on first start; nginx reads from it.
mnemosyne-static:
# Local FileSystemStorage fallback. Production uses USE_LOCAL_STORAGE=False

View File

@@ -10,6 +10,7 @@ case "$1" in
web)
# Django REST API + admin (gunicorn → wsgi).
exec gunicorn \
--config /app/docker/gunicorn.conf.py \
--bind 0.0.0.0:8000 \
--workers "${GUNICORN_WORKERS:-3}" \
--access-logfile - \

27
docker/gunicorn.conf.py Normal file
View File

@@ -0,0 +1,27 @@
import logging
import re
_PROBE_PATH = re.compile(
r"^(?:/live|/ready|/metrics|/healthz|/health[^ ]*|/ping)/?(?:\?|$)"
)
class _ProbePathFilter(logging.Filter):
def filter(self, record: logging.LogRecord) -> bool:
request = getattr(record, "args", None)
if isinstance(request, dict):
path = request.get("U") or request.get("r", "")
else:
path = record.getMessage()
return not _PROBE_PATH.search(path)
_filter = _ProbePathFilter()
def on_starting(server):
logging.getLogger("gunicorn.access").addFilter(_filter)
def post_worker_init(worker):
logging.getLogger("gunicorn.access").addFilter(_filter)

View File

@@ -18,7 +18,9 @@ DB_HOST=portia.incus
DB_PORT=5432
# --- Neo4j Graph Database ---
NEOMODEL_NEO4J_BOLT_URL=bolt://neo4j:password@ariel.incus:25554
# Dedicated Mnemosyne instance on Umbriel — do not share with Spelunker or any
# other graph workload. See README.md for the full rationale.
NEOMODEL_NEO4J_BOLT_URL=bolt://neo4j:password@umbriel.incus:7687
# --- Memcached ---
KVDB_LOCATION=127.0.0.1:11211

View File

@@ -8,6 +8,9 @@ from django.urls import include, path
from . import views
urlpatterns = [
# Health checks — no auth, must return 200 (not redirect) — use trailing slash
path("live/", views.live, name="live"),
path("ready/", views.ready, name="ready"),
# Landing / Dashboard
path("", views.landing, name="landing"),
path("dashboard/", views.dashboard, name="dashboard"),

View File

@@ -3,6 +3,9 @@ Mnemosyne project-level views — landing page and dashboard.
"""
from django.contrib.auth.decorators import login_required
from django.core.cache import cache
from django.db import connection
from django.http import JsonResponse
from django.shortcuts import render
from llm_manager.models import LLMApi, LLMModel
@@ -43,3 +46,22 @@ def dashboard(request):
context["library_count"] = None
return render(request, "mnemosyne/dashboard.html", context)
def live(request):
return JsonResponse({"status": "ok"})
def ready(request):
errors = {}
try:
connection.ensure_connection()
except Exception as e:
errors["db"] = str(e)
try:
cache.get("__readiness_probe__")
except Exception as e:
errors["cache"] = str(e)
if errors:
return JsonResponse({"status": "error", "errors": errors}, status=503)
return JsonResponse({"status": "ok"})

View File

@@ -2,9 +2,22 @@
# and the FastMCP server. HAProxy on Titania terminates TLS and routes by
# hostname; this nginx is plain HTTP on the internal network.
# Suppress probe paths from the access log (health checks, Prometheus scrapes).
# These fire every 1530 s and would drown out real traffic in Loki.
map $request_uri $loggable {
default 1;
~^/live(/|\?|$) 0;
~^/ready(/|\?|$) 0;
~^/metrics(/|\?|$) 0;
~^/healthz(/|\?|$) 0;
~^/health 0;
~^/mcp/health(/|\?|$) 0;
~^/ping(/|\?|$) 0;
}
# Map of upstreams to give us readable proxy_pass targets and easy retries.
upstream mnemosyne_web {
server web:8000 max_fails=3 fail_timeout=30s;
upstream mnemosyne_app {
server app:8000 max_fails=3 fail_timeout=30s;
}
upstream mnemosyne_mcp {
@@ -15,16 +28,50 @@ server {
listen 80 default_server;
server_name _;
access_log /var/log/nginx/access.log combined if=$loggable;
# Reasonable limits — file uploads to the ingest endpoint can be big,
# but the bulk path is S3-direct from Daedalus. 64 MB covers admin
# uploads and direct REST POST /library/api/items/upload.
client_max_body_size 64m;
client_body_timeout 120s;
# Liveness probe — always 200 if the Django process is up.
# Use the trailing-slash form: /live/ returns 200 directly.
# /live (no slash) triggers Django's APPEND_SLASH 301 redirect, which
# will cause health check clients that don't follow redirects to fail.
location = /live/ {
proxy_pass http://mnemosyne_app;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
access_log off;
}
# Readiness probe — 200 only when PostgreSQL + Memcached are reachable.
# Same trailing-slash rule applies.
location = /ready/ {
proxy_pass http://mnemosyne_app;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
access_log off;
}
# HAProxy liveness probe — proxies through to the MCP health endpoint.
location = /healthz {
proxy_pass http://mnemosyne_mcp/mcp/health;
access_log off;
}
# Mnemosyne's REST API — Django REST Framework views + admin.
# Under /library/api/* per mnemosyne/urls.py and /admin/* per Django.
location /library/ {
proxy_pass http://mnemosyne_web;
proxy_pass http://mnemosyne_app;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
@@ -34,7 +81,7 @@ server {
}
location /admin/ {
proxy_pass http://mnemosyne_web;
proxy_pass http://mnemosyne_app;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
@@ -59,7 +106,7 @@ server {
}
# Static files baked into the image at /app/staticfiles, mounted into
# this nginx via a named volume populated by the web service.
# this nginx via a named volume populated by the app service.
location /static/ {
alias /var/www/static/;
access_log off;
@@ -67,20 +114,14 @@ server {
}
# Prometheus scrape endpoint — internal networks only.
# Allows: localhost + RFC1918 private ranges (10/8, 172.16/12, 192.168/16).
# Allows: loopback + all RFC1918 private ranges.
location /metrics {
allow 127.0.0.0/8;
allow 10.0.0.0/8;
allow 172.16.0.0/12;
allow 192.168.0.0/16;
deny all;
proxy_pass http://mnemosyne_web;
access_log off;
}
# Liveness probe — proxies through to the MCP health endpoint.
location = /healthz {
proxy_pass http://mnemosyne_mcp/mcp/health;
proxy_pass http://mnemosyne_app;
access_log off;
}
}