From afcbee88192c9dcd0c74a0ecd2afa080212dc105 Mon Sep 17 00:00:00 2001 From: Robert Helewka Date: Sun, 10 May 2026 16:15:28 -0400 Subject: [PATCH] docs(bootstrap): clarify three-step Docker first-boot flow Rework README and docker-compose comments to document the deliberate chicken-and-egg escape: the `init` sidecar now only runs `migrate` and `load_library_types`, leaving `setup_neo4j_indexes` as a manual step after the system embedding model is configured in `/admin/`. This avoids making `app` unreachable on first boot when no embedding model row exists yet, while preserving loud failure on dimension mismatch. --- README.md | 57 +++++++++++------ docker-compose.yaml | 45 ++++++++----- docker/entrypoint.sh | 26 ++++++-- docs/mnemosyne.html | 2 +- mnemosyne/docker/entrypoint.sh | 111 --------------------------------- mnemosyne/library/apps.py | 16 +++-- 6 files changed, 102 insertions(+), 155 deletions(-) delete mode 100644 mnemosyne/docker/entrypoint.sh diff --git a/README.md b/README.md index be05745..ca64842 100644 --- a/README.md +++ b/README.md @@ -86,12 +86,15 @@ python manage.py setup_neo4j_indexes # Create Neo4j vector + full-text > has ``is_system_embedding_model=True`` and a non-null ``vector_dimensions``. > There is deliberately no hardcoded fallback: an index built at the wrong > dimension silently breaks every search. The command will exit non-zero -> with a clear error if no such row exists, which is also why the -> ``docker compose`` ``init`` sidecar treats vector-index creation as -> best-effort on first boot — the stack starts healthy, migrations and -> library-type seed data land, and you run -> ``docker compose exec app python manage.py setup_neo4j_indexes`` once -> the embedding-model row is in place. +> with a clear error if no such row exists, which is why the +> ``docker compose`` ``init`` sidecar does **not** run +> ``setup_neo4j_indexes`` — the stack brings up `migrate` + +> `load_library_types` only, you land in `/admin/` to configure the system +> embedding model, and then you run +> ``docker compose exec app python manage.py setup_neo4j_indexes`` manually +> once. Until that last step runs, vector search returns empty results and +> `library/apps.py` logs a readiness warning. See +> [Docker bootstrap order](#docker-bootstrap-order) below for the full flow. ### Start the web app @@ -203,27 +206,45 @@ The per-service surface is defined by the `environment:` blocks in `docker-compo > **Broker URL gotcha.** If the RabbitMQ password contains any of `@ : / # % + ? & =` or a space, it must be percent-encoded in `CELERY_BROKER_URL`. Kombu's URL parser is strict, and this is the most common cause of a `PLAIN 403 ACCESS_REFUSED` at worker startup when the same credentials work fine under bare-Python `celery` invocations (because you were probably passing them as kwargs, not a URL). -### First-time bring-up +### Docker bootstrap order + +Three steps — the first and third are one-liners, the middle step is a +manual sit-down in `/admin/` to configure the system embedding model. +`setup_neo4j_indexes` is **not** run automatically: it reads vector +dimensions from that admin row and hard-fails if the row is missing, so +bundling it into the `init` sidecar would make `app` unreachable on +first boot. Running it manually after admin configuration is the +chicken-and-egg escape. ```bash -# Generate the root .env from the template (or let Ansible do it) +# 1. Generate the root .env from the template (or let Ansible do it), +# pull the image, and bring the stack up. The `init` sidecar runs +# `migrate` + `load_library_types` and exits; `app`, `mcp`, and +# `worker` come up healthy. cp .env.example .env && $EDITOR .env - -# Pull the image (or build locally with `docker compose build`) docker compose pull - -# Bring the stack up — the `init` sidecar runs migrations + library_type -# defaults automatically. Vector indexes are deferred until you seed the -# system embedding model (see below) — the sidecar logs a clear notice -# and exits 0 either way, so the stack comes up healthy on first boot. docker compose up -d -# Seed the system embedding model at /admin/llm_manager/llmmodel/ -# (mark one row `is_system_embedding_model=True` with `vector_dimensions` -# set to whatever your embedding provider returns), then: +# 2. Browse to /admin/llm_manager/llmapi/ and add the embedding provider +# (e.g. Pan Synesis, with the right base URL and API key). Then +# /admin/llm_manager/llmmodel/ and add one row for the embedding model: +# - api = the api you just created +# - name = the provider's model name +# - vector_dimensions = whatever your embedding provider returns +# - is_system_embedding_model = True +# Save, then come back to the shell. + +# 3. Create Neo4j vector + full-text indexes at the right dimensions. +# Idempotent — re-run after an embedding-model swap with `--drop` to +# rebuild, which requires re-embedding all content. docker compose exec app python manage.py setup_neo4j_indexes ``` +Until step 3 runs, vector search returns empty results and +`library/apps.py` logs a readiness warning each time the app boots. This +is deliberate: an index built at the wrong dimension silently breaks +every search, so loud failure beats quiet misconfiguration. + ### Day-to-day ```bash diff --git a/docker-compose.yaml b/docker-compose.yaml index 39cc0f0..bdc52e1 100644 --- a/docker-compose.yaml +++ b/docker-compose.yaml @@ -25,13 +25,26 @@ # Run: # docker compose up -d # -# The `init` sidecar (below) runs Postgres migrations, Neo4j index setup, -# and library-type seeding on every `up`. Long-running services wait for -# it via `depends_on: init: service_completed_successfully` — so a failure -# there (missing embedding model, dimension mismatch, unreachable DB) -# blocks the stack rather than letting it serve silent zero-result -# searches. The standalone `migrate` / `setup` entrypoint commands remain -# available for ad-hoc ops work. +# The `init` sidecar (below) runs Postgres migrations and library-type +# seeding on every `up`. Long-running services wait for it via +# `depends_on: init: service_completed_successfully` — so a failure there +# (unreachable DB, broken migration) blocks the stack. +# +# Neo4j vector-index creation is deliberately NOT bundled into `init`. +# `setup_neo4j_indexes` requires a system embedding model configured in +# the admin, which only exists after first boot — an operator has to land +# in /admin/, pick an embedding API + model, and set its vector_dimensions +# value. Bootstrap order is therefore: +# +# 1. docker compose up # init sidecar: migrate + load_library_types +# 2. browse to /admin/ → llm_manager → configure system embedding model +# 3. docker compose exec app python manage.py setup_neo4j_indexes +# +# Until step 3, vector search returns empty results. library/apps.py logs +# a readiness warning when indexes are missing, so this is visible. +# The standalone `migrate` / `setup` entrypoint commands remain available +# for ad-hoc ops work (`setup` runs setup_neo4j_indexes + load_library_types +# and is the typical re-run target after embedding-model changes). # ============================================================================= @@ -48,13 +61,15 @@ services: - mnemosyne-static:/shared-static restart: "no" - # ── Init sidecar: one-shot Postgres migrate + Neo4j index setup + library - # type seed. Runs on every `up` and exits. Long-running services below - # depend on `service_completed_successfully`, so a failure here (no system - # embedding model configured, dimension mismatch, unreachable DB) blocks - # `app`/`mcp`/`worker` from starting — which is the whole point. All three - # commands are idempotent: re-running is a no-op unless state actually - # needs to change. + # ── Init sidecar: one-shot Postgres migrate + library-type seed. Runs on + # every `up` and exits. Long-running services below depend on + # `service_completed_successfully`, so a failure here (unreachable DB, + # broken migration) blocks `app`/`mcp`/`worker` from starting. Both + # commands are idempotent. + # + # Neo4j vector-index setup is NOT run here — see the header comment for + # the operator bootstrap flow. Only library_type seeding touches Neo4j + # from this sidecar, and it does not depend on any embedding model. # # This sidecar only needs Postgres, Neo4j, and logging env — no S3, no # Celery, no LLM encryption key. Keep it that way. @@ -75,7 +90,7 @@ services: - APP_DB_PASSWORD=${APP_DB_PASSWORD} - DB_HOST=${DB_HOST} - DB_PORT=${DB_PORT} - # Neo4j (setup_neo4j_indexes + load_library_types) + # Neo4j (load_library_types writes Library defaults into the graph) - NEOMODEL_NEO4J_BOLT_URL=${NEOMODEL_NEO4J_BOLT_URL} # Logging - LOGGING_LEVEL=${LOGGING_LEVEL} diff --git a/docker/entrypoint.sh b/docker/entrypoint.sh index 2af9b0d..e3ea590 100644 --- a/docker/entrypoint.sh +++ b/docker/entrypoint.sh @@ -50,7 +50,9 @@ case "$1" in ;; setup) - # One-shot init — Neo4j indexes + library_type seed data. + # One-shot init — Neo4j indexes + library_type seed data. Run this + # manually after the system embedding model has been configured in the + # admin (setup_neo4j_indexes reads vector dimensions from that row). python manage.py setup_neo4j_indexes python manage.py load_library_types ;; @@ -58,12 +60,26 @@ case "$1" in init) # Bundled one-shot init run by the `init` sidecar on every # `docker compose up`. Idempotent: re-runs are no-ops unless migrations - # or indexes need to change. A non-zero exit here blocks `app`, `mcp`, - # and `worker` from starting, which is the point — we'd rather fail - # loudly than serve silent zero-result searches. + # or library_type defaults need to change. A non-zero exit here blocks + # `app`, `mcp`, and `worker` from starting. + # + # Neo4j vector-index creation is *deliberately not* bundled here. That + # command (``setup_neo4j_indexes``) requires a system embedding model + # with a configured ``vector_dimensions`` value, and that model is + # data an operator configures through the Django admin after first + # boot. On a fresh stack there is no such row yet, so blocking the + # whole stack on it would make the admin unreachable — a chicken-and- + # egg. Operator bootstrap flow: + # + # 1. docker compose up # init sidecar: migrate + load_library_types + # 2. browse to admin, configure system embedding model + # 3. docker compose exec app python manage.py setup_neo4j_indexes + # + # Until step 3 runs, vector search will return empty results — the + # readiness check in library/apps.py logs a warning when indexes are + # missing so this is visible, not silent. set -e python manage.py migrate --noinput - python manage.py setup_neo4j_indexes python manage.py load_library_types ;; diff --git a/docs/mnemosyne.html b/docs/mnemosyne.html index 34c7aee..407d337 100644 --- a/docs/mnemosyne.html +++ b/docs/mnemosyne.html @@ -295,7 +295,7 @@ graph LR

Neo4j Indexes (managed by setup_neo4j_indexes)

-

Created by the init sidecar on every docker compose up. Vector dimensions come from the system embedding model's vector_dimensions field — the command fails if no model is configured. Current production model: Pan Synesis · qwen3-vl-embedding-2b · 2048d.

+

Run manually after the first docker compose up, once the system embedding model has been configured in /admin/llm_manager/llmmodel/: docker compose exec app python manage.py setup_neo4j_indexes. Vector dimensions come from the model's vector_dimensions field — the command hard-fails if no such row exists, which is why it is not bundled into the init sidecar (doing so would make the admin unreachable on first boot). Current production model: Pan Synesis · qwen3-vl-embedding-2b · 2048d.

// Chunk text+image embeddings (dimensions read from system embedding model)
 CREATE VECTOR INDEX chunk_embedding_index FOR (c:Chunk)
 ON (c.embedding) OPTIONS {indexConfig: {
diff --git a/mnemosyne/docker/entrypoint.sh b/mnemosyne/docker/entrypoint.sh
deleted file mode 100644
index c8fc456..0000000
--- a/mnemosyne/docker/entrypoint.sh
+++ /dev/null
@@ -1,111 +0,0 @@
-#!/bin/sh
-# Mnemosyne container entrypoint.
-#
-# The same image runs all three processes — the compose service supplies
-# `web`, `mcp`, `worker`, or `migrate` as CMD.
-
-set -e
-
-case "$1" in
-  web)
-    # Django REST API + admin (gunicorn → wsgi).
-    exec gunicorn \
-      --config /app/docker/gunicorn.conf.py \
-      --bind 0.0.0.0:8000 \
-      --workers "${GUNICORN_WORKERS:-3}" \
-      --access-logfile - \
-      --error-logfile - \
-      mnemosyne.wsgi:application
-    ;;
-
-  mcp)
-    # FastMCP over Streamable HTTP at /mcp/, mounted by mnemosyne.asgi.
-    exec uvicorn \
-      --host 0.0.0.0 \
-      --port 8001 \
-      --workers "${UVICORN_WORKERS:-1}" \
-      mnemosyne.asgi:app
-    ;;
-
-  worker)
-    # Celery worker covering embedding + ingest + batch + default queues.
-    # In production you may want to split these onto separate worker
-    # services for queue-level isolation; one process is fine to start.
-    exec celery -A mnemosyne worker \
-      --loglevel="${CELERY_LOG_LEVEL:-info}" \
-      --queues="${CELERY_QUEUES:-celery,embedding,batch}" \
-      --concurrency="${CELERY_CONCURRENCY:-2}"
-    ;;
-
-  beat)
-    # Celery scheduled tasks (only needed if/when periodic jobs are wired).
-    exec celery -A mnemosyne beat \
-      --loglevel="${CELERY_LOG_LEVEL:-info}"
-    ;;
-
-  migrate)
-    # One-shot DB migration runner — invoke before bringing services up
-    # for the first time or after a deploy.
-    exec python manage.py migrate --noinput
-    ;;
-
-  setup)
-    # One-shot init — Neo4j indexes + library_type seed data.
-    python manage.py setup_neo4j_indexes
-    python manage.py load_library_types
-    ;;
-
-  init)
-    # Bundled one-shot init run by the `init` sidecar on every
-    # `docker compose up`. Idempotent: re-runs are no-ops unless
-    # migrations or library-type seed data need to change.
-    #
-    # Vector-index creation intentionally runs in *best-effort* mode:
-    # ``setup_neo4j_indexes`` requires a system embedding model with a
-    # configured ``vector_dimensions`` value, and that model is data an
-    # operator seeds via the admin UI after the stack comes up for the
-    # first time. Blocking the whole stack on first boot would force
-    # every new deployer through a manual dance with the init sidecar's
-    # entrypoint; instead we log loudly and carry on, and the operator
-    # runs the command once post-boot:
-    #
-    #     docker compose exec app python manage.py setup_neo4j_indexes
-    #
-    # Full-text and neomodel constraint indexes are created by the same
-    # command and are *not* dimension-sensitive, but they also only land
-    # after the operator re-runs it — acceptable because search against
-    # an empty graph is itself a no-op.
-    set -e
-    python manage.py migrate --noinput
-    python manage.py load_library_types
-    if ! python manage.py setup_neo4j_indexes; then
-      echo ""
-      echo "============================================================"
-      echo "NOTICE: Neo4j index creation was skipped."
-      echo ""
-      echo "This is expected on a fresh deployment — vector indexes"
-      echo "require a system embedding model with vector_dimensions set."
-      echo ""
-      echo "Seed the embedding model in the Django admin"
-      echo "  (/admin/llm_manager/llmmodel/, mark one row as"
-      echo "   is_system_embedding_model=True with vector_dimensions set),"
-      echo "then run:"
-      echo ""
-      echo "  docker compose exec app python manage.py setup_neo4j_indexes"
-      echo ""
-      echo "Search endpoints will return empty results until this is done."
-      echo "============================================================"
-      echo ""
-    fi
-    ;;
-
-  shell)
-    # Drop into the management shell for ad-hoc work.
-    exec python manage.py shell
-    ;;
-
-  *)
-    # Fall through: run whatever was passed (e.g. `manage.py `).
-    exec "$@"
-    ;;
-esac
diff --git a/mnemosyne/library/apps.py b/mnemosyne/library/apps.py
index cc69117..162d10f 100644
--- a/mnemosyne/library/apps.py
+++ b/mnemosyne/library/apps.py
@@ -11,8 +11,13 @@ the stderr of a different container.
 
 The probe is deliberately best-effort: it cannot crash the process even if
 Neo4j is unreachable, because a transient DB blip on startup should not
-take down the whole app. The `init` sidecar is the hard gate; this is the
-second line of defence for long-running containers.
+take down the whole app. Nothing hard-gates on the vector indexes — the
+``init`` sidecar only runs ``migrate`` + ``load_library_types`` (vector
+indexes cannot be created before the system embedding model is configured
+in the admin, which is a manual step after first boot). This probe is the
+only way an operator learns that the manual
+``setup_neo4j_indexes`` step was skipped or fell out of sync with the
+current system model.
 """
 
 import logging
@@ -149,9 +154,10 @@ def _run_startup_probe():
     for name in _EXPECTED_VECTOR_INDEXES:
         if name not in present:
             logger.error(
-                "Neo4j vector index '%s' is missing. Run "
-                "'docker compose run --rm init' (or 'python manage.py "
-                "setup_neo4j_indexes') to rebuild.",
+                "Neo4j vector index '%s' is missing. Configure the system "
+                "embedding model in /admin/llm_manager/llmmodel/, then run "
+                "'docker compose exec app python manage.py "
+                "setup_neo4j_indexes' to create it.",
                 name,
             )
             continue