docs(bootstrap): clarify three-step Docker first-boot flow

Rework README and docker-compose comments to document the deliberate chicken-and-egg escape: the `init` sidecar now only runs `migrate` and `load_library_types`, leaving `setup_neo4j_indexes` as a manual step after the system embedding model is configured in `/admin/`. This avoids making `app` unreachable on first boot when no embedding model row exists yet, while preserving loud failure on dimension mismatch.
2026-05-10 16:15:28 -04:00
parent 19e2aee91c
commit afcbee8819
6 changed files with 102 additions and 155 deletions
--- a/README.md
+++ b/README.md
@@ -86,12 +86,15 @@ python manage.py setup_neo4j_indexes           # Create Neo4j vector + full-text
 > has ``is_system_embedding_model=True`` and a non-null ``vector_dimensions``.
 > There is deliberately no hardcoded fallback: an index built at the wrong
 > dimension silently breaks every search. The command will exit non-zero
-> with a clear error if no such row exists, which is also why the
-> ``docker compose`` ``init`` sidecar treats vector-index creation as
-> best-effort on first boot — the stack starts healthy, migrations and
-> library-type seed data land, and you run
-> ``docker compose exec app python manage.py setup_neo4j_indexes`` once
-> the embedding-model row is in place.
+> with a clear error if no such row exists, which is why the
+> ``docker compose`` ``init`` sidecar does **not** run
+> ``setup_neo4j_indexes`` — the stack brings up `migrate` +
+> `load_library_types` only, you land in `/admin/` to configure the system
+> embedding model, and then you run
+> ``docker compose exec app python manage.py setup_neo4j_indexes`` manually
+> once. Until that last step runs, vector search returns empty results and
+> `library/apps.py` logs a readiness warning. See
+> [Docker bootstrap order](#docker-bootstrap-order) below for the full flow.

 ### Start the web app

@@ -203,27 +206,45 @@ The per-service surface is defined by the `environment:` blocks in `docker-compo

 > **Broker URL gotcha.** If the RabbitMQ password contains any of `@ : / # % + ? & =` or a space, it must be percent-encoded in `CELERY_BROKER_URL`. Kombu's URL parser is strict, and this is the most common cause of a `PLAIN 403 ACCESS_REFUSED` at worker startup when the same credentials work fine under bare-Python `celery` invocations (because you were probably passing them as kwargs, not a URL).

-### First-time bring-up
+### Docker bootstrap order
+
+Three steps — the first and third are one-liners, the middle step is a
+manual sit-down in `/admin/` to configure the system embedding model.
+`setup_neo4j_indexes` is **not** run automatically: it reads vector
+dimensions from that admin row and hard-fails if the row is missing, so
+bundling it into the `init` sidecar would make `app` unreachable on
+first boot. Running it manually after admin configuration is the
+chicken-and-egg escape.

 ```bash
-# Generate the root .env from the template (or let Ansible do it)
+# 1. Generate the root .env from the template (or let Ansible do it),
+#    pull the image, and bring the stack up. The `init` sidecar runs
+#    `migrate` + `load_library_types` and exits; `app`, `mcp`, and
+#    `worker` come up healthy.
 cp .env.example .env && $EDITOR .env
-
-# Pull the image (or build locally with `docker compose build`)
 docker compose pull
-
-# Bring the stack up — the `init` sidecar runs migrations + library_type
-# defaults automatically. Vector indexes are deferred until you seed the
-# system embedding model (see below) — the sidecar logs a clear notice
-# and exits 0 either way, so the stack comes up healthy on first boot.
 docker compose up -d

-# Seed the system embedding model at /admin/llm_manager/llmmodel/
-# (mark one row `is_system_embedding_model=True` with `vector_dimensions`
-# set to whatever your embedding provider returns), then:
+# 2. Browse to /admin/llm_manager/llmapi/ and add the embedding provider
+#    (e.g. Pan Synesis, with the right base URL and API key). Then
+#    /admin/llm_manager/llmmodel/ and add one row for the embedding model:
+#       - api             = the api you just created
+#       - name            = the provider's model name
+#       - vector_dimensions = whatever your embedding provider returns
+#       - is_system_embedding_model = True
+#    Save, then come back to the shell.
+
+# 3. Create Neo4j vector + full-text indexes at the right dimensions.
+#    Idempotent — re-run after an embedding-model swap with `--drop` to
+#    rebuild, which requires re-embedding all content.
 docker compose exec app python manage.py setup_neo4j_indexes
 ```

+Until step 3 runs, vector search returns empty results and
+`library/apps.py` logs a readiness warning each time the app boots. This
+is deliberate: an index built at the wrong dimension silently breaks
+every search, so loud failure beats quiet misconfiguration.
+
 ### Day-to-day

 ```bash
--- a/docker-compose.yaml
+++ b/docker-compose.yaml
@@ -25,13 +25,26 @@
 # Run:
 #   docker compose up -d
 #
-# The `init` sidecar (below) runs Postgres migrations, Neo4j index setup,
-# and library-type seeding on every `up`. Long-running services wait for
-# it via `depends_on: init: service_completed_successfully` — so a failure
-# there (missing embedding model, dimension mismatch, unreachable DB)
-# blocks the stack rather than letting it serve silent zero-result
-# searches. The standalone `migrate` / `setup` entrypoint commands remain
-# available for ad-hoc ops work.
+# The `init` sidecar (below) runs Postgres migrations and library-type
+# seeding on every `up`. Long-running services wait for it via
+# `depends_on: init: service_completed_successfully` — so a failure there
+# (unreachable DB, broken migration) blocks the stack.
+#
+# Neo4j vector-index creation is deliberately NOT bundled into `init`.
+# `setup_neo4j_indexes` requires a system embedding model configured in
+# the admin, which only exists after first boot — an operator has to land
+# in /admin/, pick an embedding API + model, and set its vector_dimensions
+# value. Bootstrap order is therefore:
+#
+#   1. docker compose up                                          # init sidecar: migrate + load_library_types
+#   2. browse to /admin/ → llm_manager → configure system embedding model
+#   3. docker compose exec app python manage.py setup_neo4j_indexes
+#
+# Until step 3, vector search returns empty results. library/apps.py logs
+# a readiness warning when indexes are missing, so this is visible.
+# The standalone `migrate` / `setup` entrypoint commands remain available
+# for ad-hoc ops work (`setup` runs setup_neo4j_indexes + load_library_types
+# and is the typical re-run target after embedding-model changes).
 # =============================================================================


@@ -48,13 +61,15 @@ services:
      - mnemosyne-static:/shared-static
    restart: "no"

-  # ── Init sidecar: one-shot Postgres migrate + Neo4j index setup + library
-  # type seed. Runs on every `up` and exits. Long-running services below
-  # depend on `service_completed_successfully`, so a failure here (no system
-  # embedding model configured, dimension mismatch, unreachable DB) blocks
-  # `app`/`mcp`/`worker` from starting — which is the whole point. All three
-  # commands are idempotent: re-running is a no-op unless state actually
-  # needs to change.
+  # ── Init sidecar: one-shot Postgres migrate + library-type seed. Runs on
+  # every `up` and exits. Long-running services below depend on
+  # `service_completed_successfully`, so a failure here (unreachable DB,
+  # broken migration) blocks `app`/`mcp`/`worker` from starting. Both
+  # commands are idempotent.
+  #
+  # Neo4j vector-index setup is NOT run here — see the header comment for
+  # the operator bootstrap flow. Only library_type seeding touches Neo4j
+  # from this sidecar, and it does not depend on any embedding model.
  #
  # This sidecar only needs Postgres, Neo4j, and logging env — no S3, no
  # Celery, no LLM encryption key. Keep it that way.
@@ -75,7 +90,7 @@ services:
      - APP_DB_PASSWORD=${APP_DB_PASSWORD}
      - DB_HOST=${DB_HOST}
      - DB_PORT=${DB_PORT}
-      # Neo4j (setup_neo4j_indexes + load_library_types)
+      # Neo4j (load_library_types writes Library defaults into the graph)
      - NEOMODEL_NEO4J_BOLT_URL=${NEOMODEL_NEO4J_BOLT_URL}
      # Logging
      - LOGGING_LEVEL=${LOGGING_LEVEL}
--- a/docker/entrypoint.sh
+++ b/docker/entrypoint.sh
@@ -50,7 +50,9 @@ case "$1" in
    ;;

  setup)
-    # One-shot init — Neo4j indexes + library_type seed data.
+    # One-shot init — Neo4j indexes + library_type seed data. Run this
+    # manually after the system embedding model has been configured in the
+    # admin (setup_neo4j_indexes reads vector dimensions from that row).
    python manage.py setup_neo4j_indexes
    python manage.py load_library_types
    ;;
@@ -58,12 +60,26 @@ case "$1" in
  init)
    # Bundled one-shot init run by the `init` sidecar on every
    # `docker compose up`. Idempotent: re-runs are no-ops unless migrations
-    # or indexes need to change. A non-zero exit here blocks `app`, `mcp`,
-    # and `worker` from starting, which is the point — we'd rather fail
-    # loudly than serve silent zero-result searches.
+    # or library_type defaults need to change. A non-zero exit here blocks
+    # `app`, `mcp`, and `worker` from starting.
+    #
+    # Neo4j vector-index creation is *deliberately not* bundled here. That
+    # command (``setup_neo4j_indexes``) requires a system embedding model
+    # with a configured ``vector_dimensions`` value, and that model is
+    # data an operator configures through the Django admin after first
+    # boot. On a fresh stack there is no such row yet, so blocking the
+    # whole stack on it would make the admin unreachable — a chicken-and-
+    # egg. Operator bootstrap flow:
+    #
+    #   1. docker compose up           # init sidecar: migrate + load_library_types
+    #   2. browse to admin, configure system embedding model
+    #   3. docker compose exec app python manage.py setup_neo4j_indexes
+    #
+    # Until step 3 runs, vector search will return empty results — the
+    # readiness check in library/apps.py logs a warning when indexes are
+    # missing so this is visible, not silent.
    set -e
    python manage.py migrate --noinput
-    python manage.py setup_neo4j_indexes
    python manage.py load_library_types
    ;;

--- a/docs/mnemosyne.html
+++ b/docs/mnemosyne.html
@@ -295,7 +295,7 @@ graph LR

            <div class="alert alert-warning border-start border-4 border-warning">
                <h4><i class="bi bi-lightning"></i> Neo4j Indexes (managed by <code>setup_neo4j_indexes</code>)</h4>
-                <p>Created by the <code>init</code> sidecar on every <code>docker compose up</code>. Vector dimensions come from the system embedding model's <code>vector_dimensions</code> field — the command fails if no model is configured. Current production model: <strong>Pan Synesis · qwen3-vl-embedding-2b · 2048d</strong>.</p>
+                <p>Run manually after the first <code>docker compose up</code>, once the system embedding model has been configured in <code>/admin/llm_manager/llmmodel/</code>: <code>docker compose exec app python manage.py setup_neo4j_indexes</code>. Vector dimensions come from the model's <code>vector_dimensions</code> field — the command hard-fails if no such row exists, which is why it is <em>not</em> bundled into the <code>init</code> sidecar (doing so would make the admin unreachable on first boot). Current production model: <strong>Pan Synesis · qwen3-vl-embedding-2b · 2048d</strong>.</p>
                <pre class="bg-light p-3 rounded mb-0"><code>// Chunk text+image embeddings (dimensions read from system embedding model)
 CREATE VECTOR INDEX chunk_embedding_index FOR (c:Chunk)
 ON (c.embedding) OPTIONS {indexConfig: {
--- a/mnemosyne/docker/entrypoint.sh
+++ b/mnemosyne/docker/entrypoint.sh
@@ -1,111 +0,0 @@
-#!/bin/sh
-# Mnemosyne container entrypoint.
-#
-# The same image runs all three processes — the compose service supplies
-# `web`, `mcp`, `worker`, or `migrate` as CMD.
-
-set -e
-
-case "$1" in
-  web)
-    # Django REST API + admin (gunicorn → wsgi).
-    exec gunicorn \
-      --config /app/docker/gunicorn.conf.py \
-      --bind 0.0.0.0:8000 \
-      --workers "${GUNICORN_WORKERS:-3}" \
-      --access-logfile - \
-      --error-logfile - \
-      mnemosyne.wsgi:application
-    ;;
-
-  mcp)
-    # FastMCP over Streamable HTTP at /mcp/, mounted by mnemosyne.asgi.
-    exec uvicorn \
-      --host 0.0.0.0 \
-      --port 8001 \
-      --workers "${UVICORN_WORKERS:-1}" \
-      mnemosyne.asgi:app
-    ;;
-
-  worker)
-    # Celery worker covering embedding + ingest + batch + default queues.
-    # In production you may want to split these onto separate worker
-    # services for queue-level isolation; one process is fine to start.
-    exec celery -A mnemosyne worker \
-      --loglevel="${CELERY_LOG_LEVEL:-info}" \
-      --queues="${CELERY_QUEUES:-celery,embedding,batch}" \
-      --concurrency="${CELERY_CONCURRENCY:-2}"
-    ;;
-
-  beat)
-    # Celery scheduled tasks (only needed if/when periodic jobs are wired).
-    exec celery -A mnemosyne beat \
-      --loglevel="${CELERY_LOG_LEVEL:-info}"
-    ;;
-
-  migrate)
-    # One-shot DB migration runner — invoke before bringing services up
-    # for the first time or after a deploy.
-    exec python manage.py migrate --noinput
-    ;;
-
-  setup)
-    # One-shot init — Neo4j indexes + library_type seed data.
-    python manage.py setup_neo4j_indexes
-    python manage.py load_library_types
-    ;;
-
-  init)
-    # Bundled one-shot init run by the `init` sidecar on every
-    # `docker compose up`. Idempotent: re-runs are no-ops unless
-    # migrations or library-type seed data need to change.
-    #
-    # Vector-index creation intentionally runs in *best-effort* mode:
-    # ``setup_neo4j_indexes`` requires a system embedding model with a
-    # configured ``vector_dimensions`` value, and that model is data an
-    # operator seeds via the admin UI after the stack comes up for the
-    # first time. Blocking the whole stack on first boot would force
-    # every new deployer through a manual dance with the init sidecar's
-    # entrypoint; instead we log loudly and carry on, and the operator
-    # runs the command once post-boot:
-    #
-    #     docker compose exec app python manage.py setup_neo4j_indexes
-    #
-    # Full-text and neomodel constraint indexes are created by the same
-    # command and are *not* dimension-sensitive, but they also only land
-    # after the operator re-runs it — acceptable because search against
-    # an empty graph is itself a no-op.
-    set -e
-    python manage.py migrate --noinput
-    python manage.py load_library_types
-    if ! python manage.py setup_neo4j_indexes; then
-      echo ""
-      echo "============================================================"
-      echo "NOTICE: Neo4j index creation was skipped."
-      echo ""
-      echo "This is expected on a fresh deployment — vector indexes"
-      echo "require a system embedding model with vector_dimensions set."
-      echo ""
-      echo "Seed the embedding model in the Django admin"
-      echo "  (/admin/llm_manager/llmmodel/, mark one row as"
-      echo "   is_system_embedding_model=True with vector_dimensions set),"
-      echo "then run:"
-      echo ""
-      echo "  docker compose exec app python manage.py setup_neo4j_indexes"
-      echo ""
-      echo "Search endpoints will return empty results until this is done."
-      echo "============================================================"
-      echo ""
-    fi
-    ;;
-
-  shell)
-    # Drop into the management shell for ad-hoc work.
-    exec python manage.py shell
-    ;;
-
-  *)
-    # Fall through: run whatever was passed (e.g. `manage.py <cmd>`).
-    exec "$@"
-    ;;
-esac
--- a/mnemosyne/library/apps.py
+++ b/mnemosyne/library/apps.py
@@ -11,8 +11,13 @@ the stderr of a different container.

 The probe is deliberately best-effort: it cannot crash the process even if
 Neo4j is unreachable, because a transient DB blip on startup should not
-take down the whole app. The `init` sidecar is the hard gate; this is the
-second line of defence for long-running containers.
+take down the whole app. Nothing hard-gates on the vector indexes — the
+``init`` sidecar only runs ``migrate`` + ``load_library_types`` (vector
+indexes cannot be created before the system embedding model is configured
+in the admin, which is a manual step after first boot). This probe is the
+only way an operator learns that the manual
+``setup_neo4j_indexes`` step was skipped or fell out of sync with the
+current system model.
 """

 import logging
@@ -149,9 +154,10 @@ def _run_startup_probe():
    for name in _EXPECTED_VECTOR_INDEXES:
        if name not in present:
            logger.error(
-                "Neo4j vector index '%s' is missing. Run "
-                "'docker compose run --rm init' (or 'python manage.py "
-                "setup_neo4j_indexes') to rebuild.",
+                "Neo4j vector index '%s' is missing. Configure the system "
+                "embedding model in /admin/llm_manager/llmmodel/, then run "
+                "'docker compose exec app python manage.py "
+                "setup_neo4j_indexes' to create it.",
                name,
            )
            continue