docs(bootstrap): clarify three-step Docker first-boot flow
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 51s
CVE Scan & Docker Build / build-and-push (push) Successful in 2m31s

Rework README and docker-compose comments to document the deliberate
chicken-and-egg escape: the `init` sidecar now only runs `migrate` and
`load_library_types`, leaving `setup_neo4j_indexes` as a manual step
after the system embedding model is configured in `/admin/`. This
avoids making `app` unreachable on first boot when no embedding model
row exists yet, while preserving loud failure on dimension mismatch.
This commit is contained in:
2026-05-10 16:15:28 -04:00
parent 19e2aee91c
commit afcbee8819
6 changed files with 102 additions and 155 deletions

View File

@@ -86,12 +86,15 @@ python manage.py setup_neo4j_indexes # Create Neo4j vector + full-text
> has ``is_system_embedding_model=True`` and a non-null ``vector_dimensions``.
> There is deliberately no hardcoded fallback: an index built at the wrong
> dimension silently breaks every search. The command will exit non-zero
> with a clear error if no such row exists, which is also why the
> ``docker compose`` ``init`` sidecar treats vector-index creation as
> best-effort on first boot — the stack starts healthy, migrations and
> library-type seed data land, and you run
> ``docker compose exec app python manage.py setup_neo4j_indexes`` once
> the embedding-model row is in place.
> with a clear error if no such row exists, which is why the
> ``docker compose`` ``init`` sidecar does **not** run
> ``setup_neo4j_indexes`` — the stack brings up `migrate` +
> `load_library_types` only, you land in `/admin/` to configure the system
> embedding model, and then you run
> ``docker compose exec app python manage.py setup_neo4j_indexes`` manually
> once. Until that last step runs, vector search returns empty results and
> `library/apps.py` logs a readiness warning. See
> [Docker bootstrap order](#docker-bootstrap-order) below for the full flow.
### Start the web app
@@ -203,27 +206,45 @@ The per-service surface is defined by the `environment:` blocks in `docker-compo
> **Broker URL gotcha.** If the RabbitMQ password contains any of `@ : / # % + ? & =` or a space, it must be percent-encoded in `CELERY_BROKER_URL`. Kombu's URL parser is strict, and this is the most common cause of a `PLAIN 403 ACCESS_REFUSED` at worker startup when the same credentials work fine under bare-Python `celery` invocations (because you were probably passing them as kwargs, not a URL).
### First-time bring-up
### Docker bootstrap order
Three steps — the first and third are one-liners, the middle step is a
manual sit-down in `/admin/` to configure the system embedding model.
`setup_neo4j_indexes` is **not** run automatically: it reads vector
dimensions from that admin row and hard-fails if the row is missing, so
bundling it into the `init` sidecar would make `app` unreachable on
first boot. Running it manually after admin configuration is the
chicken-and-egg escape.
```bash
# Generate the root .env from the template (or let Ansible do it)
# 1. Generate the root .env from the template (or let Ansible do it),
# pull the image, and bring the stack up. The `init` sidecar runs
# `migrate` + `load_library_types` and exits; `app`, `mcp`, and
# `worker` come up healthy.
cp .env.example .env && $EDITOR .env
# Pull the image (or build locally with `docker compose build`)
docker compose pull
# Bring the stack up — the `init` sidecar runs migrations + library_type
# defaults automatically. Vector indexes are deferred until you seed the
# system embedding model (see below) — the sidecar logs a clear notice
# and exits 0 either way, so the stack comes up healthy on first boot.
docker compose up -d
# Seed the system embedding model at /admin/llm_manager/llmmodel/
# (mark one row `is_system_embedding_model=True` with `vector_dimensions`
# set to whatever your embedding provider returns), then:
# 2. Browse to /admin/llm_manager/llmapi/ and add the embedding provider
# (e.g. Pan Synesis, with the right base URL and API key). Then
# /admin/llm_manager/llmmodel/ and add one row for the embedding model:
# - api = the api you just created
# - name = the provider's model name
# - vector_dimensions = whatever your embedding provider returns
# - is_system_embedding_model = True
# Save, then come back to the shell.
# 3. Create Neo4j vector + full-text indexes at the right dimensions.
# Idempotent — re-run after an embedding-model swap with `--drop` to
# rebuild, which requires re-embedding all content.
docker compose exec app python manage.py setup_neo4j_indexes
```
Until step 3 runs, vector search returns empty results and
`library/apps.py` logs a readiness warning each time the app boots. This
is deliberate: an index built at the wrong dimension silently breaks
every search, so loud failure beats quiet misconfiguration.
### Day-to-day
```bash

View File

@@ -25,13 +25,26 @@
# Run:
# docker compose up -d
#
# The `init` sidecar (below) runs Postgres migrations, Neo4j index setup,
# and library-type seeding on every `up`. Long-running services wait for
# it via `depends_on: init: service_completed_successfully` — so a failure
# there (missing embedding model, dimension mismatch, unreachable DB)
# blocks the stack rather than letting it serve silent zero-result
# searches. The standalone `migrate` / `setup` entrypoint commands remain
# available for ad-hoc ops work.
# The `init` sidecar (below) runs Postgres migrations and library-type
# seeding on every `up`. Long-running services wait for it via
# `depends_on: init: service_completed_successfully` — so a failure there
# (unreachable DB, broken migration) blocks the stack.
#
# Neo4j vector-index creation is deliberately NOT bundled into `init`.
# `setup_neo4j_indexes` requires a system embedding model configured in
# the admin, which only exists after first boot — an operator has to land
# in /admin/, pick an embedding API + model, and set its vector_dimensions
# value. Bootstrap order is therefore:
#
# 1. docker compose up # init sidecar: migrate + load_library_types
# 2. browse to /admin/ → llm_manager → configure system embedding model
# 3. docker compose exec app python manage.py setup_neo4j_indexes
#
# Until step 3, vector search returns empty results. library/apps.py logs
# a readiness warning when indexes are missing, so this is visible.
# The standalone `migrate` / `setup` entrypoint commands remain available
# for ad-hoc ops work (`setup` runs setup_neo4j_indexes + load_library_types
# and is the typical re-run target after embedding-model changes).
# =============================================================================
@@ -48,13 +61,15 @@ services:
- mnemosyne-static:/shared-static
restart: "no"
# ── Init sidecar: one-shot Postgres migrate + Neo4j index setup + library
# type seed. Runs on every `up` and exits. Long-running services below
# depend on `service_completed_successfully`, so a failure here (no system
# embedding model configured, dimension mismatch, unreachable DB) blocks
# `app`/`mcp`/`worker` from starting — which is the whole point. All three
# commands are idempotent: re-running is a no-op unless state actually
# needs to change.
# ── Init sidecar: one-shot Postgres migrate + library-type seed. Runs on
# every `up` and exits. Long-running services below depend on
# `service_completed_successfully`, so a failure here (unreachable DB,
# broken migration) blocks `app`/`mcp`/`worker` from starting. Both
# commands are idempotent.
#
# Neo4j vector-index setup is NOT run here — see the header comment for
# the operator bootstrap flow. Only library_type seeding touches Neo4j
# from this sidecar, and it does not depend on any embedding model.
#
# This sidecar only needs Postgres, Neo4j, and logging env — no S3, no
# Celery, no LLM encryption key. Keep it that way.
@@ -75,7 +90,7 @@ services:
- APP_DB_PASSWORD=${APP_DB_PASSWORD}
- DB_HOST=${DB_HOST}
- DB_PORT=${DB_PORT}
# Neo4j (setup_neo4j_indexes + load_library_types)
# Neo4j (load_library_types writes Library defaults into the graph)
- NEOMODEL_NEO4J_BOLT_URL=${NEOMODEL_NEO4J_BOLT_URL}
# Logging
- LOGGING_LEVEL=${LOGGING_LEVEL}

View File

@@ -50,7 +50,9 @@ case "$1" in
;;
setup)
# One-shot init — Neo4j indexes + library_type seed data.
# One-shot init — Neo4j indexes + library_type seed data. Run this
# manually after the system embedding model has been configured in the
# admin (setup_neo4j_indexes reads vector dimensions from that row).
python manage.py setup_neo4j_indexes
python manage.py load_library_types
;;
@@ -58,12 +60,26 @@ case "$1" in
init)
# Bundled one-shot init run by the `init` sidecar on every
# `docker compose up`. Idempotent: re-runs are no-ops unless migrations
# or indexes need to change. A non-zero exit here blocks `app`, `mcp`,
# and `worker` from starting, which is the point — we'd rather fail
# loudly than serve silent zero-result searches.
# or library_type defaults need to change. A non-zero exit here blocks
# `app`, `mcp`, and `worker` from starting.
#
# Neo4j vector-index creation is *deliberately not* bundled here. That
# command (``setup_neo4j_indexes``) requires a system embedding model
# with a configured ``vector_dimensions`` value, and that model is
# data an operator configures through the Django admin after first
# boot. On a fresh stack there is no such row yet, so blocking the
# whole stack on it would make the admin unreachable — a chicken-and-
# egg. Operator bootstrap flow:
#
# 1. docker compose up # init sidecar: migrate + load_library_types
# 2. browse to admin, configure system embedding model
# 3. docker compose exec app python manage.py setup_neo4j_indexes
#
# Until step 3 runs, vector search will return empty results — the
# readiness check in library/apps.py logs a warning when indexes are
# missing so this is visible, not silent.
set -e
python manage.py migrate --noinput
python manage.py setup_neo4j_indexes
python manage.py load_library_types
;;

View File

@@ -295,7 +295,7 @@ graph LR
<div class="alert alert-warning border-start border-4 border-warning">
<h4><i class="bi bi-lightning"></i> Neo4j Indexes (managed by <code>setup_neo4j_indexes</code>)</h4>
<p>Created by the <code>init</code> sidecar on every <code>docker compose up</code>. Vector dimensions come from the system embedding model's <code>vector_dimensions</code> field — the command fails if no model is configured. Current production model: <strong>Pan Synesis · qwen3-vl-embedding-2b · 2048d</strong>.</p>
<p>Run manually after the first <code>docker compose up</code>, once the system embedding model has been configured in <code>/admin/llm_manager/llmmodel/</code>: <code>docker compose exec app python manage.py setup_neo4j_indexes</code>. Vector dimensions come from the model's <code>vector_dimensions</code> field — the command hard-fails if no such row exists, which is why it is <em>not</em> bundled into the <code>init</code> sidecar (doing so would make the admin unreachable on first boot). Current production model: <strong>Pan Synesis · qwen3-vl-embedding-2b · 2048d</strong>.</p>
<pre class="bg-light p-3 rounded mb-0"><code>// Chunk text+image embeddings (dimensions read from system embedding model)
CREATE VECTOR INDEX chunk_embedding_index FOR (c:Chunk)
ON (c.embedding) OPTIONS {indexConfig: {

View File

@@ -1,111 +0,0 @@
#!/bin/sh
# Mnemosyne container entrypoint.
#
# The same image runs all three processes — the compose service supplies
# `web`, `mcp`, `worker`, or `migrate` as CMD.
set -e
case "$1" in
web)
# Django REST API + admin (gunicorn → wsgi).
exec gunicorn \
--config /app/docker/gunicorn.conf.py \
--bind 0.0.0.0:8000 \
--workers "${GUNICORN_WORKERS:-3}" \
--access-logfile - \
--error-logfile - \
mnemosyne.wsgi:application
;;
mcp)
# FastMCP over Streamable HTTP at /mcp/, mounted by mnemosyne.asgi.
exec uvicorn \
--host 0.0.0.0 \
--port 8001 \
--workers "${UVICORN_WORKERS:-1}" \
mnemosyne.asgi:app
;;
worker)
# Celery worker covering embedding + ingest + batch + default queues.
# In production you may want to split these onto separate worker
# services for queue-level isolation; one process is fine to start.
exec celery -A mnemosyne worker \
--loglevel="${CELERY_LOG_LEVEL:-info}" \
--queues="${CELERY_QUEUES:-celery,embedding,batch}" \
--concurrency="${CELERY_CONCURRENCY:-2}"
;;
beat)
# Celery scheduled tasks (only needed if/when periodic jobs are wired).
exec celery -A mnemosyne beat \
--loglevel="${CELERY_LOG_LEVEL:-info}"
;;
migrate)
# One-shot DB migration runner — invoke before bringing services up
# for the first time or after a deploy.
exec python manage.py migrate --noinput
;;
setup)
# One-shot init — Neo4j indexes + library_type seed data.
python manage.py setup_neo4j_indexes
python manage.py load_library_types
;;
init)
# Bundled one-shot init run by the `init` sidecar on every
# `docker compose up`. Idempotent: re-runs are no-ops unless
# migrations or library-type seed data need to change.
#
# Vector-index creation intentionally runs in *best-effort* mode:
# ``setup_neo4j_indexes`` requires a system embedding model with a
# configured ``vector_dimensions`` value, and that model is data an
# operator seeds via the admin UI after the stack comes up for the
# first time. Blocking the whole stack on first boot would force
# every new deployer through a manual dance with the init sidecar's
# entrypoint; instead we log loudly and carry on, and the operator
# runs the command once post-boot:
#
# docker compose exec app python manage.py setup_neo4j_indexes
#
# Full-text and neomodel constraint indexes are created by the same
# command and are *not* dimension-sensitive, but they also only land
# after the operator re-runs it — acceptable because search against
# an empty graph is itself a no-op.
set -e
python manage.py migrate --noinput
python manage.py load_library_types
if ! python manage.py setup_neo4j_indexes; then
echo ""
echo "============================================================"
echo "NOTICE: Neo4j index creation was skipped."
echo ""
echo "This is expected on a fresh deployment — vector indexes"
echo "require a system embedding model with vector_dimensions set."
echo ""
echo "Seed the embedding model in the Django admin"
echo " (/admin/llm_manager/llmmodel/, mark one row as"
echo " is_system_embedding_model=True with vector_dimensions set),"
echo "then run:"
echo ""
echo " docker compose exec app python manage.py setup_neo4j_indexes"
echo ""
echo "Search endpoints will return empty results until this is done."
echo "============================================================"
echo ""
fi
;;
shell)
# Drop into the management shell for ad-hoc work.
exec python manage.py shell
;;
*)
# Fall through: run whatever was passed (e.g. `manage.py <cmd>`).
exec "$@"
;;
esac

View File

@@ -11,8 +11,13 @@ the stderr of a different container.
The probe is deliberately best-effort: it cannot crash the process even if
Neo4j is unreachable, because a transient DB blip on startup should not
take down the whole app. The `init` sidecar is the hard gate; this is the
second line of defence for long-running containers.
take down the whole app. Nothing hard-gates on the vector indexes — the
``init`` sidecar only runs ``migrate`` + ``load_library_types`` (vector
indexes cannot be created before the system embedding model is configured
in the admin, which is a manual step after first boot). This probe is the
only way an operator learns that the manual
``setup_neo4j_indexes`` step was skipped or fell out of sync with the
current system model.
"""
import logging
@@ -149,9 +154,10 @@ def _run_startup_probe():
for name in _EXPECTED_VECTOR_INDEXES:
if name not in present:
logger.error(
"Neo4j vector index '%s' is missing. Run "
"'docker compose run --rm init' (or 'python manage.py "
"setup_neo4j_indexes') to rebuild.",
"Neo4j vector index '%s' is missing. Configure the system "
"embedding model in /admin/llm_manager/llmmodel/, then run "
"'docker compose exec app python manage.py "
"setup_neo4j_indexes' to create it.",
name,
)
continue