docs(bootstrap): clarify three-step Docker first-boot flow
Rework README and docker-compose comments to document the deliberate chicken-and-egg escape: the `init` sidecar now only runs `migrate` and `load_library_types`, leaving `setup_neo4j_indexes` as a manual step after the system embedding model is configured in `/admin/`. This avoids making `app` unreachable on first boot when no embedding model row exists yet, while preserving loud failure on dimension mismatch.
This commit is contained in:
57
README.md
57
README.md
@@ -86,12 +86,15 @@ python manage.py setup_neo4j_indexes # Create Neo4j vector + full-text
|
|||||||
> has ``is_system_embedding_model=True`` and a non-null ``vector_dimensions``.
|
> has ``is_system_embedding_model=True`` and a non-null ``vector_dimensions``.
|
||||||
> There is deliberately no hardcoded fallback: an index built at the wrong
|
> There is deliberately no hardcoded fallback: an index built at the wrong
|
||||||
> dimension silently breaks every search. The command will exit non-zero
|
> dimension silently breaks every search. The command will exit non-zero
|
||||||
> with a clear error if no such row exists, which is also why the
|
> with a clear error if no such row exists, which is why the
|
||||||
> ``docker compose`` ``init`` sidecar treats vector-index creation as
|
> ``docker compose`` ``init`` sidecar does **not** run
|
||||||
> best-effort on first boot — the stack starts healthy, migrations and
|
> ``setup_neo4j_indexes`` — the stack brings up `migrate` +
|
||||||
> library-type seed data land, and you run
|
> `load_library_types` only, you land in `/admin/` to configure the system
|
||||||
> ``docker compose exec app python manage.py setup_neo4j_indexes`` once
|
> embedding model, and then you run
|
||||||
> the embedding-model row is in place.
|
> ``docker compose exec app python manage.py setup_neo4j_indexes`` manually
|
||||||
|
> once. Until that last step runs, vector search returns empty results and
|
||||||
|
> `library/apps.py` logs a readiness warning. See
|
||||||
|
> [Docker bootstrap order](#docker-bootstrap-order) below for the full flow.
|
||||||
|
|
||||||
### Start the web app
|
### Start the web app
|
||||||
|
|
||||||
@@ -203,27 +206,45 @@ The per-service surface is defined by the `environment:` blocks in `docker-compo
|
|||||||
|
|
||||||
> **Broker URL gotcha.** If the RabbitMQ password contains any of `@ : / # % + ? & =` or a space, it must be percent-encoded in `CELERY_BROKER_URL`. Kombu's URL parser is strict, and this is the most common cause of a `PLAIN 403 ACCESS_REFUSED` at worker startup when the same credentials work fine under bare-Python `celery` invocations (because you were probably passing them as kwargs, not a URL).
|
> **Broker URL gotcha.** If the RabbitMQ password contains any of `@ : / # % + ? & =` or a space, it must be percent-encoded in `CELERY_BROKER_URL`. Kombu's URL parser is strict, and this is the most common cause of a `PLAIN 403 ACCESS_REFUSED` at worker startup when the same credentials work fine under bare-Python `celery` invocations (because you were probably passing them as kwargs, not a URL).
|
||||||
|
|
||||||
### First-time bring-up
|
### Docker bootstrap order
|
||||||
|
|
||||||
|
Three steps — the first and third are one-liners, the middle step is a
|
||||||
|
manual sit-down in `/admin/` to configure the system embedding model.
|
||||||
|
`setup_neo4j_indexes` is **not** run automatically: it reads vector
|
||||||
|
dimensions from that admin row and hard-fails if the row is missing, so
|
||||||
|
bundling it into the `init` sidecar would make `app` unreachable on
|
||||||
|
first boot. Running it manually after admin configuration is the
|
||||||
|
chicken-and-egg escape.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Generate the root .env from the template (or let Ansible do it)
|
# 1. Generate the root .env from the template (or let Ansible do it),
|
||||||
|
# pull the image, and bring the stack up. The `init` sidecar runs
|
||||||
|
# `migrate` + `load_library_types` and exits; `app`, `mcp`, and
|
||||||
|
# `worker` come up healthy.
|
||||||
cp .env.example .env && $EDITOR .env
|
cp .env.example .env && $EDITOR .env
|
||||||
|
|
||||||
# Pull the image (or build locally with `docker compose build`)
|
|
||||||
docker compose pull
|
docker compose pull
|
||||||
|
|
||||||
# Bring the stack up — the `init` sidecar runs migrations + library_type
|
|
||||||
# defaults automatically. Vector indexes are deferred until you seed the
|
|
||||||
# system embedding model (see below) — the sidecar logs a clear notice
|
|
||||||
# and exits 0 either way, so the stack comes up healthy on first boot.
|
|
||||||
docker compose up -d
|
docker compose up -d
|
||||||
|
|
||||||
# Seed the system embedding model at /admin/llm_manager/llmmodel/
|
# 2. Browse to /admin/llm_manager/llmapi/ and add the embedding provider
|
||||||
# (mark one row `is_system_embedding_model=True` with `vector_dimensions`
|
# (e.g. Pan Synesis, with the right base URL and API key). Then
|
||||||
# set to whatever your embedding provider returns), then:
|
# /admin/llm_manager/llmmodel/ and add one row for the embedding model:
|
||||||
|
# - api = the api you just created
|
||||||
|
# - name = the provider's model name
|
||||||
|
# - vector_dimensions = whatever your embedding provider returns
|
||||||
|
# - is_system_embedding_model = True
|
||||||
|
# Save, then come back to the shell.
|
||||||
|
|
||||||
|
# 3. Create Neo4j vector + full-text indexes at the right dimensions.
|
||||||
|
# Idempotent — re-run after an embedding-model swap with `--drop` to
|
||||||
|
# rebuild, which requires re-embedding all content.
|
||||||
docker compose exec app python manage.py setup_neo4j_indexes
|
docker compose exec app python manage.py setup_neo4j_indexes
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Until step 3 runs, vector search returns empty results and
|
||||||
|
`library/apps.py` logs a readiness warning each time the app boots. This
|
||||||
|
is deliberate: an index built at the wrong dimension silently breaks
|
||||||
|
every search, so loud failure beats quiet misconfiguration.
|
||||||
|
|
||||||
### Day-to-day
|
### Day-to-day
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|||||||
@@ -25,13 +25,26 @@
|
|||||||
# Run:
|
# Run:
|
||||||
# docker compose up -d
|
# docker compose up -d
|
||||||
#
|
#
|
||||||
# The `init` sidecar (below) runs Postgres migrations, Neo4j index setup,
|
# The `init` sidecar (below) runs Postgres migrations and library-type
|
||||||
# and library-type seeding on every `up`. Long-running services wait for
|
# seeding on every `up`. Long-running services wait for it via
|
||||||
# it via `depends_on: init: service_completed_successfully` — so a failure
|
# `depends_on: init: service_completed_successfully` — so a failure there
|
||||||
# there (missing embedding model, dimension mismatch, unreachable DB)
|
# (unreachable DB, broken migration) blocks the stack.
|
||||||
# blocks the stack rather than letting it serve silent zero-result
|
#
|
||||||
# searches. The standalone `migrate` / `setup` entrypoint commands remain
|
# Neo4j vector-index creation is deliberately NOT bundled into `init`.
|
||||||
# available for ad-hoc ops work.
|
# `setup_neo4j_indexes` requires a system embedding model configured in
|
||||||
|
# the admin, which only exists after first boot — an operator has to land
|
||||||
|
# in /admin/, pick an embedding API + model, and set its vector_dimensions
|
||||||
|
# value. Bootstrap order is therefore:
|
||||||
|
#
|
||||||
|
# 1. docker compose up # init sidecar: migrate + load_library_types
|
||||||
|
# 2. browse to /admin/ → llm_manager → configure system embedding model
|
||||||
|
# 3. docker compose exec app python manage.py setup_neo4j_indexes
|
||||||
|
#
|
||||||
|
# Until step 3, vector search returns empty results. library/apps.py logs
|
||||||
|
# a readiness warning when indexes are missing, so this is visible.
|
||||||
|
# The standalone `migrate` / `setup` entrypoint commands remain available
|
||||||
|
# for ad-hoc ops work (`setup` runs setup_neo4j_indexes + load_library_types
|
||||||
|
# and is the typical re-run target after embedding-model changes).
|
||||||
# =============================================================================
|
# =============================================================================
|
||||||
|
|
||||||
|
|
||||||
@@ -48,13 +61,15 @@ services:
|
|||||||
- mnemosyne-static:/shared-static
|
- mnemosyne-static:/shared-static
|
||||||
restart: "no"
|
restart: "no"
|
||||||
|
|
||||||
# ── Init sidecar: one-shot Postgres migrate + Neo4j index setup + library
|
# ── Init sidecar: one-shot Postgres migrate + library-type seed. Runs on
|
||||||
# type seed. Runs on every `up` and exits. Long-running services below
|
# every `up` and exits. Long-running services below depend on
|
||||||
# depend on `service_completed_successfully`, so a failure here (no system
|
# `service_completed_successfully`, so a failure here (unreachable DB,
|
||||||
# embedding model configured, dimension mismatch, unreachable DB) blocks
|
# broken migration) blocks `app`/`mcp`/`worker` from starting. Both
|
||||||
# `app`/`mcp`/`worker` from starting — which is the whole point. All three
|
# commands are idempotent.
|
||||||
# commands are idempotent: re-running is a no-op unless state actually
|
#
|
||||||
# needs to change.
|
# Neo4j vector-index setup is NOT run here — see the header comment for
|
||||||
|
# the operator bootstrap flow. Only library_type seeding touches Neo4j
|
||||||
|
# from this sidecar, and it does not depend on any embedding model.
|
||||||
#
|
#
|
||||||
# This sidecar only needs Postgres, Neo4j, and logging env — no S3, no
|
# This sidecar only needs Postgres, Neo4j, and logging env — no S3, no
|
||||||
# Celery, no LLM encryption key. Keep it that way.
|
# Celery, no LLM encryption key. Keep it that way.
|
||||||
@@ -75,7 +90,7 @@ services:
|
|||||||
- APP_DB_PASSWORD=${APP_DB_PASSWORD}
|
- APP_DB_PASSWORD=${APP_DB_PASSWORD}
|
||||||
- DB_HOST=${DB_HOST}
|
- DB_HOST=${DB_HOST}
|
||||||
- DB_PORT=${DB_PORT}
|
- DB_PORT=${DB_PORT}
|
||||||
# Neo4j (setup_neo4j_indexes + load_library_types)
|
# Neo4j (load_library_types writes Library defaults into the graph)
|
||||||
- NEOMODEL_NEO4J_BOLT_URL=${NEOMODEL_NEO4J_BOLT_URL}
|
- NEOMODEL_NEO4J_BOLT_URL=${NEOMODEL_NEO4J_BOLT_URL}
|
||||||
# Logging
|
# Logging
|
||||||
- LOGGING_LEVEL=${LOGGING_LEVEL}
|
- LOGGING_LEVEL=${LOGGING_LEVEL}
|
||||||
|
|||||||
@@ -50,7 +50,9 @@ case "$1" in
|
|||||||
;;
|
;;
|
||||||
|
|
||||||
setup)
|
setup)
|
||||||
# One-shot init — Neo4j indexes + library_type seed data.
|
# One-shot init — Neo4j indexes + library_type seed data. Run this
|
||||||
|
# manually after the system embedding model has been configured in the
|
||||||
|
# admin (setup_neo4j_indexes reads vector dimensions from that row).
|
||||||
python manage.py setup_neo4j_indexes
|
python manage.py setup_neo4j_indexes
|
||||||
python manage.py load_library_types
|
python manage.py load_library_types
|
||||||
;;
|
;;
|
||||||
@@ -58,12 +60,26 @@ case "$1" in
|
|||||||
init)
|
init)
|
||||||
# Bundled one-shot init run by the `init` sidecar on every
|
# Bundled one-shot init run by the `init` sidecar on every
|
||||||
# `docker compose up`. Idempotent: re-runs are no-ops unless migrations
|
# `docker compose up`. Idempotent: re-runs are no-ops unless migrations
|
||||||
# or indexes need to change. A non-zero exit here blocks `app`, `mcp`,
|
# or library_type defaults need to change. A non-zero exit here blocks
|
||||||
# and `worker` from starting, which is the point — we'd rather fail
|
# `app`, `mcp`, and `worker` from starting.
|
||||||
# loudly than serve silent zero-result searches.
|
#
|
||||||
|
# Neo4j vector-index creation is *deliberately not* bundled here. That
|
||||||
|
# command (``setup_neo4j_indexes``) requires a system embedding model
|
||||||
|
# with a configured ``vector_dimensions`` value, and that model is
|
||||||
|
# data an operator configures through the Django admin after first
|
||||||
|
# boot. On a fresh stack there is no such row yet, so blocking the
|
||||||
|
# whole stack on it would make the admin unreachable — a chicken-and-
|
||||||
|
# egg. Operator bootstrap flow:
|
||||||
|
#
|
||||||
|
# 1. docker compose up # init sidecar: migrate + load_library_types
|
||||||
|
# 2. browse to admin, configure system embedding model
|
||||||
|
# 3. docker compose exec app python manage.py setup_neo4j_indexes
|
||||||
|
#
|
||||||
|
# Until step 3 runs, vector search will return empty results — the
|
||||||
|
# readiness check in library/apps.py logs a warning when indexes are
|
||||||
|
# missing so this is visible, not silent.
|
||||||
set -e
|
set -e
|
||||||
python manage.py migrate --noinput
|
python manage.py migrate --noinput
|
||||||
python manage.py setup_neo4j_indexes
|
|
||||||
python manage.py load_library_types
|
python manage.py load_library_types
|
||||||
;;
|
;;
|
||||||
|
|
||||||
|
|||||||
@@ -295,7 +295,7 @@ graph LR
|
|||||||
|
|
||||||
<div class="alert alert-warning border-start border-4 border-warning">
|
<div class="alert alert-warning border-start border-4 border-warning">
|
||||||
<h4><i class="bi bi-lightning"></i> Neo4j Indexes (managed by <code>setup_neo4j_indexes</code>)</h4>
|
<h4><i class="bi bi-lightning"></i> Neo4j Indexes (managed by <code>setup_neo4j_indexes</code>)</h4>
|
||||||
<p>Created by the <code>init</code> sidecar on every <code>docker compose up</code>. Vector dimensions come from the system embedding model's <code>vector_dimensions</code> field — the command fails if no model is configured. Current production model: <strong>Pan Synesis · qwen3-vl-embedding-2b · 2048d</strong>.</p>
|
<p>Run manually after the first <code>docker compose up</code>, once the system embedding model has been configured in <code>/admin/llm_manager/llmmodel/</code>: <code>docker compose exec app python manage.py setup_neo4j_indexes</code>. Vector dimensions come from the model's <code>vector_dimensions</code> field — the command hard-fails if no such row exists, which is why it is <em>not</em> bundled into the <code>init</code> sidecar (doing so would make the admin unreachable on first boot). Current production model: <strong>Pan Synesis · qwen3-vl-embedding-2b · 2048d</strong>.</p>
|
||||||
<pre class="bg-light p-3 rounded mb-0"><code>// Chunk text+image embeddings (dimensions read from system embedding model)
|
<pre class="bg-light p-3 rounded mb-0"><code>// Chunk text+image embeddings (dimensions read from system embedding model)
|
||||||
CREATE VECTOR INDEX chunk_embedding_index FOR (c:Chunk)
|
CREATE VECTOR INDEX chunk_embedding_index FOR (c:Chunk)
|
||||||
ON (c.embedding) OPTIONS {indexConfig: {
|
ON (c.embedding) OPTIONS {indexConfig: {
|
||||||
|
|||||||
@@ -1,111 +0,0 @@
|
|||||||
#!/bin/sh
|
|
||||||
# Mnemosyne container entrypoint.
|
|
||||||
#
|
|
||||||
# The same image runs all three processes — the compose service supplies
|
|
||||||
# `web`, `mcp`, `worker`, or `migrate` as CMD.
|
|
||||||
|
|
||||||
set -e
|
|
||||||
|
|
||||||
case "$1" in
|
|
||||||
web)
|
|
||||||
# Django REST API + admin (gunicorn → wsgi).
|
|
||||||
exec gunicorn \
|
|
||||||
--config /app/docker/gunicorn.conf.py \
|
|
||||||
--bind 0.0.0.0:8000 \
|
|
||||||
--workers "${GUNICORN_WORKERS:-3}" \
|
|
||||||
--access-logfile - \
|
|
||||||
--error-logfile - \
|
|
||||||
mnemosyne.wsgi:application
|
|
||||||
;;
|
|
||||||
|
|
||||||
mcp)
|
|
||||||
# FastMCP over Streamable HTTP at /mcp/, mounted by mnemosyne.asgi.
|
|
||||||
exec uvicorn \
|
|
||||||
--host 0.0.0.0 \
|
|
||||||
--port 8001 \
|
|
||||||
--workers "${UVICORN_WORKERS:-1}" \
|
|
||||||
mnemosyne.asgi:app
|
|
||||||
;;
|
|
||||||
|
|
||||||
worker)
|
|
||||||
# Celery worker covering embedding + ingest + batch + default queues.
|
|
||||||
# In production you may want to split these onto separate worker
|
|
||||||
# services for queue-level isolation; one process is fine to start.
|
|
||||||
exec celery -A mnemosyne worker \
|
|
||||||
--loglevel="${CELERY_LOG_LEVEL:-info}" \
|
|
||||||
--queues="${CELERY_QUEUES:-celery,embedding,batch}" \
|
|
||||||
--concurrency="${CELERY_CONCURRENCY:-2}"
|
|
||||||
;;
|
|
||||||
|
|
||||||
beat)
|
|
||||||
# Celery scheduled tasks (only needed if/when periodic jobs are wired).
|
|
||||||
exec celery -A mnemosyne beat \
|
|
||||||
--loglevel="${CELERY_LOG_LEVEL:-info}"
|
|
||||||
;;
|
|
||||||
|
|
||||||
migrate)
|
|
||||||
# One-shot DB migration runner — invoke before bringing services up
|
|
||||||
# for the first time or after a deploy.
|
|
||||||
exec python manage.py migrate --noinput
|
|
||||||
;;
|
|
||||||
|
|
||||||
setup)
|
|
||||||
# One-shot init — Neo4j indexes + library_type seed data.
|
|
||||||
python manage.py setup_neo4j_indexes
|
|
||||||
python manage.py load_library_types
|
|
||||||
;;
|
|
||||||
|
|
||||||
init)
|
|
||||||
# Bundled one-shot init run by the `init` sidecar on every
|
|
||||||
# `docker compose up`. Idempotent: re-runs are no-ops unless
|
|
||||||
# migrations or library-type seed data need to change.
|
|
||||||
#
|
|
||||||
# Vector-index creation intentionally runs in *best-effort* mode:
|
|
||||||
# ``setup_neo4j_indexes`` requires a system embedding model with a
|
|
||||||
# configured ``vector_dimensions`` value, and that model is data an
|
|
||||||
# operator seeds via the admin UI after the stack comes up for the
|
|
||||||
# first time. Blocking the whole stack on first boot would force
|
|
||||||
# every new deployer through a manual dance with the init sidecar's
|
|
||||||
# entrypoint; instead we log loudly and carry on, and the operator
|
|
||||||
# runs the command once post-boot:
|
|
||||||
#
|
|
||||||
# docker compose exec app python manage.py setup_neo4j_indexes
|
|
||||||
#
|
|
||||||
# Full-text and neomodel constraint indexes are created by the same
|
|
||||||
# command and are *not* dimension-sensitive, but they also only land
|
|
||||||
# after the operator re-runs it — acceptable because search against
|
|
||||||
# an empty graph is itself a no-op.
|
|
||||||
set -e
|
|
||||||
python manage.py migrate --noinput
|
|
||||||
python manage.py load_library_types
|
|
||||||
if ! python manage.py setup_neo4j_indexes; then
|
|
||||||
echo ""
|
|
||||||
echo "============================================================"
|
|
||||||
echo "NOTICE: Neo4j index creation was skipped."
|
|
||||||
echo ""
|
|
||||||
echo "This is expected on a fresh deployment — vector indexes"
|
|
||||||
echo "require a system embedding model with vector_dimensions set."
|
|
||||||
echo ""
|
|
||||||
echo "Seed the embedding model in the Django admin"
|
|
||||||
echo " (/admin/llm_manager/llmmodel/, mark one row as"
|
|
||||||
echo " is_system_embedding_model=True with vector_dimensions set),"
|
|
||||||
echo "then run:"
|
|
||||||
echo ""
|
|
||||||
echo " docker compose exec app python manage.py setup_neo4j_indexes"
|
|
||||||
echo ""
|
|
||||||
echo "Search endpoints will return empty results until this is done."
|
|
||||||
echo "============================================================"
|
|
||||||
echo ""
|
|
||||||
fi
|
|
||||||
;;
|
|
||||||
|
|
||||||
shell)
|
|
||||||
# Drop into the management shell for ad-hoc work.
|
|
||||||
exec python manage.py shell
|
|
||||||
;;
|
|
||||||
|
|
||||||
*)
|
|
||||||
# Fall through: run whatever was passed (e.g. `manage.py <cmd>`).
|
|
||||||
exec "$@"
|
|
||||||
;;
|
|
||||||
esac
|
|
||||||
@@ -11,8 +11,13 @@ the stderr of a different container.
|
|||||||
|
|
||||||
The probe is deliberately best-effort: it cannot crash the process even if
|
The probe is deliberately best-effort: it cannot crash the process even if
|
||||||
Neo4j is unreachable, because a transient DB blip on startup should not
|
Neo4j is unreachable, because a transient DB blip on startup should not
|
||||||
take down the whole app. The `init` sidecar is the hard gate; this is the
|
take down the whole app. Nothing hard-gates on the vector indexes — the
|
||||||
second line of defence for long-running containers.
|
``init`` sidecar only runs ``migrate`` + ``load_library_types`` (vector
|
||||||
|
indexes cannot be created before the system embedding model is configured
|
||||||
|
in the admin, which is a manual step after first boot). This probe is the
|
||||||
|
only way an operator learns that the manual
|
||||||
|
``setup_neo4j_indexes`` step was skipped or fell out of sync with the
|
||||||
|
current system model.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import logging
|
import logging
|
||||||
@@ -149,9 +154,10 @@ def _run_startup_probe():
|
|||||||
for name in _EXPECTED_VECTOR_INDEXES:
|
for name in _EXPECTED_VECTOR_INDEXES:
|
||||||
if name not in present:
|
if name not in present:
|
||||||
logger.error(
|
logger.error(
|
||||||
"Neo4j vector index '%s' is missing. Run "
|
"Neo4j vector index '%s' is missing. Configure the system "
|
||||||
"'docker compose run --rm init' (or 'python manage.py "
|
"embedding model in /admin/llm_manager/llmmodel/, then run "
|
||||||
"setup_neo4j_indexes') to rebuild.",
|
"'docker compose exec app python manage.py "
|
||||||
|
"setup_neo4j_indexes' to create it.",
|
||||||
name,
|
name,
|
||||||
)
|
)
|
||||||
continue
|
continue
|
||||||
|
|||||||
Reference in New Issue
Block a user