docs(readme): clarify embedding model seed order for Neo4j indexes

Document that the system embedding model must be seeded before running `setup_neo4j_indexes`, since vector index dimensions are read from the `llm_manager_llmmodel` row. Update Docker instructions to reflect the `init` sidecar behavior, which now runs migrations and library_type defaults automatically while deferring vector index creation.
2026-05-10 14:02:41 -04:00
parent bbd65b1300
commit 19e2aee91c
2 changed files with 134 additions and 8 deletions
--- a/README.md
+++ b/README.md
@@ -76,10 +76,23 @@ Hosts in the Ouranos lab:
 ```bash
 cd mnemosyne/
 python manage.py migrate                       # Apply Django ORM migrations
-python manage.py setup_neo4j_indexes           # Create Neo4j vector + full-text indexes
 python manage.py load_library_types            # Load LIBRARY_TYPE_DEFAULTS into Neo4j
+# --- seed the system embedding model in /admin/llm_manager/llmmodel/ here ---
+python manage.py setup_neo4j_indexes           # Create Neo4j vector + full-text indexes
 ```

+> **Seed the embedding model before running `setup_neo4j_indexes`.** Vector
+> index dimensions are read from the row in ``llm_manager_llmmodel`` that
+> has ``is_system_embedding_model=True`` and a non-null ``vector_dimensions``.
+> There is deliberately no hardcoded fallback: an index built at the wrong
+> dimension silently breaks every search. The command will exit non-zero
+> with a clear error if no such row exists, which is also why the
+> ``docker compose`` ``init`` sidecar treats vector-index creation as
+> best-effort on first boot — the stack starts healthy, migrations and
+> library-type seed data land, and you run
+> ``docker compose exec app python manage.py setup_neo4j_indexes`` once
+> the embedding-model row is in place.
+
 ### Start the web app

 The Django REST API serves `/library/api/*` (libraries, collections, items, search, workspaces, ingest) and Django admin. Use Gunicorn in production; `runserver` for dev.
@@ -199,14 +212,16 @@ cp .env.example .env && $EDITOR .env
 # Pull the image (or build locally with `docker compose build`)
 docker compose pull

-# DB migrations (one-shot)
-docker compose run --rm app migrate
-
-# Neo4j indexes + library_type defaults (one-shot)
-docker compose run --rm app setup
-
-# Bring the stack up
+# Bring the stack up — the `init` sidecar runs migrations + library_type
+# defaults automatically. Vector indexes are deferred until you seed the
+# system embedding model (see below) — the sidecar logs a clear notice
+# and exits 0 either way, so the stack comes up healthy on first boot.
 docker compose up -d
+
+# Seed the system embedding model at /admin/llm_manager/llmmodel/
+# (mark one row `is_system_embedding_model=True` with `vector_dimensions`
+# set to whatever your embedding provider returns), then:
+docker compose exec app python manage.py setup_neo4j_indexes
 ```

 ### Day-to-day