docs(readme): document operations + Daedalus integration endpoints

Adds a "Running Mnemosyne" section with the three commands needed to operate the system: Django web app (gunicorn), MCP server (uvicorn on :22091), and Celery worker — with notes on the embedding queue that the Daedalus ingest task depends on. Adds the Ouranos host map (Portia / Ariel / Oberon / Nyx / Memcached), one-time setup commands (migrate, setup_neo4j_indexes, load_library_types), the Daedalus integration endpoints table, and the two new library types (business, finance) in the existing Library Types table. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 06:27:46 -04:00
parent 5527cf6bdb
commit 2a8a3d75b4
1 changed files with 85 additions and 14 deletions
--- a/README.md
+++ b/README.md
@@ -37,11 +37,14 @@ This **content-type awareness** flows through every layer: chunking strategy, em
 | Library | Example Content | Multimodal? | Graph Relationships |
 |---------|----------------|-------------|-------------------|
 | **Fiction** | Novels, short stories | Cover art | Author → Book → Character → Theme |
+| **Nonfiction** | History, biography, science writing | Photos, charts | Author → Work → Topic → Person/Place |
 | **Technical** | Textbooks, manuals, docs | Diagrams, screenshots | Product → Manual → Section → Procedure |
 | **Music** | Lyrics, liner notes | Album artwork | Artist → Album → Track → Genre |
 | **Film** | Scripts, synopses | Stills, posters | Director → Film → Scene → Actor |
 | **Art** | Descriptions, catalogs | The artwork itself | Artist → Piece → Style → Movement |
-| **Journals** | Personal entries | Photos | Date → Entry → Topic → Person/Place |
+| **Journal** | Personal entries, plans, observations | Photos | Date → Entry → Topic → Person/Place |
+| **Business** | Proposals, marketing, strategy | Logos, charts | Client → Engagement → Deliverable |
+| **Finance** | Statements, tax, market commentary | Charts, statement scans | Account → Instrument → Period |

 ## Search Pipeline

@@ -55,32 +58,100 @@ Query → Vector Search (Neo4j) + Graph Traversal (Cypher) + Full-Text Search

 Mnemosyne's RAG pipeline architecture is inspired by [Spelunker](https://git.helu.ca/r/spelunker), an enterprise RFP response platform. The proven patterns — hybrid search, two-stage RAG (responder + reviewer), citation-based retrieval, and async document processing — are carried forward and enhanced with multimodal capabilities and knowledge graph relationships.

-## Running Celery Workers
+## Running Mnemosyne

-Mnemosyne uses Celery with RabbitMQ for async document embedding. From the `mnemosyne/` directory:
+Mnemosyne runs as three cooperating processes: the Django web app (REST API + admin), the MCP server (LLM-facing tools), and one or more Celery workers (async embedding + ingest). All three read configuration from `mnemosyne/.env` (copy from `mnemosyne/.env example` and fill in secrets).
+
+Hosts in the Ouranos lab:
+- **Postgres** — `portia.incus:5432` (Django ORM: users, IngestJob)
+- **Neo4j** — `ariel.incus:25554` (knowledge graph + vectors)
+- **RabbitMQ** — `oberon.incus:5672` (Celery broker)
+- **MinIO** — `nyx.helu.ca:8555` (S3-compatible; `mnemosyne-content` and `daedalus` buckets)
+- **Memcached** — `127.0.0.1:11211` (task progress)
+
+### One-time setup

 ```bash
-# Development — single worker, all queues
+cd mnemosyne/
+python manage.py migrate                       # Apply Django ORM migrations
+python manage.py setup_neo4j_indexes           # Create Neo4j vector + full-text indexes
+python manage.py load_library_types            # Load LIBRARY_TYPE_DEFAULTS into Neo4j
+```
+
+### Start the web app
+
+The Django REST API serves `/library/api/*` (libraries, collections, items, search, workspaces, ingest) and Django admin. Use Gunicorn in production; `runserver` for dev.
+
+```bash
+cd mnemosyne/
+
+# Development
+python manage.py runserver 0.0.0.0:8000
+
+# Production
+gunicorn --bind 0.0.0.0:8000 --workers 3 mnemosyne.wsgi:application
+```
+
+### Start the MCP server
+
+The MCP server exposes the LLM-facing tools (`search`, `get_chunk`, `list_libraries`, `list_collections`, `list_items`, `get_health`) over Streamable HTTP at `/mcp` and SSE at `/mcp/sse`. Run as a separate Uvicorn process, on its own port, so it can be reverse-proxied or scaled independently of the Django app.
+
+```bash
+cd mnemosyne/
+
+# Single command: ASGI server hosting the FastMCP app
+uvicorn mnemosyne.asgi:app --host 0.0.0.0 --port 22091 --workers 1
+```
+
+The `mcp_server/asgi.py` mounts FastMCP at `/mcp` (Streamable HTTP) and `/mcp/sse` (SSE), with a `/mcp/health` JSON probe for HAProxy/Pallas.
+
+### Start a Celery worker
+
+A single worker that handles all queues (development) plus the focused command Daedalus depends on (the `embedding` queue, where the Daedalus ingest task lives).
+
+```bash
+cd mnemosyne/
+
+# Development — one worker, all queues
 celery -A mnemosyne worker -l info -Q celery,embedding,batch

-# Or skip workers entirely with eager mode (.env):
-CELERY_TASK_ALWAYS_EAGER=True
+# Production — embedding queue (handles Daedalus ingest + embed_item)
+celery -A mnemosyne worker -l info -Q embedding -c 1 -n embedding@%h
+
+# Production — batch queue (collection/library bulk operations)
+celery -A mnemosyne worker -l info -Q batch -c 2 -n batch@%h
+
+# Production — default queue (LLM validation, misc)
+celery -A mnemosyne worker -l info -Q celery -c 2 -n default@%h
 ```

-**Production — separate workers:**
-```bash
-celery -A mnemosyne worker -l info -Q embedding -c 1 -n embedding@%h    # GPU-bound embedding
-celery -A mnemosyne worker -l info -Q batch -c 2 -n batch@%h            # Batch orchestration
-celery -A mnemosyne worker -l info -Q celery -c 2 -n default@%h         # LLM API validation
-```
+Daedalus's `POST /library/api/ingest/` dispatches `library.tasks.ingest_from_daedalus` to the **embedding** queue. If you only run one worker, make sure it consumes `embedding` or that task will sit in the broker.

-**Scheduler & Monitoring:**
+To bypass workers in dev/test, set `CELERY_TASK_ALWAYS_EAGER=True` in `.env`.
+
+**Scheduler & monitoring (optional):**
 ```bash
 celery -A mnemosyne beat -l info            # Periodic task scheduler
 celery -A mnemosyne flower --port=5555      # Web monitoring UI
 ```

-See [Phase 2: Celery Workers & Scheduler](docs/PHASE_2_EMBEDDING_PIPELINE.md#celery-workers--scheduler) for full details on queues, reliability settings, and task progress tracking.
+See [Phase 2: Celery Workers & Scheduler](docs/PHASE_2_EMBEDDING_PIPELINE.md#celery-workers--scheduler) for queue tuning, reliability settings, and task progress tracking.
+
+### Daedalus integration endpoints
+
+These endpoints are used by the Daedalus FastAPI backend (HTTP Basic auth). All under `/library/api/`:
+
+| Method | Route | Purpose |
+|--------|-------|---------|
+| POST | `/workspaces/` | Create a workspace (idempotent on `workspace_id`); body: `{workspace_id, name, library_type, description?}` |
+| GET | `/workspaces/{workspace_id}/` | Workspace status (item/chunk counts) |
+| DELETE | `/workspaces/{workspace_id}/` | Delete workspace + reachable content; preserves shared concepts |
+| POST | `/ingest/` | Queue a file for ingestion + embedding |
+| GET | `/jobs/{job_id}/` | Poll ingest job status |
+| POST | `/jobs/{job_id}/retry/` | Re-dispatch a failed job |
+| GET | `/jobs/?status=&library_uid=` | List recent jobs |
+
+See [docs/mnemosyne_integration.md](docs/mnemosyne_integration.md) for the full Daedalus contract.

 ## Architecture Note: Retrieval, Not Synthesis