31 Commits

Author SHA1 Message Date
2af72d6e82 ci: build only on push to main, not on pull_request
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 3m30s
CVE Scan & Docker Build / build-and-push (push) Successful in 2m33s
Drop the pull_request:[main] trigger so the CVE scan + Docker build runs
only when changes land on main, not when a PR is opened against it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-18 06:14:52 -04:00
70b1fc510b Merge pull request 'fix(tests): repair stale mock.patch targets after service refactors' (#2) from fix/stale-test-patch-targets into main
Some checks failed
CVE Scan & Docker Build / security-scan (push) Has been cancelled
CVE Scan & Docker Build / build-and-push (push) Has been cancelled
Build & Deploy Docs / build-and-deploy (push) Successful in 1m11s
Reviewed-on: #2
2026-06-18 02:01:25 +00:00
46ca2a934d Merge pull request 'feat/workspace-name-conflict-409' (#1) from feat/workspace-name-conflict-409 into main
Some checks failed
CVE Scan & Docker Build / security-scan (push) Has been cancelled
CVE Scan & Docker Build / build-and-push (push) Has been cancelled
Build & Deploy Docs / build-and-deploy (push) Has been cancelled
Reviewed-on: #1
2026-06-18 02:00:55 +00:00
dd06f923cd feat(workspaces): return 409 name_conflict instead of 500 on Library name clash
Some checks failed
CVE Scan & Docker Build / security-scan (pull_request) Successful in 3m49s
CVE Scan & Docker Build / build-and-push (pull_request) Has been cancelled
A recreate of a workspace whose Mnemosyne Library was orphaned (left behind
by a failed Daedalus delete-propagate) collides on the global Library.name
unique constraint. neomodel raised UniqueProperty unguarded, so workspace_create
500'd and ingest then 404'd forever — the queue froze silently.

Guard lib.save() and return a structured 409 with a machine code so Daedalus
can classify the failure without string-matching:
- name_conflict   — the new name-collision case
- owner_conflict, library_type_immutable — codes added to the two existing 409s

Cypher-touching paths stay covered by the manual end-to-end plan, per the
test module's stated convention.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 20:26:43 -04:00
539d9b6c34 fix(tests): repair stale mock.patch targets after service refactors
All checks were successful
CVE Scan & Docker Build / security-scan (pull_request) Successful in 5m24s
CVE Scan & Docker Build / build-and-push (pull_request) Successful in 2m58s
Several library tests patched symbols at import paths that no longer
expose them, so they errored (AttributeError) instead of testing anything
— giving false confidence. The underlying code is correct; only the test
patch targets were stale after earlier refactors moved imports
function-local.

- test_pipeline: patch source modules (library.models.Item,
  llm_manager.models.LLMModel, library.services.parsers.DocumentParser,
  .chunker.ContentTypeChunker, .embedding_client.EmbeddingClient,
  .vision.VisionAnalyzer, .concepts.ConceptExtractor) since pipeline.py
  imports them inside methods. default_storage stays (still module-level).
- test_search_api: patch library.services.search.SearchService (the view
  imports it function-local).
- test_tasks: patch library.services.pipeline.EmbeddingPipeline (tasks.py
  imports it function-local).
- test_search_views_admin_scope: patch library.utils.neo4j_available; the
  guard moved to utils when views._all_library_uids became a thin alias.
- test_concepts: remove SampleIndexSelectionTests — _select_sample_indices
  was deleted in the document-level concept-extraction refactor (dead test).

Not addressed here: SearchAPIAuthTest / SearchAPIValidationTest return 302
instead of 401/400. Static analysis ruled out routing, middleware, and DRF
config; reproducing needs a running server (DB-backed). Flagged for sandbox
diagnosis — not a stale-patch issue.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 20:12:46 -04:00
142e9675b5 feat(library): allow admin delete of Daedalus-managed library via shared cascade
Admin/HTML library delete previously hard-blocked workspace-scoped
(Daedalus-managed) libraries, leaving no way to clear an orphaned Library
node — e.g. one left behind when a Daedalus workspace delete failed to
propagate. A recreate of that workspace then collides on the global
Library.name unique constraint and 500s, freezing ingest.

Allow the delete behind the existing confirm warning (low risk: source
content lives in Daedalus and is recreated + re-embedded on next sync),
and route both the API and HTML delete paths through one shared cascade.

- Add library/services/library_delete.delete_library_cascade(lib), keyed on
  Library uid so it covers global and workspace-scoped libraries. It removes
  Chunks, Images/ImageEmbeddings, Items, Collections, the Library, then GCs
  orphan-only Concepts (verbatim from the API view, re-keyed workspace_id->uid).
- workspace_detail_or_delete (API) now calls the shared helper.
- library_delete (HTML) no longer blocks workspace_id libraries; it calls the
  cascade instead of a bare lib.delete() (which leaked child nodes — also a
  latent bug for global libraries with content).
- Confirm-delete template shows a caution banner for Daedalus-managed libraries.

No migration: Mnemosyne library data is in Neo4j (neomodel); no schema change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 19:37:58 -04:00
a90c6e7479 feat(metrics): add scrape-time system model health collector
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 3m49s
Build & Deploy Docs / build-and-deploy (push) Successful in 1m9s
CVE Scan & Docker Build / build-and-push (push) Successful in 3m32s
Add a Prometheus custom collector that probes the four system-default
models (chat, vision, embedding, reranker) at /metrics scrape time and
emits up/down, configured, and probe-latency gauges. This complements
the ingest-pipeline counters in the Celery worker, which only move
during active ingests and cannot signal model outages on an idle queue.

- New `library/health_collector.py` registers a custom collector with
  a 55s in-process cache to avoid hammering GPU endpoints on rapid
  scrapes or across multiple gunicorn workers.
- New `library/services/model_health.py` centralises the probe logic,
  resolving system-default models via SystemSettings and dispatching
  to chat/embedding/rerank endpoints with a short timeout.
- Register the collector only in the web process (gunicorn/runserver)
  via `LibraryConfig.ready`, excluding Celery, pytest, and management
  commands to prevent duplicate registration and stray probes.
- Add unit tests covering the collector cache, metric shape, and
  per-role probe dispatch.
2026-06-17 09:06:11 -04:00
4dde063299 fix(web): trust XFF for real client IP and correct port to 23081
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 3m41s
Build & Deploy Docs / build-and-deploy (push) Successful in 1m9s
CVE Scan & Docker Build / build-and-push (push) Successful in 3m29s
- Configure nginx `set_real_ip_from` for RFC1918 ranges and enable
  `real_ip_recursive` so allowlists evaluate the true client IP
  instead of Docker's NAT gateway, preventing public exposure of
  `/metrics` and `/nginx_status`
- Update published port from 23181 to 23081 in docker-compose
2026-06-17 06:58:36 -04:00
ec4f12d601 feat(ingest): source-bucket registry keyed on ingest source
Generalises the Daedalus-only cross-bucket fetch into a registry
(SOURCE_S3_BUCKETS) keyed on the IngestJob `source` field, so new
upstream sources (Spelunker) can ingest from their own buckets. The
ingest task now calls fetch_from_source(job.source, job.s3_key) and
falls back to "daedalus" for blank/unknown sources (backwards compatible).

Adds SPELUNKER_S3_* env vars and worker env scoping. Replaces
daedalus_s3.py with source_s3.py.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 22:30:08 -04:00
75013ebfc3 refactor(concepts): document-level extraction with one chat call per item
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 3m20s
Build & Deploy Docs / build-and-deploy (push) Successful in 1m8s
CVE Scan & Docker Build / build-and-push (push) Successful in 2m49s
Concept extraction was making up to 10 LLM calls per item by sampling
chunks, which produced redundant work (the same concept reappears in
multiple chunks), context-loss bugs (chunk boundaries cut mid-thought),
and on a 35B model dominated per-item wall time (~3 min/item).

Concepts are document-level semantic objects; chunks are retrieval
units. Extract once per item from the first 100KB of parsed document
text, then connect each chunk to the concepts it explicitly mentions
via case-insensitive substring match — no extra LLM calls. Drops the
sample-indices selector that the old per-chunk loop relied on.

Stage 7 is currently dormant in production because the configured
chat model is a reasoning-mode Qwen variant that returns empty content
on every call (output stuck in reasoning_content). Re-enables cleanly
once a non-reasoning instruct model is set as is_system_chat_model.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 21:52:51 -04:00
bc80d90b38 fix(llm_manager): fail Test & Discover when openai base_url is missing /v1
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 3m20s
Build & Deploy Docs / build-and-deploy (push) Successful in 1m7s
CVE Scan & Docker Build / build-and-push (push) Successful in 2m35s
The OpenAI SDK used by _discover_openai_models tolerates a base_url
without /v1 (it auto-adds it for the probe), but every runtime client
(embedding_client, vision, concepts, reranker) treats base_url as the
/v1 root and appends path-only segments. A non-conforming base_url
silently passed Test & Discover and then 404'd at embed/chat/rerank
time.

Add _check_openai_v1_convention() which probes {base_url}/v1/models
when the URL doesn't end in /v1; on 200, fail the test with an
explicit "set base_url to .../v1 and re-test" message that points at
the exact bare-vs-/v1 mismatch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 21:21:26 -04:00
7d95133c74 chore(docker): close neomodel driver on gunicorn worker exit
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 3m9s
CVE Scan & Docker Build / build-and-push (push) Successful in 2m38s
2026-05-23 19:51:25 -04:00
93639188d3 feat: rework auth model with UserToken and Daedalus/Pallas integration
Some checks failed
CVE Scan & Docker Build / build-and-push (push) Has been cancelled
CVE Scan & Docker Build / security-scan (push) Has been cancelled
Build & Deploy Docs / build-and-deploy (push) Successful in 1m10s
- Rename MCPToken to UserToken across models, views, and tests
- Update URL names from mcp-token-* to token-*
- Add Daedalus/Pallas integration design doc (v2)
- Switch docker-compose to build local mnemosyne:local image via shared
  build config instead of pulling from git.helu.ca
2026-05-23 19:50:29 -04:00
735eb9de1a Reset Migrations
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 3m8s
Build & Deploy Docs / build-and-deploy (push) Successful in 1m12s
CVE Scan & Docker Build / build-and-push (push) Successful in 2m24s
2026-05-23 07:14:23 -04:00
5bf9fa89cf feat: add nginx-prometheus-exporter sidecar for web metrics
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 3m13s
CVE Scan & Docker Build / build-and-push (push) Successful in 47s
2026-05-23 07:05:18 -04:00
8b2dcf01c1 ci(docs): rename deploy secrets/vars to CLIO_* naming
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 3m6s
Build & Deploy Docs / build-and-deploy (push) Successful in 1m13s
CVE Scan & Docker Build / build-and-push (push) Successful in 48s
2026-05-23 06:28:11 -04:00
f8a2cf0c3d docs: add Sphinx documentation build and deploy workflow
Some checks failed
CVE Scan & Docker Build / security-scan (push) Successful in 3m12s
CVE Scan & Docker Build / build-and-push (push) Successful in 2m38s
Build & Deploy Docs / build-and-deploy (push) Failing after 1m31s
- Add Gitea Actions workflow to build and deploy docs on push to main
- Generate Sphinx reference documentation for all apps and modules
- Deploy versioned and latest docs via rsync over SSH
2026-05-23 06:11:05 -04:00
50dffe688b feat(library): register IngestJob admin and link Neo4j views
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 52s
CVE Scan & Docker Build / build-and-push (push) Successful in 2m24s
- Add read-only ModelAdmin for IngestJob with filters, search, and
  date hierarchy for operational visibility
- Inject proxy entries into the admin index for Neo4j-backed entities
  (Libraries, Concepts, Search, Embedding pipeline) that link to
  existing CRUD views in library/views.py
- Makes library content discoverable from /admin/ without pretending
  neomodel StructuredNodes are Django ORM models
2026-05-22 23:54:10 -04:00
409da7d109 docs: replace daedalus-service basic auth with per-user DRF tokens
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 56s
CVE Scan & Docker Build / build-and-push (push) Successful in 3m30s
2026-05-22 22:59:59 -04:00
7296b8c42f CLAUDE.md added 2026-05-22 21:17:01 -04:00
55551fe9af Docs: Mnemosyne MCP
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 50s
CVE Scan & Docker Build / build-and-push (push) Successful in 2m39s
2026-05-21 05:55:45 -04:00
e1545139ab Bug: Another attempt at fixing static.
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 1m11s
CVE Scan & Docker Build / build-and-push (push) Successful in 1m23s
2026-05-17 15:47:21 -04:00
9f6176c478 feat(models): increase max_length for source and file_type fields
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 1m0s
CVE Scan & Docker Build / build-and-push (push) Successful in 3m4s
Increase max_length for source and file_type fields in IngestJob model from 50 to 100.
This prevents data truncation for longer source references or file type strings.
2026-05-16 19:25:12 -04:00
f88ec30110 feat: enable environment variable overrides for static and media roots
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 50s
CVE Scan & Docker Build / build-and-push (push) Successful in 2m25s
Update STATIC_ROOT and MEDIA_ROOT in settings.py to read from
environment variables with default fallbacks to BASE_DIR paths.
This allows flexible deployment configurations without modifying
source code for different environments.
2026-05-16 19:12:20 -04:00
4fb3676204 chore(docker): migrate static and media to managed, update comments
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 1m11s
CVE Scan & Docker Build / build-and-push (push) Successful in 48s
The static volume is now Docker-managed, removing the need for Ansible to create the host path. Media volume comments updated to reflect S3 storage usage (USE_LOCAL_STORAGE=False) and that the volume is effectively unused in production.
2026-05-16 19:00:16 -04:00
2a45cb2622 chore: add /mcp/health filter and configure uvicorn.access logging
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 53s
CVE Scan & Docker Build / build-and-push (push) Successful in 2m29s
Add /mcp/health to suppress paths in log_filters.py to demote health
probe logs to DEBUG level. Configure uvicorn.access logger in settings.py
to manage access logs directly instead of relying on mcp_server internal
filters. Update comments to reflect that uvicorn access is now managed
in project settings.
2026-05-16 18:19:58 -04:00
9629ca595d refactor(startup): move startup probe to gunicorn worker init
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 51s
CVE Scan & Docker Build / build-and-push (push) Successful in 2m48s
Move probe execution from Django app ready() to gunicorn.conf.py
Remove threading implementation to simplify startup sequence
Ensure probe runs in worker process context with proper error handling
2026-05-15 10:50:35 -04:00
a3d017a70d refactor: move startup probe to daemon thread with 10s timeout
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 1m1s
CVE Scan & Docker Build / build-and-push (push) Successful in 3m15s
Move the _run_startup_probe logic into a separate daemon thread
within LibraryConfig.ready. This prevents indefinite blocking on
startup while maintaining a 10-second wait for the probe result.
2026-05-15 10:05:09 -04:00
ba3ab3d855 refactor(docker): consolidate static file init service and update ports
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 50s
CVE Scan & Docker Build / build-and-push (push) Successful in 1m1s
Remove dedicated static-init service and run collectstatic in the init sidecar instead.
Static files baked into the image are copied to /mnt/static for nginx serving on each
deployment. Also update MCP and nginx ports and refresh external service hostnames
in comments.
2026-05-14 06:31:34 -04:00
ef733cb7bf SSO Pattern update
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 51s
CVE Scan & Docker Build / build-and-push (push) Successful in 46s
2026-05-13 06:31:00 -04:00
88afd5d307 docs(auth): add SSO signup template docs and update allauth imports 2026-05-13 06:30:59 -04:00
196 changed files with 4945 additions and 1351 deletions

View File

@@ -78,6 +78,19 @@ DAEDALUS_S3_REGION_NAME=us-east-1
DAEDALUS_S3_USE_SSL=True DAEDALUS_S3_USE_SSL=True
DAEDALUS_S3_VERIFY=True DAEDALUS_S3_VERIFY=True
# --- Spelunker S3 (cross-bucket reads for ingest, source="spelunker") ---
# Consumed by: worker only
# Spelunker scrapes web/git documents into its own bucket and posts ingest
# requests with source="spelunker". These creds should be scoped read-only
# to the Spelunker bucket in your secret manager.
SPELUNKER_S3_ENDPOINT_URL=https://nyx.helu.ca:8555
SPELUNKER_S3_ACCESS_KEY_ID=
SPELUNKER_S3_SECRET_ACCESS_KEY=
SPELUNKER_S3_BUCKET_NAME=spelunker
SPELUNKER_S3_REGION_NAME=us-east-1
SPELUNKER_S3_USE_SSL=True
SPELUNKER_S3_VERIFY=True
# --- Celery / RabbitMQ (Oberon) --------------------------------------------- # --- Celery / RabbitMQ (Oberon) ---------------------------------------------
# Consumed by: app (producer), worker (consumer). NOT mcp. # Consumed by: app (producer), worker (consumer). NOT mcp.
# Remember to percent-encode any password characters that have meaning in a # Remember to percent-encode any password characters that have meaning in a

View File

@@ -3,8 +3,6 @@ name: CVE Scan & Docker Build
on: on:
push: push:
branches: [main] branches: [main]
pull_request:
branches: [main]
env: env:
REGISTRY: git.helu.ca REGISTRY: git.helu.ca

102
.gitea/workflows/docs.yml Normal file
View File

@@ -0,0 +1,102 @@
name: Build & Deploy Docs
on:
push:
branches: [main]
paths:
- 'mnemosyne/**'
- 'docs/**'
- 'pyproject.toml'
- '.gitea/workflows/docs.yml'
jobs:
build-and-deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.12'
- name: Install package + docs deps
run: |
pip install --upgrade pip
pip install -e ".[docs]"
- name: Read version from pyproject.toml
id: version
run: |
VERSION=$(python -c "import tomllib; print(tomllib.load(open('pyproject.toml','rb'))['project']['version'])")
echo "version=$VERSION" >> "$GITHUB_OUTPUT"
# ─── Failure-debug trio (REQUIRED) ─────────────────────────────────
- name: Build HTML
id: build_html
run: |
cd docs
./regenerate_docs.sh
continue-on-error: true
- name: Print Sphinx error log on failure
if: steps.build_html.outcome == 'failure'
run: |
echo "=== Sphinx error log ==="
cat /tmp/sphinx-err-*.log 2>/dev/null || echo "(no sphinx error log found)"
- name: Fail if build failed
if: steps.build_html.outcome == 'failure'
run: exit 1
# ───────────────────────────────────────────────────────────────────
- name: Install rsync + openssh
run: |
apt-get update
apt-get install -y --no-install-recommends rsync openssh-client
- name: Configure SSH
run: |
mkdir -p ~/.ssh
printf '%s\n' "${{ secrets.CLIO_DOCS_KEY }}" > ~/.ssh/id_ed25519
chmod 600 ~/.ssh/id_ed25519
ssh-keyscan -p ${{ vars.CLIO_PORT }} ${{ vars.CLIO_HOST }} >> ~/.ssh/known_hosts
- name: Test SSH connectivity
run: |
ssh -o BatchMode=yes -o ConnectTimeout=10 \
-p ${{ vars.CLIO_PORT }} -i ~/.ssh/id_ed25519 \
git@${{ vars.CLIO_HOST }} "id && echo 'SSH OK'"
- name: Rsync to versioned path
run: |
rsync -av --delete \
-e "ssh -p ${{ vars.CLIO_PORT }} -i ~/.ssh/id_ed25519" \
docs/_build/html/ \
git@${{ vars.CLIO_HOST }}:/var/www/docs/mnemosyne/${{ steps.version.outputs.version }}/
- name: Rsync to latest
run: |
rsync -av --delete \
-e "ssh -p ${{ vars.CLIO_PORT }} -i ~/.ssh/id_ed25519" \
docs/_build/html/ \
git@${{ vars.CLIO_HOST }}:/var/www/docs/mnemosyne/latest/
- name: Regenerate versions index
run: |
ssh -p ${{ vars.CLIO_PORT }} -i ~/.ssh/id_ed25519 git@${{ vars.CLIO_HOST }} \
'python3 - <<PY
import pathlib
root = pathlib.Path("/var/www/docs/mnemosyne")
versions = sorted(
(p.name for p in root.iterdir() if p.is_dir()),
reverse=True,
)
html = ["<!DOCTYPE html><html><head><title>Mnemosyne Docs</title></head><body>",
"<h1>Mnemosyne Documentation</h1><ul>"]
for v in versions:
html.append(f"<li><a href=\"{v}/\">{v}</a></li>")
html.append("</ul></body></html>")
(root / "index.html").write_text("\n".join(html))
PY'

188
CLAUDE.md Normal file
View File

@@ -0,0 +1,188 @@
## 🐾 Red Panda Approval™
The standard every change is judged against. Don't satisfy a checklist —
satisfy the red pandas. Ask of each change: *does this earn approval?*
1. **Fresh Migration Test** — migrations apply cleanly from an empty database.
2. **Elegant Simplicity** — no unnecessary complexity; the obvious solution, done well.
3. **Observable & Debuggable** — proper logging; failures say what broke and why.
4. **Consistent Patterns** — follows Django conventions and the patterns already in this repo.
5. **Actually Works** — passes all checks *and* serves a real user need.
Criteria 1 and 5 are **externally verifiable** — migrations apply or they
don't; checks pass or they don't. Verify them, don't assert them. Criteria
24 are judgement calls: when in doubt, match what the repo already does
rather than grading your own elegance.
> If a paw print isn't leading the response, the rest of this file probably
> isn't being honoured either. Lead with one. 🐾
---
## Conventions (always-on)
These are the rubric made concrete for the common case — writing models,
views, forms, templates, and queries.
### Models
- Names: singular PascalCase (`User`, `BlogPost`, `OrderItem`).
- Every model defines `__str__` and `get_absolute_url`.
- Every model has `created_at = DateTimeField(auto_now_add=True)` and
`updated_at = DateTimeField(auto_now=True)`.
- `TextChoices` for status fields.
- `related_name` on every `ForeignKey`; plural snake_case with correct
English pluralisation.
- Public-facing models: consider `UUIDField` primary key and
`is_active` for soft deletes.
### Field naming
- Foreign keys: singular, no `_id` suffix (`author`, `category`, `parent`).
- Booleans: prefixed (`is_active`, `has_permission`, `can_edit`).
- Dates: suffixed (`created_at`, `updated_at`, `published_on`).
- No abbreviations (`description`, not `desc`).
### Views
- **Function-based views exclusively.** Explicit logic over implicit
inheritance. Extract shared logic into utility functions.
- Business logic lives in service functions, not views and not `save()`.
### Forms
- `ModelForm` with an explicit `fields` list — never `__all__`, never `exclude`.
- Validate at the boundary; never trust client-side validation alone.
### Queries
- `select_related()` for FKs; `prefetch_related()` for reverse and M2M.
- No queries inside loops (N+1). No `.all()` when you need a subset.
- `.only()` / `.defer()` for large models. Comment non-obvious querysets.
### URLs & identifiers
- Public URLs use 12-char short UUIDs via `shortuuid`. Never expose
sequential IDs (enumeration risk). Internal refs may use PKs.
- Resource-based, namespaced URL names per app, trailing slashes, flat
structure preferred.
### Docstrings
- **Google style.** Document public classes, functions, methods, modules.
- Imperative one-line summary. `Args:`/`Returns:`/`Raises:` only when the
signature doesn't already convey it. Don't restate type hints in prose.
- Skip obvious one-liners and standard Django overrides.
### Code organisation
- PEP 8 import ordering (stdlib, third-party, local). Type hints on params.
- CSS and JS in external files only — no inline styles, `<style>`,
inline handlers, or `<script>` blocks.
- File length: split by domain concept past ~500 lines; hard ceiling 1000.
### Testing
- Django `TestCase` (not pytest). Separate files per module:
`test_models.py`, `test_views.py`, `test_forms.py`.
An app isn't done until it's reachable
django-admin startapp builds an island. A complete-from-its-own-boundary
app — models, views, urls, templates, tests all present and passing — is
# Add to always-on Django CLAUDE.md — Conventions section
Insert this block under "Conventions (always-on)", as its own subsection.
It is the universal Django definition-of-done. It fires for *every* app,
not just registered tools.
### An app isn't done until it's reachable
`django-admin startapp` builds an **island**. A complete-from-its-own-boundary
app — models, views, urls, templates, tests all present and passing — is
still *unfinished* if nothing in the running site links to it. "It works in
isolation" is not done; **"a user can reach it from the running site" is done.**
Before reporting a new app complete, wire it into the site:
1. **`INSTALLED_APPS`** — add the app's config.
2. **Root URLconf**`include()` the app's `urls.py` in `config/urls.py`.
An app whose URLconf isn't included has unreachable views, full stop.
3. **Navigation / discovery** — register the app so it surfaces wherever
this project expects apps to appear. This project uses an **app
registry** (see Project Setup): the app registers itself in its own
`apps.py.ready()` and the navigation template tag picks it up. Do **not**
hand-edit nav templates or central list views — they read from the
registry.
4. **Verify reachability** — confirm the app's main page actually loads
from the running site (not just that its tests pass). Per Red Panda
criterion 5, this is externally verifiable: load the page, don't assert
it works.
Why this rule exists: an LLM reasons locally and closes the visible task at
the app's own boundary. The wiring that makes an app reachable lives in
*other* files (`config/urls.py`, `INSTALLED_APPS`, the registry) with no
signal inside the new app pointing to them. Without this rule, the
near-certain result is a fully-built, completely inaccessible app. The
registry exists precisely so that "surface it" happens *inside* the app's
own boundary (a `register()` call in `ready()`) — collapsing the wiring
into the one place local reasoning will actually look.
> The same principle generalises beyond Django: a new route that isn't
> mounted, a CLI subcommand not added to the dispatcher, a handler not
> registered — all the same failure. Done means *connected*, not *written*.
---
## Always-on anti-patterns
The cross-cutting tripwires worth carrying everywhere. File-specific
landmines (nginx, compose, broker) are in path-scoped rules.
- **Models:** no `.get()` without handling `DoesNotExist`; no `null=True`
on `CharField`/`TextField` (use `blank=True, default=""`); always specify
`on_delete`; don't override `save()` for business logic; no
`Meta.ordering` on large tables.
- **Security:** secrets via env vars, never in `settings.py`; never commit
`.env`; never `DEBUG=True` in production; never `mark_safe()` on
user-supplied content; never disable CSRF.
- **Templates:** `{% url %}` not `{{ variable }}` for URLs; no logic in
templates; `{% csrf_token %}` in every form.
- **Imports/style:** no `import *`; no mutable default args; no bare
`except:`; don't silence linter warnings without a documented reason.
---
## Environment
- Virtual environment: `~/env/PROJECT/bin/activate` (replace PROJECT).
- `pyproject.toml` for config — no `setup.py`, no `requirements.txt`.
- Dependencies floor-pinned with ceiling (`Django>=5.2,<6.0`). Exact `==`
pins only in application lock files, never in reusable packages.
- Dev DB: SQLite. Production DB: PostgreSQL.
---
## Path-scoped rules to create (`.claude/rules/`)
These hold the landmines extracted from the standards doc. Each loads only
when its `paths` match, keeping this file lean. Frontmatter shown.
- **`nginx.md`** — `paths: ["nginx/**", "**/*.conf"]` — reverse-proxy
reference config: Docker DNS resolver + variable `proxy_pass`,
`$proxy_x_forwarded_proto` map, access-log filtering, RFC1918 allowlists
(all four ranges), `always` security headers.
- **`docker-compose.md`** — `paths: ["docker-compose*.y*ml", ".env*"]`
per-service `environment:` scoping (no shared `env_file:`), `${VAR}`
interpolation, `.env.example` annotation convention, the `repr()` parse
diagnostic.
- **`celery-tasks.md`** — `paths: ["**/tasks.py"]` — idempotency, retry
logic, pass IDs not instances, synchronous-by-default, broker URL
percent-encoding, progress pattern `{app}:task:{task_id}:progress`.
- **`migrations.md`** — `paths: ["**/migrations/**"]` — never edit deployed
migrations; `RunPython` needs a reverse; no non-nullable field without a
default; meaningful `--name`; test forward and backward.
- **`memcached.md`** — `paths: ["**/settings.py", ".env*"]` — bind
`0.0.0.0` not localhost; container can't reach `127.0.0.1`; LAN hostname
in `KVDB_LOCATION`; key pattern `{app}:{model}:{identifier}:{field}`.
- **`frontend.md`** — `paths: ["**/templates/**", "**/static/**"]` — DaisyUI+
Tailwind for new projects / Bootstrap 5 for existing; extend
`themis/base.html`; no inline styles or scripts.
## Reference docs (consult on demand, don't inline)
- `docs/` gotcha writeups: broker-URL/Kombu parsing, env-file parsing
differences, nginx IP-caching. State the rule in the rule file; link the
*why* here.
- Preferred-packages list and per-app architecture: keep in `docs/`, not in
this always-on file.

View File

@@ -85,8 +85,8 @@ RUN chmod +x /usr/local/bin/entrypoint.sh
# matches the convention for a single-application container. # matches the convention for a single-application container.
RUN groupadd --gid 1000 mnemosyne \ RUN groupadd --gid 1000 mnemosyne \
&& useradd --uid 1000 --gid mnemosyne --home /app --no-create-home --shell /sbin/nologin mnemosyne \ && useradd --uid 1000 --gid mnemosyne --home /app --no-create-home --shell /sbin/nologin mnemosyne \
&& mkdir -p /app/media /app/logs \ && mkdir -p /app/media /app/logs /mnt/static /mnt/media \
&& chown -R mnemosyne:mnemosyne /app && chown -R mnemosyne:mnemosyne /app /mnt/static /mnt/media
USER mnemosyne USER mnemosyne
# The compose file overrides this per service. Default = Django web. # The compose file overrides this per service. Default = Django web.

View File

@@ -118,7 +118,7 @@ The MCP server exposes the LLM-facing tools (`search`, `get_chunk`, `list_librar
cd mnemosyne/ cd mnemosyne/
# Single command: ASGI server hosting the FastMCP app # Single command: ASGI server hosting the FastMCP app
uvicorn mnemosyne.asgi:app --host 0.0.0.0 --port 22091 --workers 1 uvicorn mnemosyne.asgi:app --host 0.0.0.0 --port 231s91 --workers 1
``` ```
The `mcp_server/asgi.py` mounts FastMCP at `/mcp` (Streamable HTTP) and `/mcp/sse` (SSE), with a `/mcp/health` JSON probe for HAProxy/Pallas. The `mcp_server/asgi.py` mounts FastMCP at `/mcp` (Streamable HTTP) and `/mcp/sse` (SSE), with a `/mcp/health` JSON probe for HAProxy/Pallas.

View File

@@ -1,32 +1,40 @@
# ============================================================================= # =============================================================================
# Mnemosyne — production deployment # Mnemosyne — production deployment
# ============================================================================= # =============================================================================
# Four services, all from the same image: # Five services:
# init — one-shot sidecar: migrate + collectstatic + load_library_types
# app — Django REST API + admin (gunicorn, port 8000) # app — Django REST API + admin (gunicorn, port 8000)
# mcp — FastMCP server (uvicorn, port 22091) # mcp — FastMCP server (uvicorn, port 8001)
# worker — Celery worker (embedding/ingest/batch queues) # worker — Celery worker (embedding/ingest/batch queues)
# web — reverse proxy, public port 23090 (nginx) # web — reverse proxy, public port 23081 (nginx)
# #
# External services (NOT spun up here): Postgres on Portia, Neo4j on Umbriel, # External services (NOT spun up here): Postgres on Despina, Neo4j on Naiad,
# RabbitMQ on Oberon, S3/MinIO on Nyx, Memcached on its own host, embedder # RabbitMQ on Thalassa, S3/MinIO on Perseus, Memcached on host. All reached
# and reranker on Nyx, smtp4dev on Oberon. All reached over the internal # over the internal network.
# 10.10.0.0/24 network.
# #
# Environment scoping # Environment scoping
# ------------------- # -------------------
# Every service lists ONLY the environment variables it actually needs, with # Every service lists ONLY the environment variables it actually needs, with
# values interpolated from the shell (typically `.env` at the project root, # values interpolated from the shell (the .env at the project root is
# which an Ansible role generates from a j2 template + vault secrets). No # generated by Ansible from a j2 template + vault secrets). No `env_file:`
# `env_file:` sharing — a compromised MCP container should not see the Celery # sharing — a compromised MCP container should not see the Celery broker
# broker creds or the LLM API encryption key, and the Celery worker has no # creds or the LLM API encryption key, and the Celery worker has no business
# business knowing `ALLOWED_HOSTS`. If you add a new Django setting, decide # knowing `ALLOWED_HOSTS`. If you add a new Django setting, decide which
# which services need it and add it only to those `environment:` blocks. # services need it and add it only to those `environment:` blocks.
#
# Static files
# ------------
# collectstatic is run by the `init` sidecar on every `up`. Static files are
# baked into the image at build time (/app/staticfiles by collectstatic in
# the Dockerfile builder stage), then copied to STATIC_ROOT (/mnt/static) by
# the init sidecar. nginx serves them directly from that bind-mounted path.
# --clear removes stale files from the previous deploy on each run.
# #
# Run: # Run:
# docker compose up -d # docker compose up -d
# #
# The `init` sidecar (below) runs Postgres migrations and library-type # The `init` sidecar runs migrate + collectstatic + load_library_types on
# seeding on every `up`. Long-running services wait for it via # every `up`. Long-running services wait for it via
# `depends_on: init: service_completed_successfully` — so a failure there # `depends_on: init: service_completed_successfully` — so a failure there
# (unreachable DB, broken migration) blocks the stack. # (unreachable DB, broken migration) blocks the stack.
# #
@@ -36,7 +44,7 @@
# in /admin/, pick an embedding API + model, and set its vector_dimensions # in /admin/, pick an embedding API + model, and set its vector_dimensions
# value. Bootstrap order is therefore: # value. Bootstrap order is therefore:
# #
# 1. docker compose up # init sidecar: migrate + load_library_types # 1. docker compose up # init sidecar: migrate + collectstatic + load_library_types
# 2. browse to /admin/ → llm_manager → configure system embedding model # 2. browse to /admin/ → llm_manager → configure system embedding model
# 3. docker compose exec app python manage.py setup_neo4j_indexes # 3. docker compose exec app python manage.py setup_neo4j_indexes
# #
@@ -61,36 +69,37 @@ x-logging: &default-logging
max-size: "10m" max-size: "10m"
max-file: "5" max-file: "5"
# -----------------------------------------------------------------------------
# Shared build config — build the Mnemosyne image locally from ./Dockerfile
# instead of pulling from git.helu.ca. All four Mnemosyne services
# (init/app/mcp/worker) share `image: mnemosyne:local`, so Compose builds
# once and reuses the resulting image across them.
# -----------------------------------------------------------------------------
x-mnemosyne-build: &mnemosyne-build
context: .
dockerfile: Dockerfile
services: services:
# ── Static-file seeder: copies /app/staticfiles into the shared volume on # ── Init sidecar: one-shot Postgres migrate + collectstatic + library-type seed. Runs on
# every `up`. Runs once and exits. Without this, the named volume is only
# seeded the first time it's empty, so static updates between deploys
# would not propagate to nginx.
static-init:
image: git.helu.ca/r/mnemosyne:latest
command: ["sh", "-c", "cp -a /app/staticfiles/. /shared-static/"]
user: "0:0"
volumes:
- mnemosyne-static:/shared-static
restart: "no"
logging: *default-logging
# ── Init sidecar: one-shot Postgres migrate + library-type seed. Runs on
# every `up` and exits. Long-running services below depend on # every `up` and exits. Long-running services below depend on
# `service_completed_successfully`, so a failure here (unreachable DB, # `service_completed_successfully`, so a failure here (unreachable DB,
# broken migration) blocks `app`/`mcp`/`worker` from starting. Both # broken migration) blocks `app`/`mcp`/`worker` from starting. All
# commands are idempotent. # commands are idempotent.
# #
# collectstatic copies static files baked into the image (/app/staticfiles)
# into STATIC_ROOT (/mnt/static) so nginx can serve them. --clear removes
# stale files from the previous deploy on each run.
#
# Neo4j vector-index setup is NOT run here — see the header comment for # Neo4j vector-index setup is NOT run here — see the header comment for
# the operator bootstrap flow. Only library_type seeding touches Neo4j # the operator bootstrap flow. Only library_type seeding touches Neo4j
# from this sidecar, and it does not depend on any embedding model. # from this sidecar, and it does not depend on any embedding model.
# #
# This sidecar only needs Postgres, Neo4j, and logging env — no S3, no # This sidecar only needs Postgres, Neo4j, static files, and logging env —
# Celery, no LLM encryption key. Keep it that way. # no S3, no Celery, no LLM encryption key. Keep it that way.
init: init:
image: git.helu.ca/r/mnemosyne:latest image: mnemosyne:local
pull_policy: always build: *mnemosyne-build
command: ["init"] command: ["init"]
environment: environment:
# Django core (settings import) # Django core (settings import)
@@ -107,15 +116,16 @@ services:
- DB_PORT=${DB_PORT} - DB_PORT=${DB_PORT}
# Neo4j (load_library_types writes Library defaults into the graph) # Neo4j (load_library_types writes Library defaults into the graph)
- NEOMODEL_NEO4J_BOLT_URL=${NEOMODEL_NEO4J_BOLT_URL} - NEOMODEL_NEO4J_BOLT_URL=${NEOMODEL_NEO4J_BOLT_URL}
# Logging (MNEMOSYNE_COMPONENT is injected by settings.py into every # Static files (collectstatic destination)
# log line as a static JSON field; Alloy on puck reads the compose - STATIC_ROOT=/mnt/static
# service name directly off the Docker label and uses that as the - USE_LOCAL_STORAGE=True
# Loki `component` label, but we still set it here so operators # Logging
# tail-ing ``docker logs`` see the same attribution)
- MNEMOSYNE_COMPONENT=init - MNEMOSYNE_COMPONENT=init
- LOGGING_LEVEL=${LOGGING_LEVEL} - LOGGING_LEVEL=${LOGGING_LEVEL}
- DJANGO_LOGGING_LEVEL=${DJANGO_LOGGING_LEVEL} - DJANGO_LOGGING_LEVEL=${DJANGO_LOGGING_LEVEL}
restart: "no" restart: "no"
volumes:
- static:/mnt/static
logging: *default-logging logging: *default-logging
@@ -124,8 +134,8 @@ services:
# Celery tasks (hence CELERY_BROKER_URL is required here too — Django is # Celery tasks (hence CELERY_BROKER_URL is required here too — Django is
# the producer, the worker is the consumer). # the producer, the worker is the consumer).
app: app:
image: git.helu.ca/r/mnemosyne:latest image: mnemosyne:local
pull_policy: always build: *mnemosyne-build
command: ["web"] command: ["web"]
environment: environment:
# Django core # Django core
@@ -136,6 +146,8 @@ services:
- CSRF_TRUSTED_ORIGINS=${CSRF_TRUSTED_ORIGINS} - CSRF_TRUSTED_ORIGINS=${CSRF_TRUSTED_ORIGINS}
- TIME_ZONE=${TIME_ZONE} - TIME_ZONE=${TIME_ZONE}
- LANGUAGE_CODE=${LANGUAGE_CODE} - LANGUAGE_CODE=${LANGUAGE_CODE}
- STATIC_ROOT=/mnt/static
- MEDIA_ROOT=/mnt/media
# Postgres (Django ORM) # Postgres (Django ORM)
- APP_DB_NAME=${APP_DB_NAME} - APP_DB_NAME=${APP_DB_NAME}
- APP_DB_USER=${APP_DB_USER} - APP_DB_USER=${APP_DB_USER}
@@ -191,12 +203,11 @@ services:
restart: unless-stopped restart: unless-stopped
logging: *default-logging logging: *default-logging
depends_on: depends_on:
static-init:
condition: service_completed_successfully
init: init:
condition: service_completed_successfully condition: service_completed_successfully
volumes: volumes:
- mnemosyne-media:/app/media - static:/mnt/static
- media:/mnt/media
healthcheck: healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/ready/"] test: ["CMD", "curl", "-f", "http://localhost:8000/ready/"]
interval: 30s interval: 30s
@@ -219,8 +230,8 @@ services:
# the S3 key here only matters if someone exploits a write path in the # the S3 key here only matters if someone exploits a write path in the
# future — keep the credential scoped to read-only in your secret manager. # future — keep the credential scoped to read-only in your secret manager.
mcp: mcp:
image: git.helu.ca/r/mnemosyne:latest image: mnemosyne:local
pull_policy: always build: *mnemosyne-build
command: ["mcp"] command: ["mcp"]
environment: environment:
# Django core (ASGI still imports settings) # Django core (ASGI still imports settings)
@@ -230,6 +241,8 @@ services:
- ALLOWED_HOSTS=${ALLOWED_HOSTS} - ALLOWED_HOSTS=${ALLOWED_HOSTS}
- TIME_ZONE=${TIME_ZONE} - TIME_ZONE=${TIME_ZONE}
- LANGUAGE_CODE=${LANGUAGE_CODE} - LANGUAGE_CODE=${LANGUAGE_CODE}
- STATIC_ROOT=/mnt/static
- MEDIA_ROOT=/mnt/media
# Postgres (McpToken lookup lives in Django ORM) # Postgres (McpToken lookup lives in Django ORM)
- APP_DB_NAME=${APP_DB_NAME} - APP_DB_NAME=${APP_DB_NAME}
- APP_DB_USER=${APP_DB_USER} - APP_DB_USER=${APP_DB_USER}
@@ -270,7 +283,7 @@ services:
init: init:
condition: service_completed_successfully condition: service_completed_successfully
volumes: volumes:
- mnemosyne-media:/app/media - media:/mnt/media
healthcheck: healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8001/mcp/health"] test: ["CMD", "curl", "-f", "http://localhost:8001/mcp/health"]
interval: 30s interval: 30s
@@ -286,8 +299,8 @@ services:
# backend. Does NOT need HTTP-layer settings (ALLOWED_HOSTS, CSRF, MCP auth) # backend. Does NOT need HTTP-layer settings (ALLOWED_HOSTS, CSRF, MCP auth)
# or search tuning (the worker never serves queries). # or search tuning (the worker never serves queries).
worker: worker:
image: git.helu.ca/r/mnemosyne:latest image: mnemosyne:local
pull_policy: always build: *mnemosyne-build
command: ["worker"] command: ["worker"]
environment: environment:
# Django core (Celery imports settings) # Django core (Celery imports settings)
@@ -296,6 +309,8 @@ services:
- DEBUG=${DEBUG} - DEBUG=${DEBUG}
- TIME_ZONE=${TIME_ZONE} - TIME_ZONE=${TIME_ZONE}
- LANGUAGE_CODE=${LANGUAGE_CODE} - LANGUAGE_CODE=${LANGUAGE_CODE}
- STATIC_ROOT=/mnt/static
- MEDIA_ROOT=/mnt/media
# Postgres # Postgres
- APP_DB_NAME=${APP_DB_NAME} - APP_DB_NAME=${APP_DB_NAME}
- APP_DB_USER=${APP_DB_USER} - APP_DB_USER=${APP_DB_USER}
@@ -324,6 +339,13 @@ services:
- DAEDALUS_S3_REGION_NAME=${DAEDALUS_S3_REGION_NAME} - DAEDALUS_S3_REGION_NAME=${DAEDALUS_S3_REGION_NAME}
- DAEDALUS_S3_USE_SSL=${DAEDALUS_S3_USE_SSL} - DAEDALUS_S3_USE_SSL=${DAEDALUS_S3_USE_SSL}
- DAEDALUS_S3_VERIFY=${DAEDALUS_S3_VERIFY} - DAEDALUS_S3_VERIFY=${DAEDALUS_S3_VERIFY}
- SPELUNKER_S3_ENDPOINT_URL=${SPELUNKER_S3_ENDPOINT_URL}
- SPELUNKER_S3_ACCESS_KEY_ID=${SPELUNKER_S3_ACCESS_KEY_ID}
- SPELUNKER_S3_SECRET_ACCESS_KEY=${SPELUNKER_S3_SECRET_ACCESS_KEY}
- SPELUNKER_S3_BUCKET_NAME=${SPELUNKER_S3_BUCKET_NAME}
- SPELUNKER_S3_REGION_NAME=${SPELUNKER_S3_REGION_NAME}
- SPELUNKER_S3_USE_SSL=${SPELUNKER_S3_USE_SSL}
- SPELUNKER_S3_VERIFY=${SPELUNKER_S3_VERIFY}
# Celery / RabbitMQ # Celery / RabbitMQ
- CELERY_BROKER_URL=${CELERY_BROKER_URL} - CELERY_BROKER_URL=${CELERY_BROKER_URL}
- CELERY_RESULT_BACKEND=${CELERY_RESULT_BACKEND} - CELERY_RESULT_BACKEND=${CELERY_RESULT_BACKEND}
@@ -347,7 +369,7 @@ services:
app: app:
condition: service_healthy condition: service_healthy
volumes: volumes:
- mnemosyne-media:/app/media - media:/mnt/media
healthcheck: healthcheck:
test: ["CMD", "celery", "-A", "mnemosyne", "inspect", "ping", "-d", "celery@$$HOSTNAME"] test: ["CMD", "celery", "-A", "mnemosyne", "inspect", "ping", "-d", "celery@$$HOSTNAME"]
interval: 60s interval: 60s
@@ -355,7 +377,7 @@ services:
retries: 3 retries: 3
start_period: 60s start_period: 60s
# ── Web: nginx reverse proxy, public port 23181 ──────────────────────────── # ── Web: nginx reverse proxy, public port 23081 ────────────────────────────
# No Django env — nginx only knows how to route. Public listener is # No Django env — nginx only knows how to route. Public listener is
# templated into the conf file by Ansible if the port ever needs to change. # templated into the conf file by Ansible if the port ever needs to change.
web: web:
@@ -368,23 +390,42 @@ services:
mcp: mcp:
condition: service_healthy condition: service_healthy
ports: ports:
- "23181:80" - "23081:80"
volumes: volumes:
- ./nginx/mnemosyne.conf:/etc/nginx/conf.d/default.conf:ro - ./nginx/mnemosyne.conf:/etc/nginx/conf.d/default.conf:ro
- mnemosyne-static:/var/www/static:ro - static:/var/www/static:ro
- mnemosyne-media:/var/www/media:ro - media:/var/www/media:ro
healthcheck: healthcheck:
test: ["CMD", "curl", "-f", "http://localhost/live/"] test: ["CMD", "curl", "-f", "http://localhost/live/"]
interval: 30s interval: 30s
timeout: 5s timeout: 5s
retries: 3 retries: 3
# ── Web metrics: nginx-prometheus-exporter ─────────────────────────────────
# Scrapes the `web` container's stub_status endpoint and re-exposes it in
# Prometheus format on 9113. Prospero (Sao) scrapes this; see
# virgo/ansible/pplg/prometheus.yml.j2 → job_name: 'mnemosyne'.
# The Django /metrics endpoint (django-prometheus + custom pipeline metrics
# in mcp_server/metrics.py and library/metrics.py) is reached separately
# via nginx at /metrics — no sidecar needed for that.
web-metrics:
image: nginx/nginx-prometheus-exporter:latest
command:
- --nginx.scrape-uri
- http://web:80/nginx_status
depends_on:
web:
condition: service_started
ports:
- "23191:9113"
restart: unless-stopped
logging: *default-logging
volumes: volumes:
# Static files baked into the image at /app/staticfiles. The static-init # Static files written by collectstatic (run by the init sidecar on every
# service seeds this volume on every `up`, so nginx always serves the # `up`). Docker-managed volume — no host path needed; storage is minimal
# current image's static bundle. # and auto-regenerated on every `up`.
mnemosyne-static: static:
# Local FileSystemStorage fallback. Production uses USE_LOCAL_STORAGE=False # Media files. Production uses USE_LOCAL_STORAGE=False (S3) so this volume
# so this is mostly empty — kept for parity with dev and for any path # is effectively unused — kept so the mount points in services don't break.
# that writes to MEDIA_ROOT directly. media:
mnemosyne-media:

View File

@@ -63,6 +63,11 @@ case "$1" in
# or library_type defaults need to change. A non-zero exit here blocks # or library_type defaults need to change. A non-zero exit here blocks
# `app`, `mcp`, and `worker` from starting. # `app`, `mcp`, and `worker` from starting.
# #
# collectstatic copies the static files baked into the image at build
# time (/app/staticfiles) into STATIC_ROOT (/mnt/static), which nginx
# serves directly. --clear removes any stale files from the previous
# deploy before copying, so deleted assets don't linger.
#
# Neo4j vector-index creation is *deliberately not* bundled here. That # Neo4j vector-index creation is *deliberately not* bundled here. That
# command (``setup_neo4j_indexes``) requires a system embedding model # command (``setup_neo4j_indexes``) requires a system embedding model
# with a configured ``vector_dimensions`` value, and that model is # with a configured ``vector_dimensions`` value, and that model is
@@ -71,7 +76,7 @@ case "$1" in
# whole stack on it would make the admin unreachable — a chicken-and- # whole stack on it would make the admin unreachable — a chicken-and-
# egg. Operator bootstrap flow: # egg. Operator bootstrap flow:
# #
# 1. docker compose up # init sidecar: migrate + load_library_types # 1. docker compose up # init sidecar: migrate + collectstatic + load_library_types
# 2. browse to admin, configure system embedding model # 2. browse to admin, configure system embedding model
# 3. docker compose exec app python manage.py setup_neo4j_indexes # 3. docker compose exec app python manage.py setup_neo4j_indexes
# #
@@ -80,6 +85,7 @@ case "$1" in
# missing so this is visible, not silent. # missing so this is visible, not silent.
set -e set -e
python manage.py migrate --noinput python manage.py migrate --noinput
python manage.py collectstatic --noinput --clear
python manage.py load_library_types python manage.py load_library_types
;; ;;

View File

@@ -25,3 +25,28 @@ def on_starting(server):
def post_worker_init(worker): def post_worker_init(worker):
logging.getLogger("gunicorn.access").addFilter(_filter) logging.getLogger("gunicorn.access").addFilter(_filter)
from library.apps import _run_startup_probe, _should_skip_probe
if not _should_skip_probe():
try:
_run_startup_probe()
except Exception as exc:
logging.getLogger("library.apps").warning(
"Startup probe crashed: %s", exc, exc_info=True
)
def worker_exit(server, worker):
# Neomodel lazily creates a neo4j.Driver on first cypher_query and
# holds it for the process lifetime. Newer neo4j drivers warn (and
# will eventually fail to clean up) if the driver is destroyed
# without an explicit close. Close it here so each gunicorn worker
# shuts down cleanly.
try:
from neomodel import db
db.close_connection()
except Exception as exc:
logging.getLogger("neomodel").warning(
"Failed to close neomodel driver on worker exit: %s", exc
)

View File

@@ -367,9 +367,12 @@ Mnemosyne validates the JWT against `MCPSigningKey` keyed by `kid`.
## 7. REST API — Mnemosyne team lifecycle ## 7. REST API — Mnemosyne team lifecycle
All endpoints live under `/mcp_server/api/teams/` and are protected All endpoints live under `/mcp_server/api/teams/` and are authenticated
by the existing `daedalus-service` HTTP Basic account (same auth as as the Mnemosyne user the team belongs to via a per-user DRF token
`/library/api/workspaces/` and `/library/api/ingest/`). (`Authorization: Token <key>`, surfaced on `/profile/settings/`). Each
team has an `owner` FK; non-owners receive 404 (never 403) so a team's
existence isn't disclosed across users. `/library/api/workspaces/` and
`/library/api/ingest/` use the same per-user auth model.
### 7.1 `POST /mcp_server/api/teams/` ### 7.1 `POST /mcp_server/api/teams/`
Create a team. Create a team.
@@ -733,7 +736,8 @@ escape hatch for hard compartmentalization.
* `TeamWorkspaceAssignment` PUT is idempotent and replaces, not * `TeamWorkspaceAssignment` PUT is idempotent and replaces, not
unions. unions.
* `/mcp_server/api/teams/` endpoints: create, delete, rotate, * `/mcp_server/api/teams/` endpoints: create, delete, rotate,
workspaces PUT, all authenticated as `daedalus-service`. workspaces PUT, all authenticated with a per-user DRF token and
scoped to the team's `owner` (non-owner requests return 404).
### 14.2 Daedalus test surface ### 14.2 Daedalus test surface
* `on_pallas_registered` populates `team_jwt_encrypted` and transitions * `on_pallas_registered` populates `team_jwt_encrypted` and transitions

View File

@@ -0,0 +1,658 @@
# Daedalus ↔ Pallas ↔ Mnemosyne Integration — v2
**Status:** Approved design — supersedes
[`DAEDALUS_PALLAS_INTEGRATION_v1.md`](DAEDALUS_PALLAS_INTEGRATION_v1.md).
**Authoritative home:** `mnemosyne/docs/DAEDALUS_PALLAS_INTEGRATION_v2.md`
**Versioning:** subsequent major revisions ship as `..._v3.md` etc.
alongside this file. Cross-service docs (Daedalus, Pallas) link here.
---
## 1. Summary
This document describes the end-state authentication / authorization
model connecting three services:
* **Mnemosyne** — knowledge platform. Owns Libraries, users, and the
MCP surface third-party clients query.
* **Daedalus** — workspace + file-lifecycle UI. Registers Pallas
instances, syncs file content to Mnemosyne, drives chat. Acts on
behalf of one Mnemosyne user per Daedalus instance.
* **Pallas** — FastAgent-backed MCP host that exposes agent teams
(Kottos, Mentor, Iolaus, …) as HTTP MCP servers.
**What changed from v1:**
* **Single token model.** The two-token split in v1 (DRF `authtoken`
for REST, `MCPToken` for `/mcp/`) is gone. One model —
[`UserToken`](../mnemosyne/mcp_server/models.py) — authenticates both
surfaces, managed from one UI at `/profile/tokens/`. The DRF
`authtoken` app has been removed from `INSTALLED_APPS`.
* **Per-user authorization on the REST surface.** The Daedalus-facing
endpoints (`/library/api/*`, `/mcp_server/api/teams/*`) are no longer
open to any authenticated account. Each `Team` has an `owner` FK and
each workspace-scoped `Library` has an `owner_username` property; the
endpoints scope by these and return 404 for non-owners. The
`daedalus-service` shared account has been retired.
* **Per-turn JWT path retired.** The legacy `iss=daedalus` JWT flow
(v1 §5.1, §6.2) is gone. Mnemosyne now only validates one JWT shape:
`typ=team`, `iss=mnemosyne`. The replay cache and the
`_resolve_jwt_actor` service-user fallback are also gone.
* **Authorization headers normalised to `Bearer`.** DRF
`TokenAuthentication` (and its `Token` keyword) is replaced by
[`UserTokenAuthentication`](../mnemosyne/mcp_server/drf_auth.py),
which accepts `Authorization: Bearer <plaintext>`. Anonymous
requests get **401 + `WWW-Authenticate: Bearer`** (RFC 7235).
Everything else in v1 — the resolved-library abstraction, team JWT
shape, Pallas's static-bearer configuration, the workspace ↔ Team
attachment model in Daedalus, agent picker UX, signing-key model — is
unchanged.
---
## 2. Motivation
v1 closed the per-turn JWT forwarding hairball by introducing static
team JWTs. v2 finishes the cleanup pass: it deletes the per-turn JWT
path entirely (now that Daedalus has migrated off it), collapses the
remaining two-token muddle into a single `UserToken` system, and tightens
the REST surface so authentication-as-user is sufficient for access
control without a shared service account.
---
## 3. Architecture
### 3.1 Services and responsibilities
| Service | Role in auth model |
|---|---|
| **Mnemosyne** | Owns Libraries, Library memberships, `UserToken`s, Teams, `TeamWorkspaceAssignment`s, signing keys. Validates bearers. Resolves every authenticated request to a Library set. |
| **Daedalus** | Control plane. Registers Pallas instances as Teams in Mnemosyne. Manages workspace ↔ team attachments. Stores team JWTs for copying into Pallas deployment configs. Acts as a single Mnemosyne user via a `UserToken`. |
| **Pallas** | Stateless MCP host. Holds a static team JWT in `fastagent.secrets.yaml`. No custom auth-forwarding code. |
### 3.2 Two credential types
Every authenticated request to Mnemosyne presents a Bearer token of
exactly one of these shapes:
| # | Credential | `iss` | Issuer | Lifetime | Used on | Library scope source |
|---|---|---|---|---|---|---|
| 1 | **Opaque `UserToken`** | n/a | The Mnemosyne user, via `/profile/tokens/` | Until revoked / expiry | `/mcp/` and DRF REST | MCP: `allowed_libraries`. REST: ignored (owner-scoped). |
| 2 | **Team JWT** | `mnemosyne` | Mnemosyne (`/mcp_server/api/teams/`) | 10 years | `/mcp/` only | Live DB lookup via `TeamWorkspaceAssignment → Library` |
The v1 per-turn JWT (category 2 in v1) has been retired and is no
longer accepted by `resolve_mcp_jwt`.
### 3.3 Scope split by surface
A `UserToken` carries optional `allowed_libraries` / `allowed_tools`
fields. These are honoured **only on the MCP surface** (`/mcp/`):
* **`/mcp/`** — `MCPAuthMiddleware` enforces `allowed_libraries`
(fail-closed: empty list = zero libraries) and `allowed_tools` (empty
list = any tool). This is the surface third-party clients (Claude
Desktop, Cline) use.
* **`/library/api/*`, `/mcp_server/api/teams/*`** — The DRF auth class
resolves *who* is calling. Access is gated by `Team.owner`
(mcp_server) and `Library.owner_username` (library workspaces). The
scope claims are ignored. Daedalus tokens are therefore
unrestricted; the user identity plus owner-scope is the access model.
The rationale: enforcing `allowed_libraries` on the REST endpoints
would force Daedalus to mint an effectively-unrestricted token (since
it manages the whole workspace lifecycle), which would defeat the
field. Owner-scope already encodes the right access pattern there.
### 3.4 Resolved-library abstraction (MCP)
Mnemosyne's MCP auth middleware populates a single
`resolved_libraries: list[str]` per request. Downstream code (search,
get_chunk, …) only reads that list.
```
Bearer → classify → dispatch
├─ Opaque UserToken → token.allowed_libraries (JSON list of UIDs)
└─ team JWT (typ=team) → live DB join:
TeamWorkspaceAssignment.workspace_id
→ Library.workspace_id → Library.uid
resolved_libraries: list[str]
downstream tools
```
Fail-closed: empty resolution → no libraries visible.
---
## 4. Data model
### 4.1 Mnemosyne
#### `UserToken` (renamed from `MCPToken`)
[`mnemosyne/mcp_server/models.py`](../mnemosyne/mcp_server/models.py).
Per-user opaque bearer. Hashed at rest (SHA-256, 64-char hex).
```python
class UserToken(models.Model):
user = FK(User, related_name="api_tokens")
token_hash = CharField(64, unique=True, db_index=True)
name = CharField(100)
is_active = BooleanField(default=True)
expires_at = DateTimeField(null=True, blank=True)
last_used_at = DateTimeField(null=True, blank=True)
allowed_tools = JSONField(default=list, blank=True)
allowed_libraries = JSONField(default=list, blank=True)
created_at, updated_at =
```
* Plaintext shown once at mint via
[`UserTokenManager.create_token`](../mnemosyne/mcp_server/models.py);
never persisted.
* Display masking via `get_masked_token()` returns `tok_…<hash[:8]>`.
* `allowed_*` fields apply only on `/mcp/` — see §3.3.
#### `LibraryMembership`
Unchanged from v1. Roles `owner` / `manager` / `reader` over Neo4j
Libraries (joined by `uid` string since Library is a neomodel node).
#### `Team`
v1 + new non-null `owner` FK:
```python
class Team(models.Model):
id = UUIDField(primary_key=True, editable=False)
name = CharField(200)
owner = FK(User, on_delete=PROTECT, related_name="teams")
active = BooleanField(default=True)
active_jti = UUIDField(null=True)
created_at, updated_at =
```
`Team.owner` is set on creation in
[`team_create`](../mnemosyne/mcp_server/api/teams.py) from
`request.user`. All other team endpoints filter by `(pk, owner=request.user)`;
non-owners receive 404, never 403, so a team's existence isn't
disclosed across users.
Soft-delete via `Team.active = False` is unchanged.
#### `TeamWorkspaceAssignment`
Unchanged from v1. Live-queried per request; `PUT /workspaces/`
replaces the assignment set.
#### `MCPSigningKey`
Unchanged. Signs team JWTs.
#### `Library.owner_username` (new neomodel property)
[`mnemosyne/library/models.py`](../mnemosyne/library/models.py). For
workspace-scoped libraries (i.e. those with `workspace_id` set), the
Mnemosyne username of the creating user. Null for global libraries.
Indexed.
```python
owner_username = StringProperty(required=False, index=True)
```
The workspace endpoints (`/library/api/workspaces/…`) set this on
create and require `lib.owner_username == request.user.username` for
all mutations and reads; non-owners get 404 on GET/PUT and 204 on
DELETE (idempotent).
### 4.2 Daedalus (informational — managed in the Daedalus repo)
Unchanged from v1 except:
* `vault_mnemosyne_daedalus_service_password` is **gone**. Daedalus
authenticates to Mnemosyne with a `UserToken` plaintext minted at
`/profile/tokens/`, stored in whatever secret the operator wires
(suggestion: `vault_mnemosyne_user_token`).
* Daedalus's HTTP client sends `Authorization: Bearer <plaintext>` to
every Mnemosyne endpoint (`/library/api/*`, `/mcp_server/api/teams/*`,
`/mcp/`). The `Token <key>` keyword is no longer accepted anywhere.
### 4.3 Pallas
Unchanged from v1. Static `Authorization: Bearer <team-jwt>` in
`fastagent.secrets.yaml`.
---
## 5. JWT claim shapes
Only one JWT shape remains — the team JWT from v1 §5.2:
```json
{
"iss": "mnemosyne",
"aud": "mnemosyne",
"sub": "team:<pallas_instance_uuid>",
"typ": "team",
"iat": 1715000000,
"exp": 1976000000,
"jti": "uuid4"
}
```
[`mnemosyne/mcp_server/teams.py:mint_team_jwt`](../mnemosyne/mcp_server/teams.py).
### 5.1 Validator changes vs v1
[`mnemosyne/mcp_server/auth.py`](../mnemosyne/mcp_server/auth.py):
* `resolve_mcp_jwt` no longer accepts `iss=daedalus`. The `_JTI_CACHE`
replay cache still exists but is exercised by no live code path —
scheduled for removal in a follow-up cleanup commit.
* `_resolve_jwt_actor` resolves to `team.owner` (the Mnemosyne user
that created the team) rather than a synthetic service user. Audit
log / usage accounting now correctly attribute each turn to the
acting user.
```python
def _resolve_jwt_actor(claims: dict):
if claims.get("typ") != "team":
raise MCPAuthError("Per-turn JWTs are no longer accepted; mint a team JWT.")
team = Team.objects.select_related("owner").get(pk=claims["team_id"])
if not team.active:
raise MCPAuthError("Team JWT references an inactive team.")
if not team.owner.is_active:
raise MCPAuthError("Team owner is disabled.")
return team.owner
```
---
## 6. Auth flow
### 6.1 Third-party MCP client with `UserToken`
1. Client sends `Authorization: Bearer <plaintext>` to `/mcp/`.
2. `MCPAuthMiddleware` hashes → looks up `UserToken` → validates
active/expired/user-active.
3. `resolved_libraries = list(token.allowed_libraries or [])`.
4. Fails closed if empty.
### 6.2 Agent team (Kottos / Mentor / Iolaus / Daedalus-chat-team)
1. Pallas sends `Authorization: Bearer <team-jwt>` to `/mcp/`.
2. Middleware validates signature, `iss=mnemosyne`, `typ=team`.
3. Loads `Team` by UUID from `sub`. Verifies `active=True` and
`jti == active_jti`.
4. Expands to `resolved_libraries` via `TeamWorkspaceAssignment`
`Library.workspace_id`.
5. The acting user (for audit, usage accounting) is `team.owner`.
### 6.3 Daedalus REST control / ingest
1. Daedalus sends `Authorization: Bearer <user-token-plaintext>` to
`/library/api/*` or `/mcp_server/api/teams/*`.
2. DRF `UserTokenAuthentication` (first in the auth stack) resolves
the token to its user.
3. Endpoint scopes by `Team.owner` (mcp_server) or
`Library.owner_username` (library). Non-owner ⇒ 404.
### 6.4 Browser / web session
SessionAuthentication runs second; cookie-authenticated users hit the
DRF browsable API as themselves with no special handling.
### 6.5 Failure modes
| Condition | Response |
|---|---|
| No `Authorization` header | 401 + `WWW-Authenticate: Bearer` |
| `Authorization: Token …` (legacy DRF keyword) | 401 (not consumed by any auth class) |
| Invalid bearer plaintext | 401 + `WWW-Authenticate: Bearer` |
| Inactive / expired token | 401 |
| Disabled user | 401 |
| JWT signature invalid | 401 + `WWW-Authenticate: Bearer` |
| JWT `exp` past (+30s leeway) | 401 |
| JWT `iss` not `mnemosyne` | 401 |
| JWT `typ` not `team` (legacy per-turn) | 401 ("per-turn JWTs no longer accepted") |
| Team inactive / unknown / `jti` stale | 401 |
| Team endpoint, non-owner caller | 404 |
| Workspace endpoint, non-owner caller (GET/PUT) | 404 |
| Workspace endpoint, non-owner caller (DELETE) | 204 (idempotent) |
---
## 7. REST API — Mnemosyne team lifecycle
Endpoints under `/mcp_server/api/teams/` are authenticated as the
Mnemosyne user the team belongs to via a per-user `UserToken`
(`Authorization: Bearer <plaintext>`, minted at `/profile/tokens/`).
Each team has an `owner` FK; non-owners receive 404 (never 403) so a
team's existence isn't disclosed across users.
### 7.1 `POST /mcp_server/api/teams/`
Create a team. `Team.owner` is set to `request.user`.
**Request**
```json
{ "id": "a3f1…", "name": "Kottos" }
```
**Response 201** — fresh id
```json
{ "id": "a3f1…", "name": "Kottos", "jwt": "eyJhbGci…" }
```
**Response 200** — same id, same owner (idempotent; no new JWT issued).
**Response 409** — same id, different owner ("Team id is already in use.").
### 7.2 `DELETE /mcp_server/api/teams/{id}/`
Soft-delete (`active=False`, clear `active_jti`). Old JWT invalid on
next call. Non-owner ⇒ 404.
### 7.3 `PUT /mcp_server/api/teams/{id}/workspaces/`
Replace the team's workspace assignment set. Idempotent.
```json
{ "workspace_ids": ["ws_abc", "ws_def"] }
```
### 7.4 `POST /mcp_server/api/teams/{id}/rotate/`
Generate a fresh `jti` and JWT, replace `active_jti`. Old JWT invalid
immediately.
**Upsert-on-missing.** If no `Team` exists for `id`, rotate creates one
owned by the caller (with `name = str(id)`) and mints its first JWT —
the operator clicks "Rotate JWT" in Daedalus settings and things just
work even if Daedalus's `provision_teams` workflow never ran for this
PallasInstance. The placeholder name can be edited via admin.
| Response | Condition |
|---|---|
| **200** + `jwt` | Same-owner id (rotates) or fresh id (upserts + mints) |
| **409** | `id` exists under a different owner (`"Team id is already in use."`) |
| **409** | Team is inactive (soft-deleted) — explicit recreate required |
The upsert path logs `team_rotate upserted_missing team_id=… owner=…`
at INFO. Surfacing this in metrics is a useful drift signal: Daedalus
and Mnemosyne fell out of sync on team provisioning.
### 7.5 `GET /mcp_server/api/teams/{id}/`
Read-only detail (no JWT). Used by the Daedalus reconciler.
### 7.6 `/library/api/ingest/` and `/library/api/jobs/…`
Same owner-scope model as the workspace endpoints: every ingest write,
job read, retry, and list filter against
`Library.owner_username == request.user.username` (global libraries
with null `owner_username` remain shared). Cross-user calls get 404
with the same "not registered" wording as a genuinely missing
workspace — existence is not disclosed across users. The list endpoint
silently filters; a `library_uid` the caller has no access to returns
an empty list rather than 404.
---
## 8. Daedalus lifecycle hooks
Unchanged from v1 §8 except the HTTP client now sends
`Authorization: Bearer <UserToken-plaintext>` and Daedalus's config
exposes one `UserToken` plaintext (one per Mnemosyne user the Daedalus
instance acts on behalf of, in deployments that multiplex).
---
## 9. Operator workflows
### 9.1 Register a new Pallas deployment
Unchanged from v1 §9.1.
### 9.2 Attach a Pallas team to a workspace
Unchanged from v1 §9.2.
### 9.3 Retire a Pallas deployment
Unchanged from v1 §9.3.
### 9.4 Rotate a compromised team JWT
Unchanged from v1 §9.4.
### 9.5 Provision Mnemosyne integration on a fresh Daedalus instance
Replaces v1 §9.5 (`provision_teams`) and the deleted
`ensure_service_user` flow:
1. **Mint a `UserToken` for the Mnemosyne user** Daedalus will act as:
`/profile/tokens/add/` (UI) or
`python manage.py create_user_token --user <username> --name "Daedalus"`.
Copy the plaintext (shown once).
2. **Stage the plaintext in Daedalus's config** as the bearer for all
Mnemosyne calls.
3. **Run Daedalus's `provision_teams`** to materialize a `Team` row in
Mnemosyne for every existing `PallasInstance`.
4. **Distribute team JWTs** to each Pallas deployment as v1 §9.5
describes.
### 9.6 Issue a `UserToken` for a third-party MCP client
1. User logs in to Mnemosyne, navigates to `/profile/tokens/`, clicks
"Generate API Token".
2. (Optional) opens the "Restrictions (optional)" section to set
`allowed_tools` / `allowed_libraries` — these apply only on
`/mcp/`; for purely REST use they can stay empty.
3. Plaintext is shown once on the response page.
4. User pastes plaintext into the third-party client's config (Claude
Desktop, Cline, etc.) with `Authorization: Bearer …`.
The same UI and command (`create_user_token`) mint tokens for any
purpose — Daedalus, MCP clients, scripts, CI. There is no separate
"DRF token" category.
---
## 10. UX changes in Daedalus
Unchanged from v1 §10.
---
## 11. Migration
### 11.1 State at the start of v2
* Mnemosyne is not in a production deployment; migrations are reset on
schema changes and the project assumes a clean DB on the next
release.
* Daedalus has already migrated to `Authorization: Bearer <plaintext>`
and is configured to use a per-user token; the v1 DRF-token shim is
no longer used at runtime.
* No live Pallas deployments authenticate via per-turn JWT (the path
is removed).
### 11.2 Order of operations
1. **Mnemosyne v2 deploys.** New `UserTokenAuthentication`, owner-scoped
REST endpoints, retired per-turn JWT validation, removed
`authtoken` app. Operator mints a `UserToken` for Daedalus's
Mnemosyne account before deploy.
2. **Daedalus's config swap.** Operator points Daedalus at the new
`UserToken` plaintext. (If Daedalus was still sending
`Authorization: Token …`, switch to `Authorization: Bearer …` at
the same time.)
3. **Existing Teams.** None expected at the v2 cutover (migrations are
reset). If any existed, `Team.owner` would need backfill; not in
scope.
### 11.3 Rollback
Mnemosyne v2 is a coordinated cutover with Daedalus's bearer-header
swap. Rolling Mnemosyne back to v1 without rolling Daedalus back too
means Daedalus's `Authorization: Bearer …` won't be recognised on
`/library/api/*` (v1 only accepted `Token`). Plan the deploy as a
single window.
---
## 12. Deprecated / removed in v2
### Mnemosyne
* `rest_framework.authtoken` (removed from `INSTALLED_APPS`).
Generated migration drops the `authtoken_token` table on next migrate;
on a reset schema there's nothing to drop.
* `rest_framework.authentication.TokenAuthentication` and
`BasicAuthentication` (removed from
`REST_FRAMEWORK["DEFAULT_AUTHENTICATION_CLASSES"]`).
* "API Token" card on `/profile/settings/` (removed). The whole
`api_token_regenerate` view + URL are gone.
* `mcp_server.management.commands.ensure_service_user` (deleted).
* `daedalus-service` user (no longer provisioned by Mnemosyne; no
longer assumed by any endpoint).
* `MCP_JWT_SERVICE_USERNAME` setting (no longer read by
`_resolve_jwt_actor`).
* Per-turn JWT path in
[`mcp_server/auth.py`](../mnemosyne/mcp_server/auth.py) — accepted
shapes shrink to `typ=team` only. `_JTI_CACHE` is now exercised by
no live path; scheduled for cleanup.
* `MCPToken` (renamed to `UserToken`); `MCPTokenManager`,
`MCPTokenAdmin`, `MCPTokenCreateForm`, `MCPTokenEditForm` (renamed
in lockstep). The `mcp_…` masked-token prefix becomes `tok_…`.
* `create_mcp_token` management command (renamed `create_user_token`).
* `/profile/mcp-tokens/` URL prefix (renamed `/profile/tokens/`); URL
names `mcp-token-*` (renamed `token-*`).
### Daedalus
* `vault_mnemosyne_daedalus_service_password` (no longer needed; the
service user is gone).
* Any code path that distinguished DRF-`Token` from MCP-`Bearer` — one
bearer header for everything now.
### Pallas
No changes from v1.
---
## 13. Security
### 13.1 Token lifetimes
* **`UserToken`**: until revoked (user) or `expires_at`. Rotation is
manual via the `/profile/tokens/` dashboard.
* **Team JWT**: 10 years. Revocation via `Team.active`,
`Team.active_jti`, or key rotation.
### 13.2 Revocation levers
1. `PUT /teams/{id}/workspaces/` with `[]` — team sees nothing, JWT
still validates. Useful for pausing without redistributing tokens.
2. `DELETE /teams/{id}/` — team inactive, all its JWTs rejected.
3. `POST /teams/{id}/rotate/``active_jti` changes; leaked JWT
stops working.
4. **Revoke a `UserToken`**`/profile/tokens/{id}/revoke/` flips
`is_active=False`; immediate effect for both `/mcp/` and REST.
5. `MCPSigningKey.retire()` — nuclear option for team JWTs.
### 13.3 At-rest protection
* `UserToken.token_hash`: SHA-256 of plaintext; plaintext never
stored.
* `MCPSigningKey.secret_hex`: 256-bit hex secret stored in Mnemosyne
DB only.
* `PallasInstance.team_jwt_encrypted`: Fernet-encrypted by Daedalus.
### 13.4 Audit attribution
Every authenticated request resolves to a real Mnemosyne user:
* Opaque `UserToken``token.user`.
* Team JWT → `team.owner`.
Both flow through to usage accounting (`LLMUsage`, search metrics) and
the audit log. The synthetic `daedalus-service` actor is gone; nothing
in the audit trail is attributed to a non-user account.
Notable audit events:
* `team_create created team_id=… name=…` — fresh team registered.
* `team_create idempotent_hit team_id=…` — same-owner re-POST.
* `team_create owner_conflict team_id=… caller=…` — id collision.
* `team_rotate team_id=… new_jti=…` — explicit rotation.
* `team_rotate upserted_missing team_id=… owner=…` — rotate created a
missing team on the fly. Useful drift signal: Daedalus and
Mnemosyne fell out of sync on team provisioning.
* `team_delete team_id=…` — soft-delete.
### 13.5 Isolation model
Unchanged from v1 §13.5.
---
## 14. Testing
### 14.1 Mnemosyne test surface (relevant to v2)
* `resolve_mcp_jwt` rejects `iss=daedalus` / non-`team` payloads.
* `_resolve_jwt_actor` resolves to `team.owner`; rejects per-turn JWTs
and inactive owners. See
[`test_auth.py::ResolveJWTActorTest`](../mnemosyne/mcp_server/tests/test_auth.py).
* `UserTokenAuthentication` issues 401 + `WWW-Authenticate: Bearer`
for anonymous and rejected-token cases; 200 for valid bearer; stashes
the `UserToken` on `request.auth`. See
[`test_drf_auth.py`](../mnemosyne/mcp_server/tests/test_drf_auth.py).
* `Team` endpoints scope by `owner`; cross-user GET/DELETE/PUT return
404; same-id different-owner POST/rotate returns 409. `rotate`
upserts a missing team owned by the caller. See
[`test_teams_api.py`](../mnemosyne/mcp_server/tests/test_teams_api.py).
* Ingest endpoints (`POST /library/api/ingest/`,
`GET/POST /library/api/jobs/…`) scope by `Library.owner_username`.
Cross-user writes/reads return 404; list silently filters. The
Cypher-touching paths require Neo4j, so the scoping is exercised by
the manual e2e plan in §14.3 rather than unit tests.
* `UserToken` model: hash-at-rest, `tok_…` masked prefix,
`allowed_libraries` round-trip. See
[`test_token.py`](../mnemosyne/mcp_server/tests/test_token.py),
[`test_models.py`](../mnemosyne/mcp_server/tests/test_models.py).
### 14.2 Daedalus test surface
Unchanged from v1 §14.2 except:
* HTTP client uses `Authorization: Bearer …` against every Mnemosyne
endpoint.
* Provisioning command depends on a configured `UserToken`, not the
retired `daedalus-service` Basic-auth credential.
### 14.3 Integration
* End-to-end: MCP client with `UserToken` → search scoped to
`token.allowed_libraries`.
* End-to-end: Pallas with team JWT → search scoped to team's attached
workspaces.
* End-to-end: Daedalus REST call with `UserToken` → workspace
mutation succeeds only for the owning user; cross-user attempts get
404.
* End-to-end: ingest as one user, then a *different* user attempts
`POST /library/api/ingest/`, `GET /jobs/{id}/`, `POST /jobs/{id}/retry/`
and `GET /jobs/?library_uid=<theirs>` — first three return 404, the
list returns an empty array.
* End-to-end: anonymous REST call → 401 + `WWW-Authenticate: Bearer`.
* End-to-end: `POST /mcp_server/api/teams/{fresh-uuid}/rotate/` on a
team Mnemosyne has never seen → 200 + JWT, `Team` row created with
`owner=request.user`. Second rotate on the same id → 200 with a
fresh `active_jti`. Rotate on an id owned by a different user → 409.
---
## 15. Phased delivery
| # | Phase | Surface | Status |
|---|---|---|---|
| 1 | Design v1 | [`DAEDALUS_PALLAS_INTEGRATION_v1.md`](DAEDALUS_PALLAS_INTEGRATION_v1.md) | Superseded |
| 2 | Mnemosyne core | `LibraryMembership`, `MCPToken`, `Team`, `TeamWorkspaceAssignment`, `/mcp_server/api/teams/`, team JWT mint | Implemented (v1) |
| 3 | Pallas cleanup | Remove `_fastagent_patch.py` internals | Implemented (v1) |
| 4 | Daedalus integration | Lifecycle hooks, reconciler, `provision_teams`, attached-teams UI | Implemented (v1) |
| 5 | Per-user REST authorization | `Team.owner`, `Library.owner_username`, owner-scope on all Daedalus-facing endpoints, `_resolve_jwt_actor``team.owner` | Implemented (v2) |
| 6 | Token consolidation | Rename `MCPToken``UserToken`, `UserTokenAuthentication` DRF class, drop `authtoken` + DRF Token UI, retire per-turn JWT, `Bearer`-first auth stack | Implemented (v2) |
| 7 | Documentation | This file; updates to [`mnemosyne_integration.md`](mnemosyne_integration.md) and [`deploy.md`](deploy.md) | Implemented (v2) |
---
## 16. Open items (v2)
* `_JTI_CACHE` in [`auth.py`](../mnemosyne/mcp_server/auth.py) is dead
code (the per-turn replay path is gone). Cleanup commit pending; not
blocking.
* `BasicAuthentication` is removed from the DRF default stack. If any
internal tooling relied on it, that path is now broken and will need
an explicit re-add to the relevant viewset's `authentication_classes`
rather than the global default.
---
## 17. Cross-references
* Mnemosyne MCP auth: [`mnemosyne/mcp_server/auth.py`](../mnemosyne/mcp_server/auth.py).
* Mnemosyne DRF auth class: [`mnemosyne/mcp_server/drf_auth.py`](../mnemosyne/mcp_server/drf_auth.py).
* Mnemosyne token model: [`mnemosyne/mcp_server/models.py`](../mnemosyne/mcp_server/models.py) (`UserToken`).
* Mnemosyne team REST: [`mnemosyne/mcp_server/api/teams.py`](../mnemosyne/mcp_server/api/teams.py).
* Mnemosyne workspace REST: [`mnemosyne/library/api/workspaces.py`](../mnemosyne/library/api/workspaces.py).
* Token self-service dashboard: [`mnemosyne/mcp_server/views.py`](../mnemosyne/mcp_server/views.py), [`urls.py`](../mnemosyne/mcp_server/urls.py).
* `create_user_token` management command: [`mnemosyne/mcp_server/management/commands/create_user_token.py`](../mnemosyne/mcp_server/management/commands/create_user_token.py).
* v1 design (superseded but kept for history): [`DAEDALUS_PALLAS_INTEGRATION_v1.md`](DAEDALUS_PALLAS_INTEGRATION_v1.md).

22
docs/Makefile Normal file
View File

@@ -0,0 +1,22 @@
# Minimal Sphinx Makefile.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = _build
.PHONY: help clean html livehtml Makefile
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
clean:
rm -rf $(BUILDDIR) $(SOURCEDIR)/reference/apps
html:
@$(SPHINXBUILD) -M html "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
livehtml:
sphinx-autobuild "$(SOURCEDIR)" "$(BUILDDIR)/html" $(SPHINXOPTS) $(O)
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

View File

@@ -1,4 +1,4 @@
# SSO with Allauth & Casdoor Pattern v1.0.0 # SSO with Allauth & Casdoor Pattern v1.02
Standardizes OIDC-based Single Sign-On using Django Allauth and Casdoor, covering adapter customization, user provisioning, group mapping, superuser protection, and configurable local-login fallback. Used by the `core` Django application. Standardizes OIDC-based Single Sign-On using Django Allauth and Casdoor, covering adapter customization, user provisioning, group mapping, superuser protection, and configurable local-login fallback. Used by the `core` Django application.
@@ -35,6 +35,7 @@ Every SSO implementation following this pattern must provide these files:
| Local account adapter | `<app>/adapters.py` | Disable local signup, authentication logging | | Local account adapter | `<app>/adapters.py` | Disable local signup, authentication logging |
| Management command | `<app>/management/commands/create_sso_groups.py` | Idempotent group + permission creation | | Management command | `<app>/management/commands/create_sso_groups.py` | Idempotent group + permission creation |
| Login template | `templates/account/login.html` | SSO button + conditional local login form | | Login template | `templates/account/login.html` | SSO button + conditional local login form |
| SSO signup template | `templates/socialaccount/signup.html` | Email confirmation step for first-time SSO users |
| Context processor | `<app>/context_processors.py` | Expose `CASDOOR_ENABLED` / `ALLOW_LOCAL_LOGIN` to templates | | Context processor | `<app>/context_processors.py` | Expose `CASDOOR_ENABLED` / `ALLOW_LOCAL_LOGIN` to templates |
| SSL patch (optional) | `<app>/ssl_patch.py` | Development-only SSL bypass | | SSL patch (optional) | `<app>/ssl_patch.py` | Development-only SSL bypass |
@@ -194,7 +195,7 @@ The social account adapter is the core of the pattern. It handles user provision
```python ```python
from allauth.socialaccount.adapter import DefaultSocialAccountAdapter from allauth.socialaccount.adapter import DefaultSocialAccountAdapter
from allauth.exceptions import ImmediateHttpResponse from allauth.core.exceptions import ImmediateHttpResponse
from django.contrib.auth.models import User, Group from django.contrib.auth.models import User, Group
from django.contrib import messages from django.contrib import messages
from django.shortcuts import redirect from django.shortcuts import redirect
@@ -440,6 +441,73 @@ The login template shows an SSO button when Casdoor is enabled and conditionally
--- ---
## SSO Signup Template
When a new SSO user has no existing account, allauth redirects them to `accounts/3rdparty/signup/` to confirm their email before the account is created. Without a custom template this page renders with no styling.
Create `templates/socialaccount/signup.html` extending the project base:
```html
{% extends "<app>/base.html" %}
{% block title %}Complete Sign Up — {{ themis_app_name }}{% endblock %}
{% block content %}
<div class="flex justify-center items-center min-h-[60vh]">
<div class="card bg-base-200 shadow-xl w-full max-w-md">
<div class="card-body">
<h2 class="card-title text-2xl justify-center mb-2">Complete Sign Up</h2>
<p class="text-center text-base-content/70 mb-4">
Confirm your email address to finish signing in with SSO.
</p>
{% if form.errors %}
<div class="alert alert-error mb-4">
<span>Please correct the errors below.</span>
</div>
{% endif %}
<form method="post" action="{{ action_url }}">
{% csrf_token %}
<div class="form-control mb-6">
<label class="label" for="id_email">
<span class="label-text">Email</span>
</label>
<input type="email" name="email" id="id_email"
class="input input-bordered w-full{% if form.email.errors %} input-error{% endif %}"
value="{{ form.email.value|default:'' }}"
autocomplete="email" required>
{% if form.email.errors %}
<label class="label">
<span class="label-text-alt text-error">{{ form.email.errors|join:", " }}</span>
</label>
{% endif %}
</div>
<div class="form-control mt-2">
<button type="submit" class="btn btn-primary w-full">Complete Sign Up</button>
</div>
</form>
</div>
</div>
</div>
{% endblock %}
```
Key context variables allauth provides to this template:
| Variable | Description |
|----------|-------------|
| `form` | `SignupForm` with a single `email` field pre-populated from the OIDC claim |
| `action_url` | POST target (`/accounts/3rdparty/signup/`) — always use this, not a hard-coded path |
| `sociallogin` | The in-progress social login object (rarely needed in the template) |
> **Why this page exists:** `SOCIALACCOUNT_AUTO_SIGNUP = True` skips it when the IdP provides a valid email. It only appears when allauth cannot confirm the email (e.g. the IdP omitted it or there is a conflict with an existing account).
---
## Context Processor ## Context Processor
Exposes SSO settings to every template: Exposes SSO settings to every template:
@@ -701,7 +769,7 @@ class CasdoorAdapterTest(TestCase):
def test_superuser_sso_login_blocked(self): def test_superuser_sso_login_blocked(self):
"""pre_social_login must raise ImmediateHttpResponse for superusers.""" """pre_social_login must raise ImmediateHttpResponse for superusers."""
from allauth.exceptions import ImmediateHttpResponse from allauth.core.exceptions import ImmediateHttpResponse
user = User.objects.create_superuser( user = User.objects.create_superuser(
'admin@example.com', 'admin@example.com', 'pass' 'admin@example.com', 'admin@example.com', 'pass'
) )

View File

@@ -0,0 +1,521 @@
# Sphinx Documentation Pattern v1.0.0
Standardizes how Django projects build, configure, and deploy Sphinx documentation under a single `settings.py` — using the `TESTING` env-var flag to relax required-secret checks so docs build cleanly in CI without a real `.env`.
## 🐾 Red Panda Approval™
This pattern follows Red Panda Approval standards.
---
## Why a Pattern, Not a Shared Implementation
Every Django project has its own:
- **Required env vars** — one project needs `MCP_JWT_SECRET`, another needs `SLACK_TOKEN`, a third needs neither.
- **App layout** — `apps/` vs. top-level packages; some projects ship one app, others fifteen.
- **Autodoc-poisoning attributes** — DRF projects have class-level `queryset = Model.objects.filter(...)`; pure-Django projects may not.
- **Deploy target** — different hosts, ports, paths, and SSH key names per environment.
A shared library can't paper over those differences. Instead, this pattern defines:
- **Required interface** — the four files every project must have.
- **Recommended behaviours** — what most projects should include.
- **Extension guidelines** — what to add or skip per project.
- **Standard Sphinx extension set** — for consistency across projects.
---
## Required Interface
The non-negotiable minimum every Django project must provide.
### 1. `settings.py` — TESTING-gated safe defaults
Every required env var (those without a `default=`) must have a `TESTING`-mode fallback. Read `TESTING` **first**, then branch every required `env('X')` call:
```python
# Test mode flag — read first so it can relax required-env-var checks below.
TESTING = env.bool('TESTING', default=False)
DEBUG = env.bool('DEBUG', default=False)
# In TESTING mode (unit tests, docs build) required keys fall back to safe
# dummies so the settings module imports without a real .env. In production
# they remain required — missing values fail loud.
if TESTING:
SECRET_KEY = env('SECRET_KEY', default='testing-insecure-key')
ALLOWED_HOSTS = env.list('ALLOWED_HOSTS', default=['testserver', 'localhost', '127.0.0.1'])
CSRF_TRUSTED_ORIGINS = env.list('CSRF_TRUSTED_ORIGINS', default=['http://localhost'])
# ...any other required secrets get a 'testing-insecure-*' default here
else:
SECRET_KEY = env('SECRET_KEY')
ALLOWED_HOSTS = env.list('ALLOWED_HOSTS')
CSRF_TRUSTED_ORIGINS = env.list('CSRF_TRUSTED_ORIGINS')
# ...and the production no-default form here
```
Rule: **every** required env var read in `settings.py` (anything that uses `env('X')` without `default=`) gets paired branches like above. Production fails loud on missing; TESTING falls back.
### 2. Database choice gated on `TESTING`
```python
if TESTING:
# Test/docs build: in-memory SQLite. No real DB needed.
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': ':memory:',
}
}
elif env('APP_DB_NAME', default=None):
# Production: PostgreSQL (or whatever the project uses)
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': env('APP_DB_NAME'),
'USER': env('APP_DB_USER'),
'PASSWORD': env('APP_DB_PASSWORD'),
'HOST': env('DB_HOST'),
'PORT': env('DB_PORT'),
}
}
else:
# Local development: SQLite file
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': BASE_DIR / 'db.sqlite3',
}
}
```
### 3. `docs/source/conf.py` — boot Django in TESTING mode + neuter QuerySet repr
```python
import os
import sys
import django
# Adjust this path to point at your Django package directory.
sys.path.insert(0, os.path.abspath('../../<project_package>'))
os.environ.setdefault('DJANGO_SETTINGS_MODULE', '<project_package>.settings')
# Load real .env if present (local dev). In CI there is none and that's fine.
_repo_root = os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..'))
_env_file = os.path.join(_repo_root, '.env')
if os.path.exists(_env_file):
with open(_env_file) as _f:
for _line in _f:
_line = _line.strip()
if not _line or _line.startswith('#') or '=' not in _line:
continue
_key, _val = _line.split('=', 1)
os.environ.setdefault(_key.strip(), _val.strip())
# Force TESTING mode so settings.py uses its safe dummy defaults and the
# in-memory SQLite database. The docs build never serves traffic or touches
# real data, so the production "fail loud on missing secret" contract does
# not apply here.
os.environ['TESTING'] = 'true'
django.setup()
# Sphinx 9 autodoc calls repr() on every class attribute it documents.
# Django's QuerySet.__repr__ executes a SELECT against the database — which
# documentation has no business doing. Intercept object_description so
# QuerySet instances render as a static string instead.
from django.db.models.query import QuerySet # noqa: E402
import sphinx.util.inspect as _sphinx_inspect # noqa: E402
_orig_object_description = _sphinx_inspect.object_description
def _safe_object_description(obj, *args, **kwargs):
if isinstance(obj, QuerySet):
return f'<QuerySet [{obj.model.__name__}]>'
return _orig_object_description(obj, *args, **kwargs)
_sphinx_inspect.object_description = _safe_object_description
# ── Sphinx configuration below ────────────────────────────────────────────
project = '<Project Name>'
copyright = '<year>, <Project Team>'
author = '<Project Team>'
release = '1.0'
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.viewcode',
'sphinx.ext.napoleon',
'sphinx.ext.intersphinx',
'sphinx_autodoc_typehints',
'sphinxcontrib.httpdomain',
'sphinxcontrib.mermaid',
'myst_parser',
]
source_suffix = {'.rst': 'restructuredtext', '.md': 'markdown'}
myst_enable_extensions = ['colon_fence', 'deflist', 'tasklist', 'attrs_inline']
myst_heading_anchors = 4
autodoc_default_options = {
'members': True,
'member-order': 'bysource',
'special-members': '__init__',
'undoc-members': True,
'exclude-members': '__weakref__',
}
autodoc_inherit_docstrings = False
napoleon_use_ivar = True
html_theme = 'sphinx_rtd_theme'
html_static_path = ['_static']
html_theme_options = {
'navigation_depth': 4,
'collapse_navigation': False,
'sticky_navigation': True,
'includehidden': True,
'titles_only': False,
}
```
### 4. `.gitea/workflows/docs.yml` — build + failure-debug + deploy
The failure-debug trio (`continue-on-error` + log dump + explicit fail) is **required** — without it, the Sphinx `ValueError` traceback in `/tmp/sphinx-err-*.log` is invisible in the Gitea UI and the build is effectively undiagnosable.
```yaml
name: Build & Deploy Docs
on:
push:
branches: [main]
paths:
- '<project_package>/**'
- 'docs/**'
- 'pyproject.toml'
- '.gitea/workflows/docs.yml'
jobs:
build-and-deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install package + docs deps
run: |
pip install --upgrade pip
pip install -e ".[docs]"
- name: Read version from pyproject.toml
id: version
run: |
VERSION=$(python -c "import tomllib; print(tomllib.load(open('pyproject.toml','rb'))['project']['version'])")
echo "version=$VERSION" >> "$GITHUB_OUTPUT"
# ─── Failure-debug trio (REQUIRED) ─────────────────────────────────
- name: Build HTML
id: build_html
run: |
cd docs
./regenerate_docs.sh
continue-on-error: true
- name: Print Sphinx error log on failure
if: steps.build_html.outcome == 'failure'
run: |
echo "=== Sphinx error log ==="
cat /tmp/sphinx-err-*.log 2>/dev/null || echo "(no sphinx error log found)"
- name: Fail if build failed
if: steps.build_html.outcome == 'failure'
run: exit 1
# ───────────────────────────────────────────────────────────────────
- name: Install rsync + openssh
run: |
apt-get update
apt-get install -y --no-install-recommends rsync openssh-client
- name: Configure SSH
run: |
mkdir -p ~/.ssh
printf '%s\n' "${{ secrets.DOCS_DEPLOY_KEY }}" > ~/.ssh/id_ed25519
chmod 600 ~/.ssh/id_ed25519
ssh-keyscan -p ${{ vars.DOCS_HOST_PORT }} ${{ vars.DOCS_HOST }} >> ~/.ssh/known_hosts
- name: Test SSH connectivity
run: |
ssh -o BatchMode=yes -o ConnectTimeout=10 \
-p ${{ vars.DOCS_HOST_PORT }} -i ~/.ssh/id_ed25519 \
git@${{ vars.DOCS_HOST }} "id && echo 'SSH OK'"
- name: Rsync to versioned path
run: |
rsync -av --delete \
-e "ssh -p ${{ vars.DOCS_HOST_PORT }} -i ~/.ssh/id_ed25519" \
docs/_build/html/ \
git@${{ vars.DOCS_HOST }}:/var/www/docs/<project_slug>/${{ steps.version.outputs.version }}/
- name: Rsync to latest
run: |
rsync -av --delete \
-e "ssh -p ${{ vars.DOCS_HOST_PORT }} -i ~/.ssh/id_ed25519" \
docs/_build/html/ \
git@${{ vars.DOCS_HOST }}:/var/www/docs/<project_slug>/latest/
- name: Regenerate versions index
run: |
ssh -p ${{ vars.DOCS_HOST_PORT }} -i ~/.ssh/id_ed25519 git@${{ vars.DOCS_HOST }} \
'python3 - <<PY
import pathlib
root = pathlib.Path("/var/www/docs/<project_slug>")
versions = sorted(
(p.name for p in root.iterdir() if p.is_dir()),
reverse=True,
)
html = ["<!DOCTYPE html><html><head><title><Project> Docs</title></head><body>",
"<h1><Project> Documentation</h1><ul>"]
for v in versions:
html.append(f"<li><a href=\"{v}/\">{v}</a></li>")
html.append("</ul></body></html>")
(root / "index.html").write_text("\n".join(html))
PY'
```
Required Gitea secrets/variables:
- `secrets.DOCS_DEPLOY_KEY` — SSH private key authorised on the deploy host.
- `vars.DOCS_HOST` — deploy host (e.g. `docs.example.com`).
- `vars.DOCS_HOST_PORT` — SSH port (typically `22`).
---
## Standard Sphinx Extensions
Use this exact extension set for consistency across projects:
```python
extensions = [
'sphinx.ext.autodoc', # Pull docs from Python docstrings
'sphinx.ext.viewcode', # "[source]" links to highlighted source
'sphinx.ext.napoleon', # Google / NumPy style docstring support
'sphinx.ext.intersphinx', # Cross-link to other projects' Sphinx docs
'sphinx_autodoc_typehints', # Render PEP 484 type hints in docs
'sphinxcontrib.httpdomain', # ".. http:get::" etc. for REST APIs
'sphinxcontrib.mermaid', # Mermaid diagrams in Markdown / RST
'myst_parser', # Markdown source files alongside RST
]
```
And the matching `pyproject.toml` extras group:
```toml
[project.optional-dependencies]
docs = [
"sphinx",
"sphinx-rtd-theme",
"sphinx-autodoc-typehints",
"sphinx-autobuild",
"sphinxcontrib-httpdomain",
"sphinxcontrib-mermaid",
"myst-parser",
]
```
---
## Recommended Behaviours
Behaviours that most projects should include but are not strictly required:
- **Live rebuild during authoring** — `make livehtml` (via `sphinx-autobuild`) for hot-reload editing.
- **One-shot regen script** — `docs/regenerate_docs.sh` runs `make clean`, `sphinx-apidoc` over every app, then `make html`. Drives both local development and the CI pipeline.
- **Mermaid for diagrams** — text-based, diffable, lives in the `.md` / `.rst` source. Avoid binary diagram assets.
- **Static images in `source/_static/`** — referenced with relative paths.
- **Hand-written prose in Markdown (MyST)** alongside autogenerated reference docs in RST. The two coexist via `myst_parser` + `source_suffix`.
- **Project root `CLAUDE.md` (or equivalent) names docs as the single source of truth** — discourage parallel READMEs that drift.
---
## Pattern Variant 1: DRF / QuerySet Autodoc Poisoning
**Problem.** Sphinx 9 autodoc renders class attributes by calling `repr()` on the live object. Django's `QuerySet.__repr__` triggers `_fetch_all()`, which opens a database connection and runs a `SELECT`. For DRF viewsets like:
```python
class CurrencyViewSet(viewsets.ReadOnlyModelViewSet):
queryset = Currency.objects.filter(is_active=True) # ← autodoc tries to execute this
serializer_class = CurrencySerializer
```
…the docs build crashes with `psycopg.OperationalError: failed to resolve host 'postgres'` (or whatever DB hostname is configured), even in TESTING mode where the in-memory SQLite has no tables.
**Solution.** Monkey-patch `sphinx.util.inspect.object_description` in `conf.py` to short-circuit QuerySets before `repr()` is called:
```python
from django.db.models.query import QuerySet
import sphinx.util.inspect as _sphinx_inspect
_orig_object_description = _sphinx_inspect.object_description
def _safe_object_description(obj, *args, **kwargs):
if isinstance(obj, QuerySet):
return f'<QuerySet [{obj.model.__name__}]>'
return _orig_object_description(obj, *args, **kwargs)
_sphinx_inspect.object_description = _safe_object_description
```
This must run **after** `django.setup()` (so `QuerySet` can be imported) but **before** Sphinx starts processing documents.
---
## Pattern Variant 2: Settings-Driven TESTING Mode
**Problem.** Docs build needs to import `settings.py` but has no real `.env` in CI. Production-mode `env('SECRET_KEY')` calls (no default) raise `ImproperlyConfigured` and the build crashes before Sphinx even starts.
**Solution.** Read `TESTING` first in `settings.py`, then gate every required `env('X')` behind it:
```python
TESTING = env.bool('TESTING', default=False)
if TESTING:
SECRET_KEY = env('SECRET_KEY', default='testing-insecure-key')
else:
SECRET_KEY = env('SECRET_KEY')
```
`conf.py` flips the switch:
```python
os.environ['TESTING'] = 'true'
django.setup()
```
**Bonus.** This also fixes a latent bug where `python manage.py test` would fail in any environment without `.env`. The same defaults that unblock the docs build now unblock the test suite — one mechanism, two payoffs.
---
## Pattern Variant 3: Gitea Actions Deploy Workflow
The workflow has four logical phases:
1. **Setup** — checkout, Python, `pip install -e ".[docs]"`, read version from `pyproject.toml`.
2. **Build with failure visibility** — the three-step trio shown above. The `continue-on-error: true` on the build step plus `if: steps.build_html.outcome == 'failure'` on the log-dump and fail steps ensures the Sphinx traceback reaches the Gitea log even when the build crashes.
3. **SSH setup** — write the deploy key to `~/.ssh/id_ed25519`, scan the host into `known_hosts`, verify connectivity.
4. **Deploy** — rsync to `/var/www/docs/<project>/<version>/`, rsync to `…/latest/`, regenerate the versions index page on the remote host via a heredoc Python script.
The deploy host is expected to serve `/var/www/docs/` over HTTPS via nginx or similar. Each pushed version gets its own directory; `latest/` is a copy of the most recent build. The versions index lists every directory alphabetically.
---
## Domain Extension Examples
### Project without DRF / class-level QuerySets
If your project has no `queryset = Model.objects.filter(...)` attributes at module load time, the `_safe_object_description` monkey-patch is unnecessary. You can omit it. The `TESTING=true` switch is still required because settings.py still has required env vars.
### Project with extra required secrets
Add each extra key to the TESTING branch in `settings.py`:
```python
if TESTING:
SECRET_KEY = env('SECRET_KEY', default='testing-insecure-key')
SLACK_TOKEN = env('SLACK_TOKEN', default='testing-insecure-slack')
STRIPE_API_KEY = env('STRIPE_API_KEY', default='testing-insecure-stripe')
else:
SECRET_KEY = env('SECRET_KEY')
SLACK_TOKEN = env('SLACK_TOKEN')
STRIPE_API_KEY = env('STRIPE_API_KEY')
```
No changes needed to `conf.py` — the single `TESTING=true` flip covers them all.
### Project on a non-Postgres database (MySQL, MariaDB)
No special handling needed. The `if TESTING:` branch in `settings.py` switches to in-memory SQLite regardless of what production uses. The MySQL driver is never imported during a docs build.
---
## Anti-Patterns
-**Don't load `.env.example` as a runtime fallback.** It's a documentation file with placeholder values like `DB_HOST=postgres` — those placeholders will poison the docs build by making `settings.py` believe Postgres is available.
-**Don't override `settings.DATABASES` after `django.setup()`.** Django's `ConnectionHandler.databases` is a `@cached_property` populated during app loading; mutating `settings.DATABASES` afterwards has no effect.
-**Don't add a separate `settings_docs.py`.** Env-var toggles are the project convention. A separate settings module fragments the config surface and forces every dev to remember which settings file applies in which context.
-**Don't hand-edit `docs/source/reference/apps/`.** That tree is regenerated by `sphinx-apidoc` on every CI run. Hand-edits get overwritten.
-**Don't suppress build errors in CI without dumping `/tmp/sphinx-err-*.log` first.** Sphinx writes its full traceback there and nowhere else; without the dump, the Gitea UI shows a one-line `ValueError` with no useful context.
-**Don't use `os.environ.setdefault('TESTING', 'true')` in `conf.py`.** A user with `TESTING=false` in their local `.env` will see the setdefault skipped and hit production-mode behaviour during docs build. Use plain `os.environ['TESTING'] = 'true'` so it always wins.
---
## Settings
Document the `TESTING` env var contract:
```python
# settings.py
TESTING = env.bool('TESTING', default=False)
# When true, gates safe-default branches for:
# - Required secrets (SECRET_KEY and any other env('X') with no default)
# - Required lists (ALLOWED_HOSTS, CSRF_TRUSTED_ORIGINS)
# - DATABASES → in-memory SQLite
# - CACHES → dummy backend
# - DRF throttling → disabled
# - MIGRATION_MODULES → disabled (no DB schema)
# - PASSWORD_HASHERS → fast hashers
# - LOGGING → minimal
#
# Set true for: pytest, manage.py test, docs build.
# Set false (or unset) for: production, local dev with real services.
```
---
## Testing
Two verification recipes every project should run before pushing.
### 1. Local build with real `.env`
```bash
cd docs
make clean && make html
```
Expected: `build succeeded.` with zero warnings. Open `_build/html/index.html` to spot-check rendering.
### 2. CI simulation (no `.env`)
```bash
mv .env .env.bak
cd docs && make clean && make html
cd .. && mv .env.bak .env
```
Expected: `build succeeded.` again. Settings.py uses TESTING-mode dummies; the in-memory SQLite has no tables but autodoc never queries it because the monkey-patch short-circuits QuerySet repr().
### 3. Latent test-suite bug check
```bash
mv .env .env.bak
python manage.py test --keepdb 2>&1 | head -5
mv .env.bak .env
```
Expected: tests start running normally (not `ImproperlyConfigured: Set the SECRET_KEY environment variable`). This confirms the TESTING-mode defaults are wired into `settings.py` correctly — the docs build and the test suite share the same fallback mechanism.
### 4. CI dry-run (Gitea Actions)
Push to a feature branch. The workflow's failure-debug trio means any crash surfaces with a full traceback in the Gitea Actions log. Read the trace, fix the cause, push again.

View File

@@ -85,21 +85,12 @@ an explicit `when: mnemosyne_first_deploy` flag.
```bash ```bash
# Apply Django ORM migrations (PostgreSQL schema) # Apply Django ORM migrations (PostgreSQL schema)
docker compose -f /srv/mnemosyne/docker-compose.yaml \ docker compose -f /srv/mnemosyne/docker-compose.yaml run --rm app migrate
run --rm app migrate
# Create Neo4j vector + full-text indexes and load library-type defaults # Create Neo4j vector + full-text indexes and load library-type defaults
docker compose -f /srv/mnemosyne/docker-compose.yaml \ docker compose -f /srv/mnemosyne/docker-compose.yaml \
run --rm app setup run --rm app setup
# Create the daedalus-service user (HTTP Basic auth for ingest API)
# Pass --password from vault; idempotent if user already exists.
docker compose -f /srv/mnemosyne/docker-compose.yaml \
run --rm app \
python manage.py ensure_service_user \
--username daedalus-service \
--password "{{ vault_mnemosyne_daedalus_service_password }}"
# Seed the MCPSigningKey used to sign long-lived Pallas team JWTs. # Seed the MCPSigningKey used to sign long-lived Pallas team JWTs.
# --retire-other deactivates any previously-active key. The hex # --retire-other deactivates any previously-active key. The hex
# emitted to stdout is persisted in Mnemosyne's database and is # emitted to stdout is persisted in Mnemosyne's database and is
@@ -321,16 +312,20 @@ curl -f http://puck.incus:23181/healthz
curl http://puck.incus:23181/metrics | head -5 curl http://puck.incus:23181/metrics | head -5
``` ```
### Verify the daedalus-service account ### Verify Daedalus auth (per-user API token)
Daedalus now authenticates as a Mnemosyne user via a `UserToken` minted
at `/profile/tokens/`. To smoke-test from a deploy host:
```bash ```bash
curl -u daedalus-service:<password> \ curl -H "Authorization: Bearer <user-token-plaintext>" \
https://mnemosyne.ouranos.helu.ca/library/api/workspaces/ \ https://mnemosyne.ouranos.helu.ca/library/api/workspaces/ws_smoke/ \
-o /dev/null -w "%{http_code}" -o /dev/null -w "%{http_code}"
# Expect: 200 # Expect: 200 if the workspace exists for that user, 404 otherwise.
# An anonymous request gets 401 with `WWW-Authenticate: Bearer`.
``` ```
### Verify MCP connectivity (from a client with a valid MCPToken) ### Verify MCP connectivity (from a client with a valid UserToken)
```bash ```bash
curl -H "Authorization: Bearer <token>" \ curl -H "Authorization: Bearer <token>" \
@@ -401,6 +396,5 @@ will report as a failure.
| `vault_daedalus_s3_read_secret` | `DAEDALUS_S3_SECRET_ACCESS_KEY` | | `vault_daedalus_s3_read_secret` | `DAEDALUS_S3_SECRET_ACCESS_KEY` |
| `vault_rabbitmq_password` | embedded in `CELERY_BROKER_URL` | | `vault_rabbitmq_password` | embedded in `CELERY_BROKER_URL` |
| `vault_mnemosyne_llm_encryption_key` | `LLM_API_SECRETS_ENCRYPTION_KEY` | | `vault_mnemosyne_llm_encryption_key` | `LLM_API_SECRETS_ENCRYPTION_KEY` |
| `vault_mnemosyne_daedalus_service_password` | passed to `ensure_service_user --password` |
| `vault_mnemosyne_casdoor_client_id` | `CASDOOR_CLIENT_ID` | | `vault_mnemosyne_casdoor_client_id` | `CASDOOR_CLIENT_ID` |
| `vault_mnemosyne_casdoor_client_secret` | `CASDOOR_CLIENT_SECRET` | | `vault_mnemosyne_casdoor_client_secret` | `CASDOOR_CLIENT_SECRET` |

View File

@@ -8,7 +8,7 @@ This document describes Mnemosyne's role in the Daedalus + Pallas architecture a
Mnemosyne exposes two interfaces for the wider Ouranos ecosystem: Mnemosyne exposes two interfaces for the wider Ouranos ecosystem:
1. **REST API** (`/library/api/*`) — consumed by the Daedalus backend (HTTP Basic auth, service account `daedalus-service`) for workspace lifecycle and asynchronous file ingestion. Phase 1, **implemented**. 1. **REST API** (`/library/api/*`) — consumed by the Daedalus backend authenticated as the owning Mnemosyne user via a per-user `UserToken` (`Authorization: Bearer <plaintext>`, minted at `/profile/tokens/`) for workspace lifecycle and asynchronous file ingestion. Phase 1, **implemented**.
2. **MCP Server** (port 22091 internal, `/mcp/` via nginx on 23090) — exposes search, browse, and retrieval tools. Phase 5 of Mnemosyne's own roadmap, **implemented** with workspace-scoped access control via long-lived team JWTs. Consumed by Pallas FastAgents in production (Daedalus integration Phase 2, **implemented** — see [Phase 3 of this doc](#3-phase-3-long-lived-team-jwt-access-control-for-pallas-instances)). 2. **MCP Server** (port 22091 internal, `/mcp/` via nginx on 23090) — exposes search, browse, and retrieval tools. Phase 5 of Mnemosyne's own roadmap, **implemented** with workspace-scoped access control via long-lived team JWTs. Consumed by Pallas FastAgents in production (Daedalus integration Phase 2, **implemented** — see [Phase 3 of this doc](#3-phase-3-long-lived-team-jwt-access-control-for-pallas-instances)).
### Phase status ### Phase status
@@ -105,7 +105,7 @@ Auth is controlled by `MCP_REQUIRE_AUTH` in `.env`. Production sets it to `True`
## 2. REST API for Daedalus ## 2. REST API for Daedalus
All endpoints require HTTP Basic auth as `daedalus-service`. They are consumed by the Daedalus FastAPI backend only — not by any frontend. All endpoints require an `Authorization: Bearer <plaintext>` header carrying a `UserToken` belonging to the Mnemosyne user the workspace belongs to (minted at `/profile/tokens/`). Workspaces are scoped to their creating user via the `Library.owner_username` property; cross-user access returns 404. Anonymous requests get 401 with `WWW-Authenticate: Bearer`. These endpoints are consumed by the Daedalus FastAPI backend only — not by any frontend.
### Workspace lifecycle ### Workspace lifecycle
@@ -354,7 +354,7 @@ mnemosyne_s3_operations_total{operation,status} counter
- [x] `GET /library/api/jobs/{job_id}/`, `POST .../retry/`, `GET /library/api/jobs/` - [x] `GET /library/api/jobs/{job_id}/`, `POST .../retry/`, `GET /library/api/jobs/`
- [x] `library.tasks.ingest_from_daedalus` Celery task with content-hash-aware supersede logic - [x] `library.tasks.ingest_from_daedalus` Celery task with content-hash-aware supersede logic
- [x] `library.services.daedalus_s3` cross-bucket fetch + copy - [x] `library.services.daedalus_s3` cross-bucket fetch + copy
- [x] HTTP Basic auth via `daedalus-service` user - [x] Per-user `UserToken` auth (`Authorization: Bearer <plaintext>`, minted at `/profile/tokens/`); workspaces scoped to the owning user via `Library.owner_username`
### Phase 2 — MCP Server (Mnemosyne roadmap Phase 5) ✅ Implemented ### Phase 2 — MCP Server (Mnemosyne roadmap Phase 5) ✅ Implemented
- [x] `mcp_server/` module following the [Django MCP Pattern](Pattern_Django-MCP_V1-00.md) - [x] `mcp_server/` module following the [Django MCP Pattern](Pattern_Django-MCP_V1-00.md)

240
docs/mnemosyne_mcp.md Normal file
View File

@@ -0,0 +1,240 @@
# Mnemosyne MCP Server Tools
Mnemosyne exposes a retrieval surface via the [Model Context Protocol](https://modelcontextprotocol.io/) using [FastMCP](https://github.com/jlowin/fastmcp). The server is a **retrieval surface, not a RAG pipeline**: it returns ranked evidence and the calling LLM is responsible for synthesis and citation.
## Concepts
**Library** — the top-level container. Each library has a `library_type` that drives chunking, embedding, and re-ranking strategy:
| `library_type` | Content |
|---|---|
| `fiction` | Novels, short stories. Cover art available. |
| `nonfiction` | General non-fiction prose. |
| `technical` | Manuals, textbooks, docs. Diagrams and code-like content. |
| `music` | Lyrics, liner notes, album artwork. |
| `film` | Scripts, synopses, stills. |
| `art` | Catalogs, descriptions, artwork itself. |
| `journal` | Personal entries; temporal/reflective. |
| `business` | Proposals, marketing, sales, strategy. Commercial context. |
| `finance` | Statements, tax, market commentary. Quote figures exactly. |
**Collection** — a named group of items inside a library (e.g. a novel series, a multi-volume manual).
**Item** — an indexed document or file. Only items with `embedding_status = "completed"` appear in search results.
**Chunk** — a text segment of an item, stored in S3. Search returns a `text_preview` (~500 chars); use `get_chunk` to fetch the full text.
## Recommended Workflow
```
list_libraries
→ search(query, library_type=..., library_uid=...)
→ get_chunk(chunk_uid) # only when text_preview is insufficient
```
---
## Tools
### `search`
Hybrid retrieval: vector + full-text + concept-graph candidates fused by RRF (Reciprocal Rank Fusion), with optional Synesis re-ranking.
**Parameters**
| Name | Type | Default | Description |
|---|---|---|---|
| `query` | `str` | required | The search query. |
| `library_uid` | `str \| None` | `None` | Restrict to one library by UID. |
| `library_type` | `str \| None` | `None` | Restrict by library type (see table above). |
| `collection_uid` | `str \| None` | `None` | Restrict to one collection by UID. |
| `limit` | `int` | `20` | Maximum candidates to return. |
| `rerank` | `bool` | `True` | Apply Synesis re-ranking. Set `False` to skip. |
| `include_images` | `bool` | `True` | Include matching images in the response. |
| `search_types` | `list[str] \| None` | `["vector", "fulltext", "graph"]` | Which retrieval strategies to run. |
**Response**
```json
{
"query": "...",
"candidates": [
{
"chunk_uid": "...",
"item_uid": "...",
"item_title": "...",
"library_type": "...",
"text_preview": "... (~500 chars) ...",
"score": 0.92,
"source": "vector|fulltext|graph"
}
],
"images": [...],
"total_candidates": 42,
"search_time_ms": 85,
"reranker_used": true,
"reranker_model": "...",
"search_types_used": ["vector", "fulltext", "graph"]
}
```
---
### `get_chunk`
Fetch the full text of a single chunk by its UID. Use this when the `text_preview` returned by `search` is not enough.
**Parameters**
| Name | Type | Description |
|---|---|---|
| `chunk_uid` | `str` | The chunk UID from a `search` result. |
**Response**
```json
{
"chunk_uid": "...",
"chunk_index": 3,
"item_uid": "...",
"item_title": "...",
"library_type": "...",
"text": "Full chunk text..."
}
```
---
### `list_libraries`
Enumerate libraries the caller is authorized to read. Use the returned `uid` or `library_type` to scope a subsequent `search`.
**Parameters**
| Name | Type | Default | Description |
|---|---|---|---|
| `limit` | `int` | `50` | Max libraries to return (capped at 200). |
| `offset` | `int` | `0` | Pagination offset. |
**Response**
```json
{
"libraries": [
{
"uid": "...",
"name": "...",
"library_type": "fiction",
"description": "..."
}
],
"limit": 50,
"offset": 0
}
```
---
### `list_collections`
Enumerate collections, optionally filtered to a single library. Use the returned `uid` to scope `search` or `list_items` to one collection.
**Parameters**
| Name | Type | Default | Description |
|---|---|---|---|
| `library_uid` | `str \| None` | `None` | Filter to one parent library. |
| `limit` | `int` | `50` | Max collections to return (capped at 200). |
| `offset` | `int` | `0` | Pagination offset. |
**Response**
```json
{
"collections": [
{
"uid": "...",
"name": "...",
"description": "...",
"library_uid": "...",
"library_name": "..."
}
],
"limit": 50,
"offset": 0
}
```
---
### `list_items`
Enumerate indexed documents/files, optionally filtered by library or collection. Check `embedding_status` before searching — only `"completed"` items appear in search results. Use `chunk_count` to gauge document size.
**Parameters**
| Name | Type | Default | Description |
|---|---|---|---|
| `collection_uid` | `str \| None` | `None` | Filter to one collection. |
| `library_uid` | `str \| None` | `None` | Filter to one library. |
| `limit` | `int` | `50` | Max items to return (capped at 200). |
| `offset` | `int` | `0` | Pagination offset. |
**Response**
```json
{
"items": [
{
"uid": "...",
"title": "...",
"item_type": "...",
"file_type": "...",
"chunk_count": 120,
"image_count": 4,
"embedding_status": "completed"
}
],
"limit": 50,
"offset": 0
}
```
---
### `get_health`
Health check for infrastructure pollers (Pallas, Daedalus). Does not require authentication.
Returns a Pallas-compatible status object. `neo4j` and `s3` failures result in `"error"` (critical). A missing or unconfigured embedding model results in `"degraded"` (non-critical).
**Parameters:** none
**Response**
```json
{
"status": "ok | degraded | error",
"checks": {
"neo4j": { "status": "ok", "duration_ms": 2.1 },
"s3": { "status": "ok", "duration_ms": 8.4 },
"embedding": { "status": "ok", "model": "...", "duration_ms": 0.3 }
}
}
```
---
## Authentication
All tools except `get_health` require a `Bearer` token in the `Authorization` header. Three credential types are accepted:
| Type | Issued by | Lifetime | Scope |
|---|---|---|---|
| **Opaque `MCPToken`** | Mnemosyne admin | Long-lived (optional expiry) | `allowed_libraries` list on the token row. Per-tool ACL available. |
| **Per-turn JWT** (`iss=daedalus`) | Daedalus chat | ≤10 minutes | `libs` claim (list of Library UIDs). |
| **Team JWT** (`iss=mnemosyne`, `typ=team`) | Mnemosyne | 10-year lifetime | Resolved live from `TeamWorkspaceAssignment` → Neo4j `Library.workspace_id`. Revoked via `active_jti` rotation. |
Every authenticated request resolves to a `resolved_libraries` list — the set of Library UIDs the caller may read. Tools enforce this list at the query layer; an empty list means the caller is authenticated but sees nothing (fail-closed). `None` (no auth) is also fail-closed.
The `MCP_REQUIRE_AUTH` Django setting (default `True`) controls whether unauthenticated requests are rejected.

View File

@@ -1,557 +0,0 @@
# Ouranos Lab
Infrastructure-as-Code project managing the **Ouranos Lab** — a development sandbox at [ouranos.helu.ca](https://ouranos.helu.ca). Uses **Terraform** for container provisioning and **Ansible** for configuration management, themed around the moons of Uranus.
---
## Project Overview
| Component | Purpose |
|-----------|---------|
| **Terraform** | Provisions 10 specialised Incus containers (LXC) with DNS-resolved networking, security policies, and resource dependencies |
| **Ansible** | Deploys Docker, databases (PostgreSQL, Neo4j), observability stack (Prometheus, Grafana, Loki), and application runtimes across all hosts |
> **DNS Domain**: Incus resolves containers via the `.incus` domain suffix (e.g., `oberon.incus`, `portia.incus`). IPv4 addresses are dynamically assigned — always use DNS names, never hardcode IPs.
---
## Uranian Host Architecture
All containers are named after moons of Uranus and resolved via the `.incus` DNS suffix.
| Name | Role | Description | Nesting |
|------|------|-------------|---------|
| **ariel** | graph_database | Neo4j — Ethereal graph connections | ✔ |
| **caliban** | agent_automation | Agent S MCP Server with MATE Desktop | ✔ |
| **miranda** | mcp_docker_host | Dedicated Docker Host for MCP Servers | ✔ |
| **oberon** | container_orchestration | Docker Host — MCP Switchboard, RabbitMQ, Open WebUI | ✔ |
| **portia** | database | PostgreSQL — Relational database host | ❌ |
| **prospero** | observability | PPLG stack — Prometheus, Grafana, Loki, PgAdmin | ❌ |
| **puck** | application_runtime | Python App Host — JupyterLab, Django apps, Gitea Runner | ✔ |
| **rosalind** | collaboration | Gitea, LobeChat, Nextcloud, AnythingLLM | ✔ |
| **sycorax** | language_models | Arke LLM Proxy | ✔ |
| **titania** | proxy_sso | HAProxy TLS termination + Casdoor SSO | ✔ |
| **umbriel** | graph_database | Neo4j (Mnemosyne) — dedicated memory graph | ✔ |
### puck — Project Application Runtime
Shape-shifting trickster embodying Python's versatility.
This is the host that runs Python projects in the Ouranos sandbox.
It has an RDP server and is generally where application development happens.
Each project has a number that is used to determine port numbers.
- Docker engine
- JupyterLab (port 22071 via OAuth2-Proxy)
- Gitea Runner (CI/CD agent)
- Django Projects: Zelus (221), Angelia (222), Athena (224), Kairos (225), Icarlos (226), MCP Switchboard (227), Spelunker (228), Peitho (229), Mnemosyne (230)
- FastAgent Projects: Pallas (240)
- FastAPI Projects: Daedalus (200), Arke (201) Kernos (202), Rommie (203), Orpheus (204), Periplus (205), Nike (206), Stentor (207)
### caliban — Agent Automation
Autonomous computer agent learning through environmental interaction.
- Docker engine
- Agent S MCP Server (MATE desktop, AT-SPI automation)
- Kernos MCP Shell Server (port 22062)
- Rommie MCP Server (port 22061) — agent-to-agent GUI automation via Agent S
- FreeCAD Robust MCP Server (port 22063) — CAD automation via FreeCAD XML-RPC
- GPU passthrough
- RDP access (port 25521)
### oberon — Container Orchestration & Dockerized Shared Services
King of the Fairies orchestrating containers and managing MCP infrastructure.
- Docker engine
- MCP Switchboard (port 22781) — Django app routing MCP tool calls
- RabbitMQ message queue
- smtp4dev SMTP test server (port 22025)
### portia — Relational Database
Intelligent and resourceful — the reliability of relational databases.
- PostgreSQL 17 (port 5432)
- Databases: `arke`, `anythingllm`, `gitea`, `hass`, `lobechat`, `mcp_switchboard`, `mnemosyne`, `nextcloud`, `openwebui`, `periplus`, `spelunker`
### ariel — Graph Database
Air spirit — ethereal, interconnected nature mirroring graph relationships.
- Neo4j 5.26.0 (Docker)
- HTTP API: port 25554
- Bolt: port 7687 (reached as `ariel.incus:7687` on the internal network)
### umbriel — Graph Database (Mnemosyne)
Dusky melancholy sprite from Pope's *Rape of the Lock* — keeper of the Cave of
Spleen, naturally paired with Mnemosyne the Titan of memory. Dedicated Neo4j
instance so Mnemosyne's `Library`/`Collection`/`Item`/`Chunk`/`Concept` labels,
vector indexes, and schema migrations can't collide with another tenant's
graph on Ariel.
- Neo4j 5.26.0 (Docker)
- HTTP Browser: port 25555
- Bolt: port 7687 (reached as `umbriel.incus:7687` on the internal network)
### miranda — MCP Docker Host
Curious bridge between worlds — hosting MCP server containers.
- Docker engine (API exposed on port 2375 for MCP Switchboard)
- MCPO OpenAI-compatible MCP proxy 22071
- Argos MCP Server — web search via SearXNG (port 22062)
- Grafana MCP Server (port 22063)
- Neo4j MCP Server (port 22064)
- Gitea MCP Server (port 22065)
### prospero — Observability Stack
Master magician observing all events.
- PPLG stack via Docker Compose: Prometheus, Loki, Grafana, PgAdmin
- Internal HAProxy with OAuth2-Proxy for all dashboards
- AlertManager with Pushover notifications
- Prometheus metrics collection (`node-exporter`, HAProxy, Loki)
- Loki log aggregation via Alloy (all hosts)
- Grafana dashboard suite with Casdoor SSO integration
### rosalind — Third Party Applications for testing and evaluation
Witty and resourceful moon for PHP, Go, and Node.js runtimes.
- SearXNG privacy search (port 22083, behind OAuth2-Proxy)
- Gitea self-hosted Git (port 22082, SSH on 22022)
- LobeChat AI chat interface (port 22081)
- Nextcloud file sharing and collaboration (port 22083)
- AnythingLLM document AI workspace (port 22084)
- Nextcloud data on dedicated Incus storage volume
- Open WebUI LLM interface (port 22088, PostgreSQL backend on Portia
- Home Assistant (port 8123)
### sycorax — Language Models
Original magical power wielding language magic.
- Arke LLM API Proxy (port 25540)
- Multi-provider support (OpenAI, Anthropic, etc.)
- Session management with Memcached
- Database backend on Portia
### titania — Proxy & SSO Services
Queen of the Fairies managing access control and authentication.
- HAProxy 3.x with TLS termination (port 443)
- Let's Encrypt wildcard certificate via certbot DNS-01 (Namecheap)
- HTTP to HTTPS redirect (port 80)
- Gitea SSH proxy (port 22022)
- Casdoor SSO (port 22081, local PostgreSQL)
- Prometheus metrics at `:8404/metrics`
---
## Port Numbering
Well-known ports running as a service may be used: Postgresql 5432, Prometheus Metrics 9100.
However inside a docker project, the number plan needs to be followed to avoid port conflicts and confusion:
XXXYZ
XXX Project Number or 220 for external project
Y Service: 0 reserved, 1-4 flexible, 5 database, 6 MCP, 7 API, 8 Web App, 9 Prometheus metrics
Z Instance: The running instance of this app on the same host, starting at 1. May also be used to handle exceptions.
255 Incus port forwarding: Ports in ths range are forwarded from the Incus host to Incus containers (defined in Terraform)
514ZZ is the syslog port. Docker containers send their syslog to an Alloy syslog collector port. ZZ is the application instance, they just need to be different on the same host and increment from 01.
---
## Application Conventions
Standards that all services deployed in Ouranos MUST follow. For full logging standards and anti-patterns, see [red_panda_standards.md](red_panda_standards.md).
### Health Check Endpoints
All services MUST expose Kubernetes-style health endpoints:
| Endpoint | Purpose | Auth |
|----------|---------|------|
| `GET /live` | **Liveness** — process is running and accepting connections | None |
| `GET /ready` | **Readiness** — process is running AND all dependencies (DB, cache, upstream APIs) are healthy | None |
| `GET /metrics` | Prometheus metrics (see below) | IP-restricted |
- HAProxy checks `health_path` (typically `/ready/`) for backend health — return HTTP 200 when healthy
- Health endpoints MUST NOT require authentication (no JWT, no session)
- Third-party services use their native health paths (e.g., `/api/health`, `/api/healthz`, `/-/healthy`)
### Health Checks in Docker Compose
Use `curl -f` for Docker Compose healthchecks. Install curl in images if needed.
```yaml
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/live"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
```
### Logging Conventions
Log output flows through: **App → syslog (RFC3164) → Alloy → Loki → Grafana**
| Level | Usage |
|-------|-------|
| **ERROR** | Broken state requiring human action — always include `exc_info=True`, error type, and context |
| **WARNING** | Degraded but recovering — client disconnects, performance outliers, client-side exceptions, leaked markup |
| **INFO** | Lifecycle events — service start/stop, connections, requests completed, jobs finished |
| **DEBUG** | Diagnostic detail — SSE events, keepalive pings, health check 200 responses, negotiation steps |
**Health check responses MUST be logged at DEBUG only.** HAProxy and Prometheus probe endpoints every 15-30 seconds. Logging these at INFO floods syslog with thousands of identical `200 OK` lines per hour, burying real events.
### Protected vs Unprotected Endpoints
| Protected (require valid JWT) | Unprotected |
|-------------------------------|-------------|
| All `/api/v1/*` routes | `GET /live` |
| | `GET /ready` |
| | `GET /metrics` (IP-restricted to internal networks) |
| | `GET /api/auth/login-url` |
| | `POST /api/auth/token` |
| | `POST /api/v1/telemetry` (sendBeacon cannot set headers) |
### Prometheus Metrics
All services SHOULD expose `GET /metrics` in Prometheus exposition format, scraped by Prospero's Prometheus (default 15s interval).
- **IP-restricted** to internal networks only (`10.10.0.0/24`, `172.16.0.0/12`, `127.0.0.0/8`)
- Consider exposing: request counts/durations, error rates, active connections, queue depths, dependency health
### Browser Telemetry
Frontend/browser code MUST send telemetry data and errors back to the application's telemetry API:
- `POST /api/v1/telemetry` — unprotected (browser `sendBeacon` cannot set Authorization headers)
- Capture and report: JavaScript exceptions, performance metrics, user-facing errors
- Client-side exceptions should log as **WARNING** on the server (they indicate a problem but not a server-side failure)
### Docker Networking
- Use the **default Docker bridge network** for simple deployments
- Add additional named networks only when required (e.g., isolating database traffic) or explicitly requested
- Do not create custom network definitions for single-service Docker Compose stacks
---
## External Access via HAProxy
Titania provides TLS termination and reverse proxy for all services.
- **Base domain**: `ouranos.helu.ca`
- **HTTPS**: port 443 (standard)
- **HTTP**: port 80 (redirects to HTTPS)
- **Certificate**: Let's Encrypt wildcard via certbot DNS-01
### Route Table
| Subdomain | Backend | Service |
|-----------|---------|---------|
| `ouranos.helu.ca` (root) | puck.incus:22281 | Angelia (Django) |
| `alertmanager.ouranos.helu.ca` | prospero.incus:443 (SSL) | AlertManager |
| `angelia.ouranos.helu.ca` | puck.incus:22281 | Angelia (Django) |
| `anythingllm.ouranos.helu.ca` | rosalind.incus:22084 | AnythingLLM |
| `arke.ouranos.helu.ca` | sycorax.incus:25540 | Arke LLM Proxy |
| `athena.ouranos.helu.ca` | puck.incus:22481 | Athena (Django) |
| `gitea.ouranos.helu.ca` | rosalind.incus:22082 | Gitea |
| `grafana.ouranos.helu.ca` | prospero.incus:443 (SSL) | Grafana |
| `hass.ouranos.helu.ca` | oberon.incus:8123 | Home Assistant |
| `id.ouranos.helu.ca` | titania.incus:22081 | Casdoor SSO |
| `icarlos.ouranos.helu.ca` | puck.incus:22681 | Icarlos (Django) |
| `jupyterlab.ouranos.helu.ca` | puck.incus:22071 | JupyterLab (OAuth2-Proxy) |
| `kairos.ouranos.helu.ca` | puck.incus:22581 | Kairos (Django) |
| `lobechat.ouranos.helu.ca` | rosalind.incus:22081 | LobeChat |
| `loki.ouranos.helu.ca` | prospero.incus:443 (SSL) | Loki |
| `mcp-switchboard.ouranos.helu.ca` | oberon.incus:22781 | MCP Switchboard |
| `nextcloud.ouranos.helu.ca` | rosalind.incus:22083 | Nextcloud |
| `openwebui.ouranos.helu.ca` | oberon.incus:22088 | Open WebUI |
| `peitho.ouranos.helu.ca` | puck.incus:22981 | Peitho (Django) |
| `periplus.ouranos.helu.ca` | puck.incus:20681 | Periplus (FastAPI + MCP via nginx) |
| `pgadmin.ouranos.helu.ca` | prospero.incus:443 (SSL) | PgAdmin 4 |
| `prometheus.ouranos.helu.ca` | prospero.incus:443 (SSL) | Prometheus |
| `searxng.ouranos.helu.ca` | oberon.incus:22073 | SearXNG (OAuth2-Proxy) |
| `smtp4dev.ouranos.helu.ca` | oberon.incus:22085 | smtp4dev |
| `spelunker.ouranos.helu.ca` | puck.incus:22881 | Spelunker (Django) |
---
## Infrastructure Management
### Quick Start
```bash
# Provision containers
cd terraform
terraform init
terraform plan
terraform apply
# Start all containers
cd ../ansible
source ~/env/ouranos/bin/activate
ansible-playbook sandbox_up.yml
# Deploy all services
ansible-playbook site.yml
# Stop all containers
ansible-playbook sandbox_down.yml
```
### Python Virtual Environment Setup
The Ansible automation requires a Python virtual environment with the `ansible` package installed. Create and activate the environment from the `~` directory:
```bash
# Create virtual environment
cd ~
python3 -m venv env/ouranos
# Activate environment
source ~/env/ouranos/bin/activate
# Install Ansible
pip install ansible
pip install ansible-core
pip install ansible-community.postgresql
```
### Ansible Playbook Syntax Check
Before running playbooks, use the `apsc.sh` utility (in PATH) to quickly validate YAML syntax:
```bash
# From the ansible directory
apsc.sh
# This will check all YAML files in the current directory for syntax errors
```
### Terraform Workflow
1. **Define** — Containers, networks, and resources in `*.tf` files
2. **Plan** — Review changes with `terraform plan`
3. **Apply** — Provision with `terraform apply`
4. **Verify** — Check outputs and container status
### Terraform Import
When containers or other resources are created manually (outside Terraform) or need to be re-imported after recreation, use `terraform import` to sync the Terraform state with existing infrastructure.
#### Import Syntax
The correct import format for Incus resources requires quoting resource addresses with `for_each` keys and using the full ID including image fingerprints:
```bash
# Import a container with correct syntax
terraform import 'incus_instance.uranian_hosts["<name>"]' ouranos/<name>,image=<fingerprint>
```
#### Getting Image Fingerprints
First, get the fingerprint of the image resource from Terraform state:
```bash
cd terraform
terraform state show incus_image.noble | grep fingerprint
# Output: fingerprint = "75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644"
terraform state show incus_image.questing | grep fingerprint
# Output: fingerprint = "e78dd4a406b7fa3592ed0a6048862260b3d2e50c76e32a6169930245c0a13fdf"
```
#### Importing All Uranian Hosts
Replace containers missing from state (or re-import after manual recreation):
```bash
# Containers using noble image
terraform import 'incus_instance.uranian_hosts["ariel"]' ouranos/ariel,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
terraform import 'incus_instance.uranian_hosts["miranda"]' ouranos/miranda,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
terraform import 'incus_instance.uranian_hosts["oberon"]' ouranos/oberon,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
terraform import 'incus_instance.uranian_hosts["portia"]' ouranos/portia,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
terraform import 'incus_instance.uranian_hosts["prospero"]' ouranos/prospero,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
terraform import 'incus_instance.uranian_hosts["rosalind"]' ouranos/rosalind,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
terraform import 'incus_instance.uranian_hosts["sycorax"]' ouranos/sycorax,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
terraform import 'incus_instance.uranian_hosts["titania"]' ouranos/titania,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
terraform import 'incus_instance.uranian_hosts["umbriel"]' ouranos/umbriel,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
# Containers using questing image
terraform import 'incus_instance.uranian_hosts["caliban"]' ouranos/caliban,image=e78dd4a406b7fa3592ed0a6048862260b3d2e50c76e32a6169930245c0a13fdf
terraform import 'incus_instance.uranian_hosts["puck"]' ouranos/puck,image=e78dd4a406b7fa3592ed0a6048862260b3d2e50c76e32a6169930245c0a13fdf
```
#### Storage Bucket Import
For storage buckets, use the `<project>/<pool>/<name>` format:
```bash
terraform import incus_storage_bucket.<name> ouranos/default/<bucket-name>
```
#### Common Issues
1. **Import ID format errors**: Use quotes around resource addresses with `for_each` keys: `'incus_instance.uranian_hosts["name"]'`
2. **Image replacement on import**: Importing without specifying the image fingerprint will cause Terraform to replace the container on next apply. Always include `image=<fingerprint>` in the import ID.
3. **Tainted state**: If a resource shows "will be created" but already exists, it may be tainted. Remove from state and re-import:
```bash
terraform state rm 'incus_instance.uranian_hosts["name"]'
terraform import 'incus_instance.uranian_hosts["name"]' ouranos/name,image=<fingerprint>
```
#### Verify Import
After importing, verify with `terraform plan`:
```bash
terraform plan
# Should show: Plan: 0 to add, 0 to change, 0 to destroy
# (Minor "update in-place" changes are normal for state sync of computed attributes)
```
### Ansible Workflow
1. **Bootstrap** — Update packages, install essentials (`apt_update.yml`)
2. **Agents** — Deploy Alloy (log/metrics) and Node Exporter on all hosts
3. **Services** — Configure databases, Docker, applications, observability
4. **Verify** — Check service health and connectivity
### Vault Management
```bash
# Edit secrets
ansible-vault edit inventory/group_vars/all/vault.yml
# View secrets
ansible-vault view inventory/group_vars/all/vault.yml
# Encrypt a new file
ansible-vault encrypt new_secrets.yml
```
---
## S3 Storage Provisioning
Terraform provisions Incus S3 buckets for services requiring object storage:
| Service | Host | Purpose |
|---------|------|---------|
| **Casdoor** | Titania | User avatars and SSO resource storage |
| **LobeChat** | Rosalind | File uploads and attachments |
> S3 credentials (access key, secret key, endpoint) are stored as sensitive Terraform outputs and managed in Ansible Vault with the `vault_*_s3_*` prefix.
---
## Ansible Automation
### Full Deployment (`site.yml`)
Playbooks run in dependency order:
| Playbook | Hosts | Purpose |
|----------|-------|---------|
| `apt_update.yml` | All | Update packages and install essentials |
| `alloy/deploy.yml` | All | Grafana Alloy log/metrics collection |
| `prometheus/node_deploy.yml` | All | Node Exporter metrics |
| `docker/deploy.yml` | Oberon, Ariel, Miranda, Puck, Rosalind, Sycorax, Caliban, Titania | Docker engine |
| `smtp4dev/deploy.yml` | Oberon | SMTP test server |
| `pplg/deploy.yml` | Prospero | Full observability stack + HAProxy + OAuth2-Proxy |
| `postgresql/deploy.yml` | Portia | PostgreSQL with all databases |
| `postgresql_ssl/deploy.yml` | Titania | Dedicated PostgreSQL for Casdoor |
| `neo4j/deploy.yml` | Ariel, Umbriel | Neo4j graph database (Umbriel is the dedicated Mnemosyne instance) |
| `searxng/deploy.yml` | Oberon | SearXNG privacy search |
| `haproxy/deploy.yml` | Titania | HAProxy TLS termination and routing |
| `casdoor/deploy.yml` | Titania | Casdoor SSO |
| `mcpo/deploy.yml` | Miranda | MCPO MCP proxy |
| `openwebui/deploy.yml` | Oberon | Open WebUI LLM interface |
| `hass/deploy.yml` | Oberon | Home Assistant |
| `gitea/deploy.yml` | Rosalind | Gitea self-hosted Git |
| `nextcloud/deploy.yml` | Rosalind | Nextcloud collaboration |
### Individual Service Deployments
Services with standalone deploy playbooks (not in `site.yml`):
| Playbook | Host | Service |
|----------|------|---------|
| `anythingllm/deploy.yml` | Rosalind | AnythingLLM document AI |
| `arke/deploy.yml` | Sycorax | Arke LLM proxy |
| `argos/deploy.yml` | Miranda | Argos MCP web search server |
| `caliban/deploy.yml` | Caliban | Agent S MCP Server |
| `certbot/deploy.yml` | Titania | Let's Encrypt certificate renewal |
| `gitea_mcp/deploy.yml` | Miranda | Gitea MCP Server |
| `gitea_runner/deploy.yml` | Puck | Gitea CI/CD runner |
| `grafana_mcp/deploy.yml` | Miranda | Grafana MCP Server |
| `jupyterlab/deploy.yml` | Puck | JupyterLab + OAuth2-Proxy |
| `kernos/deploy.yml` | Caliban | Kernos MCP shell server |
| `lobechat/deploy.yml` | Rosalind | LobeChat AI chat |
| `rommie/deploy.yml` | Caliban | Rommie MCP server (Agent S GUI automation) |
| `neo4j_mcp/deploy.yml` | Miranda | Neo4j MCP Server |
| `freecad_mcp/deploy.yml` | Caliban | FreeCAD Robust MCP Server |
| `rabbitmq/deploy.yml` | Oberon | RabbitMQ message queue |
### Lifecycle Playbooks
| Playbook | Purpose |
|----------|---------|
| `sandbox_up.yml` | Start all Uranian host containers |
| `sandbox_down.yml` | Gracefully stop all containers |
| `apt_update.yml` | Update packages on all hosts |
| `site.yml` | Full deployment orchestration |
---
## Data Flow Architecture
### Observability Pipeline
```
All Hosts Prospero Alerts
Alloy + Node Exporter → Prometheus + Loki + Grafana → AlertManager + Pushover
collect metrics & logs storage & visualisation notifications
```
### Integration Points
| Consumer | Provider | Connection |
|----------|----------|-----------|
| All LLM apps | Arke (Sycorax) | `http://sycorax.incus:25540` |
| Open WebUI, Arke, Gitea, Nextcloud, LobeChat | PostgreSQL (Portia) | `portia.incus:5432` |
| Neo4j MCP | Neo4j (Ariel) | `ariel.incus:7687` (Bolt) |
| Mnemosyne | Neo4j (Umbriel) | `umbriel.incus:7687` (Bolt) — dedicated tenant |
| MCP Switchboard | Docker API (Miranda) | `tcp://miranda.incus:2375` |
| MCP Switchboard | RabbitMQ (Oberon) | `oberon.incus:5672` |
| Kairos, Spelunker | RabbitMQ (Oberon) | `oberon.incus:5672` |
| SMTP (all apps) | smtp4dev (Oberon) | `oberon.incus:22025` |
| All hosts | Loki (Prospero) | `http://prospero.incus:3100` |
| All hosts | Prometheus (Prospero) | `http://prospero.incus:9090` |
---
## Important Notes
⚠️ **Alloy Host Variables Required** — Every host with `alloy` in its `services` list must define `alloy_log_level` in `inventory/host_vars/<host>.incus.yml`. The playbook will fail with an undefined variable error if this is missing.
⚠️ **Alloy Syslog Listeners Required for Docker Services** — Any Docker Compose service using the syslog logging driver must have a corresponding `loki.source.syslog` listener in the host's Alloy config template (`ansible/alloy/<hostname>/config.alloy.j2`). Missing listeners cause Docker containers to fail on start.
⚠️ **Local Terraform State** — This project uses local Terraform state (no remote backend). Do not run `terraform apply` from multiple machines simultaneously.
⚠️ **Nested Docker** — Docker runs inside Incus containers (nested), requiring `security.nesting = true` and `lxc.apparmor.profile=unconfined` AppArmor override on all Docker-enabled hosts.
⚠️ **Deployment Order** — Prospero (observability) must be fully deployed before other hosts, as Alloy on every host pushes logs and metrics to `prospero.incus`. Run `pplg/deploy.yml` before `site.yml` on a fresh environment.

42
docs/regenerate_docs.sh Executable file
View File

@@ -0,0 +1,42 @@
#!/usr/bin/env bash
# Regenerate Sphinx API reference for every Mnemosyne app, then build HTML.
# Drives both local development and the CI pipeline.
set -euo pipefail
cd "$(dirname "$0")"
APPS=(themis library llm_manager mcp_server)
SOURCE_REF=source/reference/apps
PACKAGE_ROOT=../mnemosyne
make clean
mkdir -p "$SOURCE_REF"
# Per-app subdir so each app gets its own modules.rst (sphinx-apidoc
# overwrites the file otherwise, leaving only the last app in the index).
for app in "${APPS[@]}"; do
sphinx-apidoc \
--force \
--separate \
--module-first \
--output-dir "$SOURCE_REF/$app" \
"$PACKAGE_ROOT/$app" \
"$PACKAGE_ROOT/$app/migrations" \
"$PACKAGE_ROOT/$app/tests"
done
# Write a top-level apps.rst that toctree's every app's modules.rst.
{
echo "Applications"
echo "============"
echo
echo ".. toctree::"
echo " :maxdepth: 2"
echo
for app in "${APPS[@]}"; do
echo " $app/modules"
done
} > "$SOURCE_REF/index.rst"
make html

View File

97
docs/source/conf.py Normal file
View File

@@ -0,0 +1,97 @@
import os
import sys
import tomllib
# The Django package lives at <repo>/mnemosyne/<inner mnemosyne>/. Adding the
# outer mnemosyne/ directory to sys.path lets autodoc resolve every app
# (themis, library, llm_manager, mcp_server) and the project settings module.
sys.path.insert(0, os.path.abspath('../../mnemosyne'))
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'mnemosyne.settings')
# Load real .env if present (local dev). In CI there is none and that's fine —
# settings.py provides a default for every env var it reads, so the import
# succeeds either way.
_repo_root = os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..'))
_env_file = os.path.join(_repo_root, 'mnemosyne', '.env')
if os.path.exists(_env_file):
with open(_env_file) as _f:
for _line in _f:
_line = _line.strip()
if not _line or _line.startswith('#') or '=' not in _line:
continue
_key, _val = _line.split('=', 1)
os.environ.setdefault(_key.strip(), _val.strip())
import django # noqa: E402
django.setup()
# Sphinx autodoc calls repr() on every class attribute it documents. Django's
# QuerySet.__repr__ executes a SELECT against the database — which doc builds
# have no business doing. Intercept object_description so QuerySet instances
# render as a static string instead. Mnemosyne's themis app has at least one
# DRF viewset with a class-level queryset attribute, so this matters.
from django.db.models.query import QuerySet # noqa: E402
import sphinx.util.inspect as _sphinx_inspect # noqa: E402
_orig_object_description = _sphinx_inspect.object_description
def _safe_object_description(obj, *args, **kwargs):
if isinstance(obj, QuerySet):
return f'<QuerySet [{obj.model.__name__}]>'
return _orig_object_description(obj, *args, **kwargs)
_sphinx_inspect.object_description = _safe_object_description
# ── Sphinx configuration ──────────────────────────────────────────────────
project = 'Mnemosyne'
copyright = '2026, Mnemosyne Team'
author = 'Mnemosyne Team'
with open(os.path.join(_repo_root, 'pyproject.toml'), 'rb') as _f:
release = tomllib.load(_f)['project']['version']
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.viewcode',
'sphinx.ext.napoleon',
'sphinx.ext.intersphinx',
'sphinx_autodoc_typehints',
'sphinxcontrib.httpdomain',
'sphinxcontrib.mermaid',
'myst_parser',
]
source_suffix = {'.rst': 'restructuredtext', '.md': 'markdown'}
myst_enable_extensions = ['colon_fence', 'deflist', 'tasklist', 'attrs_inline']
myst_heading_anchors = 4
autodoc_default_options = {
'members': True,
'member-order': 'bysource',
'special-members': '__init__',
'undoc-members': True,
'exclude-members': '__weakref__',
}
autodoc_inherit_docstrings = False
napoleon_use_ivar = True
intersphinx_mapping = {
'python': ('https://docs.python.org/3', None),
'django': ('https://docs.djangoproject.com/en/stable/',
'https://docs.djangoproject.com/en/stable/_objects/'),
}
html_theme = 'sphinx_rtd_theme'
html_static_path = ['_static']
html_theme_options = {
'navigation_depth': 4,
'collapse_navigation': False,
'sticky_navigation': True,
'includehidden': True,
'titles_only': False,
}

17
docs/source/index.rst Normal file
View File

@@ -0,0 +1,17 @@
Mnemosyne Documentation
=======================
Content-type-aware, multimodal personal knowledge management system.
.. toctree::
:maxdepth: 2
:caption: API Reference
reference/apps/index
Indices
-------
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

View File

@@ -0,0 +1,10 @@
Applications
============
.. toctree::
:maxdepth: 2
themis/modules
library/modules
llm_manager/modules
mcp_server/modules

View File

@@ -0,0 +1,7 @@
library.admin module
====================
.. automodule:: library.admin
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,18 @@
library.api package
===================
.. automodule:: library.api
:members:
:show-inheritance:
:undoc-members:
Submodules
----------
.. toctree::
:maxdepth: 4
library.api.serializers
library.api.urls
library.api.views
library.api.workspaces

View File

@@ -0,0 +1,7 @@
library.api.serializers module
==============================
.. automodule:: library.api.serializers
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.api.urls module
=======================
.. automodule:: library.api.urls
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.api.views module
========================
.. automodule:: library.api.views
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.api.workspaces module
=============================
.. automodule:: library.api.workspaces
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.apps module
===================
.. automodule:: library.apps
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.content\_types module
=============================
.. automodule:: library.content_types
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.forms module
====================
.. automodule:: library.forms
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.management.commands.embed\_collection module
====================================================
.. automodule:: library.management.commands.embed_collection
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.management.commands.embed\_item module
==============================================
.. automodule:: library.management.commands.embed_item
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.management.commands.embedding\_status module
====================================================
.. automodule:: library.management.commands.embedding_status
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.management.commands.load\_library\_types module
=======================================================
.. automodule:: library.management.commands.load_library_types
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,21 @@
library.management.commands package
===================================
.. automodule:: library.management.commands
:members:
:show-inheritance:
:undoc-members:
Submodules
----------
.. toctree::
:maxdepth: 4
library.management.commands.embed_collection
library.management.commands.embed_item
library.management.commands.embedding_status
library.management.commands.load_library_types
library.management.commands.search
library.management.commands.search_stats
library.management.commands.setup_neo4j_indexes

View File

@@ -0,0 +1,7 @@
library.management.commands.search module
=========================================
.. automodule:: library.management.commands.search
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.management.commands.search\_stats module
================================================
.. automodule:: library.management.commands.search_stats
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.management.commands.setup\_neo4j\_indexes module
========================================================
.. automodule:: library.management.commands.setup_neo4j_indexes
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,15 @@
library.management package
==========================
.. automodule:: library.management
:members:
:show-inheritance:
:undoc-members:
Subpackages
-----------
.. toctree::
:maxdepth: 4
library.management.commands

View File

@@ -0,0 +1,7 @@
library.metrics module
======================
.. automodule:: library.metrics
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.models module
=====================
.. automodule:: library.models
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,34 @@
library package
===============
.. automodule:: library
:members:
:show-inheritance:
:undoc-members:
Subpackages
-----------
.. toctree::
:maxdepth: 4
library.api
library.management
library.services
Submodules
----------
.. toctree::
:maxdepth: 4
library.admin
library.apps
library.content_types
library.forms
library.metrics
library.models
library.tasks
library.urls
library.utils
library.views

View File

@@ -0,0 +1,7 @@
library.services.chunker module
===============================
.. automodule:: library.services.chunker
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.services.concepts module
================================
.. automodule:: library.services.concepts
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.services.daedalus\_s3 module
====================================
.. automodule:: library.services.daedalus_s3
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.services.embedding\_client module
=========================================
.. automodule:: library.services.embedding_client
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.services.fusion module
==============================
.. automodule:: library.services.fusion
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.services.parsers module
===============================
.. automodule:: library.services.parsers
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.services.pipeline module
================================
.. automodule:: library.services.pipeline
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.services.reranker module
================================
.. automodule:: library.services.reranker
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,25 @@
library.services package
========================
.. automodule:: library.services
:members:
:show-inheritance:
:undoc-members:
Submodules
----------
.. toctree::
:maxdepth: 4
library.services.chunker
library.services.concepts
library.services.daedalus_s3
library.services.embedding_client
library.services.fusion
library.services.parsers
library.services.pipeline
library.services.reranker
library.services.search
library.services.text_utils
library.services.vision

View File

@@ -0,0 +1,7 @@
library.services.search module
==============================
.. automodule:: library.services.search
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.services.text\_utils module
===================================
.. automodule:: library.services.text_utils
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.services.vision module
==============================
.. automodule:: library.services.vision
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.tasks module
====================
.. automodule:: library.tasks
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.urls module
===================
.. automodule:: library.urls
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.utils module
====================
.. automodule:: library.utils
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library.views module
====================
.. automodule:: library.views
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
library
=======
.. toctree::
:maxdepth: 4
library

View File

@@ -0,0 +1,7 @@
llm\_manager.admin module
=========================
.. automodule:: llm_manager.admin
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,17 @@
llm\_manager.api package
========================
.. automodule:: llm_manager.api
:members:
:show-inheritance:
:undoc-members:
Submodules
----------
.. toctree::
:maxdepth: 4
llm_manager.api.serializers
llm_manager.api.urls
llm_manager.api.views

View File

@@ -0,0 +1,7 @@
llm\_manager.api.serializers module
===================================
.. automodule:: llm_manager.api.serializers
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
llm\_manager.api.urls module
============================
.. automodule:: llm_manager.api.urls
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
llm\_manager.api.views module
=============================
.. automodule:: llm_manager.api.views
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
llm\_manager.apps module
========================
.. automodule:: llm_manager.apps
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
llm\_manager.encryption module
==============================
.. automodule:: llm_manager.encryption
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
llm\_manager.forms module
=========================
.. automodule:: llm_manager.forms
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
llm\_manager.management.commands.load\_default\_llm\_models module
==================================================================
.. automodule:: llm_manager.management.commands.load_default_llm_models
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,15 @@
llm\_manager.management.commands package
========================================
.. automodule:: llm_manager.management.commands
:members:
:show-inheritance:
:undoc-members:
Submodules
----------
.. toctree::
:maxdepth: 4
llm_manager.management.commands.load_default_llm_models

View File

@@ -0,0 +1,15 @@
llm\_manager.management package
===============================
.. automodule:: llm_manager.management
:members:
:show-inheritance:
:undoc-members:
Subpackages
-----------
.. toctree::
:maxdepth: 4
llm_manager.management.commands

View File

@@ -0,0 +1,7 @@
llm\_manager.models module
==========================
.. automodule:: llm_manager.models
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,32 @@
llm\_manager package
====================
.. automodule:: llm_manager
:members:
:show-inheritance:
:undoc-members:
Subpackages
-----------
.. toctree::
:maxdepth: 4
llm_manager.api
llm_manager.management
Submodules
----------
.. toctree::
:maxdepth: 4
llm_manager.admin
llm_manager.apps
llm_manager.encryption
llm_manager.forms
llm_manager.models
llm_manager.services
llm_manager.tasks
llm_manager.urls
llm_manager.views

View File

@@ -0,0 +1,7 @@
llm\_manager.services module
============================
.. automodule:: llm_manager.services
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
llm\_manager.tasks module
=========================
.. automodule:: llm_manager.tasks
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
llm\_manager.urls module
========================
.. automodule:: llm_manager.urls
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
llm\_manager.views module
=========================
.. automodule:: llm_manager.views
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
llm_manager
===========
.. toctree::
:maxdepth: 4
llm_manager

View File

@@ -0,0 +1,7 @@
mcp\_server.admin module
========================
.. automodule:: mcp_server.admin
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,17 @@
mcp\_server.api package
=======================
.. automodule:: mcp_server.api
:members:
:show-inheritance:
:undoc-members:
Submodules
----------
.. toctree::
:maxdepth: 4
mcp_server.api.serializers
mcp_server.api.teams
mcp_server.api.urls

View File

@@ -0,0 +1,7 @@
mcp\_server.api.serializers module
==================================
.. automodule:: mcp_server.api.serializers
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
mcp\_server.api.teams module
============================
.. automodule:: mcp_server.api.teams
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
mcp\_server.api.urls module
===========================
.. automodule:: mcp_server.api.urls
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
mcp\_server.apps module
=======================
.. automodule:: mcp_server.apps
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
mcp\_server.auth module
=======================
.. automodule:: mcp_server.auth
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
mcp\_server.context module
==========================
.. automodule:: mcp_server.context
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
mcp\_server.forms module
========================
.. automodule:: mcp_server.forms
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
mcp\_server.management.commands.backfill\_library\_memberships module
=====================================================================
.. automodule:: mcp_server.management.commands.backfill_library_memberships
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
mcp\_server.management.commands.create\_mcp\_token module
=========================================================
.. automodule:: mcp_server.management.commands.create_mcp_token
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,17 @@
mcp\_server.management.commands package
=======================================
.. automodule:: mcp_server.management.commands
:members:
:show-inheritance:
:undoc-members:
Submodules
----------
.. toctree::
:maxdepth: 4
mcp_server.management.commands.backfill_library_memberships
mcp_server.management.commands.create_mcp_token
mcp_server.management.commands.seed_signing_key

View File

@@ -0,0 +1,7 @@
mcp\_server.management.commands.seed\_signing\_key module
=========================================================
.. automodule:: mcp_server.management.commands.seed_signing_key
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,15 @@
mcp\_server.management package
==============================
.. automodule:: mcp_server.management
:members:
:show-inheritance:
:undoc-members:
Subpackages
-----------
.. toctree::
:maxdepth: 4
mcp_server.management.commands

View File

@@ -0,0 +1,7 @@
mcp\_server.metrics module
==========================
.. automodule:: mcp_server.metrics
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
mcp\_server.models module
=========================
.. automodule:: mcp_server.models
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,35 @@
mcp\_server package
===================
.. automodule:: mcp_server
:members:
:show-inheritance:
:undoc-members:
Subpackages
-----------
.. toctree::
:maxdepth: 4
mcp_server.api
mcp_server.management
mcp_server.tools
Submodules
----------
.. toctree::
:maxdepth: 4
mcp_server.admin
mcp_server.apps
mcp_server.auth
mcp_server.context
mcp_server.forms
mcp_server.metrics
mcp_server.models
mcp_server.server
mcp_server.teams
mcp_server.urls
mcp_server.views

View File

@@ -0,0 +1,7 @@
mcp\_server.server module
=========================
.. automodule:: mcp_server.server
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
mcp\_server.teams module
========================
.. automodule:: mcp_server.teams
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
mcp\_server.tools.discovery module
==================================
.. automodule:: mcp_server.tools.discovery
:members:
:show-inheritance:
:undoc-members:

View File

@@ -0,0 +1,7 @@
mcp\_server.tools.health module
===============================
.. automodule:: mcp_server.tools.health
:members:
:show-inheritance:
:undoc-members:

Some files were not shown because too many files have changed in this diff Show More