feat: rework auth model with UserToken and Daedalus/Pallas integration

- Rename MCPToken to UserToken across models, views, and tests - Update URL names from mcp-token-* to token-* - Add Daedalus/Pallas integration design doc (v2) - Switch docker-compose to build local mnemosyne:local image via shared build config instead of pulling from git.helu.ca
2026-05-23 19:50:29 -04:00
parent 735eb9de1a
commit 93639188d3
44 changed files with 1305 additions and 865 deletions
--- a/docs/DAEDALUS_PALLAS_INTEGRATION_v2.md
+++ b/docs/DAEDALUS_PALLAS_INTEGRATION_v2.md
@@ -0,0 +1,658 @@
+# Daedalus ↔ Pallas ↔ Mnemosyne Integration — v2
+
+**Status:** Approved design — supersedes
+[`DAEDALUS_PALLAS_INTEGRATION_v1.md`](DAEDALUS_PALLAS_INTEGRATION_v1.md).
+**Authoritative home:** `mnemosyne/docs/DAEDALUS_PALLAS_INTEGRATION_v2.md`
+**Versioning:** subsequent major revisions ship as `..._v3.md` etc.
+alongside this file. Cross-service docs (Daedalus, Pallas) link here.
+
+---
+
+## 1. Summary
+
+This document describes the end-state authentication / authorization
+model connecting three services:
+
+* **Mnemosyne** — knowledge platform. Owns Libraries, users, and the
+  MCP surface third-party clients query.
+* **Daedalus** — workspace + file-lifecycle UI. Registers Pallas
+  instances, syncs file content to Mnemosyne, drives chat. Acts on
+  behalf of one Mnemosyne user per Daedalus instance.
+* **Pallas** — FastAgent-backed MCP host that exposes agent teams
+  (Kottos, Mentor, Iolaus, …) as HTTP MCP servers.
+
+**What changed from v1:**
+
+* **Single token model.** The two-token split in v1 (DRF `authtoken`
+  for REST, `MCPToken` for `/mcp/`) is gone. One model —
+  [`UserToken`](../mnemosyne/mcp_server/models.py) — authenticates both
+  surfaces, managed from one UI at `/profile/tokens/`. The DRF
+  `authtoken` app has been removed from `INSTALLED_APPS`.
+* **Per-user authorization on the REST surface.** The Daedalus-facing
+  endpoints (`/library/api/*`, `/mcp_server/api/teams/*`) are no longer
+  open to any authenticated account. Each `Team` has an `owner` FK and
+  each workspace-scoped `Library` has an `owner_username` property; the
+  endpoints scope by these and return 404 for non-owners. The
+  `daedalus-service` shared account has been retired.
+* **Per-turn JWT path retired.** The legacy `iss=daedalus` JWT flow
+  (v1 §5.1, §6.2) is gone. Mnemosyne now only validates one JWT shape:
+  `typ=team`, `iss=mnemosyne`. The replay cache and the
+  `_resolve_jwt_actor` service-user fallback are also gone.
+* **Authorization headers normalised to `Bearer`.** DRF
+  `TokenAuthentication` (and its `Token` keyword) is replaced by
+  [`UserTokenAuthentication`](../mnemosyne/mcp_server/drf_auth.py),
+  which accepts `Authorization: Bearer <plaintext>`. Anonymous
+  requests get **401 + `WWW-Authenticate: Bearer`** (RFC 7235).
+
+Everything else in v1 — the resolved-library abstraction, team JWT
+shape, Pallas's static-bearer configuration, the workspace ↔ Team
+attachment model in Daedalus, agent picker UX, signing-key model — is
+unchanged.
+
+---
+
+## 2. Motivation
+
+v1 closed the per-turn JWT forwarding hairball by introducing static
+team JWTs. v2 finishes the cleanup pass: it deletes the per-turn JWT
+path entirely (now that Daedalus has migrated off it), collapses the
+remaining two-token muddle into a single `UserToken` system, and tightens
+the REST surface so authentication-as-user is sufficient for access
+control without a shared service account.
+
+---
+
+## 3. Architecture
+
+### 3.1 Services and responsibilities
+
+| Service | Role in auth model |
+|---|---|
+| **Mnemosyne** | Owns Libraries, Library memberships, `UserToken`s, Teams, `TeamWorkspaceAssignment`s, signing keys. Validates bearers. Resolves every authenticated request to a Library set. |
+| **Daedalus** | Control plane. Registers Pallas instances as Teams in Mnemosyne. Manages workspace ↔ team attachments. Stores team JWTs for copying into Pallas deployment configs. Acts as a single Mnemosyne user via a `UserToken`. |
+| **Pallas** | Stateless MCP host. Holds a static team JWT in `fastagent.secrets.yaml`. No custom auth-forwarding code. |
+
+### 3.2 Two credential types
+
+Every authenticated request to Mnemosyne presents a Bearer token of
+exactly one of these shapes:
+
+| # | Credential | `iss` | Issuer | Lifetime | Used on | Library scope source |
+|---|---|---|---|---|---|---|
+| 1 | **Opaque `UserToken`** | n/a | The Mnemosyne user, via `/profile/tokens/` | Until revoked / expiry | `/mcp/` and DRF REST | MCP: `allowed_libraries`. REST: ignored (owner-scoped). |
+| 2 | **Team JWT** | `mnemosyne` | Mnemosyne (`/mcp_server/api/teams/`) | 10 years | `/mcp/` only | Live DB lookup via `TeamWorkspaceAssignment → Library` |
+
+The v1 per-turn JWT (category 2 in v1) has been retired and is no
+longer accepted by `resolve_mcp_jwt`.
+
+### 3.3 Scope split by surface
+
+A `UserToken` carries optional `allowed_libraries` / `allowed_tools`
+fields. These are honoured **only on the MCP surface** (`/mcp/`):
+
+* **`/mcp/`** — `MCPAuthMiddleware` enforces `allowed_libraries`
+  (fail-closed: empty list = zero libraries) and `allowed_tools` (empty
+  list = any tool). This is the surface third-party clients (Claude
+  Desktop, Cline) use.
+* **`/library/api/*`, `/mcp_server/api/teams/*`** — The DRF auth class
+  resolves *who* is calling. Access is gated by `Team.owner`
+  (mcp_server) and `Library.owner_username` (library workspaces). The
+  scope claims are ignored. Daedalus tokens are therefore
+  unrestricted; the user identity plus owner-scope is the access model.
+
+The rationale: enforcing `allowed_libraries` on the REST endpoints
+would force Daedalus to mint an effectively-unrestricted token (since
+it manages the whole workspace lifecycle), which would defeat the
+field. Owner-scope already encodes the right access pattern there.
+
+### 3.4 Resolved-library abstraction (MCP)
+
+Mnemosyne's MCP auth middleware populates a single
+`resolved_libraries: list[str]` per request. Downstream code (search,
+get_chunk, …) only reads that list.
+
+```
+Bearer → classify → dispatch
+                     ├─ Opaque UserToken    → token.allowed_libraries (JSON list of UIDs)
+                     └─ team JWT (typ=team) → live DB join:
+                                                TeamWorkspaceAssignment.workspace_id
+                                                → Library.workspace_id → Library.uid
+                                 ↓
+                   resolved_libraries: list[str]
+                                 ↓
+                         downstream tools
+```
+
+Fail-closed: empty resolution → no libraries visible.
+
+---
+
+## 4. Data model
+
+### 4.1 Mnemosyne
+
+#### `UserToken` (renamed from `MCPToken`)
+[`mnemosyne/mcp_server/models.py`](../mnemosyne/mcp_server/models.py).
+Per-user opaque bearer. Hashed at rest (SHA-256, 64-char hex).
+
+```python
+class UserToken(models.Model):
+    user              = FK(User, related_name="api_tokens")
+    token_hash        = CharField(64, unique=True, db_index=True)
+    name              = CharField(100)
+    is_active         = BooleanField(default=True)
+    expires_at        = DateTimeField(null=True, blank=True)
+    last_used_at      = DateTimeField(null=True, blank=True)
+    allowed_tools     = JSONField(default=list, blank=True)
+    allowed_libraries = JSONField(default=list, blank=True)
+    created_at, updated_at = …
+```
+
+* Plaintext shown once at mint via
+  [`UserTokenManager.create_token`](../mnemosyne/mcp_server/models.py);
+  never persisted.
+* Display masking via `get_masked_token()` returns `tok_…<hash[:8]>`.
+* `allowed_*` fields apply only on `/mcp/` — see §3.3.
+
+#### `LibraryMembership`
+Unchanged from v1. Roles `owner` / `manager` / `reader` over Neo4j
+Libraries (joined by `uid` string since Library is a neomodel node).
+
+#### `Team`
+v1 + new non-null `owner` FK:
+
+```python
+class Team(models.Model):
+    id          = UUIDField(primary_key=True, editable=False)
+    name        = CharField(200)
+    owner       = FK(User, on_delete=PROTECT, related_name="teams")
+    active      = BooleanField(default=True)
+    active_jti  = UUIDField(null=True)
+    created_at, updated_at = …
+```
+
+`Team.owner` is set on creation in
+[`team_create`](../mnemosyne/mcp_server/api/teams.py) from
+`request.user`. All other team endpoints filter by `(pk, owner=request.user)`;
+non-owners receive 404, never 403, so a team's existence isn't
+disclosed across users.
+
+Soft-delete via `Team.active = False` is unchanged.
+
+#### `TeamWorkspaceAssignment`
+Unchanged from v1. Live-queried per request; `PUT /workspaces/`
+replaces the assignment set.
+
+#### `MCPSigningKey`
+Unchanged. Signs team JWTs.
+
+#### `Library.owner_username` (new neomodel property)
+[`mnemosyne/library/models.py`](../mnemosyne/library/models.py). For
+workspace-scoped libraries (i.e. those with `workspace_id` set), the
+Mnemosyne username of the creating user. Null for global libraries.
+Indexed.
+
+```python
+owner_username = StringProperty(required=False, index=True)
+```
+
+The workspace endpoints (`/library/api/workspaces/…`) set this on
+create and require `lib.owner_username == request.user.username` for
+all mutations and reads; non-owners get 404 on GET/PUT and 204 on
+DELETE (idempotent).
+
+### 4.2 Daedalus (informational — managed in the Daedalus repo)
+
+Unchanged from v1 except:
+
+* `vault_mnemosyne_daedalus_service_password` is **gone**. Daedalus
+  authenticates to Mnemosyne with a `UserToken` plaintext minted at
+  `/profile/tokens/`, stored in whatever secret the operator wires
+  (suggestion: `vault_mnemosyne_user_token`).
+* Daedalus's HTTP client sends `Authorization: Bearer <plaintext>` to
+  every Mnemosyne endpoint (`/library/api/*`, `/mcp_server/api/teams/*`,
+  `/mcp/`). The `Token <key>` keyword is no longer accepted anywhere.
+
+### 4.3 Pallas
+Unchanged from v1. Static `Authorization: Bearer <team-jwt>` in
+`fastagent.secrets.yaml`.
+
+---
+
+## 5. JWT claim shapes
+
+Only one JWT shape remains — the team JWT from v1 §5.2:
+
+```json
+{
+  "iss":  "mnemosyne",
+  "aud":  "mnemosyne",
+  "sub":  "team:<pallas_instance_uuid>",
+  "typ":  "team",
+  "iat":  1715000000,
+  "exp":  1976000000,
+  "jti":  "uuid4"
+}
+```
+
+[`mnemosyne/mcp_server/teams.py:mint_team_jwt`](../mnemosyne/mcp_server/teams.py).
+
+### 5.1 Validator changes vs v1
+
+[`mnemosyne/mcp_server/auth.py`](../mnemosyne/mcp_server/auth.py):
+
+* `resolve_mcp_jwt` no longer accepts `iss=daedalus`. The `_JTI_CACHE`
+  replay cache still exists but is exercised by no live code path —
+  scheduled for removal in a follow-up cleanup commit.
+* `_resolve_jwt_actor` resolves to `team.owner` (the Mnemosyne user
+  that created the team) rather than a synthetic service user. Audit
+  log / usage accounting now correctly attribute each turn to the
+  acting user.
+
+```python
+def _resolve_jwt_actor(claims: dict):
+    if claims.get("typ") != "team":
+        raise MCPAuthError("Per-turn JWTs are no longer accepted; mint a team JWT.")
+    team = Team.objects.select_related("owner").get(pk=claims["team_id"])
+    if not team.active:
+        raise MCPAuthError("Team JWT references an inactive team.")
+    if not team.owner.is_active:
+        raise MCPAuthError("Team owner is disabled.")
+    return team.owner
+```
+
+---
+
+## 6. Auth flow
+
+### 6.1 Third-party MCP client with `UserToken`
+1. Client sends `Authorization: Bearer <plaintext>` to `/mcp/`.
+2. `MCPAuthMiddleware` hashes → looks up `UserToken` → validates
+   active/expired/user-active.
+3. `resolved_libraries = list(token.allowed_libraries or [])`.
+4. Fails closed if empty.
+
+### 6.2 Agent team (Kottos / Mentor / Iolaus / Daedalus-chat-team)
+1. Pallas sends `Authorization: Bearer <team-jwt>` to `/mcp/`.
+2. Middleware validates signature, `iss=mnemosyne`, `typ=team`.
+3. Loads `Team` by UUID from `sub`. Verifies `active=True` and
+   `jti == active_jti`.
+4. Expands to `resolved_libraries` via `TeamWorkspaceAssignment` →
+   `Library.workspace_id`.
+5. The acting user (for audit, usage accounting) is `team.owner`.
+
+### 6.3 Daedalus REST control / ingest
+1. Daedalus sends `Authorization: Bearer <user-token-plaintext>` to
+   `/library/api/*` or `/mcp_server/api/teams/*`.
+2. DRF `UserTokenAuthentication` (first in the auth stack) resolves
+   the token to its user.
+3. Endpoint scopes by `Team.owner` (mcp_server) or
+   `Library.owner_username` (library). Non-owner ⇒ 404.
+
+### 6.4 Browser / web session
+SessionAuthentication runs second; cookie-authenticated users hit the
+DRF browsable API as themselves with no special handling.
+
+### 6.5 Failure modes
+
+| Condition | Response |
+|---|---|
+| No `Authorization` header | 401 + `WWW-Authenticate: Bearer` |
+| `Authorization: Token …` (legacy DRF keyword) | 401 (not consumed by any auth class) |
+| Invalid bearer plaintext | 401 + `WWW-Authenticate: Bearer` |
+| Inactive / expired token | 401 |
+| Disabled user | 401 |
+| JWT signature invalid | 401 + `WWW-Authenticate: Bearer` |
+| JWT `exp` past (+30s leeway) | 401 |
+| JWT `iss` not `mnemosyne` | 401 |
+| JWT `typ` not `team` (legacy per-turn) | 401 ("per-turn JWTs no longer accepted") |
+| Team inactive / unknown / `jti` stale | 401 |
+| Team endpoint, non-owner caller | 404 |
+| Workspace endpoint, non-owner caller (GET/PUT) | 404 |
+| Workspace endpoint, non-owner caller (DELETE) | 204 (idempotent) |
+
+---
+
+## 7. REST API — Mnemosyne team lifecycle
+
+Endpoints under `/mcp_server/api/teams/` are authenticated as the
+Mnemosyne user the team belongs to via a per-user `UserToken`
+(`Authorization: Bearer <plaintext>`, minted at `/profile/tokens/`).
+Each team has an `owner` FK; non-owners receive 404 (never 403) so a
+team's existence isn't disclosed across users.
+
+### 7.1 `POST /mcp_server/api/teams/`
+Create a team. `Team.owner` is set to `request.user`.
+
+**Request**
+```json
+{ "id": "a3f1…", "name": "Kottos" }
+```
+
+**Response 201** — fresh id
+```json
+{ "id": "a3f1…", "name": "Kottos", "jwt": "eyJhbGci…" }
+```
+
+**Response 200** — same id, same owner (idempotent; no new JWT issued).
+**Response 409** — same id, different owner ("Team id is already in use.").
+
+### 7.2 `DELETE /mcp_server/api/teams/{id}/`
+Soft-delete (`active=False`, clear `active_jti`). Old JWT invalid on
+next call. Non-owner ⇒ 404.
+
+### 7.3 `PUT /mcp_server/api/teams/{id}/workspaces/`
+Replace the team's workspace assignment set. Idempotent.
+
+```json
+{ "workspace_ids": ["ws_abc", "ws_def"] }
+```
+
+### 7.4 `POST /mcp_server/api/teams/{id}/rotate/`
+Generate a fresh `jti` and JWT, replace `active_jti`. Old JWT invalid
+immediately.
+
+**Upsert-on-missing.** If no `Team` exists for `id`, rotate creates one
+owned by the caller (with `name = str(id)`) and mints its first JWT —
+the operator clicks "Rotate JWT" in Daedalus settings and things just
+work even if Daedalus's `provision_teams` workflow never ran for this
+PallasInstance. The placeholder name can be edited via admin.
+
+| Response | Condition |
+|---|---|
+| **200** + `jwt` | Same-owner id (rotates) or fresh id (upserts + mints) |
+| **409** | `id` exists under a different owner (`"Team id is already in use."`) |
+| **409** | Team is inactive (soft-deleted) — explicit recreate required |
+
+The upsert path logs `team_rotate upserted_missing team_id=… owner=…`
+at INFO. Surfacing this in metrics is a useful drift signal: Daedalus
+and Mnemosyne fell out of sync on team provisioning.
+
+### 7.5 `GET /mcp_server/api/teams/{id}/`
+Read-only detail (no JWT). Used by the Daedalus reconciler.
+
+### 7.6 `/library/api/ingest/` and `/library/api/jobs/…`
+Same owner-scope model as the workspace endpoints: every ingest write,
+job read, retry, and list filter against
+`Library.owner_username == request.user.username` (global libraries
+with null `owner_username` remain shared). Cross-user calls get 404
+with the same "not registered" wording as a genuinely missing
+workspace — existence is not disclosed across users. The list endpoint
+silently filters; a `library_uid` the caller has no access to returns
+an empty list rather than 404.
+
+---
+
+## 8. Daedalus lifecycle hooks
+
+Unchanged from v1 §8 except the HTTP client now sends
+`Authorization: Bearer <UserToken-plaintext>` and Daedalus's config
+exposes one `UserToken` plaintext (one per Mnemosyne user the Daedalus
+instance acts on behalf of, in deployments that multiplex).
+
+---
+
+## 9. Operator workflows
+
+### 9.1 Register a new Pallas deployment
+Unchanged from v1 §9.1.
+
+### 9.2 Attach a Pallas team to a workspace
+Unchanged from v1 §9.2.
+
+### 9.3 Retire a Pallas deployment
+Unchanged from v1 §9.3.
+
+### 9.4 Rotate a compromised team JWT
+Unchanged from v1 §9.4.
+
+### 9.5 Provision Mnemosyne integration on a fresh Daedalus instance
+Replaces v1 §9.5 (`provision_teams`) and the deleted
+`ensure_service_user` flow:
+
+1. **Mint a `UserToken` for the Mnemosyne user** Daedalus will act as:
+   `/profile/tokens/add/` (UI) or
+   `python manage.py create_user_token --user <username> --name "Daedalus"`.
+   Copy the plaintext (shown once).
+2. **Stage the plaintext in Daedalus's config** as the bearer for all
+   Mnemosyne calls.
+3. **Run Daedalus's `provision_teams`** to materialize a `Team` row in
+   Mnemosyne for every existing `PallasInstance`.
+4. **Distribute team JWTs** to each Pallas deployment as v1 §9.5
+   describes.
+
+### 9.6 Issue a `UserToken` for a third-party MCP client
+1. User logs in to Mnemosyne, navigates to `/profile/tokens/`, clicks
+   "Generate API Token".
+2. (Optional) opens the "Restrictions (optional)" section to set
+   `allowed_tools` / `allowed_libraries` — these apply only on
+   `/mcp/`; for purely REST use they can stay empty.
+3. Plaintext is shown once on the response page.
+4. User pastes plaintext into the third-party client's config (Claude
+   Desktop, Cline, etc.) with `Authorization: Bearer …`.
+
+The same UI and command (`create_user_token`) mint tokens for any
+purpose — Daedalus, MCP clients, scripts, CI. There is no separate
+"DRF token" category.
+
+---
+
+## 10. UX changes in Daedalus
+
+Unchanged from v1 §10.
+
+---
+
+## 11. Migration
+
+### 11.1 State at the start of v2
+
+* Mnemosyne is not in a production deployment; migrations are reset on
+  schema changes and the project assumes a clean DB on the next
+  release.
+* Daedalus has already migrated to `Authorization: Bearer <plaintext>`
+  and is configured to use a per-user token; the v1 DRF-token shim is
+  no longer used at runtime.
+* No live Pallas deployments authenticate via per-turn JWT (the path
+  is removed).
+
+### 11.2 Order of operations
+
+1. **Mnemosyne v2 deploys.** New `UserTokenAuthentication`, owner-scoped
+   REST endpoints, retired per-turn JWT validation, removed
+   `authtoken` app. Operator mints a `UserToken` for Daedalus's
+   Mnemosyne account before deploy.
+2. **Daedalus's config swap.** Operator points Daedalus at the new
+   `UserToken` plaintext. (If Daedalus was still sending
+   `Authorization: Token …`, switch to `Authorization: Bearer …` at
+   the same time.)
+3. **Existing Teams.** None expected at the v2 cutover (migrations are
+   reset). If any existed, `Team.owner` would need backfill; not in
+   scope.
+
+### 11.3 Rollback
+Mnemosyne v2 is a coordinated cutover with Daedalus's bearer-header
+swap. Rolling Mnemosyne back to v1 without rolling Daedalus back too
+means Daedalus's `Authorization: Bearer …` won't be recognised on
+`/library/api/*` (v1 only accepted `Token`). Plan the deploy as a
+single window.
+
+---
+
+## 12. Deprecated / removed in v2
+
+### Mnemosyne
+* `rest_framework.authtoken` (removed from `INSTALLED_APPS`).
+  Generated migration drops the `authtoken_token` table on next migrate;
+  on a reset schema there's nothing to drop.
+* `rest_framework.authentication.TokenAuthentication` and
+  `BasicAuthentication` (removed from
+  `REST_FRAMEWORK["DEFAULT_AUTHENTICATION_CLASSES"]`).
+* "API Token" card on `/profile/settings/` (removed). The whole
+  `api_token_regenerate` view + URL are gone.
+* `mcp_server.management.commands.ensure_service_user` (deleted).
+* `daedalus-service` user (no longer provisioned by Mnemosyne; no
+  longer assumed by any endpoint).
+* `MCP_JWT_SERVICE_USERNAME` setting (no longer read by
+  `_resolve_jwt_actor`).
+* Per-turn JWT path in
+  [`mcp_server/auth.py`](../mnemosyne/mcp_server/auth.py) — accepted
+  shapes shrink to `typ=team` only. `_JTI_CACHE` is now exercised by
+  no live path; scheduled for cleanup.
+* `MCPToken` (renamed to `UserToken`); `MCPTokenManager`,
+  `MCPTokenAdmin`, `MCPTokenCreateForm`, `MCPTokenEditForm` (renamed
+  in lockstep). The `mcp_…` masked-token prefix becomes `tok_…`.
+* `create_mcp_token` management command (renamed `create_user_token`).
+* `/profile/mcp-tokens/` URL prefix (renamed `/profile/tokens/`); URL
+  names `mcp-token-*` (renamed `token-*`).
+
+### Daedalus
+* `vault_mnemosyne_daedalus_service_password` (no longer needed; the
+  service user is gone).
+* Any code path that distinguished DRF-`Token` from MCP-`Bearer` — one
+  bearer header for everything now.
+
+### Pallas
+No changes from v1.
+
+---
+
+## 13. Security
+
+### 13.1 Token lifetimes
+* **`UserToken`**: until revoked (user) or `expires_at`. Rotation is
+  manual via the `/profile/tokens/` dashboard.
+* **Team JWT**: 10 years. Revocation via `Team.active`,
+  `Team.active_jti`, or key rotation.
+
+### 13.2 Revocation levers
+1. `PUT /teams/{id}/workspaces/` with `[]` — team sees nothing, JWT
+   still validates. Useful for pausing without redistributing tokens.
+2. `DELETE /teams/{id}/` — team inactive, all its JWTs rejected.
+3. `POST /teams/{id}/rotate/` — `active_jti` changes; leaked JWT
+   stops working.
+4. **Revoke a `UserToken`** — `/profile/tokens/{id}/revoke/` flips
+   `is_active=False`; immediate effect for both `/mcp/` and REST.
+5. `MCPSigningKey.retire()` — nuclear option for team JWTs.
+
+### 13.3 At-rest protection
+* `UserToken.token_hash`: SHA-256 of plaintext; plaintext never
+  stored.
+* `MCPSigningKey.secret_hex`: 256-bit hex secret stored in Mnemosyne
+  DB only.
+* `PallasInstance.team_jwt_encrypted`: Fernet-encrypted by Daedalus.
+
+### 13.4 Audit attribution
+Every authenticated request resolves to a real Mnemosyne user:
+
+* Opaque `UserToken` → `token.user`.
+* Team JWT → `team.owner`.
+
+Both flow through to usage accounting (`LLMUsage`, search metrics) and
+the audit log. The synthetic `daedalus-service` actor is gone; nothing
+in the audit trail is attributed to a non-user account.
+
+Notable audit events:
+
+* `team_create created team_id=… name=…` — fresh team registered.
+* `team_create idempotent_hit team_id=…` — same-owner re-POST.
+* `team_create owner_conflict team_id=… caller=…` — id collision.
+* `team_rotate team_id=… new_jti=…` — explicit rotation.
+* `team_rotate upserted_missing team_id=… owner=…` — rotate created a
+  missing team on the fly. Useful drift signal: Daedalus and
+  Mnemosyne fell out of sync on team provisioning.
+* `team_delete team_id=…` — soft-delete.
+
+### 13.5 Isolation model
+Unchanged from v1 §13.5.
+
+---
+
+## 14. Testing
+
+### 14.1 Mnemosyne test surface (relevant to v2)
+* `resolve_mcp_jwt` rejects `iss=daedalus` / non-`team` payloads.
+* `_resolve_jwt_actor` resolves to `team.owner`; rejects per-turn JWTs
+  and inactive owners. See
+  [`test_auth.py::ResolveJWTActorTest`](../mnemosyne/mcp_server/tests/test_auth.py).
+* `UserTokenAuthentication` issues 401 + `WWW-Authenticate: Bearer`
+  for anonymous and rejected-token cases; 200 for valid bearer; stashes
+  the `UserToken` on `request.auth`. See
+  [`test_drf_auth.py`](../mnemosyne/mcp_server/tests/test_drf_auth.py).
+* `Team` endpoints scope by `owner`; cross-user GET/DELETE/PUT return
+  404; same-id different-owner POST/rotate returns 409. `rotate`
+  upserts a missing team owned by the caller. See
+  [`test_teams_api.py`](../mnemosyne/mcp_server/tests/test_teams_api.py).
+* Ingest endpoints (`POST /library/api/ingest/`,
+  `GET/POST /library/api/jobs/…`) scope by `Library.owner_username`.
+  Cross-user writes/reads return 404; list silently filters. The
+  Cypher-touching paths require Neo4j, so the scoping is exercised by
+  the manual e2e plan in §14.3 rather than unit tests.
+* `UserToken` model: hash-at-rest, `tok_…` masked prefix,
+  `allowed_libraries` round-trip. See
+  [`test_token.py`](../mnemosyne/mcp_server/tests/test_token.py),
+  [`test_models.py`](../mnemosyne/mcp_server/tests/test_models.py).
+
+### 14.2 Daedalus test surface
+Unchanged from v1 §14.2 except:
+* HTTP client uses `Authorization: Bearer …` against every Mnemosyne
+  endpoint.
+* Provisioning command depends on a configured `UserToken`, not the
+  retired `daedalus-service` Basic-auth credential.
+
+### 14.3 Integration
+* End-to-end: MCP client with `UserToken` → search scoped to
+  `token.allowed_libraries`.
+* End-to-end: Pallas with team JWT → search scoped to team's attached
+  workspaces.
+* End-to-end: Daedalus REST call with `UserToken` → workspace
+  mutation succeeds only for the owning user; cross-user attempts get
+  404.
+* End-to-end: ingest as one user, then a *different* user attempts
+  `POST /library/api/ingest/`, `GET /jobs/{id}/`, `POST /jobs/{id}/retry/`
+  and `GET /jobs/?library_uid=<theirs>` — first three return 404, the
+  list returns an empty array.
+* End-to-end: anonymous REST call → 401 + `WWW-Authenticate: Bearer`.
+* End-to-end: `POST /mcp_server/api/teams/{fresh-uuid}/rotate/` on a
+  team Mnemosyne has never seen → 200 + JWT, `Team` row created with
+  `owner=request.user`. Second rotate on the same id → 200 with a
+  fresh `active_jti`. Rotate on an id owned by a different user → 409.
+
+---
+
+## 15. Phased delivery
+
+| # | Phase | Surface | Status |
+|---|---|---|---|
+| 1 | Design v1 | [`DAEDALUS_PALLAS_INTEGRATION_v1.md`](DAEDALUS_PALLAS_INTEGRATION_v1.md) | Superseded |
+| 2 | Mnemosyne core | `LibraryMembership`, `MCPToken`, `Team`, `TeamWorkspaceAssignment`, `/mcp_server/api/teams/`, team JWT mint | Implemented (v1) |
+| 3 | Pallas cleanup | Remove `_fastagent_patch.py` internals | Implemented (v1) |
+| 4 | Daedalus integration | Lifecycle hooks, reconciler, `provision_teams`, attached-teams UI | Implemented (v1) |
+| 5 | Per-user REST authorization | `Team.owner`, `Library.owner_username`, owner-scope on all Daedalus-facing endpoints, `_resolve_jwt_actor` → `team.owner` | Implemented (v2) |
+| 6 | Token consolidation | Rename `MCPToken` → `UserToken`, `UserTokenAuthentication` DRF class, drop `authtoken` + DRF Token UI, retire per-turn JWT, `Bearer`-first auth stack | Implemented (v2) |
+| 7 | Documentation | This file; updates to [`mnemosyne_integration.md`](mnemosyne_integration.md) and [`deploy.md`](deploy.md) | Implemented (v2) |
+
+---
+
+## 16. Open items (v2)
+
+* `_JTI_CACHE` in [`auth.py`](../mnemosyne/mcp_server/auth.py) is dead
+  code (the per-turn replay path is gone). Cleanup commit pending; not
+  blocking.
+* `BasicAuthentication` is removed from the DRF default stack. If any
+  internal tooling relied on it, that path is now broken and will need
+  an explicit re-add to the relevant viewset's `authentication_classes`
+  rather than the global default.
+
+---
+
+## 17. Cross-references
+
+* Mnemosyne MCP auth: [`mnemosyne/mcp_server/auth.py`](../mnemosyne/mcp_server/auth.py).
+* Mnemosyne DRF auth class: [`mnemosyne/mcp_server/drf_auth.py`](../mnemosyne/mcp_server/drf_auth.py).
+* Mnemosyne token model: [`mnemosyne/mcp_server/models.py`](../mnemosyne/mcp_server/models.py) (`UserToken`).
+* Mnemosyne team REST: [`mnemosyne/mcp_server/api/teams.py`](../mnemosyne/mcp_server/api/teams.py).
+* Mnemosyne workspace REST: [`mnemosyne/library/api/workspaces.py`](../mnemosyne/library/api/workspaces.py).
+* Token self-service dashboard: [`mnemosyne/mcp_server/views.py`](../mnemosyne/mcp_server/views.py), [`urls.py`](../mnemosyne/mcp_server/urls.py).
+* `create_user_token` management command: [`mnemosyne/mcp_server/management/commands/create_user_token.py`](../mnemosyne/mcp_server/management/commands/create_user_token.py).
+* v1 design (superseded but kept for history): [`DAEDALUS_PALLAS_INTEGRATION_v1.md`](DAEDALUS_PALLAS_INTEGRATION_v1.md).
--- a/docs/deploy.md
+++ b/docs/deploy.md
@@ -85,8 +85,7 @@ an explicit `when: mnemosyne_first_deploy` flag.

 ```bash
 # Apply Django ORM migrations (PostgreSQL schema)
-docker compose -f /srv/mnemosyne/docker-compose.yaml \
-    run --rm app migrate
+docker compose -f /srv/mnemosyne/docker-compose.yaml run --rm app migrate

 # Create Neo4j vector + full-text indexes and load library-type defaults
 docker compose -f /srv/mnemosyne/docker-compose.yaml \
@@ -315,17 +314,18 @@ curl http://puck.incus:23181/metrics | head -5

 ### Verify Daedalus auth (per-user API token)

-Daedalus now authenticates as a Mnemosyne user via the DRF token shown
-on `/profile/settings/`. To smoke-test from a deploy host:
+Daedalus now authenticates as a Mnemosyne user via a `UserToken` minted
+at `/profile/tokens/`. To smoke-test from a deploy host:

 ```bash
-curl -H "Authorization: Token <user-api-token>" \
+curl -H "Authorization: Bearer <user-token-plaintext>" \
    https://mnemosyne.ouranos.helu.ca/library/api/workspaces/ws_smoke/ \
    -o /dev/null -w "%{http_code}"
 # Expect: 200 if the workspace exists for that user, 404 otherwise.
+# An anonymous request gets 401 with `WWW-Authenticate: Bearer`.
 ```

-### Verify MCP connectivity (from a client with a valid MCPToken)
+### Verify MCP connectivity (from a client with a valid UserToken)

 ```bash
 curl -H "Authorization: Bearer <token>" \
--- a/docs/mnemosyne_integration.md
+++ b/docs/mnemosyne_integration.md
@@ -8,7 +8,7 @@ This document describes Mnemosyne's role in the Daedalus + Pallas architecture a

 Mnemosyne exposes two interfaces for the wider Ouranos ecosystem:

-1. **REST API** (`/library/api/*`) — consumed by the Daedalus backend authenticated as the owning Mnemosyne user via a per-user DRF token (`Authorization: Token <key>`, surfaced on `/profile/settings/`) for workspace lifecycle and asynchronous file ingestion. Phase 1, **implemented**.
+1. **REST API** (`/library/api/*`) — consumed by the Daedalus backend authenticated as the owning Mnemosyne user via a per-user `UserToken` (`Authorization: Bearer <plaintext>`, minted at `/profile/tokens/`) for workspace lifecycle and asynchronous file ingestion. Phase 1, **implemented**.
 2. **MCP Server** (port 22091 internal, `/mcp/` via nginx on 23090) — exposes search, browse, and retrieval tools. Phase 5 of Mnemosyne's own roadmap, **implemented** with workspace-scoped access control via long-lived team JWTs. Consumed by Pallas FastAgents in production (Daedalus integration Phase 2, **implemented** — see [Phase 3 of this doc](#3-phase-3-long-lived-team-jwt-access-control-for-pallas-instances)).

 ### Phase status
@@ -105,7 +105,7 @@ Auth is controlled by `MCP_REQUIRE_AUTH` in `.env`. Production sets it to `True`

 ## 2. REST API for Daedalus

-All endpoints require an `Authorization: Token <key>` header carrying the DRF token of the Mnemosyne user the workspace belongs to (surfaced on `/profile/settings/`). Workspaces are scoped to their creating user via the `Library.owner_username` property; cross-user access returns 404. They are consumed by the Daedalus FastAPI backend only — not by any frontend.
+All endpoints require an `Authorization: Bearer <plaintext>` header carrying a `UserToken` belonging to the Mnemosyne user the workspace belongs to (minted at `/profile/tokens/`). Workspaces are scoped to their creating user via the `Library.owner_username` property; cross-user access returns 404. Anonymous requests get 401 with `WWW-Authenticate: Bearer`. These endpoints are consumed by the Daedalus FastAPI backend only — not by any frontend.

 ### Workspace lifecycle

@@ -354,7 +354,7 @@ mnemosyne_s3_operations_total{operation,status}                counter
 - [x] `GET /library/api/jobs/{job_id}/`, `POST .../retry/`, `GET /library/api/jobs/`
 - [x] `library.tasks.ingest_from_daedalus` Celery task with content-hash-aware supersede logic
 - [x] `library.services.daedalus_s3` cross-bucket fetch + copy
- [x] Per-user DRF token auth (`Authorization: Token <key>`); workspaces scoped to the owning user via `Library.owner_username`
+- [x] Per-user `UserToken` auth (`Authorization: Bearer <plaintext>`, minted at `/profile/tokens/`); workspaces scoped to the owning user via `Library.owner_username`

 ### Phase 2 — MCP Server (Mnemosyne roadmap Phase 5)  ✅ Implemented
 - [x] `mcp_server/` module following the [Django MCP Pattern](Pattern_Django-MCP_V1-00.md)
--- a/docs/ouranos.md
+++ b/docs/ouranos.md
@@ -1,557 +0,0 @@
-# Ouranos Lab
-
-Infrastructure-as-Code project managing the **Ouranos Lab** — a development sandbox at [ouranos.helu.ca](https://ouranos.helu.ca). Uses **Terraform** for container provisioning and **Ansible** for configuration management, themed around the moons of Uranus.
-
---
-
-## Project Overview
-
-| Component | Purpose |
-|-----------|---------|
-| **Terraform** | Provisions 10 specialised Incus containers (LXC) with DNS-resolved networking, security policies, and resource dependencies |
-| **Ansible** | Deploys Docker, databases (PostgreSQL, Neo4j), observability stack (Prometheus, Grafana, Loki), and application runtimes across all hosts |
-
-> **DNS Domain**: Incus resolves containers via the `.incus` domain suffix (e.g., `oberon.incus`, `portia.incus`). IPv4 addresses are dynamically assigned — always use DNS names, never hardcode IPs.
-
---
-
-## Uranian Host Architecture
-
-All containers are named after moons of Uranus and resolved via the `.incus` DNS suffix.
-
-| Name | Role | Description | Nesting |
-|------|------|-------------|---------|
-| **ariel** | graph_database | Neo4j — Ethereal graph connections | ✔ |
-| **caliban** | agent_automation | Agent S MCP Server with MATE Desktop | ✔ |
-| **miranda** | mcp_docker_host | Dedicated Docker Host for MCP Servers | ✔ |
-| **oberon** | container_orchestration | Docker Host — MCP Switchboard, RabbitMQ, Open WebUI | ✔ |
-| **portia** | database | PostgreSQL — Relational database host | ❌ |
-| **prospero** | observability | PPLG stack — Prometheus, Grafana, Loki, PgAdmin | ❌ |
-| **puck** | application_runtime | Python App Host — JupyterLab, Django apps, Gitea Runner | ✔ |
-| **rosalind** | collaboration | Gitea, LobeChat, Nextcloud, AnythingLLM | ✔ |
-| **sycorax** | language_models | Arke LLM Proxy | ✔ |
-| **titania** | proxy_sso | HAProxy TLS termination + Casdoor SSO | ✔ |
-| **umbriel** | graph_database | Neo4j (Mnemosyne) — dedicated memory graph | ✔ |
-
-### puck — Project Application Runtime
-
-Shape-shifting trickster embodying Python's versatility.
-This is the host that runs Python projects in the Ouranos sandbox.
-It has an RDP server and is generally where application development happens.
-Each project has a number that is used to determine port numbers.
-
- Docker engine
- JupyterLab (port 22071 via OAuth2-Proxy)
- Gitea Runner (CI/CD agent)
- Django Projects: Zelus (221), Angelia (222), Athena (224), Kairos (225), Icarlos (226), MCP Switchboard (227), Spelunker (228), Peitho (229), Mnemosyne (230)
- FastAgent Projects: Pallas (240)
- FastAPI Projects: Daedalus (200), Arke (201) Kernos (202), Rommie (203), Orpheus (204), Periplus (205), Nike (206), Stentor (207)
-
-### caliban — Agent Automation
-
-Autonomous computer agent learning through environmental interaction.
-
- Docker engine
- Agent S MCP Server (MATE desktop, AT-SPI automation)
- Kernos MCP Shell Server (port 22062)
- Rommie MCP Server (port 22061) — agent-to-agent GUI automation via Agent S
- FreeCAD Robust MCP Server (port 22063) — CAD automation via FreeCAD XML-RPC
- GPU passthrough
- RDP access (port 25521)
-
-### oberon — Container Orchestration & Dockerized Shared Services
-
-King of the Fairies orchestrating containers and managing MCP infrastructure.
-
- Docker engine
- MCP Switchboard (port 22781) — Django app routing MCP tool calls
- RabbitMQ message queue
- smtp4dev SMTP test server (port 22025)
-
-### portia — Relational Database
-
-Intelligent and resourceful — the reliability of relational databases.
-
- PostgreSQL 17 (port 5432)
- Databases: `arke`, `anythingllm`, `gitea`, `hass`, `lobechat`, `mcp_switchboard`, `mnemosyne`, `nextcloud`, `openwebui`, `periplus`, `spelunker`
-
-### ariel — Graph Database
-
-Air spirit — ethereal, interconnected nature mirroring graph relationships.
-
- Neo4j 5.26.0 (Docker)
- HTTP API: port 25554
- Bolt: port 7687 (reached as `ariel.incus:7687` on the internal network)
-
-### umbriel — Graph Database (Mnemosyne)
-
-Dusky melancholy sprite from Pope's *Rape of the Lock* — keeper of the Cave of
-Spleen, naturally paired with Mnemosyne the Titan of memory. Dedicated Neo4j
-instance so Mnemosyne's `Library`/`Collection`/`Item`/`Chunk`/`Concept` labels,
-vector indexes, and schema migrations can't collide with another tenant's
-graph on Ariel.
-
- Neo4j 5.26.0 (Docker)
- HTTP Browser: port 25555
- Bolt: port 7687 (reached as `umbriel.incus:7687` on the internal network)
-
-### miranda — MCP Docker Host
-
-Curious bridge between worlds — hosting MCP server containers.
-
- Docker engine (API exposed on port 2375 for MCP Switchboard)
- MCPO OpenAI-compatible MCP proxy 22071
- Argos MCP Server — web search via SearXNG (port 22062)
- Grafana MCP Server (port 22063)
- Neo4j MCP Server (port 22064)
- Gitea MCP Server (port 22065)
-
-### prospero — Observability Stack
-
-Master magician observing all events.
-
- PPLG stack via Docker Compose: Prometheus, Loki, Grafana, PgAdmin
- Internal HAProxy with OAuth2-Proxy for all dashboards
- AlertManager with Pushover notifications
- Prometheus metrics collection (`node-exporter`, HAProxy, Loki)
- Loki log aggregation via Alloy (all hosts)
- Grafana dashboard suite with Casdoor SSO integration
-
-### rosalind — Third Party Applications for testing and evaluation
-
-Witty and resourceful moon for PHP, Go, and Node.js runtimes.
-
- SearXNG privacy search (port 22083, behind OAuth2-Proxy)
- Gitea self-hosted Git (port 22082, SSH on 22022)
- LobeChat AI chat interface (port 22081)
- Nextcloud file sharing and collaboration (port 22083)
- AnythingLLM document AI workspace (port 22084)
- Nextcloud data on dedicated Incus storage volume
- Open WebUI LLM interface (port 22088, PostgreSQL backend on Portia
- Home Assistant (port 8123)
-
-### sycorax — Language Models
-
-Original magical power wielding language magic.
-
- Arke LLM API Proxy (port 25540)
- Multi-provider support (OpenAI, Anthropic, etc.)
- Session management with Memcached
- Database backend on Portia
-
-### titania — Proxy & SSO Services
-
-Queen of the Fairies managing access control and authentication.
-
- HAProxy 3.x with TLS termination (port 443)
- Let's Encrypt wildcard certificate via certbot DNS-01 (Namecheap)
- HTTP to HTTPS redirect (port 80)
- Gitea SSH proxy (port 22022)
- Casdoor SSO (port 22081, local PostgreSQL)
- Prometheus metrics at `:8404/metrics`
-
---
-
-## Port Numbering
-
-Well-known ports running as a service may be used: Postgresql 5432, Prometheus Metrics 9100.
-
-However inside a docker project, the number plan needs to be followed to avoid port conflicts and confusion:
-XXXYZ
-XXX Project Number or 220 for external project 
-Y Service: 0 reserved, 1-4 flexible, 5 database, 6 MCP, 7 API, 8 Web App, 9 Prometheus metrics
-Z Instance: The running instance of this app on the same host, starting at 1.  May also be used to handle exceptions.
-
-255 Incus port forwarding: Ports in ths range are forwarded from the Incus host to Incus containers (defined in Terraform)
-
-514ZZ is the syslog port.  Docker containers send their syslog to an Alloy syslog collector port.  ZZ is the application instance, they just need to be different on the same host and increment from 01.
-
---
-
-## Application Conventions
-
-Standards that all services deployed in Ouranos MUST follow. For full logging standards and anti-patterns, see [red_panda_standards.md](red_panda_standards.md).
-
-### Health Check Endpoints
-
-All services MUST expose Kubernetes-style health endpoints:
-
-| Endpoint | Purpose | Auth |
-|----------|---------|------|
-| `GET /live` | **Liveness** — process is running and accepting connections | None |
-| `GET /ready` | **Readiness** — process is running AND all dependencies (DB, cache, upstream APIs) are healthy | None |
-| `GET /metrics` | Prometheus metrics (see below) | IP-restricted |
-
- HAProxy checks `health_path` (typically `/ready/`) for backend health — return HTTP 200 when healthy
- Health endpoints MUST NOT require authentication (no JWT, no session)
- Third-party services use their native health paths (e.g., `/api/health`, `/api/healthz`, `/-/healthy`)
-
-### Health Checks in Docker Compose
-
-Use `curl -f` for Docker Compose healthchecks. Install curl in images if needed.
-
-```yaml
-healthcheck:
-  test: ["CMD", "curl", "-f", "http://localhost:8000/live"]
-  interval: 30s
-  timeout: 10s
-  retries: 3
-  start_period: 40s
-```
-
-### Logging Conventions
-
-Log output flows through: **App → syslog (RFC3164) → Alloy → Loki → Grafana**
-
-| Level | Usage |
-|-------|-------|
-| **ERROR** | Broken state requiring human action — always include `exc_info=True`, error type, and context |
-| **WARNING** | Degraded but recovering — client disconnects, performance outliers, client-side exceptions, leaked markup |
-| **INFO** | Lifecycle events — service start/stop, connections, requests completed, jobs finished |
-| **DEBUG** | Diagnostic detail — SSE events, keepalive pings, health check 200 responses, negotiation steps |
-
-**Health check responses MUST be logged at DEBUG only.** HAProxy and Prometheus probe endpoints every 15-30 seconds. Logging these at INFO floods syslog with thousands of identical `200 OK` lines per hour, burying real events.
-
-### Protected vs Unprotected Endpoints
-
-| Protected (require valid JWT) | Unprotected |
-|-------------------------------|-------------|
-| All `/api/v1/*` routes | `GET /live` |
-| | `GET /ready` |
-| | `GET /metrics` (IP-restricted to internal networks) |
-| | `GET /api/auth/login-url` |
-| | `POST /api/auth/token` |
-| | `POST /api/v1/telemetry` (sendBeacon cannot set headers) |
-
-### Prometheus Metrics
-
-All services SHOULD expose `GET /metrics` in Prometheus exposition format, scraped by Prospero's Prometheus (default 15s interval).
-
- **IP-restricted** to internal networks only (`10.10.0.0/24`, `172.16.0.0/12`, `127.0.0.0/8`)
- Consider exposing: request counts/durations, error rates, active connections, queue depths, dependency health
-
-### Browser Telemetry
-
-Frontend/browser code MUST send telemetry data and errors back to the application's telemetry API:
-
- `POST /api/v1/telemetry` — unprotected (browser `sendBeacon` cannot set Authorization headers)
- Capture and report: JavaScript exceptions, performance metrics, user-facing errors
- Client-side exceptions should log as **WARNING** on the server (they indicate a problem but not a server-side failure)
-
-### Docker Networking
-
- Use the **default Docker bridge network** for simple deployments
- Add additional named networks only when required (e.g., isolating database traffic) or explicitly requested
- Do not create custom network definitions for single-service Docker Compose stacks
-
---
-
-## External Access via HAProxy
-
-Titania provides TLS termination and reverse proxy for all services.
-
- **Base domain**: `ouranos.helu.ca`
- **HTTPS**: port 443 (standard)
- **HTTP**: port 80 (redirects to HTTPS)
- **Certificate**: Let's Encrypt wildcard via certbot DNS-01
-
-### Route Table
-
-| Subdomain | Backend | Service |
-|-----------|---------|---------|
-| `ouranos.helu.ca` (root) | puck.incus:22281 | Angelia (Django) |
-| `alertmanager.ouranos.helu.ca` | prospero.incus:443 (SSL) | AlertManager |
-| `angelia.ouranos.helu.ca` | puck.incus:22281 | Angelia (Django) |
-| `anythingllm.ouranos.helu.ca` | rosalind.incus:22084 | AnythingLLM |
-| `arke.ouranos.helu.ca` | sycorax.incus:25540 | Arke LLM Proxy |
-| `athena.ouranos.helu.ca` | puck.incus:22481 | Athena (Django) |
-| `gitea.ouranos.helu.ca` | rosalind.incus:22082 | Gitea |
-| `grafana.ouranos.helu.ca` | prospero.incus:443 (SSL) | Grafana |
-| `hass.ouranos.helu.ca` | oberon.incus:8123 | Home Assistant |
-| `id.ouranos.helu.ca` | titania.incus:22081 | Casdoor SSO |
-| `icarlos.ouranos.helu.ca` | puck.incus:22681 | Icarlos (Django) |
-| `jupyterlab.ouranos.helu.ca` | puck.incus:22071 | JupyterLab (OAuth2-Proxy) |
-| `kairos.ouranos.helu.ca` | puck.incus:22581 | Kairos (Django) |
-| `lobechat.ouranos.helu.ca` | rosalind.incus:22081 | LobeChat |
-| `loki.ouranos.helu.ca` | prospero.incus:443 (SSL) | Loki |
-| `mcp-switchboard.ouranos.helu.ca` | oberon.incus:22781 | MCP Switchboard |
-| `nextcloud.ouranos.helu.ca` | rosalind.incus:22083 | Nextcloud |
-| `openwebui.ouranos.helu.ca` | oberon.incus:22088 | Open WebUI |
-| `peitho.ouranos.helu.ca` | puck.incus:22981 | Peitho (Django) |
-| `periplus.ouranos.helu.ca` | puck.incus:20681 | Periplus (FastAPI + MCP via nginx) |
-| `pgadmin.ouranos.helu.ca` | prospero.incus:443 (SSL) | PgAdmin 4 |
-| `prometheus.ouranos.helu.ca` | prospero.incus:443 (SSL) | Prometheus |
-| `searxng.ouranos.helu.ca` | oberon.incus:22073 | SearXNG (OAuth2-Proxy) |
-| `smtp4dev.ouranos.helu.ca` | oberon.incus:22085 | smtp4dev |
-| `spelunker.ouranos.helu.ca` | puck.incus:22881 | Spelunker (Django) |
-
---
-
-## Infrastructure Management
-
-### Quick Start
-
-```bash
-# Provision containers
-cd terraform
-terraform init
-terraform plan
-terraform apply
-
-# Start all containers
-cd ../ansible
-source ~/env/ouranos/bin/activate
-ansible-playbook sandbox_up.yml
-
-# Deploy all services
-ansible-playbook site.yml
-
-# Stop all containers
-ansible-playbook sandbox_down.yml
-```
-
-### Python Virtual Environment Setup
-
-The Ansible automation requires a Python virtual environment with the `ansible` package installed. Create and activate the environment from the `~` directory:
-
-```bash
-# Create virtual environment
-cd ~
-python3 -m venv env/ouranos
-
-# Activate environment
-source ~/env/ouranos/bin/activate
-
-# Install Ansible
-pip install ansible
-pip install ansible-core
-pip install ansible-community.postgresql
-```
-
-### Ansible Playbook Syntax Check
-
-Before running playbooks, use the `apsc.sh` utility (in PATH) to quickly validate YAML syntax:
-
-```bash
-# From the ansible directory
-apsc.sh
-
-# This will check all YAML files in the current directory for syntax errors
-```
-
-### Terraform Workflow
-
-1. **Define** — Containers, networks, and resources in `*.tf` files
-2. **Plan** — Review changes with `terraform plan`
-3. **Apply** — Provision with `terraform apply`
-4. **Verify** — Check outputs and container status
-
-### Terraform Import
-
-When containers or other resources are created manually (outside Terraform) or need to be re-imported after recreation, use `terraform import` to sync the Terraform state with existing infrastructure.
-
-#### Import Syntax
-
-The correct import format for Incus resources requires quoting resource addresses with `for_each` keys and using the full ID including image fingerprints:
-
-```bash
-# Import a container with correct syntax
-terraform import 'incus_instance.uranian_hosts["<name>"]' ouranos/<name>,image=<fingerprint>
-```
-
-#### Getting Image Fingerprints
-
-First, get the fingerprint of the image resource from Terraform state:
-
-```bash
-cd terraform
-terraform state show incus_image.noble | grep fingerprint
-# Output: fingerprint = "75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644"
-
-terraform state show incus_image.questing | grep fingerprint
-# Output: fingerprint = "e78dd4a406b7fa3592ed0a6048862260b3d2e50c76e32a6169930245c0a13fdf"
-```
-
-#### Importing All Uranian Hosts
-
-Replace containers missing from state (or re-import after manual recreation):
-
-```bash
-# Containers using noble image
-terraform import 'incus_instance.uranian_hosts["ariel"]' ouranos/ariel,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
-terraform import 'incus_instance.uranian_hosts["miranda"]' ouranos/miranda,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
-terraform import 'incus_instance.uranian_hosts["oberon"]' ouranos/oberon,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
-terraform import 'incus_instance.uranian_hosts["portia"]' ouranos/portia,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
-terraform import 'incus_instance.uranian_hosts["prospero"]' ouranos/prospero,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
-terraform import 'incus_instance.uranian_hosts["rosalind"]' ouranos/rosalind,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
-terraform import 'incus_instance.uranian_hosts["sycorax"]' ouranos/sycorax,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
-terraform import 'incus_instance.uranian_hosts["titania"]' ouranos/titania,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
-terraform import 'incus_instance.uranian_hosts["umbriel"]' ouranos/umbriel,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
-
-# Containers using questing image
-terraform import 'incus_instance.uranian_hosts["caliban"]' ouranos/caliban,image=e78dd4a406b7fa3592ed0a6048862260b3d2e50c76e32a6169930245c0a13fdf
-terraform import 'incus_instance.uranian_hosts["puck"]' ouranos/puck,image=e78dd4a406b7fa3592ed0a6048862260b3d2e50c76e32a6169930245c0a13fdf
-```
-
-#### Storage Bucket Import
-
-For storage buckets, use the `<project>/<pool>/<name>` format:
-
-```bash
-terraform import incus_storage_bucket.<name> ouranos/default/<bucket-name>
-```
-
-#### Common Issues
-
-1. **Import ID format errors**: Use quotes around resource addresses with `for_each` keys: `'incus_instance.uranian_hosts["name"]'`
-
-2. **Image replacement on import**: Importing without specifying the image fingerprint will cause Terraform to replace the container on next apply. Always include `image=<fingerprint>` in the import ID.
-
-3. **Tainted state**: If a resource shows "will be created" but already exists, it may be tainted. Remove from state and re-import:
-   ```bash
-   terraform state rm 'incus_instance.uranian_hosts["name"]'
-   terraform import 'incus_instance.uranian_hosts["name"]' ouranos/name,image=<fingerprint>
-   ```
-
-#### Verify Import
-
-After importing, verify with `terraform plan`:
-
-```bash
-terraform plan
-# Should show: Plan: 0 to add, 0 to change, 0 to destroy
-# (Minor "update in-place" changes are normal for state sync of computed attributes)
-```
-
-### Ansible Workflow
-
-1. **Bootstrap** — Update packages, install essentials (`apt_update.yml`)
-2. **Agents** — Deploy Alloy (log/metrics) and Node Exporter on all hosts
-3. **Services** — Configure databases, Docker, applications, observability
-4. **Verify** — Check service health and connectivity
-
-### Vault Management
-
-```bash
-# Edit secrets
-ansible-vault edit inventory/group_vars/all/vault.yml
-
-# View secrets
-ansible-vault view inventory/group_vars/all/vault.yml
-
-# Encrypt a new file
-ansible-vault encrypt new_secrets.yml
-```
-
---
-
-## S3 Storage Provisioning
-
-Terraform provisions Incus S3 buckets for services requiring object storage:
-
-| Service | Host | Purpose |
-|---------|------|---------|
-| **Casdoor** | Titania | User avatars and SSO resource storage |
-| **LobeChat** | Rosalind | File uploads and attachments |
-
-> S3 credentials (access key, secret key, endpoint) are stored as sensitive Terraform outputs and managed in Ansible Vault with the `vault_*_s3_*` prefix.
-
---
-
-## Ansible Automation
-
-### Full Deployment (`site.yml`)
-
-Playbooks run in dependency order:
-
-| Playbook | Hosts | Purpose |
-|----------|-------|---------|
-| `apt_update.yml` | All | Update packages and install essentials |
-| `alloy/deploy.yml` | All | Grafana Alloy log/metrics collection |
-| `prometheus/node_deploy.yml` | All | Node Exporter metrics |
-| `docker/deploy.yml` | Oberon, Ariel, Miranda, Puck, Rosalind, Sycorax, Caliban, Titania | Docker engine |
-| `smtp4dev/deploy.yml` | Oberon | SMTP test server |
-| `pplg/deploy.yml` | Prospero | Full observability stack + HAProxy + OAuth2-Proxy |
-| `postgresql/deploy.yml` | Portia | PostgreSQL with all databases |
-| `postgresql_ssl/deploy.yml` | Titania | Dedicated PostgreSQL for Casdoor |
-| `neo4j/deploy.yml` | Ariel, Umbriel | Neo4j graph database (Umbriel is the dedicated Mnemosyne instance) |
-| `searxng/deploy.yml` | Oberon | SearXNG privacy search |
-| `haproxy/deploy.yml` | Titania | HAProxy TLS termination and routing |
-| `casdoor/deploy.yml` | Titania | Casdoor SSO |
-| `mcpo/deploy.yml` | Miranda | MCPO MCP proxy |
-| `openwebui/deploy.yml` | Oberon | Open WebUI LLM interface |
-| `hass/deploy.yml` | Oberon | Home Assistant |
-| `gitea/deploy.yml` | Rosalind | Gitea self-hosted Git |
-| `nextcloud/deploy.yml` | Rosalind | Nextcloud collaboration |
-
-### Individual Service Deployments
-
-Services with standalone deploy playbooks (not in `site.yml`):
-
-| Playbook | Host | Service |
-|----------|------|---------|
-| `anythingllm/deploy.yml` | Rosalind | AnythingLLM document AI |
-| `arke/deploy.yml` | Sycorax | Arke LLM proxy |
-| `argos/deploy.yml` | Miranda | Argos MCP web search server |
-| `caliban/deploy.yml` | Caliban | Agent S MCP Server |
-| `certbot/deploy.yml` | Titania | Let's Encrypt certificate renewal |
-| `gitea_mcp/deploy.yml` | Miranda | Gitea MCP Server |
-| `gitea_runner/deploy.yml` | Puck | Gitea CI/CD runner |
-| `grafana_mcp/deploy.yml` | Miranda | Grafana MCP Server |
-| `jupyterlab/deploy.yml` | Puck | JupyterLab + OAuth2-Proxy |
-| `kernos/deploy.yml` | Caliban | Kernos MCP shell server |
-| `lobechat/deploy.yml` | Rosalind | LobeChat AI chat |
-| `rommie/deploy.yml` | Caliban | Rommie MCP server (Agent S GUI automation) |
-| `neo4j_mcp/deploy.yml` | Miranda | Neo4j MCP Server |
-| `freecad_mcp/deploy.yml` | Caliban | FreeCAD Robust MCP Server |
-| `rabbitmq/deploy.yml` | Oberon | RabbitMQ message queue |
-
-### Lifecycle Playbooks
-
-| Playbook | Purpose |
-|----------|---------|
-| `sandbox_up.yml` | Start all Uranian host containers |
-| `sandbox_down.yml` | Gracefully stop all containers |
-| `apt_update.yml` | Update packages on all hosts |
-| `site.yml` | Full deployment orchestration |
-
---
-
-## Data Flow Architecture
-
-### Observability Pipeline
-
-```
-All Hosts                      Prospero                         Alerts
-Alloy + Node Exporter     →   Prometheus + Loki + Grafana   →  AlertManager + Pushover
-collect metrics & logs         storage & visualisation           notifications
-```
-
-### Integration Points
-
-| Consumer | Provider | Connection |
-|----------|----------|-----------|
-| All LLM apps | Arke (Sycorax) | `http://sycorax.incus:25540` |
-| Open WebUI, Arke, Gitea, Nextcloud, LobeChat | PostgreSQL (Portia) | `portia.incus:5432` |
-| Neo4j MCP | Neo4j (Ariel) | `ariel.incus:7687` (Bolt) |
-| Mnemosyne | Neo4j (Umbriel) | `umbriel.incus:7687` (Bolt) — dedicated tenant |
-| MCP Switchboard | Docker API (Miranda) | `tcp://miranda.incus:2375` |
-| MCP Switchboard | RabbitMQ (Oberon) | `oberon.incus:5672` |
-| Kairos, Spelunker | RabbitMQ (Oberon) | `oberon.incus:5672` |
-| SMTP (all apps) | smtp4dev (Oberon) | `oberon.incus:22025` |
-| All hosts | Loki (Prospero) | `http://prospero.incus:3100` |
-| All hosts | Prometheus (Prospero) | `http://prospero.incus:9090` |
-
---
-
-## Important Notes
-
-⚠️ **Alloy Host Variables Required** — Every host with `alloy` in its `services` list must define `alloy_log_level` in `inventory/host_vars/<host>.incus.yml`. The playbook will fail with an undefined variable error if this is missing.
-
-⚠️ **Alloy Syslog Listeners Required for Docker Services** — Any Docker Compose service using the syslog logging driver must have a corresponding `loki.source.syslog` listener in the host's Alloy config template (`ansible/alloy/<hostname>/config.alloy.j2`). Missing listeners cause Docker containers to fail on start.
-
-⚠️ **Local Terraform State** — This project uses local Terraform state (no remote backend). Do not run `terraform apply` from multiple machines simultaneously.
-
-⚠️ **Nested Docker** — Docker runs inside Incus containers (nested), requiring `security.nesting = true` and `lxc.apparmor.profile=unconfined` AppArmor override on all Docker-enabled hosts.
-
-⚠️ **Deployment Order** — Prospero (observability) must be fully deployed before other hosts, as Alloy on every host pushes logs and metrics to `prospero.incus`. Run `pplg/deploy.yml` before `site.yml` on a fresh environment.