Fix bearer forwarding across anyio TaskGroup boundary

The Mnemosyne Authorization: Bearer token was being dropped on outbound MCP calls because fast-agent runs downstream transports inside a long-lived anyio TaskGroup whose context is snapshotted at manager startup — request_bearer_token.get() inside _prepare_headers_and_auth therefore always resolved to None even when the request handler had just set it. Fix: * pallas/_fastagent_patch.py - add _pending_bearers registry keyed by id(server_config) with a threading.Lock; publish_bearer / revoke_bearer helpers. - patched _prepare_headers_and_auth reads the registry first, falls back to the ContextVar for non-persistent probe paths. - emit INFO log on install() so the journal shows the patch ran; verbose flow logs at DEBUG on pallas.forward. * pallas/multimodal_server.py - send_message resolves the agent's opted-in downstreams, publishes the inbound bearer for each, and revokes them all in the finally. - bearer/header diagnostics go to pallas.auth (DEBUG) instead of /tmp/pallas-bearer.log which is invisible under systemd PrivateTmp. * pallas/log.py - honour PALLAS_LOG_LEVEL env var (default INFO) so operators can flip the forward/auth diagnostics on without a code change. * docs/pallas.md, docs/mnemosyne_integration.md - document the registry-based forwarding and the task-group ContextVar constraint that forced it.
2026-05-05 12:09:51 -04:00
parent 24c7374f3d
commit 679a809f66
5 changed files with 220 additions and 48 deletions
--- a/docs/mnemosyne_integration.md
+++ b/docs/mnemosyne_integration.md
@@ -74,11 +74,14 @@ async def _shawn():

 2. Daedalus calls Pallas's `send_message` tool with `Authorization: Bearer <token>` in the HTTP request headers.

-3. Pallas's `MultimodalAgentMCPServer` captures the token via FastMCP's `get_access_token()` into the `request_bearer_token` context variable (see `pallas/multimodal_server.py`).
+3. Pallas's `MultimodalAgentMCPServer` captures the token by reading the request's `Authorization` header directly through `fastmcp.server.dependencies.get_http_request()` — `get_access_token()` returns `None` because Pallas runs without the FastMCP auth middleware. The token is pushed into the `request_bearer_token` ContextVar (for LLM-provider passthrough) and **also** registered in a per-request bearer registry keyed by each opted-in downstream's `MCPServerSettings` object.

-4. The fast-agent patch in `pallas/_fastagent_patch.py` (installed at import time in `pallas/__init__.py`) wraps `_prepare_headers_and_auth`. When a server config has `forward_inbound_auth: true`, the patch reads `request_bearer_token.get()` and injects `Authorization: Bearer <token>` into the outgoing HTTP headers for that MCP call.
+4. The fast-agent patch in `pallas/_fastagent_patch.py` (installed at import time in `pallas/__init__.py`) wraps `_prepare_headers_and_auth`. When a server config has `forward_inbound_auth: true`, the patch reads the bearer out of the per-request registry (with the ContextVar as a fallback) and injects `Authorization: Bearer <token>` into the outgoing HTTP headers for that MCP call. The registry is required because fast-agent's `MCPConnectionManager` runs the transport in its own anyio `TaskGroup`, which does not inherit the request handler's `contextvars.Context`.
+
+5. The request handler's `finally` clause revokes every bearer it published, so per-request tokens never outlive the call and no stale credentials can be reused.
+
+6. Mnemosyne receives the same token, validates the HMAC signature against its `MCPSigningKey` table, and scopes all search Cypher queries to `ws` from the claims.

-5. Mnemosyne receives the same token, validates the HMAC signature against its `MCPSigningKey` table, and scopes all search Cypher queries to `ws` from the claims.

 The `forward_inbound_auth` flag is **per-server** — other servers in the same agent (`argos`, `neo4j_cypher`, `time`, etc.) never receive the bearer.