Fix bearer forwarding across anyio TaskGroup boundary

The Mnemosyne Authorization: Bearer token was being dropped on outbound MCP calls because fast-agent runs downstream transports inside a long-lived anyio TaskGroup whose context is snapshotted at manager startup — request_bearer_token.get() inside _prepare_headers_and_auth therefore always resolved to None even when the request handler had just set it. Fix: * pallas/_fastagent_patch.py - add _pending_bearers registry keyed by id(server_config) with a threading.Lock; publish_bearer / revoke_bearer helpers. - patched _prepare_headers_and_auth reads the registry first, falls back to the ContextVar for non-persistent probe paths. - emit INFO log on install() so the journal shows the patch ran; verbose flow logs at DEBUG on pallas.forward. * pallas/multimodal_server.py - send_message resolves the agent's opted-in downstreams, publishes the inbound bearer for each, and revokes them all in the finally. - bearer/header diagnostics go to pallas.auth (DEBUG) instead of /tmp/pallas-bearer.log which is invisible under systemd PrivateTmp. * pallas/log.py - honour PALLAS_LOG_LEVEL env var (default INFO) so operators can flip the forward/auth diagnostics on without a code change. * docs/pallas.md, docs/mnemosyne_integration.md - document the registry-based forwarding and the task-group ContextVar constraint that forced it.
2026-05-05 12:09:51 -04:00
parent 24c7374f3d
commit 679a809f66
5 changed files with 220 additions and 48 deletions
--- a/docs/mnemosyne_integration.md
+++ b/docs/mnemosyne_integration.md
@@ -74,11 +74,14 @@ async def _shawn():

 2. Daedalus calls Pallas's `send_message` tool with `Authorization: Bearer <token>` in the HTTP request headers.

-3. Pallas's `MultimodalAgentMCPServer` captures the token via FastMCP's `get_access_token()` into the `request_bearer_token` context variable (see `pallas/multimodal_server.py`).
+3. Pallas's `MultimodalAgentMCPServer` captures the token by reading the request's `Authorization` header directly through `fastmcp.server.dependencies.get_http_request()` — `get_access_token()` returns `None` because Pallas runs without the FastMCP auth middleware. The token is pushed into the `request_bearer_token` ContextVar (for LLM-provider passthrough) and **also** registered in a per-request bearer registry keyed by each opted-in downstream's `MCPServerSettings` object.

-4. The fast-agent patch in `pallas/_fastagent_patch.py` (installed at import time in `pallas/__init__.py`) wraps `_prepare_headers_and_auth`. When a server config has `forward_inbound_auth: true`, the patch reads `request_bearer_token.get()` and injects `Authorization: Bearer <token>` into the outgoing HTTP headers for that MCP call.
+4. The fast-agent patch in `pallas/_fastagent_patch.py` (installed at import time in `pallas/__init__.py`) wraps `_prepare_headers_and_auth`. When a server config has `forward_inbound_auth: true`, the patch reads the bearer out of the per-request registry (with the ContextVar as a fallback) and injects `Authorization: Bearer <token>` into the outgoing HTTP headers for that MCP call. The registry is required because fast-agent's `MCPConnectionManager` runs the transport in its own anyio `TaskGroup`, which does not inherit the request handler's `contextvars.Context`.
+
+5. The request handler's `finally` clause revokes every bearer it published, so per-request tokens never outlive the call and no stale credentials can be reused.
+
+6. Mnemosyne receives the same token, validates the HMAC signature against its `MCPSigningKey` table, and scopes all search Cypher queries to `ws` from the claims.

-5. Mnemosyne receives the same token, validates the HMAC signature against its `MCPSigningKey` table, and scopes all search Cypher queries to `ws` from the claims.

 The `forward_inbound_auth` flag is **per-server** — other servers in the same agent (`argos`, `neo4j_cypher`, `time`, etc.) never receive the bearer.

--- a/docs/pallas.md
+++ b/docs/pallas.md
@@ -417,10 +417,19 @@ For agents with `instance_scope != "request"`, a `{agent}_history` prompt is reg

 ### Bearer Token Propagation

-The server captures the authenticated bearer token from the incoming MCP request into the `request_bearer_token` context variable. Two consumers read it:
+The server captures the authenticated bearer token from the incoming MCP request's `Authorization: Bearer …` header via `fastmcp.server.dependencies.get_http_request()` (FastMCP's `get_access_token()` returns `None` because Pallas runs without the auth middleware). Two consumers read it:
+
+- **LLM-provider passthrough** — the token is also pushed into the `request_bearer_token` ContextVar for the agent's LLM provider key manager to pick up automatically (used by HuggingFace and any other token-passthrough providers). The ContextVar works here because the LLM call runs in a child task of the request handler.
+- **Downstream MCP servers (opt-in)** — outgoing MCP calls inherit the same bearer when the downstream server is marked `forward_inbound_auth: true` in `fastagent.config.yaml`. Without that flag, the inbound bearer is **not** forwarded to MCP transport calls — `server_config.headers` is the only header source.
+
+The forwarding is per-server so a FastAgent attached to both a credentialed downstream (e.g. Mnemosyne) and an unrelated public server doesn't leak the bearer to the latter.
+
+#### Why a simple ContextVar forward isn't enough
+
+fast-agent's `MCPConnectionManager` runs each downstream transport inside a long-lived `anyio.TaskGroup` created at manager startup. `TaskGroup.start_soon` snapshots the owner's `contextvars.Context` at spawn time — the request-handler's context is invisible to the transport task. A straight `request_bearer_token.get()` inside `_prepare_headers_and_auth` therefore always resolves to `None` even when the inbound handler has `set` the token a few frames up. The persistent connection is additionally reused across requests, so the first-call context (often empty) would be cached forever.
+
+Pallas works around this in `pallas._fastagent_patch` by maintaining a process-wide `_pending_bearers` registry keyed by `id(server_config)`. `multimodal_server.send_message` calls `publish_bearer(cfg, token)` for every opted-in downstream the agent is allowed to reach; the patched `_prepare_headers_and_auth` looks it up there (with the ContextVar as a fallback for non-persistent probe paths); and the request handler's `finally` block calls `revoke_bearer(cfg)` to clear the entry. Per-request bearers therefore survive the task-group boundary without any mutation of shared config.

- **LLM-provider passthrough** — the agent's LLM provider key manager picks it up automatically (used by HuggingFace and any other token-passthrough providers).
- **Downstream MCP servers (opt-in)** — outgoing MCP calls inherit the same bearer when the downstream server is marked `forward_inbound_auth: true` in `fastagent.config.yaml`. Without that flag, `request_bearer_token` is **not** forwarded to MCP transport calls — `server_config.headers` is the only header source. This is implemented as a fast-agent monkey-patch in `pallas._fastagent_patch` and is per-server so a FastAgent attached to both a credentialed downstream (e.g. Mnemosyne) and an unrelated public server doesn't leak the bearer to the latter.

 Example: