Fix bearer forwarding across anyio TaskGroup boundary
The Mnemosyne Authorization: Bearer token was being dropped on outbound MCP
calls because fast-agent runs downstream transports inside a long-lived
anyio TaskGroup whose context is snapshotted at manager startup —
request_bearer_token.get() inside _prepare_headers_and_auth therefore
always resolved to None even when the request handler had just set it.
Fix:
* pallas/_fastagent_patch.py
- add _pending_bearers registry keyed by id(server_config) with a
threading.Lock; publish_bearer / revoke_bearer helpers.
- patched _prepare_headers_and_auth reads the registry first, falls
back to the ContextVar for non-persistent probe paths.
- emit INFO log on install() so the journal shows the patch ran;
verbose flow logs at DEBUG on pallas.forward.
* pallas/multimodal_server.py
- send_message resolves the agent's opted-in downstreams, publishes
the inbound bearer for each, and revokes them all in the finally.
- bearer/header diagnostics go to pallas.auth (DEBUG) instead of
/tmp/pallas-bearer.log which is invisible under systemd PrivateTmp.
* pallas/log.py
- honour PALLAS_LOG_LEVEL env var (default INFO) so operators can
flip the forward/auth diagnostics on without a code change.
* docs/pallas.md, docs/mnemosyne_integration.md
- document the registry-based forwarding and the task-group
ContextVar constraint that forced it.
This commit is contained in:
@@ -74,11 +74,14 @@ async def _shawn():
|
||||
|
||||
2. Daedalus calls Pallas's `send_message` tool with `Authorization: Bearer <token>` in the HTTP request headers.
|
||||
|
||||
3. Pallas's `MultimodalAgentMCPServer` captures the token via FastMCP's `get_access_token()` into the `request_bearer_token` context variable (see `pallas/multimodal_server.py`).
|
||||
3. Pallas's `MultimodalAgentMCPServer` captures the token by reading the request's `Authorization` header directly through `fastmcp.server.dependencies.get_http_request()` — `get_access_token()` returns `None` because Pallas runs without the FastMCP auth middleware. The token is pushed into the `request_bearer_token` ContextVar (for LLM-provider passthrough) and **also** registered in a per-request bearer registry keyed by each opted-in downstream's `MCPServerSettings` object.
|
||||
|
||||
4. The fast-agent patch in `pallas/_fastagent_patch.py` (installed at import time in `pallas/__init__.py`) wraps `_prepare_headers_and_auth`. When a server config has `forward_inbound_auth: true`, the patch reads `request_bearer_token.get()` and injects `Authorization: Bearer <token>` into the outgoing HTTP headers for that MCP call.
|
||||
4. The fast-agent patch in `pallas/_fastagent_patch.py` (installed at import time in `pallas/__init__.py`) wraps `_prepare_headers_and_auth`. When a server config has `forward_inbound_auth: true`, the patch reads the bearer out of the per-request registry (with the ContextVar as a fallback) and injects `Authorization: Bearer <token>` into the outgoing HTTP headers for that MCP call. The registry is required because fast-agent's `MCPConnectionManager` runs the transport in its own anyio `TaskGroup`, which does not inherit the request handler's `contextvars.Context`.
|
||||
|
||||
5. The request handler's `finally` clause revokes every bearer it published, so per-request tokens never outlive the call and no stale credentials can be reused.
|
||||
|
||||
6. Mnemosyne receives the same token, validates the HMAC signature against its `MCPSigningKey` table, and scopes all search Cypher queries to `ws` from the claims.
|
||||
|
||||
5. Mnemosyne receives the same token, validates the HMAC signature against its `MCPSigningKey` table, and scopes all search Cypher queries to `ws` from the claims.
|
||||
|
||||
The `forward_inbound_auth` flag is **per-server** — other servers in the same agent (`argos`, `neo4j_cypher`, `time`, etc.) never receive the bearer.
|
||||
|
||||
|
||||
@@ -417,10 +417,19 @@ For agents with `instance_scope != "request"`, a `{agent}_history` prompt is reg
|
||||
|
||||
### Bearer Token Propagation
|
||||
|
||||
The server captures the authenticated bearer token from the incoming MCP request into the `request_bearer_token` context variable. Two consumers read it:
|
||||
The server captures the authenticated bearer token from the incoming MCP request's `Authorization: Bearer …` header via `fastmcp.server.dependencies.get_http_request()` (FastMCP's `get_access_token()` returns `None` because Pallas runs without the auth middleware). Two consumers read it:
|
||||
|
||||
- **LLM-provider passthrough** — the token is also pushed into the `request_bearer_token` ContextVar for the agent's LLM provider key manager to pick up automatically (used by HuggingFace and any other token-passthrough providers). The ContextVar works here because the LLM call runs in a child task of the request handler.
|
||||
- **Downstream MCP servers (opt-in)** — outgoing MCP calls inherit the same bearer when the downstream server is marked `forward_inbound_auth: true` in `fastagent.config.yaml`. Without that flag, the inbound bearer is **not** forwarded to MCP transport calls — `server_config.headers` is the only header source.
|
||||
|
||||
The forwarding is per-server so a FastAgent attached to both a credentialed downstream (e.g. Mnemosyne) and an unrelated public server doesn't leak the bearer to the latter.
|
||||
|
||||
#### Why a simple ContextVar forward isn't enough
|
||||
|
||||
fast-agent's `MCPConnectionManager` runs each downstream transport inside a long-lived `anyio.TaskGroup` created at manager startup. `TaskGroup.start_soon` snapshots the owner's `contextvars.Context` at spawn time — the request-handler's context is invisible to the transport task. A straight `request_bearer_token.get()` inside `_prepare_headers_and_auth` therefore always resolves to `None` even when the inbound handler has `set` the token a few frames up. The persistent connection is additionally reused across requests, so the first-call context (often empty) would be cached forever.
|
||||
|
||||
Pallas works around this in `pallas._fastagent_patch` by maintaining a process-wide `_pending_bearers` registry keyed by `id(server_config)`. `multimodal_server.send_message` calls `publish_bearer(cfg, token)` for every opted-in downstream the agent is allowed to reach; the patched `_prepare_headers_and_auth` looks it up there (with the ContextVar as a fallback for non-persistent probe paths); and the request handler's `finally` block calls `revoke_bearer(cfg)` to clear the entry. Per-request bearers therefore survive the task-group boundary without any mutation of shared config.
|
||||
|
||||
- **LLM-provider passthrough** — the agent's LLM provider key manager picks it up automatically (used by HuggingFace and any other token-passthrough providers).
|
||||
- **Downstream MCP servers (opt-in)** — outgoing MCP calls inherit the same bearer when the downstream server is marked `forward_inbound_auth: true` in `fastagent.config.yaml`. Without that flag, `request_bearer_token` is **not** forwarded to MCP transport calls — `server_config.headers` is the only header source. This is implemented as a fast-agent monkey-patch in `pallas._fastagent_patch` and is per-server so a FastAgent attached to both a credentialed downstream (e.g. Mnemosyne) and an unrelated public server doesn't leak the bearer to the latter.
|
||||
|
||||
Example:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user