fast-agent's MCPAgentClientSession.send_request catches every downstream
transport exception, logs the one-line 'send_request failed: <str(e)>'
WITHOUT exc_info=True, then re-raises. The exception then propagates
up to the agent loop where its message is serialised as the tool result
string ('object NoneType can't be used in an await expression' being
the canonical symptom) and the traceback is lost forever.
Wrap send_request so Pallas emits logger.exception() with the full
stack against the 'pallas.forward.trace' logger before re-raising.
No behavioural change — we re-raise the same exception; we just get
one extra log record with the frames attached, which pallas.log now
preserves thanks to the _JSONFormatter traceback field.
This will surface the real origin of the NoneType-await that's
currently being served as Harper's mnemosyne tool result even though
Mnemosyne itself returns 200 OK.
The existing setup only attached the file/stderr handlers to the
'pallas' namespace, so every record emitted by fast-agent, fastmcp,
the MCP SDK, Anthropic, uvicorn etc. disappeared into Rich's progress
display and never hit pallas.log. When one of those libraries raised
and logged 'something failed' via logger.error(..., exc_info=True),
we ended up grepping a Rich-overwritten TTY for a traceback that was
already long gone -- exactly the situation blocking the current
Mnemosyne debug.
This patch:
* Extends _JSONFormatter to serialise exc_info/stack_info as a
'traceback' field when present, so Loki/grep sees the full stack.
* Attaches the same file+stderr handlers to the *root* logger so
every library's records (and any uncaught logger.error tracebacks)
land in pallas.log with the stack attached.
* Keeps the 'pallas' logger's own handlers (propagate=False) so our
records are unaffected by any later root-handler manipulation.
* Tags our handlers with _pallas_attached so repeated setup_logging()
calls are idempotent -- important because uvicorn workers and
fast-agent subagent subprocesses each reinitialise logging.
httpx/httpcore stay at WARNING so we don't flood the log with per-
request body traces on a DEBUG deployment. Demote third-party
namespaces further in a follow-up if needed.
The async_auth_flow override was being driven via 'await' in httpx's
async dispatcher, which yielded 'NoneType is not awaitable' because a
plain generator yielding a Request doesn't produce an awaitable.
httpx.Auth has three hooks: sync_auth_flow, async_auth_flow, and the
generic auth_flow. The default sync/async implementations delegate to
auth_flow when subclasses override only that one, which is exactly the
behaviour we want: one plain-generator implementation shared across
sync and async clients. Override auth_flow, drop sync/async overrides.
The previous static-header approach only ran at handshake time, and
persistent MCP connections reuse the open socket for every subsequent
tools/call. The first startup probe had no bearer, so every later
tool call inherited an empty Authorization header — Mnemosyne saw
no credentials and returned 'Authentication required'.
Fix: swap the static header for a _DynamicBearerAuth(httpx.Auth) that
httpx consults per-request via async_auth_flow. We look up the current
_pending_bearers entry for this server_config and stamp Authorization
on each outgoing request individually — no stale caching, no
handshake/tool-call skew.
Verified chain now runs:
bearer.captured (inbound)
forward.published (registry key)
forward.bound (auth object installed at connect time)
forward.applied (stamped per request via async_auth_flow)
Root-cause: fast-agent's Settings(**merged_settings) validation pipeline
silently drops unknown keys on nested MCPServerSettings instances — even
after flipping extra='allow' and calling model_rebuild(force=True). The
culprit is Settings(nested_model_default_partial_update=True) which takes
a model_construct path that discards model_extra on the nested model.
Verified live: MCPServerSettings.model_validate({'forward_inbound_auth': True})
preserves the field (model_extra={'forward_inbound_auth': True}), but
get_settings().mcp.servers['mnemosyne'] returns an instance where the
attribute is MISSING and model_extra is None.
Fix: parse fastagent.config.yaml ourselves at patch-install time and
record the set of opted-in server names in _FORWARD_SERVERS. The patch
and multimodal_server's forwardable-config resolver both key off the
server name — stable, authoritative, and completely sidesteps Pydantic's
extras handling.
fast-agent's progress_display installs a Rich Live renderer on stdout/stderr;
plain StreamHandler records get swallowed mid-render, making the bearer-
forwarding DEBUG logs invisible on the console.
Route every pallas.* record to two sinks:
1. ~/.local/state/pallas/pallas.log (rotating, 10MiB x5) — durable capture
regardless of who owns the TTY. Overridable via PALLAS_LOG_FILE.
2. sys.__stderr__ — the original stderr FD captured before Rich could grab
it, so records still reach the TTY / journal when DEBUG is on.
Avoids /tmp deliberately: systemd PrivateTmp=yes made /tmp/pallas-bearer.log
invisible during the original debug saga.
The Mnemosyne Authorization: Bearer token was being dropped on outbound MCP
calls because fast-agent runs downstream transports inside a long-lived
anyio TaskGroup whose context is snapshotted at manager startup —
request_bearer_token.get() inside _prepare_headers_and_auth therefore
always resolved to None even when the request handler had just set it.
Fix:
* pallas/_fastagent_patch.py
- add _pending_bearers registry keyed by id(server_config) with a
threading.Lock; publish_bearer / revoke_bearer helpers.
- patched _prepare_headers_and_auth reads the registry first, falls
back to the ContextVar for non-persistent probe paths.
- emit INFO log on install() so the journal shows the patch ran;
verbose flow logs at DEBUG on pallas.forward.
* pallas/multimodal_server.py
- send_message resolves the agent's opted-in downstreams, publishes
the inbound bearer for each, and revokes them all in the finally.
- bearer/header diagnostics go to pallas.auth (DEBUG) instead of
/tmp/pallas-bearer.log which is invisible under systemd PrivateTmp.
* pallas/log.py
- honour PALLAS_LOG_LEVEL env var (default INFO) so operators can
flip the forward/auth diagnostics on without a code change.
* docs/pallas.md, docs/mnemosyne_integration.md
- document the registry-based forwarding and the task-group
ContextVar constraint that forced it.
Replace stdlib logger calls for inbound bearer token capture and forward
decisions with a `_diag_write` helper that appends to
`/tmp/pallas-bearer.log`. This ensures diagnostic output is reliably
captured regardless of logger configuration, while swallowing any write
errors to avoid impacting request handling.
Add info-level logging to trace bearer token capture and forwarding
through fastagent, including token length/prefix and reasons for
skipping forward (existing user auth, oauth, or missing inbound token).
Also log warnings on bearer extraction errors instead of silently
swallowing exceptions.
get_access_token() requires FastMCP auth middleware to populate
AuthenticatedUser in the request scope — Pallas runs without auth
middleware so it always returned None. Read the Authorization header
directly from the ASGI request instead.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduce per-server `forward_inbound_auth` flag that controls whether the
inbound MCP bearer token is propagated to outbound MCP transport calls.
Implemented as a fast-agent monkey-patch auto-installed on package import,
preventing accidental credential leakage to unrelated downstream servers.
Update docs to describe the two bearer token consumers (LLM provider
passthrough and opt-in downstream MCP forwarding) with a config example.
Make Pallas truly stateless per the 'Pallas is ephemeral' contract.
BREAKING (behavioural, not API):
* instance_scope changes from 'shared' to 'request' in pallas.server.
Each MCP tools/call now acquires a freshly-created fast-agent instance
via the existing create_instance / dispose_instance factories and
disposes it immediately after the response.
With 'shared' mode:
* Every MCP caller saw the same agent.message_history, so different
Daedalus conversations leaked into each other.
* Mid-chat context was silently truncated once the model window filled.
* Restarting the Pallas process wiped all in-flight conversation state,
even though Daedalus had it persisted in Postgres.
With 'request' mode the Pallas process holds no per-conversation state;
the caller (Daedalus) owns history and reseeds it on every turn.
send_message gains two optional arguments:
* history: list[{role, content, images?}] in chronological order,
converted to PromptMessageExtended and seeded onto the fresh
instance's message_history before agent.send().
* conversation_id: opaque string, logged for trace correlation only —
Pallas never interprets or persists it.
Malformed history entries (bad role, missing image data/mime_type, etc.)
are skipped with a warning rather than raising, so a single bad row
cannot wipe a whole conversation.
The {agent}_history MCP prompt is still registered under 'request'
scope for backward compatibility but always returns []; history lives
on the client.
Version bumped to 0.2.0.
Extend `_HealthAccessFilter` to also drop uvicorn access log lines for
successful `POST /mcp` requests, in addition to the existing
`/live`, `/ready`, and `/metrics` health probes.
**Why:** Every Daedalus health poll and tool call hits the single `/mcp`
route. Pallas already emits structured `mcp_request_start` /
`mcp_request_complete` logs at the agent layer, making the uvicorn
access line pure duplication and noise in syslog.
**How:**
- Replace the simple substring list `_HEALTH_PATHS` with compiled regex
patterns (`_HEALTH_PATH_RE`, `_MCP_RE`) for more precise path matching
- Add `_SUCCESS_STATUS_RE` to only suppress 1xx/2xx/3xx responses;
non-successful responses (4xx, 5xx) still pass through as real signals
- Update docstring to document the new suppression rules clearly
Swap out the standard `MCPToolProgressManager` from fast-agent with
the local `EnrichedMCPToolProgressManager` from `pallas.progress` to
provide richer progress reporting during tool execution in the
multimodal MCP server.
Add optional `model` and `model_capabilities` fields to agent definitions
in agents.yaml, allowing each agent to target a different model/provider
with its own capability parameters (vision, context_window, etc.).
- Refactor `_build_agents_table` to return rich dicts instead of tuples
- Extract `_register_one_model` from `_register_unknown_models` for reuse
- Register per-agent models in addition to the global default_model,
falling back to top-level model_capabilities when agent-specific ones
are not provided
- Override `AgentConfig.model` at startup when an agent declares a model
- Thread deployment_config through `_preflight` and `_start_agent`