The existing setup only attached the file/stderr handlers to the
'pallas' namespace, so every record emitted by fast-agent, fastmcp,
the MCP SDK, Anthropic, uvicorn etc. disappeared into Rich's progress
display and never hit pallas.log. When one of those libraries raised
and logged 'something failed' via logger.error(..., exc_info=True),
we ended up grepping a Rich-overwritten TTY for a traceback that was
already long gone -- exactly the situation blocking the current
Mnemosyne debug.
This patch:
* Extends _JSONFormatter to serialise exc_info/stack_info as a
'traceback' field when present, so Loki/grep sees the full stack.
* Attaches the same file+stderr handlers to the *root* logger so
every library's records (and any uncaught logger.error tracebacks)
land in pallas.log with the stack attached.
* Keeps the 'pallas' logger's own handlers (propagate=False) so our
records are unaffected by any later root-handler manipulation.
* Tags our handlers with _pallas_attached so repeated setup_logging()
calls are idempotent -- important because uvicorn workers and
fast-agent subagent subprocesses each reinitialise logging.
httpx/httpcore stay at WARNING so we don't flood the log with per-
request body traces on a DEBUG deployment. Demote third-party
namespaces further in a follow-up if needed.
The async_auth_flow override was being driven via 'await' in httpx's
async dispatcher, which yielded 'NoneType is not awaitable' because a
plain generator yielding a Request doesn't produce an awaitable.
httpx.Auth has three hooks: sync_auth_flow, async_auth_flow, and the
generic auth_flow. The default sync/async implementations delegate to
auth_flow when subclasses override only that one, which is exactly the
behaviour we want: one plain-generator implementation shared across
sync and async clients. Override auth_flow, drop sync/async overrides.
The previous static-header approach only ran at handshake time, and
persistent MCP connections reuse the open socket for every subsequent
tools/call. The first startup probe had no bearer, so every later
tool call inherited an empty Authorization header — Mnemosyne saw
no credentials and returned 'Authentication required'.
Fix: swap the static header for a _DynamicBearerAuth(httpx.Auth) that
httpx consults per-request via async_auth_flow. We look up the current
_pending_bearers entry for this server_config and stamp Authorization
on each outgoing request individually — no stale caching, no
handshake/tool-call skew.
Verified chain now runs:
bearer.captured (inbound)
forward.published (registry key)
forward.bound (auth object installed at connect time)
forward.applied (stamped per request via async_auth_flow)
Root-cause: fast-agent's Settings(**merged_settings) validation pipeline
silently drops unknown keys on nested MCPServerSettings instances — even
after flipping extra='allow' and calling model_rebuild(force=True). The
culprit is Settings(nested_model_default_partial_update=True) which takes
a model_construct path that discards model_extra on the nested model.
Verified live: MCPServerSettings.model_validate({'forward_inbound_auth': True})
preserves the field (model_extra={'forward_inbound_auth': True}), but
get_settings().mcp.servers['mnemosyne'] returns an instance where the
attribute is MISSING and model_extra is None.
Fix: parse fastagent.config.yaml ourselves at patch-install time and
record the set of opted-in server names in _FORWARD_SERVERS. The patch
and multimodal_server's forwardable-config resolver both key off the
server name — stable, authoritative, and completely sidesteps Pydantic's
extras handling.
fast-agent's progress_display installs a Rich Live renderer on stdout/stderr;
plain StreamHandler records get swallowed mid-render, making the bearer-
forwarding DEBUG logs invisible on the console.
Route every pallas.* record to two sinks:
1. ~/.local/state/pallas/pallas.log (rotating, 10MiB x5) — durable capture
regardless of who owns the TTY. Overridable via PALLAS_LOG_FILE.
2. sys.__stderr__ — the original stderr FD captured before Rich could grab
it, so records still reach the TTY / journal when DEBUG is on.
Avoids /tmp deliberately: systemd PrivateTmp=yes made /tmp/pallas-bearer.log
invisible during the original debug saga.
The Mnemosyne Authorization: Bearer token was being dropped on outbound MCP
calls because fast-agent runs downstream transports inside a long-lived
anyio TaskGroup whose context is snapshotted at manager startup —
request_bearer_token.get() inside _prepare_headers_and_auth therefore
always resolved to None even when the request handler had just set it.
Fix:
* pallas/_fastagent_patch.py
- add _pending_bearers registry keyed by id(server_config) with a
threading.Lock; publish_bearer / revoke_bearer helpers.
- patched _prepare_headers_and_auth reads the registry first, falls
back to the ContextVar for non-persistent probe paths.
- emit INFO log on install() so the journal shows the patch ran;
verbose flow logs at DEBUG on pallas.forward.
* pallas/multimodal_server.py
- send_message resolves the agent's opted-in downstreams, publishes
the inbound bearer for each, and revokes them all in the finally.
- bearer/header diagnostics go to pallas.auth (DEBUG) instead of
/tmp/pallas-bearer.log which is invisible under systemd PrivateTmp.
* pallas/log.py
- honour PALLAS_LOG_LEVEL env var (default INFO) so operators can
flip the forward/auth diagnostics on without a code change.
* docs/pallas.md, docs/mnemosyne_integration.md
- document the registry-based forwarding and the task-group
ContextVar constraint that forced it.
Replace stdlib logger calls for inbound bearer token capture and forward
decisions with a `_diag_write` helper that appends to
`/tmp/pallas-bearer.log`. This ensures diagnostic output is reliably
captured regardless of logger configuration, while swallowing any write
errors to avoid impacting request handling.
Add info-level logging to trace bearer token capture and forwarding
through fastagent, including token length/prefix and reasons for
skipping forward (existing user auth, oauth, or missing inbound token).
Also log warnings on bearer extraction errors instead of silently
swallowing exceptions.
get_access_token() requires FastMCP auth middleware to populate
AuthenticatedUser in the request scope — Pallas runs without auth
middleware so it always returned None. Read the Authorization header
directly from the ASGI request instead.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduce per-server `forward_inbound_auth` flag that controls whether the
inbound MCP bearer token is propagated to outbound MCP transport calls.
Implemented as a fast-agent monkey-patch auto-installed on package import,
preventing accidental credential leakage to unrelated downstream servers.
Update docs to describe the two bearer token consumers (LLM provider
passthrough and opt-in downstream MCP forwarding) with a config example.
Make Pallas truly stateless per the 'Pallas is ephemeral' contract.
BREAKING (behavioural, not API):
* instance_scope changes from 'shared' to 'request' in pallas.server.
Each MCP tools/call now acquires a freshly-created fast-agent instance
via the existing create_instance / dispose_instance factories and
disposes it immediately after the response.
With 'shared' mode:
* Every MCP caller saw the same agent.message_history, so different
Daedalus conversations leaked into each other.
* Mid-chat context was silently truncated once the model window filled.
* Restarting the Pallas process wiped all in-flight conversation state,
even though Daedalus had it persisted in Postgres.
With 'request' mode the Pallas process holds no per-conversation state;
the caller (Daedalus) owns history and reseeds it on every turn.
send_message gains two optional arguments:
* history: list[{role, content, images?}] in chronological order,
converted to PromptMessageExtended and seeded onto the fresh
instance's message_history before agent.send().
* conversation_id: opaque string, logged for trace correlation only —
Pallas never interprets or persists it.
Malformed history entries (bad role, missing image data/mime_type, etc.)
are skipped with a warning rather than raising, so a single bad row
cannot wipe a whole conversation.
The {agent}_history MCP prompt is still registered under 'request'
scope for backward compatibility but always returns []; history lives
on the client.
Version bumped to 0.2.0.
Extend `_HealthAccessFilter` to also drop uvicorn access log lines for
successful `POST /mcp` requests, in addition to the existing
`/live`, `/ready`, and `/metrics` health probes.
**Why:** Every Daedalus health poll and tool call hits the single `/mcp`
route. Pallas already emits structured `mcp_request_start` /
`mcp_request_complete` logs at the agent layer, making the uvicorn
access line pure duplication and noise in syslog.
**How:**
- Replace the simple substring list `_HEALTH_PATHS` with compiled regex
patterns (`_HEALTH_PATH_RE`, `_MCP_RE`) for more precise path matching
- Add `_SUCCESS_STATUS_RE` to only suppress 1xx/2xx/3xx responses;
non-successful responses (4xx, 5xx) still pass through as real signals
- Update docstring to document the new suppression rules clearly
Swap out the standard `MCPToolProgressManager` from fast-agent with
the local `EnrichedMCPToolProgressManager` from `pallas.progress` to
provide richer progress reporting during tool execution in the
multimodal MCP server.
Add optional `model` and `model_capabilities` fields to agent definitions
in agents.yaml, allowing each agent to target a different model/provider
with its own capability parameters (vision, context_window, etc.).
- Refactor `_build_agents_table` to return rich dicts instead of tuples
- Extract `_register_one_model` from `_register_unknown_models` for reuse
- Register per-agent models in addition to the global default_model,
falling back to top-level model_capabilities when agent-specific ones
are not provided
- Override `AgentConfig.model` at startup when an agent declares a model
- Thread deployment_config through `_preflight` and `_start_agent`