Introduces `pallas.loop_guard` module that detects and halts agentic loops
where the same `(tool, args) → result` repeats consecutively, preventing
wasted LLM turns when upstream MCP servers return contradictory data.
- Add per-request `ToolRunnerHooks` tracking rolling tool-call signatures
- Halt loop after `loop_repeat_threshold` consecutive repeats (default 3)
- Collapse `max_iterations` on halt to terminate without further LLM call
- Append user-facing explanation to the turn with `stop_reason=endTurn`
- Expose `pallas_agent_loop_aborted_total{agent,reason}` counter
- Add per-agent `max_iterations` and `loop_repeat_threshold` config
- Document guard behavior, metric, and alerting query
Add two new sections to the Pallas documentation:
- Sampling parameters: explain that temperature/top_p/top_k are
configured via the fast-agent decorator's `request_params`, with a
provider support matrix and a note on Claude Opus 4.7 stripping these
params in favor of `output_config.effort`.
- Metrics: document the Prometheus `/metrics` endpoint exposed on the
registry port, including scrape config, full metrics reference table,
and notes on where each metric is captured.
Introduce `model_capabilities.mantle` flag that installs a provider-specific
override in fast-agent's `ModelDatabase._PROVIDER_MODEL_OVERRIDES` to strip
features the AWS Bedrock Mantle endpoint rejects (beta headers, extended
thinking, task budgets, web tools, prompt caching).
Without this override, fast-agent sends default beta headers and `thinking`
parameters for modern Claude models that Mantle rejects with a misleading
404 "model does not exist" error.
Add comprehensive model card for Anthropic's Claude Haiku 4.5 on AWS
Bedrock, including model details, capabilities, pricing, programmatic
access examples, and regional availability information.
Capture the five debugging chapters from the bearer-forwarding rollout
so the knowledge doesn't live only in chat history:
1. Per-request bearer across anyio.TaskGroup boundary (ContextVar
snapshot semantics, httpx auth-header caching on persistent
connections, forward_inbound_auth pydantic-drop workaround).
2. install() idempotency guard shadowing three newly-added
monkey-patches — each patch now owns its own sentinel.
3. FastMCP on_call_tool context shape: context.message.name, not
context.message.params.name. Extractor returning None silently
killed the _PUBLIC_TOOLS bypass and downstream dispatch "await
None(...)" produced the terse 'object NoneType can't be used in
await expression' string that blocked Harper<->Mnemosyne.
4. Rich-TUI corruption by DEBUG openai/sse_starlette/mcp via root
logger inheriting logger.level=debug + our stderr StreamHandler.
Fixed by PALLAS_LOG_STDERR gate and PALLAS_ROOT_LOG_LEVEL split.
5. Current state table of PALLAS_LOG_* knobs + jq tail recipe.
Also add pallas.log and pallas._fastagent_patch to the Module Reference
table.
The Mnemosyne Authorization: Bearer token was being dropped on outbound MCP
calls because fast-agent runs downstream transports inside a long-lived
anyio TaskGroup whose context is snapshotted at manager startup —
request_bearer_token.get() inside _prepare_headers_and_auth therefore
always resolved to None even when the request handler had just set it.
Fix:
* pallas/_fastagent_patch.py
- add _pending_bearers registry keyed by id(server_config) with a
threading.Lock; publish_bearer / revoke_bearer helpers.
- patched _prepare_headers_and_auth reads the registry first, falls
back to the ContextVar for non-persistent probe paths.
- emit INFO log on install() so the journal shows the patch ran;
verbose flow logs at DEBUG on pallas.forward.
* pallas/multimodal_server.py
- send_message resolves the agent's opted-in downstreams, publishes
the inbound bearer for each, and revokes them all in the finally.
- bearer/header diagnostics go to pallas.auth (DEBUG) instead of
/tmp/pallas-bearer.log which is invisible under systemd PrivateTmp.
* pallas/log.py
- honour PALLAS_LOG_LEVEL env var (default INFO) so operators can
flip the forward/auth diagnostics on without a code change.
* docs/pallas.md, docs/mnemosyne_integration.md
- document the registry-based forwarding and the task-group
ContextVar constraint that forced it.
Introduce per-server `forward_inbound_auth` flag that controls whether the
inbound MCP bearer token is propagated to outbound MCP transport calls.
Implemented as a fast-agent monkey-patch auto-installed on package import,
preventing accidental credential leakage to unrelated downstream servers.
Update docs to describe the two bearer token consumers (LLM provider
passthrough and opt-in downstream MCP forwarding) with a config example.
Make Pallas truly stateless per the 'Pallas is ephemeral' contract.
BREAKING (behavioural, not API):
* instance_scope changes from 'shared' to 'request' in pallas.server.
Each MCP tools/call now acquires a freshly-created fast-agent instance
via the existing create_instance / dispose_instance factories and
disposes it immediately after the response.
With 'shared' mode:
* Every MCP caller saw the same agent.message_history, so different
Daedalus conversations leaked into each other.
* Mid-chat context was silently truncated once the model window filled.
* Restarting the Pallas process wiped all in-flight conversation state,
even though Daedalus had it persisted in Postgres.
With 'request' mode the Pallas process holds no per-conversation state;
the caller (Daedalus) owns history and reseeds it on every turn.
send_message gains two optional arguments:
* history: list[{role, content, images?}] in chronological order,
converted to PromptMessageExtended and seeded onto the fresh
instance's message_history before agent.send().
* conversation_id: opaque string, logged for trace correlation only —
Pallas never interprets or persists it.
Malformed history entries (bad role, missing image data/mime_type, etc.)
are skipped with a warning rather than raising, so a single bad row
cannot wipe a whole conversation.
The {agent}_history MCP prompt is still registered under 'request'
scope for backward compatibility but always returns []; history lives
on the client.
Version bumped to 0.2.0.