Add `AssistantChunkEmitter` that hooks into fast-agent's `ToolRunnerHooks.after_llm_call` to emit one `notifications/message` per LLM iteration, carrying structured content blocks as JSON via the existing StreamableHTTP transport. This exposes intermediate assistant messages (substantive replies produced before tool calls) that would otherwise be hidden inside fast-agent's message_history and never cross the MCP boundary, letting Daedalus update its live bubble during multi-iteration tool loops instead of only seeing the final wrap-up text.
Pallas — FastAgent MCP Bridge
Pallas is the generic runtime that turns fast-agent agent definitions into StreamableHTTP MCP servers.
It is completely deployment-agnostic: all environment-specific values (agent names, ports, hosts, model) live in the calling project's agents.yaml and fastagent.config.yaml.
Installation
pip install git+ssh://git@git.helu.ca:22022/r/pallas.git
Or as a project dependency in pyproject.toml:
dependencies = [
"pallas-mcp @ git+ssh://git@git.helu.ca:22022/r/pallas.git",
]
Usage
Pallas reads configuration from the working directory at runtime.
my-project/
├── agents/
│ ├── __init__.py
│ └── jarvis.py # FastAgent definitions
├── agents.yaml # Deployment topology
├── fastagent.config.yaml # FastAgent + model config
└── fastagent.secrets.yaml # API keys (gitignored)
Run from your project root:
pallas # start all agents + registry
pallas --agent jarvis # start a single agent
Or via python -m:
python -m pallas.server
agents.yaml format
name: my-project # used in log prefixes and registry names
version: "1.0.0"
host: my-host.example.com # hostname for registry URLs
namespace: com.example.my-project
registry_port: 8200
agents:
jarvis:
module: agents.jarvis # importable Python module path
port: 8201
title: Jarvis
description: "My assistant agent"
depends_on: [research] # optional: start these first
research:
module: agents.research
port: 8250
title: Research Agent
description: "Web search and knowledge graph"
Loop safeguards
Three optional fields bound how long an agent's tool-call loop can run:
| Field | Type | Default | Purpose |
|---|---|---|---|
max_iterations |
int | 15 | Maximum tool calls in a single agent turn |
streaming_timeout |
float | 120 | Max idle seconds between streaming events |
turn_timeout |
float | 300 | Hard wall-clock limit for a full turn (seconds) |
All three are optional. Agents that omit them use the defaults shown above.
agents:
research:
module: agents.research
port: 8250
max_iterations: 10 # this agent only needs a few search calls
streaming_timeout: 60 # fail fast on a slow search MCP
turn_timeout: 120 # research turns should not take more than 2 min
fastagent.config.yaml extensions
Pallas reads two extra keys beyond the standard fast-agent config:
default_model: openai.my-custom-model-name
# Explicit capability declarations — avoids brittle name-regex heuristics
model_capabilities:
vision: false
context_window: 200000
max_output_tokens: 32000
Capabilities are published in the registry and used to register unknown models
with fast-agent's ModelDatabase.
AWS Bedrock Mantle — automatic shims
When anthropic.base_url points at a Bedrock Mantle endpoint
(https://bedrock-mantle.{region}.api.aws/anthropic), Pallas auto-detects it
at startup and installs two compatibility shims via pallas.mantle_shims.
No config flag is required.
Shim 1 — wire-name prefix. Mantle requires the full anthropic.<name>
wire id (e.g. anthropic.claude-opus-4-7). Fast-agent's model-spec parser
would otherwise strip the anthropic. prefix, causing a misleading
404 "The model '...' does not exist". The shim registers the prefixed
forms in ModelDatabase._PROVIDER_WIRE_MODEL_NAMES.
Shim 2 — strip caller: null from replayed tool_use blocks. Anthropic
SDK 0.100.x leaks caller: null onto serialised BetaToolUseBlock params
(upstream issue #1454).
api.anthropic.com silently tolerates the extra field; Mantle rejects it
with tool_use.caller: Input should be a valid dictionary or object, which
breaks the MCP tool-use loop on the second turn. The shim monkeypatches
AnthropicConverter._deserialize_assistant_raw_blocks and
_append_server_tool_channel_blocks to pop the field before history is
re-sent.
See docs/bedrock.md for the full configuration walkthrough.
Environment variable
| Variable | Default | Purpose |
|---|---|---|
PALLAS_AGENTS_CONFIG |
agents.yaml |
Override path to deployment config |
What Pallas provides
| Module | Purpose |
|---|---|
pallas.server |
CLI entry point and agent orchestration |
pallas.registry |
GET /.well-known/mcp/server.json registry server |
pallas.multimodal_server |
MultimodalAgentMCPServer — AgentMCPServer subclass with image + history support |
pallas.health |
LLM preflight validation + get_health MCP tool |
pallas._fastagent_patch |
Traceback-capture wrappers around three opaque fast-agent catch-sites (debug-only) |
Authentication
Pallas is transparent to downstream authentication. Whatever the operator
places under each downstream MCP server's headers: block in
fastagent.config.yaml (typically loaded from fastagent.secrets.yaml) is what
fast-agent sends — Pallas does not intercept, rewrite, or forward the inbound
Authorization header of the MCP request that triggered the agent turn.
For agents that talk to Mnemosyne, the convention is a long-lived team JWT
minted from Mnemosyne's admin UI and pasted into the agent project's
fastagent.secrets.yaml:
mcp:
servers:
mnemosyne:
transport: http
url: https://mnemosyne.example.com/mcp/
headers:
Authorization: "Bearer eyJ…team-jwt…"
See
mnemosyne/docs/DAEDALUS_PALLAS_INTEGRATION_v1.md
for the three credential types Mnemosyne recognises, how team JWTs are
minted and rotated, and the data model that ties a team to a set of
libraries.
Earlier versions of Pallas shipped a
forward_inbound_auth: truemechanism that captured the per-turnAuthorizationheader and propagated it to opted-in downstream servers. That mechanism has been retired — opt-in flags in oldfastagent.config.yamlfiles are now silently ignored and can be removed at your convenience.