Make Pallas truly stateless per the 'Pallas is ephemeral' contract.
BREAKING (behavioural, not API):
* instance_scope changes from 'shared' to 'request' in pallas.server.
Each MCP tools/call now acquires a freshly-created fast-agent instance
via the existing create_instance / dispose_instance factories and
disposes it immediately after the response.
With 'shared' mode:
* Every MCP caller saw the same agent.message_history, so different
Daedalus conversations leaked into each other.
* Mid-chat context was silently truncated once the model window filled.
* Restarting the Pallas process wiped all in-flight conversation state,
even though Daedalus had it persisted in Postgres.
With 'request' mode the Pallas process holds no per-conversation state;
the caller (Daedalus) owns history and reseeds it on every turn.
send_message gains two optional arguments:
* history: list[{role, content, images?}] in chronological order,
converted to PromptMessageExtended and seeded onto the fresh
instance's message_history before agent.send().
* conversation_id: opaque string, logged for trace correlation only —
Pallas never interprets or persists it.
Malformed history entries (bad role, missing image data/mime_type, etc.)
are skipped with a warning rather than raising, so a single bad row
cannot wipe a whole conversation.
The {agent}_history MCP prompt is still registered under 'request'
scope for backward compatibility but always returns []; history lives
on the client.
Version bumped to 0.2.0.
20 KiB
Pallas MCP Interface Specification
This document defines the contract between Daedalus (MCP client / web UI) and Pallas (FastAgent MCP servers). It specifies the interfaces Pallas must expose: a registry endpoint for agent discovery, a get_health MCP tool on each agent for health monitoring, and progress notifications for real-time feedback during agent execution.
Architecture Overview
Pallas Instance (puck.incus)
┌────────────────────────────────────────┐
│ │
│ Registry (port 23030) │
Daedalus ──GET──▶│ /.well-known/mcp/server.json │
│ │
│ Agent: Research (port 23031) │
Daedalus ──MCP──▶│ MultimodalAgentMCPServer │──MCP──▶ Argos, Neo4j
│ └─ get_health tool │
│ │
│ Agent: Engineering (port 23032) │
Daedalus ──MCP──▶│ MultimodalAgentMCPServer │──MCP──▶ Kernos, Gitea
│ └─ get_health tool │
│ │
│ Agent: Orchestrator (port 23033) │
Daedalus ──MCP──▶│ MultimodalAgentMCPServer │──MCP──▶ Research, Infra
│ └─ get_health tool │
└────────────────────────────────────────┘
A single Pallas instance hosts multiple FastAgent agents, each on its own port. The registry runs on a dedicated port (e.g. 23030) and provides a catalogue of all agents. Each agent exposes a get_health MCP tool that FastAgent intercepts programmatically — no LLM invocation.
Daedalus registers the registry URL once in global settings. Everything else is automatic.
1. Registry Endpoint
GET {registry_url}/.well-known/mcp/server.json
The registry is a plain HTTP endpoint (not MCP) served on a dedicated port. It returns a dynamic list of all agents currently provided by the Pallas instance, following the MCP Server Schema.
Request
GET http://puck.incus:23030/.well-known/mcp/server.json
Accept: application/json
No authentication. No query parameters.
Response
{
"servers": [
{
"server": {
"$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json",
"name": "ca.helu.ouranos/pallas-research",
"title": "Research Agent",
"description": "Web search via Argos and knowledge graph via Neo4j",
"version": "1.0.0",
"icons": [
{ "src": "https://daedalus.ouranos.helu.ca/icons/research.svg", "sizes": "any" }
],
"remotes": [
{ "type": "streamable-http", "url": "http://puck.incus:23031/mcp" }
],
"capabilities": {
"model": "qwen3-8b-q5",
"vision": false,
"context_window": 200000,
"max_output_tokens": 32000
}
},
"_meta": {
"io.modelcontextprotocol.registry/official": {
"status": "active",
"updatedAt": "2026-03-12T10:00:00Z",
"isLatest": true
}
}
},
{
"server": {
"name": "ca.helu.ouranos/pallas-infra",
"title": "Engineering Agent",
"description": "Shell access via Kernos and repository management via Gitea",
"version": "1.0.0",
"remotes": [
{ "type": "streamable-http", "url": "http://puck.incus:23032/mcp" }
],
"capabilities": {
"model": "qwen3-8b-q5",
"vision": false,
"context_window": 200000,
"max_output_tokens": 32000
}
},
"_meta": {
"io.modelcontextprotocol.registry/official": {
"status": "active",
"updatedAt": "2026-03-12T10:00:00Z",
"isLatest": true
}
}
}
]
}
Schema
| Field | Type | Required | Description |
|---|---|---|---|
servers |
array | yes | List of server entries |
servers[].server.name |
string | yes | Reverse-domain identifier (e.g. ca.helu.ouranos/pallas-research). Daedalus derives server_id from the segment after the last /. |
servers[].server.title |
string | no | Human-readable display name. Falls back to name if absent. |
servers[].server.description |
string | no | One-line description shown in Daedalus UI. |
servers[].server.version |
string | no | Semver version string. |
servers[].server.icons |
array | no | Array of { src, sizes }. Daedalus uses the first entry. |
servers[].server.remotes |
array | yes | Connection endpoints. Daedalus looks for type: "streamable-http" and uses its url. |
servers[].server.capabilities |
object | no | Model capabilities. Contains model (string), vision (bool), context_window (int), max_output_tokens (int). Published when model_capabilities is configured in fastagent.config.yaml. |
servers[]._meta |
object | no | Registry metadata. Informational only — Daedalus does not act on it. |
Behaviour
- The response must reflect the current set of registered agents. If an agent is added or removed from Pallas, subsequent requests must reflect the change.
- Content-Type must be
application/json. - Every entry in
remoteswithtype: "streamable-http"is treated as an MCP endpoint Daedalus can connect to. - The
icons[].srcURL may be absolute or relative. Daedalus stores it as-is.
2. Conversation State & History (Daedalus-owned)
Pallas is stateless. As of version 0.2.0, every MCP tools/call is
handled by a freshly-created fast-agent instance that is disposed immediately
after the response. The Pallas process holds no per-conversation memory
between calls. This is enforced by instance_scope="request" in
pallas.server — do not override it.
Conversation history is owned by the client (Daedalus). It must be replayed
on every turn through the history argument on send_message.
send_message Arguments
Each agent's MCP tool accepts:
| Parameter | Type | Required | Description |
|---|---|---|---|
message |
str |
yes | The new user turn as plain text. |
images |
list[dict] |
no | Images attached to this turn only: [{"data": base64, "mime_type": "image/png"}]. Requires a vision-capable model. |
history |
list[dict] |
no | Prior conversation history in chronological order. Entries have shape {"role": "user" | "assistant", "content": str, "images"?: [...]}. When present, seeds the freshly-created agent's message_history before the new turn is executed. |
conversation_id |
str |
no | Opaque identifier logged by Pallas for trace correlation. Pallas does not interpret or persist it. |
Rationale
| Problem with shared state | Behaviour with instance_scope="request" |
|---|---|
Every caller sees the same agent.message_history, so different conversations leak into each other. |
Each call gets a fresh, isolated instance. No cross-conversation bleed. |
| Process restart wipes all in-flight context. | There was no in-flight context to wipe — Daedalus reseeds it on the next turn. |
| Context-window trimming happens invisibly inside fast-agent. | Daedalus decides what history to send and how much, based on capabilities.context_window from the registry. |
{agent}_history Prompt
Under instance_scope="request" the {agent}_history MCP prompt is still
registered for backward compatibility but always returns [] — history lives
on the client and there is no authoritative server-side copy. Existing
callers that invoke this prompt will not error, but should migrate to
tracking history client-side.
Backward Compatibility
All new arguments are optional. A client that calls send_message(message=...)
with no history and no conversation_id gets a zero-history turn (the
agent sees only the current message). This is correct stateless behaviour —
it is never "the last conversation's context". Existing fast-agent MCP
clients that do not know about history will produce one-shot responses,
which is the appropriate and visible failure mode.
3. Health Tool
MCP tool: get_health
Each agent's MCP server must expose a tool named get_health. FastAgent intercepts this tool programmatically — it does not route through the LLM. This keeps health checks fast (~ms) and free of inference cost.
Tool Definition
The tool should appear in session.list_tools() with:
{
"name": "get_health",
"description": "Returns the health status of this agent and its downstream dependencies.",
"inputSchema": {
"type": "object",
"properties": {},
"additionalProperties": false
}
}
No input arguments.
Invocation
Daedalus calls this via the standard MCP SDK:
result = await session.call_tool("get_health")
Response
The tool returns a single text content block containing a JSON object:
{
"status": "ok",
"timestamp": "2026-03-12T15:42:00Z"
}
Status Values
| Status | Meaning | Daedalus Behaviour |
|---|---|---|
ok |
Agent healthy, all downstream MCP servers reachable | Green badge. Normal operation. |
degraded |
Agent responds but with issues (slow responses, partial downstream outage) | Yellow badge + warning banner. Chat allowed. |
error |
Agent cannot process requests | Red badge. Chat disabled — user cannot send messages. |
Fields
| Field | Type | Required | Description |
|---|---|---|---|
status |
"ok" | "degraded" | "error" |
yes | Current health state |
timestamp |
string (ISO 8601) | no | When the health check was performed |
message |
string | no | Human-readable explanation. Required when status is degraded or error. Shown in Daedalus UI tooltips and warning banners. |
Examples
Healthy:
{
"status": "ok",
"timestamp": "2026-03-12T15:42:00Z"
}
Degraded:
{
"status": "degraded",
"timestamp": "2026-03-12T15:42:00Z",
"message": "Avg response 12s — Neo4j connection slow"
}
Error:
{
"status": "error",
"timestamp": "2026-03-12T15:42:00Z",
"message": "Argos MCP server unreachable"
}
Implementation Guidance
The get_health tool checks connectivity to all downstream MCP servers the agent depends on using the MCP initialize handshake — the only MCP method that works without a pre-established session. This avoids burning LLM tokens on health checks.
For each downstream MCP server:
POSTan MCPinitializerequest to the server URL (with auth headers andAccept: application/json, text/event-stream)- On success, tear down the session by sending
DELETEwith the returnedMcp-Session-Idheader to avoid leaking server-side state - On failure (HTTP error, timeout, connection refused), record the server as unreachable
Result mapping:
- All downstream servers reachable and active LLM provider healthy →
ok - Some downstream servers unreachable, or active LLM provider failed preflight →
degradedwith explanation - Agent failed to start or cannot process requests →
errorwith explanation
The tool must not invoke the LLM. It should complete in under 1 second (3-second timeout per downstream probe).
4. Daedalus Consumption
Registration Flow
- User enters registry URL in Daedalus global settings (e.g.
http://puck.incus:23030) - Daedalus
GETs{url}/.well-known/mcp/server.json - Daedalus stores the
PallasInstancewith its registry URL - Discovered agents are shown with metadata (title, description, icon)
Workspace Attachment
- User selects a registered Pallas instance in workspace settings
- Daedalus re-fetches the registry and creates
AgentConnectionrows for every agent in the instance - All agents from the instance become available in the workspace
- Detaching removes all agent connections for that instance from the workspace
Health Polling
- Daedalus polls
get_healthon connected agents at a configurable interval (DAEDALUS_MCP_HEALTH_INTERVAL, default 60 seconds) - Health is cached in memory and exposed via the agent status API
- Prometheus gauge
daedalus_agent_health{instance, agent}tracks health (1.0=ok, 0.5=degraded, 0.0=error) - If health check fails entirely (connection error, timeout), status is treated as
error
Chat Blocking
- If the target agent's cached health is
error, the chat endpoint returns HTTP 503 and the UI disables the message input - If
degraded, a warning bar appears but chat is allowed - Users can create a workspace and attach an instance with unhealthy agents — health only blocks sending messages
5. Agent Progress Notifications
Agent tool calls can take tens of seconds to minutes when the agent enters an agentic loop — calling sub-agents, searching the web, querying knowledge graphs, etc. During this time, the MCP tool call has not yet returned. Without progress feedback, the user sees a dead spinner.
MCP provides a built-in mechanism for this: notifications/progress. Pallas already emits these notifications during agent execution. Daedalus must opt in by sending a progressToken and rendering the notifications it receives.
How It Works
Daedalus Pallas (harper, port 24101)
│ │
│── tools/call ─────────────────────────▶│ { message: "...", _meta: { progressToken: "abc123" } }
│ │
│ │── LLM generates text + tool calls ──▶
│ │
│◀── notifications/progress ─────────────│ { progressToken: "abc123", progress: 0, message: "research/research__research: started" }
│ │
│◀── notifications/progress ─────────────│ { progressToken: "abc123", progress: 1, message: "harper step 1 (tool)" }
│ │
│◀── notifications/progress ─────────────│ { progressToken: "abc123", progress: 2, message: "harper step 2 (llm)" }
│ │
│◀── notifications/progress ─────────────│ { progressToken: "abc123", progress: 1, total: 1, message: "research/research__research: completed" }
│ │
│◀── notifications/progress ─────────────│ { progressToken: "abc123", progress: 1, total: 1, message: "tech_research/tech_research__tech_research: completed" }
│ │
│◀── tools/call result ─────────────────│ { content: [{ type: "text", text: "..." }] }
│ │
All messages flow over the existing SSE connection established by MCP Streamable HTTP. No additional transport is needed.
Daedalus Requirements
Sending the Progress Token
When calling any agent tool (except get_health), Daedalus must include a progressToken in the request's _meta:
result = await session.call_tool(
"harper",
arguments={"message": user_input},
request_params={"_meta": {"progressToken": str(uuid4())}},
)
Without the progressToken, Pallas skips all progress notifications and Daedalus receives nothing until the final result.
Handling Progress Notifications
Daedalus receives notifications/progress messages on the SSE stream during the tool call. Each notification contains:
| Field | Type | Description |
|---|---|---|
progressToken |
string/int | Matches the token sent in the request |
progress |
float | Monotonically increasing step counter |
total |
float | null | null = indeterminate (loop in progress), 1.0 = task finished |
message |
string | null | Human-readable status text |
Message Format
Progress messages follow predictable patterns:
| Pattern | Meaning | Example |
|---|---|---|
{server}/{tool}: started |
Tool invocation began | research/research__research: started |
{server}/{tool}: completed |
Tool invocation finished | tech_research/tech_research__tech_research: completed |
{server}/{tool}: failed |
Tool invocation failed | argos/search_web: failed |
{agent} step N (llm) |
Agent loop: LLM turn | harper step 2 (llm) |
{agent} step N (tool) |
Agent loop: tool execution | harper step 3 (tool) |
Rendering Guidance
- Display the
messageas a status line beneath the "thinking" indicator - Replace the previous status on each new notification (not appended)
- When
totalisnull, show an indeterminate progress indicator (spinner) - When
totalequalsprogress(typically1.0/1.0), the specific tool/sub-task has completed — but the overall tool call may still be in progress - Clear the progress indicator when the final
tools/callresult arrives
Pallas Guarantees
- Progress notifications are emitted automatically by FastAgent's
MCPToolProgressManager— no additional server-side configuration is needed - Notifications are only sent when the client provides a
progressToken - At minimum,
on_tool_start(progress 0) andon_tool_complete(progress 1/1) are emitted for every downstream tool invocation - Loop step notifications are emitted when
emit_loop_progress=True(the default for all Pallas agents) - Progress notifications are best-effort — if one fails to send, the agent loop continues unaffected
Limitations
- LLM intermediate text is not streamed as progress. When the agent says "Let me look into that..." before calling tools, this text is generated server-side during the LLM streaming step but is not forwarded as a progress notification. The text is included in the final tool result. A future enhancement may stream LLM text deltas as progress messages with a distinguishable prefix.
- Parallel tool calls emit interleaved progress messages. Each message includes a tool-specific prefix (
{server}/{tool}), so Daedalus can track them independently if desired, or simply display the most recent message.
6. Why MCP (Not REST)
Pallas wraps each FastAgent instance in a MultimodalAgentMCPServer and serves it over StreamableHTTP. The MCP transport gives Daedalus:
- Tool discovery —
session.list_tools()returns the full capability manifest - Streaming — MCP Streamable HTTP handles streaming natively
- Health checks —
get_healthis just another tool call, no separate API surface - Protocol alignment — MCP is the abstraction boundary both above and below Pallas. No MCP→REST→MCP translation layer.
The alternative (REST between Daedalus and Pallas) would require building a custom API layer in Pallas that reimplements what the MCP server already provides, with no simplification on the Daedalus side.