r/pallas

Files

Robert Helewka 95fa6e6fc0 feat!: stateless per-request agents; add history + conversation_id to send_message

Make Pallas truly stateless per the 'Pallas is ephemeral' contract.

BREAKING (behavioural, not API):
  * instance_scope changes from 'shared' to 'request' in pallas.server.
    Each MCP tools/call now acquires a freshly-created fast-agent instance
    via the existing create_instance / dispose_instance factories and
    disposes it immediately after the response.

With 'shared' mode:
  * Every MCP caller saw the same agent.message_history, so different
    Daedalus conversations leaked into each other.
  * Mid-chat context was silently truncated once the model window filled.
  * Restarting the Pallas process wiped all in-flight conversation state,
    even though Daedalus had it persisted in Postgres.

With 'request' mode the Pallas process holds no per-conversation state;
the caller (Daedalus) owns history and reseeds it on every turn.

send_message gains two optional arguments:
  * history: list[{role, content, images?}] in chronological order,
    converted to PromptMessageExtended and seeded onto the fresh
    instance's message_history before agent.send().
  * conversation_id: opaque string, logged for trace correlation only —
    Pallas never interprets or persists it.

Malformed history entries (bad role, missing image data/mime_type, etc.)
are skipped with a warning rather than raising, so a single bad row
cannot wipe a whole conversation.

The {agent}_history MCP prompt is still registered under 'request'
scope for backward compatibility but always returns []; history lives
on the client.

Version bumped to 0.2.0.

2026-04-27 08:16:59 -04:00

20 KiB

Raw Blame History

Pallas MCP Interface Specification

This document defines the contract between Daedalus (MCP client / web UI) and Pallas (FastAgent MCP servers). It specifies the interfaces Pallas must expose: a registry endpoint for agent discovery, a get_health MCP tool on each agent for health monitoring, and progress notifications for real-time feedback during agent execution.

Architecture Overview

                         Pallas Instance (puck.incus)
                    ┌────────────────────────────────────────┐
                    │                                        │
                    │   Registry (port 23030)                │
   Daedalus ──GET──▶│   /.well-known/mcp/server.json        │
                    │                                        │
                    │   Agent: Research (port 23031)         │
   Daedalus ──MCP──▶│   MultimodalAgentMCPServer            │──MCP──▶ Argos, Neo4j
                    │     └─ get_health tool                 │
                    │                                        │
                    │   Agent: Engineering (port 23032)      │
   Daedalus ──MCP──▶│   MultimodalAgentMCPServer            │──MCP──▶ Kernos, Gitea
                    │     └─ get_health tool                 │
                    │                                        │
                    │   Agent: Orchestrator (port 23033)     │
   Daedalus ──MCP──▶│   MultimodalAgentMCPServer            │──MCP──▶ Research, Infra
                    │     └─ get_health tool                 │
                    └────────────────────────────────────────┘

A single Pallas instance hosts multiple FastAgent agents, each on its own port. The registry runs on a dedicated port (e.g. 23030) and provides a catalogue of all agents. Each agent exposes a get_health MCP tool that FastAgent intercepts programmatically — no LLM invocation.

Daedalus registers the registry URL once in global settings. Everything else is automatic.

1. Registry Endpoint

`GET {registry_url}/.well-known/mcp/server.json`

The registry is a plain HTTP endpoint (not MCP) served on a dedicated port. It returns a dynamic list of all agents currently provided by the Pallas instance, following the MCP Server Schema.

Request

GET http://puck.incus:23030/.well-known/mcp/server.json
Accept: application/json

No authentication. No query parameters.

Response

{
  "servers": [
    {
      "server": {
        "$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json",
        "name": "ca.helu.ouranos/pallas-research",
        "title": "Research Agent",
        "description": "Web search via Argos and knowledge graph via Neo4j",
        "version": "1.0.0",
        "icons": [
          { "src": "https://daedalus.ouranos.helu.ca/icons/research.svg", "sizes": "any" }
        ],
        "remotes": [
          { "type": "streamable-http", "url": "http://puck.incus:23031/mcp" }
        ],
        "capabilities": {
          "model": "qwen3-8b-q5",
          "vision": false,
          "context_window": 200000,
          "max_output_tokens": 32000
        }
      },
      "_meta": {
        "io.modelcontextprotocol.registry/official": {
          "status": "active",
          "updatedAt": "2026-03-12T10:00:00Z",
          "isLatest": true
        }
      }
    },
    {
      "server": {
        "name": "ca.helu.ouranos/pallas-infra",
        "title": "Engineering Agent",
        "description": "Shell access via Kernos and repository management via Gitea",
        "version": "1.0.0",
        "remotes": [
          { "type": "streamable-http", "url": "http://puck.incus:23032/mcp" }
        ],
        "capabilities": {
          "model": "qwen3-8b-q5",
          "vision": false,
          "context_window": 200000,
          "max_output_tokens": 32000
        }
      },
      "_meta": {
        "io.modelcontextprotocol.registry/official": {
          "status": "active",
          "updatedAt": "2026-03-12T10:00:00Z",
          "isLatest": true
        }
      }
    }
  ]
}

Schema

Field	Type	Required	Description
`servers`	array	yes	List of server entries
`servers[].server.name`	string	yes	Reverse-domain identifier (e.g. `ca.helu.ouranos/pallas-research`). Daedalus derives `server_id` from the segment after the last `/`.
`servers[].server.title`	string	no	Human-readable display name. Falls back to `name` if absent.
`servers[].server.description`	string	no	One-line description shown in Daedalus UI.
`servers[].server.version`	string	no	Semver version string.
`servers[].server.icons`	array	no	Array of `{ src, sizes }`. Daedalus uses the first entry.
`servers[].server.remotes`	array	yes	Connection endpoints. Daedalus looks for `type: "streamable-http"` and uses its `url`.
`servers[].server.capabilities`	object	no	Model capabilities. Contains `model` (string), `vision` (bool), `context_window` (int), `max_output_tokens` (int). Published when `model_capabilities` is configured in `fastagent.config.yaml`.
`servers[]._meta`	object	no	Registry metadata. Informational only — Daedalus does not act on it.

Behaviour

The response must reflect the current set of registered agents. If an agent is added or removed from Pallas, subsequent requests must reflect the change.
Content-Type must be application/json.
Every entry in remotes with type: "streamable-http" is treated as an MCP endpoint Daedalus can connect to.
The icons[].src URL may be absolute or relative. Daedalus stores it as-is.

2. Conversation State & History (Daedalus-owned)

Pallas is stateless. As of version 0.2.0, every MCP tools/call is handled by a freshly-created fast-agent instance that is disposed immediately after the response. The Pallas process holds no per-conversation memory between calls. This is enforced by instance_scope="request" in pallas.server — do not override it.

Conversation history is owned by the client (Daedalus). It must be replayed on every turn through the history argument on send_message.

`send_message` Arguments

Each agent's MCP tool accepts:

Parameter	Type	Required	Description
`message`	`str`	yes	The new user turn as plain text.
`images`	`list[dict]`	no	Images attached to this turn only: `[{"data": base64, "mime_type": "image/png"}]`. Requires a vision-capable model.
`history`	`list[dict]`	no	Prior conversation history in chronological order. Entries have shape `{"role": "user" \| "assistant", "content": str, "images"?: [...]}`. When present, seeds the freshly-created agent's `message_history` before the new turn is executed.
`conversation_id`	`str`	no	Opaque identifier logged by Pallas for trace correlation. Pallas does not interpret or persist it.

Rationale

Problem with shared state	Behaviour with `instance_scope="request"`
Every caller sees the same `agent.message_history`, so different conversations leak into each other.	Each call gets a fresh, isolated instance. No cross-conversation bleed.
Process restart wipes all in-flight context.	There was no in-flight context to wipe — Daedalus reseeds it on the next turn.
Context-window trimming happens invisibly inside fast-agent.	Daedalus decides what history to send and how much, based on `capabilities.context_window` from the registry.

`{agent}_history` Prompt

Under instance_scope="request" the {agent}_history MCP prompt is still registered for backward compatibility but always returns [] — history lives on the client and there is no authoritative server-side copy. Existing callers that invoke this prompt will not error, but should migrate to tracking history client-side.

Backward Compatibility

All new arguments are optional. A client that calls send_message(message=...) with no history and no conversation_id gets a zero-history turn (the agent sees only the current message). This is correct stateless behaviour — it is never "the last conversation's context". Existing fast-agent MCP clients that do not know about history will produce one-shot responses, which is the appropriate and visible failure mode.

3. Health Tool

MCP tool: `get_health`

Each agent's MCP server must expose a tool named get_health. FastAgent intercepts this tool programmatically — it does not route through the LLM. This keeps health checks fast (~ms) and free of inference cost.

Tool Definition

The tool should appear in session.list_tools() with:

{
  "name": "get_health",
  "description": "Returns the health status of this agent and its downstream dependencies.",
  "inputSchema": {
    "type": "object",
    "properties": {},
    "additionalProperties": false
  }
}

No input arguments.

Invocation

Daedalus calls this via the standard MCP SDK:

result = await session.call_tool("get_health")

Response

The tool returns a single text content block containing a JSON object:

{
  "status": "ok",
  "timestamp": "2026-03-12T15:42:00Z"
}

Status Values

Status	Meaning	Daedalus Behaviour
`ok`	Agent healthy, all downstream MCP servers reachable	Green badge. Normal operation.
`degraded`	Agent responds but with issues (slow responses, partial downstream outage)	Yellow badge + warning banner. Chat allowed.
`error`	Agent cannot process requests	Red badge. Chat disabled — user cannot send messages.

Fields

Field	Type	Required	Description
`status`	`"ok" \| "degraded" \| "error"`	yes	Current health state
`timestamp`	string (ISO 8601)	no	When the health check was performed
`message`	string	no	Human-readable explanation. Required when `status` is `degraded` or `error`. Shown in Daedalus UI tooltips and warning banners.

Examples

Healthy:

{
  "status": "ok",
  "timestamp": "2026-03-12T15:42:00Z"
}

Degraded:

{
  "status": "degraded",
  "timestamp": "2026-03-12T15:42:00Z",
  "message": "Avg response 12s — Neo4j connection slow"
}

Error:

{
  "status": "error",
  "timestamp": "2026-03-12T15:42:00Z",
  "message": "Argos MCP server unreachable"
}

Implementation Guidance

The get_health tool checks connectivity to all downstream MCP servers the agent depends on using the MCP initialize handshake — the only MCP method that works without a pre-established session. This avoids burning LLM tokens on health checks.

For each downstream MCP server:

POST an MCP initialize request to the server URL (with auth headers and Accept: application/json, text/event-stream)
On success, tear down the session by sending DELETE with the returned Mcp-Session-Id header to avoid leaking server-side state
On failure (HTTP error, timeout, connection refused), record the server as unreachable

Result mapping:

All downstream servers reachable and active LLM provider healthy → ok
Some downstream servers unreachable, or active LLM provider failed preflight → degraded with explanation
Agent failed to start or cannot process requests → error with explanation

The tool must not invoke the LLM. It should complete in under 1 second (3-second timeout per downstream probe).

4. Daedalus Consumption

Registration Flow

User enters registry URL in Daedalus global settings (e.g. http://puck.incus:23030)
Daedalus GETs {url}/.well-known/mcp/server.json
Daedalus stores the PallasInstance with its registry URL
Discovered agents are shown with metadata (title, description, icon)

Workspace Attachment

User selects a registered Pallas instance in workspace settings
Daedalus re-fetches the registry and creates AgentConnection rows for every agent in the instance
All agents from the instance become available in the workspace
Detaching removes all agent connections for that instance from the workspace

Health Polling

Daedalus polls get_health on connected agents at a configurable interval (DAEDALUS_MCP_HEALTH_INTERVAL, default 60 seconds)
Health is cached in memory and exposed via the agent status API
Prometheus gauge daedalus_agent_health{instance, agent} tracks health (1.0=ok, 0.5=degraded, 0.0=error)
If health check fails entirely (connection error, timeout), status is treated as error

Chat Blocking

If the target agent's cached health is error, the chat endpoint returns HTTP 503 and the UI disables the message input
If degraded, a warning bar appears but chat is allowed
Users can create a workspace and attach an instance with unhealthy agents — health only blocks sending messages

5. Agent Progress Notifications

Agent tool calls can take tens of seconds to minutes when the agent enters an agentic loop — calling sub-agents, searching the web, querying knowledge graphs, etc. During this time, the MCP tool call has not yet returned. Without progress feedback, the user sees a dead spinner.

MCP provides a built-in mechanism for this: notifications/progress. Pallas already emits these notifications during agent execution. Daedalus must opt in by sending a progressToken and rendering the notifications it receives.

How It Works

Daedalus                              Pallas (harper, port 24101)
   │                                        │
   │── tools/call ─────────────────────────▶│  { message: "...", _meta: { progressToken: "abc123" } }
   │                                        │
   │                                        │── LLM generates text + tool calls ──▶
   │                                        │
   │◀── notifications/progress ─────────────│  { progressToken: "abc123", progress: 0, message: "research/research__research: started" }
   │                                        │
   │◀── notifications/progress ─────────────│  { progressToken: "abc123", progress: 1, message: "harper step 1 (tool)" }
   │                                        │
   │◀── notifications/progress ─────────────│  { progressToken: "abc123", progress: 2, message: "harper step 2 (llm)" }
   │                                        │
   │◀── notifications/progress ─────────────│  { progressToken: "abc123", progress: 1, total: 1, message: "research/research__research: completed" }
   │                                        │
   │◀── notifications/progress ─────────────│  { progressToken: "abc123", progress: 1, total: 1, message: "tech_research/tech_research__tech_research: completed" }
   │                                        │
   │◀── tools/call result ─────────────────│  { content: [{ type: "text", text: "..." }] }
   │                                        │

All messages flow over the existing SSE connection established by MCP Streamable HTTP. No additional transport is needed.

Daedalus Requirements

Sending the Progress Token

When calling any agent tool (except get_health), Daedalus must include a progressToken in the request's _meta:

result = await session.call_tool(
    "harper",
    arguments={"message": user_input},
    request_params={"_meta": {"progressToken": str(uuid4())}},
)

Without the progressToken, Pallas skips all progress notifications and Daedalus receives nothing until the final result.

Handling Progress Notifications

Daedalus receives notifications/progress messages on the SSE stream during the tool call. Each notification contains:

Field	Type	Description
`progressToken`	string/int	Matches the token sent in the request
`progress`	float	Monotonically increasing step counter
`total`	float \| null	`null` = indeterminate (loop in progress), `1.0` = task finished
`message`	string \| null	Human-readable status text

Message Format

Progress messages follow predictable patterns:

Pattern	Meaning	Example
`{server}/{tool}: started`	Tool invocation began	`research/research__research: started`
`{server}/{tool}: completed`	Tool invocation finished	`tech_research/tech_research__tech_research: completed`
`{server}/{tool}: failed`	Tool invocation failed	`argos/search_web: failed`
`{agent} step N (llm)`	Agent loop: LLM turn	`harper step 2 (llm)`
`{agent} step N (tool)`	Agent loop: tool execution	`harper step 3 (tool)`

Rendering Guidance

Display the message as a status line beneath the "thinking" indicator
Replace the previous status on each new notification (not appended)
When total is null, show an indeterminate progress indicator (spinner)
When total equals progress (typically 1.0/1.0), the specific tool/sub-task has completed — but the overall tool call may still be in progress
Clear the progress indicator when the final tools/call result arrives

Pallas Guarantees

Progress notifications are emitted automatically by FastAgent's MCPToolProgressManager — no additional server-side configuration is needed
Notifications are only sent when the client provides a progressToken
At minimum, on_tool_start (progress 0) and on_tool_complete (progress 1/1) are emitted for every downstream tool invocation
Loop step notifications are emitted when emit_loop_progress=True (the default for all Pallas agents)
Progress notifications are best-effort — if one fails to send, the agent loop continues unaffected

Limitations

LLM intermediate text is not streamed as progress. When the agent says "Let me look into that..." before calling tools, this text is generated server-side during the LLM streaming step but is not forwarded as a progress notification. The text is included in the final tool result. A future enhancement may stream LLM text deltas as progress messages with a distinguishable prefix.
Parallel tool calls emit interleaved progress messages. Each message includes a tool-specific prefix ({server}/{tool}), so Daedalus can track them independently if desired, or simply display the most recent message.

6. Why MCP (Not REST)

Pallas wraps each FastAgent instance in a MultimodalAgentMCPServer and serves it over StreamableHTTP. The MCP transport gives Daedalus:

Tool discovery — session.list_tools() returns the full capability manifest
Streaming — MCP Streamable HTTP handles streaming natively
Health checks — get_health is just another tool call, no separate API surface
Protocol alignment — MCP is the abstraction boundary both above and below Pallas. No MCP→REST→MCP translation layer.

The alternative (REST between Daedalus and Pallas) would require building a custom API layer in Pallas that reimplements what the MCP server already provides, with no simplification on the Daedalus side.

20 KiB Raw Blame History

Pallas MCP Interface Specification

Architecture Overview

1. Registry Endpoint

GET {registry_url}/.well-known/mcp/server.json

Request

Response

Schema

Behaviour

2. Conversation State & History (Daedalus-owned)

send_message Arguments

Rationale

{agent}_history Prompt

Backward Compatibility

3. Health Tool

MCP tool: get_health

Tool Definition

Invocation

Response

Status Values

Fields

Examples

Implementation Guidance

4. Daedalus Consumption

Registration Flow

Workspace Attachment

Health Polling

Chat Blocking

5. Agent Progress Notifications

How It Works

Daedalus Requirements

Sending the Progress Token

Handling Progress Notifications

Message Format

Rendering Guidance

Pallas Guarantees

Limitations

6. Why MCP (Not REST)

20 KiB

Raw Blame History

`GET {registry_url}/.well-known/mcp/server.json`

`send_message` Arguments

`{agent}_history` Prompt

MCP tool: `get_health`