Make Pallas truly stateless per the 'Pallas is ephemeral' contract.
BREAKING (behavioural, not API):
* instance_scope changes from 'shared' to 'request' in pallas.server.
Each MCP tools/call now acquires a freshly-created fast-agent instance
via the existing create_instance / dispose_instance factories and
disposes it immediately after the response.
With 'shared' mode:
* Every MCP caller saw the same agent.message_history, so different
Daedalus conversations leaked into each other.
* Mid-chat context was silently truncated once the model window filled.
* Restarting the Pallas process wiped all in-flight conversation state,
even though Daedalus had it persisted in Postgres.
With 'request' mode the Pallas process holds no per-conversation state;
the caller (Daedalus) owns history and reseeds it on every turn.
send_message gains two optional arguments:
* history: list[{role, content, images?}] in chronological order,
converted to PromptMessageExtended and seeded onto the fresh
instance's message_history before agent.send().
* conversation_id: opaque string, logged for trace correlation only —
Pallas never interprets or persists it.
Malformed history entries (bad role, missing image data/mime_type, etc.)
are skipped with a warning rather than raising, so a single bad row
cannot wipe a whole conversation.
The {agent}_history MCP prompt is still registered under 'request'
scope for backward compatibility but always returns []; history lives
on the client.
Version bumped to 0.2.0.
425 lines
20 KiB
Markdown
425 lines
20 KiB
Markdown
# Pallas MCP Interface Specification
|
|
|
|
This document defines the contract between **Daedalus** (MCP client / web UI) and **Pallas** (FastAgent MCP servers). It specifies the interfaces Pallas must expose: a **registry endpoint** for agent discovery, a **`get_health` MCP tool** on each agent for health monitoring, and **progress notifications** for real-time feedback during agent execution.
|
|
|
|
---
|
|
|
|
## Architecture Overview
|
|
|
|
```
|
|
Pallas Instance (puck.incus)
|
|
┌────────────────────────────────────────┐
|
|
│ │
|
|
│ Registry (port 23030) │
|
|
Daedalus ──GET──▶│ /.well-known/mcp/server.json │
|
|
│ │
|
|
│ Agent: Research (port 23031) │
|
|
Daedalus ──MCP──▶│ MultimodalAgentMCPServer │──MCP──▶ Argos, Neo4j
|
|
│ └─ get_health tool │
|
|
│ │
|
|
│ Agent: Engineering (port 23032) │
|
|
Daedalus ──MCP──▶│ MultimodalAgentMCPServer │──MCP──▶ Kernos, Gitea
|
|
│ └─ get_health tool │
|
|
│ │
|
|
│ Agent: Orchestrator (port 23033) │
|
|
Daedalus ──MCP──▶│ MultimodalAgentMCPServer │──MCP──▶ Research, Infra
|
|
│ └─ get_health tool │
|
|
└────────────────────────────────────────┘
|
|
```
|
|
|
|
A single Pallas instance hosts multiple FastAgent agents, each on its own port. The registry runs on a dedicated port (e.g. 23030) and provides a catalogue of all agents. Each agent exposes a `get_health` MCP tool that FastAgent intercepts programmatically — no LLM invocation.
|
|
|
|
Daedalus registers the registry URL once in global settings. Everything else is automatic.
|
|
|
|
---
|
|
|
|
## 1. Registry Endpoint
|
|
|
|
### `GET {registry_url}/.well-known/mcp/server.json`
|
|
|
|
The registry is a plain HTTP endpoint (not MCP) served on a dedicated port. It returns a dynamic list of all agents currently provided by the Pallas instance, following the [MCP Server Schema](https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json).
|
|
|
|
#### Request
|
|
|
|
```
|
|
GET http://puck.incus:23030/.well-known/mcp/server.json
|
|
Accept: application/json
|
|
```
|
|
|
|
No authentication. No query parameters.
|
|
|
|
#### Response
|
|
|
|
```json
|
|
{
|
|
"servers": [
|
|
{
|
|
"server": {
|
|
"$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json",
|
|
"name": "ca.helu.ouranos/pallas-research",
|
|
"title": "Research Agent",
|
|
"description": "Web search via Argos and knowledge graph via Neo4j",
|
|
"version": "1.0.0",
|
|
"icons": [
|
|
{ "src": "https://daedalus.ouranos.helu.ca/icons/research.svg", "sizes": "any" }
|
|
],
|
|
"remotes": [
|
|
{ "type": "streamable-http", "url": "http://puck.incus:23031/mcp" }
|
|
],
|
|
"capabilities": {
|
|
"model": "qwen3-8b-q5",
|
|
"vision": false,
|
|
"context_window": 200000,
|
|
"max_output_tokens": 32000
|
|
}
|
|
},
|
|
"_meta": {
|
|
"io.modelcontextprotocol.registry/official": {
|
|
"status": "active",
|
|
"updatedAt": "2026-03-12T10:00:00Z",
|
|
"isLatest": true
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"server": {
|
|
"name": "ca.helu.ouranos/pallas-infra",
|
|
"title": "Engineering Agent",
|
|
"description": "Shell access via Kernos and repository management via Gitea",
|
|
"version": "1.0.0",
|
|
"remotes": [
|
|
{ "type": "streamable-http", "url": "http://puck.incus:23032/mcp" }
|
|
],
|
|
"capabilities": {
|
|
"model": "qwen3-8b-q5",
|
|
"vision": false,
|
|
"context_window": 200000,
|
|
"max_output_tokens": 32000
|
|
}
|
|
},
|
|
"_meta": {
|
|
"io.modelcontextprotocol.registry/official": {
|
|
"status": "active",
|
|
"updatedAt": "2026-03-12T10:00:00Z",
|
|
"isLatest": true
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
#### Schema
|
|
|
|
| Field | Type | Required | Description |
|
|
|-------|------|----------|-------------|
|
|
| `servers` | array | yes | List of server entries |
|
|
| `servers[].server.name` | string | yes | Reverse-domain identifier (e.g. `ca.helu.ouranos/pallas-research`). Daedalus derives `server_id` from the segment after the last `/`. |
|
|
| `servers[].server.title` | string | no | Human-readable display name. Falls back to `name` if absent. |
|
|
| `servers[].server.description` | string | no | One-line description shown in Daedalus UI. |
|
|
| `servers[].server.version` | string | no | Semver version string. |
|
|
| `servers[].server.icons` | array | no | Array of `{ src, sizes }`. Daedalus uses the first entry. |
|
|
| `servers[].server.remotes` | array | yes | Connection endpoints. Daedalus looks for `type: "streamable-http"` and uses its `url`. |
|
|
| `servers[].server.capabilities` | object | no | Model capabilities. Contains `model` (string), `vision` (bool), `context_window` (int), `max_output_tokens` (int). Published when `model_capabilities` is configured in `fastagent.config.yaml`. |
|
|
| `servers[]._meta` | object | no | Registry metadata. Informational only — Daedalus does not act on it. |
|
|
|
|
#### Behaviour
|
|
|
|
- The response **must** reflect the current set of registered agents. If an agent is added or removed from Pallas, subsequent requests must reflect the change.
|
|
- Content-Type **must** be `application/json`.
|
|
- Every entry in `remotes` with `type: "streamable-http"` is treated as an MCP endpoint Daedalus can connect to.
|
|
- The `icons[].src` URL may be absolute or relative. Daedalus stores it as-is.
|
|
|
|
---
|
|
|
|
## 2. Conversation State & History (Daedalus-owned)
|
|
|
|
**Pallas is stateless.** As of version `0.2.0`, every MCP `tools/call` is
|
|
handled by a freshly-created fast-agent instance that is disposed immediately
|
|
after the response. The Pallas process holds **no per-conversation memory
|
|
between calls**. This is enforced by `instance_scope="request"` in
|
|
`pallas.server` — do not override it.
|
|
|
|
Conversation history is owned by the client (Daedalus). It must be replayed
|
|
on every turn through the `history` argument on `send_message`.
|
|
|
|
### `send_message` Arguments
|
|
|
|
Each agent's MCP tool accepts:
|
|
|
|
| Parameter | Type | Required | Description |
|
|
|-----------|------|----------|-------------|
|
|
| `message` | `str` | yes | The new user turn as plain text. |
|
|
| `images` | `list[dict]` | no | Images attached to this turn only: `[{"data": base64, "mime_type": "image/png"}]`. Requires a vision-capable model. |
|
|
| `history` | `list[dict]` | no | Prior conversation history in chronological order. Entries have shape `{"role": "user" \| "assistant", "content": str, "images"?: [...]}`. When present, seeds the freshly-created agent's `message_history` *before* the new turn is executed. |
|
|
| `conversation_id` | `str` | no | Opaque identifier logged by Pallas for trace correlation. Pallas does not interpret or persist it. |
|
|
|
|
### Rationale
|
|
|
|
| Problem with shared state | Behaviour with `instance_scope="request"` |
|
|
|---------------------------|-------------------------------------------|
|
|
| Every caller sees the same `agent.message_history`, so different conversations leak into each other. | Each call gets a fresh, isolated instance. No cross-conversation bleed. |
|
|
| Process restart wipes all in-flight context. | There was no in-flight context to wipe — Daedalus reseeds it on the next turn. |
|
|
| Context-window trimming happens invisibly inside fast-agent. | Daedalus decides what history to send and how much, based on `capabilities.context_window` from the registry. |
|
|
|
|
### `{agent}_history` Prompt
|
|
|
|
Under `instance_scope="request"` the `{agent}_history` MCP prompt is still
|
|
registered for backward compatibility but always returns `[]` — history lives
|
|
on the client and there is no authoritative server-side copy. Existing
|
|
callers that invoke this prompt will not error, but should migrate to
|
|
tracking history client-side.
|
|
|
|
### Backward Compatibility
|
|
|
|
All new arguments are optional. A client that calls `send_message(message=...)`
|
|
with no `history` and no `conversation_id` gets a *zero-history* turn (the
|
|
agent sees only the current message). This is correct stateless behaviour —
|
|
it is never "the last conversation's context". Existing fast-agent MCP
|
|
clients that do not know about `history` will produce one-shot responses,
|
|
which is the appropriate and visible failure mode.
|
|
|
|
---
|
|
|
|
## 3. Health Tool
|
|
|
|
### MCP tool: `get_health`
|
|
|
|
Each agent's MCP server **must** expose a tool named `get_health`. FastAgent intercepts this tool programmatically — it does not route through the LLM. This keeps health checks fast (~ms) and free of inference cost.
|
|
|
|
#### Tool Definition
|
|
|
|
The tool should appear in `session.list_tools()` with:
|
|
|
|
```json
|
|
{
|
|
"name": "get_health",
|
|
"description": "Returns the health status of this agent and its downstream dependencies.",
|
|
"inputSchema": {
|
|
"type": "object",
|
|
"properties": {},
|
|
"additionalProperties": false
|
|
}
|
|
}
|
|
```
|
|
|
|
No input arguments.
|
|
|
|
#### Invocation
|
|
|
|
Daedalus calls this via the standard MCP SDK:
|
|
|
|
```python
|
|
result = await session.call_tool("get_health")
|
|
```
|
|
|
|
#### Response
|
|
|
|
The tool returns a single `text` content block containing a JSON object:
|
|
|
|
```json
|
|
{
|
|
"status": "ok",
|
|
"timestamp": "2026-03-12T15:42:00Z"
|
|
}
|
|
```
|
|
|
|
##### Status Values
|
|
|
|
| Status | Meaning | Daedalus Behaviour |
|
|
|--------|---------|-------------------|
|
|
| `ok` | Agent healthy, all downstream MCP servers reachable | Green badge. Normal operation. |
|
|
| `degraded` | Agent responds but with issues (slow responses, partial downstream outage) | Yellow badge + warning banner. Chat allowed. |
|
|
| `error` | Agent cannot process requests | Red badge. Chat disabled — user cannot send messages. |
|
|
|
|
##### Fields
|
|
|
|
| Field | Type | Required | Description |
|
|
|-------|------|----------|-------------|
|
|
| `status` | `"ok" \| "degraded" \| "error"` | yes | Current health state |
|
|
| `timestamp` | string (ISO 8601) | no | When the health check was performed |
|
|
| `message` | string | no | Human-readable explanation. Required when `status` is `degraded` or `error`. Shown in Daedalus UI tooltips and warning banners. |
|
|
|
|
##### Examples
|
|
|
|
**Healthy:**
|
|
```json
|
|
{
|
|
"status": "ok",
|
|
"timestamp": "2026-03-12T15:42:00Z"
|
|
}
|
|
```
|
|
|
|
**Degraded:**
|
|
```json
|
|
{
|
|
"status": "degraded",
|
|
"timestamp": "2026-03-12T15:42:00Z",
|
|
"message": "Avg response 12s — Neo4j connection slow"
|
|
}
|
|
```
|
|
|
|
**Error:**
|
|
```json
|
|
{
|
|
"status": "error",
|
|
"timestamp": "2026-03-12T15:42:00Z",
|
|
"message": "Argos MCP server unreachable"
|
|
}
|
|
```
|
|
|
|
#### Implementation Guidance
|
|
|
|
The `get_health` tool checks connectivity to all downstream MCP servers the agent depends on using the MCP `initialize` handshake — the only MCP method that works without a pre-established session. This avoids burning LLM tokens on health checks.
|
|
|
|
For each downstream MCP server:
|
|
|
|
1. `POST` an MCP `initialize` request to the server URL (with auth headers and `Accept: application/json, text/event-stream`)
|
|
2. On success, tear down the session by sending `DELETE` with the returned `Mcp-Session-Id` header to avoid leaking server-side state
|
|
3. On failure (HTTP error, timeout, connection refused), record the server as unreachable
|
|
|
|
Result mapping:
|
|
|
|
- All downstream servers reachable and active LLM provider healthy → `ok`
|
|
- Some downstream servers unreachable, or active LLM provider failed preflight → `degraded` with explanation
|
|
- Agent failed to start or cannot process requests → `error` with explanation
|
|
|
|
The tool **must not** invoke the LLM. It should complete in under 1 second (3-second timeout per downstream probe).
|
|
|
|
---
|
|
|
|
## 4. Daedalus Consumption
|
|
|
|
### Registration Flow
|
|
|
|
1. User enters registry URL in Daedalus global settings (e.g. `http://puck.incus:23030`)
|
|
2. Daedalus `GET`s `{url}/.well-known/mcp/server.json`
|
|
3. Daedalus stores the `PallasInstance` with its registry URL
|
|
4. Discovered agents are shown with metadata (title, description, icon)
|
|
|
|
### Workspace Attachment
|
|
|
|
1. User selects a registered Pallas instance in workspace settings
|
|
2. Daedalus re-fetches the registry and creates `AgentConnection` rows for every agent in the instance
|
|
3. All agents from the instance become available in the workspace
|
|
4. Detaching removes all agent connections for that instance from the workspace
|
|
|
|
### Health Polling
|
|
|
|
- Daedalus polls `get_health` on connected agents at a configurable interval (`DAEDALUS_MCP_HEALTH_INTERVAL`, default 60 seconds)
|
|
- Health is cached in memory and exposed via the agent status API
|
|
- Prometheus gauge `daedalus_agent_health{instance, agent}` tracks health (1.0=ok, 0.5=degraded, 0.0=error)
|
|
- If health check fails entirely (connection error, timeout), status is treated as `error`
|
|
|
|
### Chat Blocking
|
|
|
|
- If the target agent's cached health is `error`, the chat endpoint returns HTTP 503 and the UI disables the message input
|
|
- If `degraded`, a warning bar appears but chat is allowed
|
|
- Users **can** create a workspace and attach an instance with unhealthy agents — health only blocks sending messages
|
|
|
|
---
|
|
|
|
## 5. Agent Progress Notifications
|
|
|
|
Agent tool calls can take tens of seconds to minutes when the agent enters an agentic loop — calling sub-agents, searching the web, querying knowledge graphs, etc. During this time, the MCP tool call has not yet returned. Without progress feedback, the user sees a dead spinner.
|
|
|
|
MCP provides a built-in mechanism for this: `notifications/progress`. Pallas already emits these notifications during agent execution. Daedalus must opt in by sending a `progressToken` and rendering the notifications it receives.
|
|
|
|
### How It Works
|
|
|
|
```
|
|
Daedalus Pallas (harper, port 24101)
|
|
│ │
|
|
│── tools/call ─────────────────────────▶│ { message: "...", _meta: { progressToken: "abc123" } }
|
|
│ │
|
|
│ │── LLM generates text + tool calls ──▶
|
|
│ │
|
|
│◀── notifications/progress ─────────────│ { progressToken: "abc123", progress: 0, message: "research/research__research: started" }
|
|
│ │
|
|
│◀── notifications/progress ─────────────│ { progressToken: "abc123", progress: 1, message: "harper step 1 (tool)" }
|
|
│ │
|
|
│◀── notifications/progress ─────────────│ { progressToken: "abc123", progress: 2, message: "harper step 2 (llm)" }
|
|
│ │
|
|
│◀── notifications/progress ─────────────│ { progressToken: "abc123", progress: 1, total: 1, message: "research/research__research: completed" }
|
|
│ │
|
|
│◀── notifications/progress ─────────────│ { progressToken: "abc123", progress: 1, total: 1, message: "tech_research/tech_research__tech_research: completed" }
|
|
│ │
|
|
│◀── tools/call result ─────────────────│ { content: [{ type: "text", text: "..." }] }
|
|
│ │
|
|
```
|
|
|
|
All messages flow over the existing SSE connection established by MCP Streamable HTTP. No additional transport is needed.
|
|
|
|
### Daedalus Requirements
|
|
|
|
#### Sending the Progress Token
|
|
|
|
When calling any agent tool (except `get_health`), Daedalus **must** include a `progressToken` in the request's `_meta`:
|
|
|
|
```python
|
|
result = await session.call_tool(
|
|
"harper",
|
|
arguments={"message": user_input},
|
|
request_params={"_meta": {"progressToken": str(uuid4())}},
|
|
)
|
|
```
|
|
|
|
Without the `progressToken`, Pallas skips all progress notifications and Daedalus receives nothing until the final result.
|
|
|
|
#### Handling Progress Notifications
|
|
|
|
Daedalus receives `notifications/progress` messages on the SSE stream during the tool call. Each notification contains:
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `progressToken` | string/int | Matches the token sent in the request |
|
|
| `progress` | float | Monotonically increasing step counter |
|
|
| `total` | float \| null | `null` = indeterminate (loop in progress), `1.0` = task finished |
|
|
| `message` | string \| null | Human-readable status text |
|
|
|
|
#### Message Format
|
|
|
|
Progress messages follow predictable patterns:
|
|
|
|
| Pattern | Meaning | Example |
|
|
|---------|---------|---------|
|
|
| `{server}/{tool}: started` | Tool invocation began | `research/research__research: started` |
|
|
| `{server}/{tool}: completed` | Tool invocation finished | `tech_research/tech_research__tech_research: completed` |
|
|
| `{server}/{tool}: failed` | Tool invocation failed | `argos/search_web: failed` |
|
|
| `{agent} step N (llm)` | Agent loop: LLM turn | `harper step 2 (llm)` |
|
|
| `{agent} step N (tool)` | Agent loop: tool execution | `harper step 3 (tool)` |
|
|
|
|
#### Rendering Guidance
|
|
|
|
- Display the `message` as a status line beneath the "thinking" indicator
|
|
- Replace the previous status on each new notification (not appended)
|
|
- When `total` is `null`, show an indeterminate progress indicator (spinner)
|
|
- When `total` equals `progress` (typically `1.0/1.0`), the specific tool/sub-task has completed — but the overall tool call may still be in progress
|
|
- Clear the progress indicator when the final `tools/call` result arrives
|
|
|
|
### Pallas Guarantees
|
|
|
|
- Progress notifications are emitted automatically by FastAgent's `MCPToolProgressManager` — no additional server-side configuration is needed
|
|
- Notifications are only sent when the client provides a `progressToken`
|
|
- At minimum, `on_tool_start` (progress 0) and `on_tool_complete` (progress 1/1) are emitted for every downstream tool invocation
|
|
- Loop step notifications are emitted when `emit_loop_progress=True` (the default for all Pallas agents)
|
|
- Progress notifications are best-effort — if one fails to send, the agent loop continues unaffected
|
|
|
|
### Limitations
|
|
|
|
- **LLM intermediate text is not streamed as progress.** When the agent says "Let me look into that..." before calling tools, this text is generated server-side during the LLM streaming step but is not forwarded as a progress notification. The text is included in the final tool result. A future enhancement may stream LLM text deltas as progress messages with a distinguishable prefix.
|
|
- **Parallel tool calls** emit interleaved progress messages. Each message includes a tool-specific prefix (`{server}/{tool}`), so Daedalus can track them independently if desired, or simply display the most recent message.
|
|
|
|
---
|
|
|
|
## 6. Why MCP (Not REST)
|
|
|
|
Pallas wraps each FastAgent instance in a `MultimodalAgentMCPServer` and serves it over StreamableHTTP. The MCP transport gives Daedalus:
|
|
|
|
- **Tool discovery** — `session.list_tools()` returns the full capability manifest
|
|
- **Streaming** — MCP Streamable HTTP handles streaming natively
|
|
- **Health checks** — `get_health` is just another tool call, no separate API surface
|
|
- **Protocol alignment** — MCP is the abstraction boundary both above and below Pallas. No MCP→REST→MCP translation layer.
|
|
|
|
The alternative (REST between Daedalus and Pallas) would require building a custom API layer in Pallas that reimplements what the MCP server already provides, with no simplification on the Daedalus side.
|