pallas/docs/pallas_integration.md

# Pallas MCP Interface Specification

This document defines the contract between **Daedalus** (MCP client / web UI) and **Pallas** (FastAgent MCP servers). It specifies the interfaces Pallas must expose: a **registry endpoint** for agent discovery, a **`get_health` MCP tool** on each agent for health monitoring, and **progress notifications** for real-time feedback during agent execution.

---

## Architecture Overview

```
                         Pallas Instance (puck.incus)
                    ┌────────────────────────────────────────┐
                    │                                        │
                    │   Registry (port 23030)                │
   Daedalus ──GET──▶│   /.well-known/mcp/server.json        │
                    │                                        │
                    │   Agent: Research (port 23031)         │
   Daedalus ──MCP──▶│   MultimodalAgentMCPServer            │──MCP──▶ Argos, Neo4j
                    │     └─ get_health tool                 │
                    │                                        │
                    │   Agent: Engineering (port 23032)      │
   Daedalus ──MCP──▶│   MultimodalAgentMCPServer            │──MCP──▶ Kernos, Gitea
                    │     └─ get_health tool                 │
                    │                                        │
                    │   Agent: Orchestrator (port 23033)     │
   Daedalus ──MCP──▶│   MultimodalAgentMCPServer            │──MCP──▶ Research, Infra
                    │     └─ get_health tool                 │
                    └────────────────────────────────────────┘
```

A single Pallas instance hosts multiple FastAgent agents, each on its own port. The registry runs on a dedicated port (e.g. 23030) and provides a catalogue of all agents. Each agent exposes a `get_health` MCP tool that FastAgent intercepts programmatically — no LLM invocation.

Daedalus registers the registry URL once in global settings. Everything else is automatic.

---

## 1. Registry Endpoint

### `GET {registry_url}/.well-known/mcp/server.json`

The registry is a plain HTTP endpoint (not MCP) served on a dedicated port. It returns a dynamic list of all agents currently provided by the Pallas instance, following the [MCP Server Schema](https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json).

#### Request

```
GET http://puck.incus:23030/.well-known/mcp/server.json
Accept: application/json
```

No authentication. No query parameters.

#### Response

```json
{
  "servers": [
    {
      "server": {
        "$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json",
        "name": "ca.helu.ouranos/pallas-research",
        "title": "Research Agent",
        "description": "Web search via Argos and knowledge graph via Neo4j",
        "version": "1.0.0",
        "icons": [
          { "src": "https://daedalus.ouranos.helu.ca/icons/research.svg", "sizes": "any" }
        ],
        "remotes": [
          { "type": "streamable-http", "url": "http://puck.incus:23031/mcp" }
        ],
        "capabilities": {
          "model": "qwen3-8b-q5",
          "vision": false,
          "context_window": 200000,
          "max_output_tokens": 32000
        }
      },
      "_meta": {
        "io.modelcontextprotocol.registry/official": {
          "status": "active",
          "updatedAt": "2026-03-12T10:00:00Z",
          "isLatest": true
        }
      }
    },
    {
      "server": {
        "name": "ca.helu.ouranos/pallas-infra",
        "title": "Engineering Agent",
        "description": "Shell access via Kernos and repository management via Gitea",
        "version": "1.0.0",
        "remotes": [
          { "type": "streamable-http", "url": "http://puck.incus:23032/mcp" }
        ],
        "capabilities": {
          "model": "qwen3-8b-q5",
          "vision": false,
          "context_window": 200000,
          "max_output_tokens": 32000
        }
      },
      "_meta": {
        "io.modelcontextprotocol.registry/official": {
          "status": "active",
          "updatedAt": "2026-03-12T10:00:00Z",
          "isLatest": true
        }
      }
    }
  ]
}
```

#### Schema

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `servers` | array | yes | List of server entries |
| `servers[].server.name` | string | yes | Reverse-domain identifier (e.g. `ca.helu.ouranos/pallas-research`). Daedalus derives `server_id` from the segment after the last `/`. |
| `servers[].server.title` | string | no | Human-readable display name. Falls back to `name` if absent. |
| `servers[].server.description` | string | no | One-line description shown in Daedalus UI. |
| `servers[].server.version` | string | no | Semver version string. |
| `servers[].server.icons` | array | no | Array of `{ src, sizes }`. Daedalus uses the first entry. |
| `servers[].server.remotes` | array | yes | Connection endpoints. Daedalus looks for `type: "streamable-http"` and uses its `url`. |
| `servers[].server.capabilities` | object | no | Model capabilities. Contains `model` (string), `vision` (bool), `context_window` (int), `max_output_tokens` (int). Published when `model_capabilities` is configured in `fastagent.config.yaml`. |
| `servers[]._meta` | object | no | Registry metadata. Informational only — Daedalus does not act on it. |

#### Behaviour

- The response **must** reflect the current set of registered agents. If an agent is added or removed from Pallas, subsequent requests must reflect the change.
- Content-Type **must** be `application/json`.
- Every entry in `remotes` with `type: "streamable-http"` is treated as an MCP endpoint Daedalus can connect to.
- The `icons[].src` URL may be absolute or relative. Daedalus stores it as-is.

---

## 2. Conversation State & History (Daedalus-owned)

**Pallas is stateless.** As of version `0.2.0`, every MCP `tools/call` is
handled by a freshly-created fast-agent instance that is disposed immediately
after the response. The Pallas process holds **no per-conversation memory
between calls**. This is enforced by `instance_scope="request"` in
`pallas.server` — do not override it.

Conversation history is owned by the client (Daedalus). It must be replayed
on every turn through the `history` argument on `send_message`.

### `send_message` Arguments

Each agent's MCP tool accepts:

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `message` | `str` | yes | The new user turn as plain text. |
| `images` | `list[dict]` | no | Images attached to this turn only: `[{"data": base64, "mime_type": "image/png"}]`. Requires a vision-capable model. |
| `history` | `list[dict]` | no | Prior conversation history in chronological order. Entries have shape `{"role": "user" \| "assistant", "content": str, "images"?: [...]}`. When present, seeds the freshly-created agent's `message_history` *before* the new turn is executed. |
| `conversation_id` | `str` | no | Opaque identifier logged by Pallas for trace correlation. Pallas does not interpret or persist it. |

### Rationale

| Problem with shared state | Behaviour with `instance_scope="request"` |
|---------------------------|-------------------------------------------|
| Every caller sees the same `agent.message_history`, so different conversations leak into each other. | Each call gets a fresh, isolated instance. No cross-conversation bleed. |
| Process restart wipes all in-flight context. | There was no in-flight context to wipe — Daedalus reseeds it on the next turn. |
| Context-window trimming happens invisibly inside fast-agent. | Daedalus decides what history to send and how much, based on `capabilities.context_window` from the registry. |

### `{agent}_history` Prompt

Under `instance_scope="request"` the `{agent}_history` MCP prompt is still
registered for backward compatibility but always returns `[]` — history lives
on the client and there is no authoritative server-side copy. Existing
callers that invoke this prompt will not error, but should migrate to
tracking history client-side.

### Backward Compatibility

All new arguments are optional. A client that calls `send_message(message=...)`
with no `history` and no `conversation_id` gets a *zero-history* turn (the
agent sees only the current message). This is correct stateless behaviour —
it is never "the last conversation's context". Existing fast-agent MCP
clients that do not know about `history` will produce one-shot responses,
which is the appropriate and visible failure mode.

---

## 3. Health Tool

### MCP tool: `get_health`

Each agent's MCP server **must** expose a tool named `get_health`. FastAgent intercepts this tool programmatically — it does not route through the LLM. This keeps health checks fast (~ms) and free of inference cost.

#### Tool Definition

The tool should appear in `session.list_tools()` with:

```json
{
  "name": "get_health",
  "description": "Returns the health status of this agent and its downstream dependencies.",
  "inputSchema": {
    "type": "object",
    "properties": {},
    "additionalProperties": false
  }
}
```

No input arguments.

#### Invocation

Daedalus calls this via the standard MCP SDK:

```python
result = await session.call_tool("get_health")
```

#### Response

The tool returns a single `text` content block containing a JSON object:

```json
{
  "status": "ok",
  "timestamp": "2026-03-12T15:42:00Z"
}
```

##### Status Values

| Status | Meaning | Daedalus Behaviour |
|--------|---------|-------------------|
| `ok` | Agent healthy, all downstream MCP servers reachable | Green badge. Normal operation. |
| `degraded` | Agent responds but with issues (slow responses, partial downstream outage) | Yellow badge + warning banner. Chat allowed. |
| `error` | Agent cannot process requests | Red badge. Chat disabled — user cannot send messages. |

##### Fields

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `status` | `"ok" \| "degraded" \| "error"` | yes | Current health state |
| `timestamp` | string (ISO 8601) | no | When the health check was performed |
| `message` | string | no | Human-readable explanation. Required when `status` is `degraded` or `error`. Shown in Daedalus UI tooltips and warning banners. |

##### Examples

**Healthy:**
```json
{
  "status": "ok",
  "timestamp": "2026-03-12T15:42:00Z"
}
```

**Degraded:**
```json
{
  "status": "degraded",
  "timestamp": "2026-03-12T15:42:00Z",
  "message": "Avg response 12s — Neo4j connection slow"
}
```

**Error:**
```json
{
  "status": "error",
  "timestamp": "2026-03-12T15:42:00Z",
  "message": "Argos MCP server unreachable"
}
```

#### Implementation Guidance

The `get_health` tool checks connectivity to all downstream MCP servers the agent depends on using the MCP `initialize` handshake — the only MCP method that works without a pre-established session. This avoids burning LLM tokens on health checks.

For each downstream MCP server:

1. `POST` an MCP `initialize` request to the server URL (with auth headers and `Accept: application/json, text/event-stream`)
2. On success, tear down the session by sending `DELETE` with the returned `Mcp-Session-Id` header to avoid leaking server-side state
3. On failure (HTTP error, timeout, connection refused), record the server as unreachable

Result mapping:

- All downstream servers reachable and active LLM provider healthy → `ok`
- Some downstream servers unreachable, or active LLM provider failed preflight → `degraded` with explanation
- Agent failed to start or cannot process requests → `error` with explanation

The tool **must not** invoke the LLM. It should complete in under 1 second (3-second timeout per downstream probe).

---

## 4. Daedalus Consumption

### Registration Flow

1. User enters registry URL in Daedalus global settings (e.g. `http://puck.incus:23030`)
2. Daedalus `GET`s `{url}/.well-known/mcp/server.json`
3. Daedalus stores the `PallasInstance` with its registry URL
4. Discovered agents are shown with metadata (title, description, icon)

### Workspace Attachment

1. User selects a registered Pallas instance in workspace settings
2. Daedalus re-fetches the registry and creates `AgentConnection` rows for every agent in the instance
3. All agents from the instance become available in the workspace
4. Detaching removes all agent connections for that instance from the workspace

### Health Polling

- Daedalus polls `get_health` on connected agents at a configurable interval (`DAEDALUS_MCP_HEALTH_INTERVAL`, default 60 seconds)
- Health is cached in memory and exposed via the agent status API
- Prometheus gauge `daedalus_agent_health{instance, agent}` tracks health (1.0=ok, 0.5=degraded, 0.0=error)
- If health check fails entirely (connection error, timeout), status is treated as `error`

### Chat Blocking

- If the target agent's cached health is `error`, the chat endpoint returns HTTP 503 and the UI disables the message input
- If `degraded`, a warning bar appears but chat is allowed
- Users **can** create a workspace and attach an instance with unhealthy agents — health only blocks sending messages

---

## 5. Agent Progress Notifications

Agent tool calls can take tens of seconds to minutes when the agent enters an agentic loop — calling sub-agents, searching the web, querying knowledge graphs, etc. During this time, the MCP tool call has not yet returned. Without progress feedback, the user sees a dead spinner.

MCP provides a built-in mechanism for this: `notifications/progress`. Pallas already emits these notifications during agent execution. Daedalus must opt in by sending a `progressToken` and rendering the notifications it receives.

### How It Works

```
Daedalus                              Pallas (harper, port 24101)
   │                                        │
   │── tools/call ─────────────────────────▶│  { message: "...", _meta: { progressToken: "abc123" } }
   │                                        │
   │                                        │── LLM generates text + tool calls ──▶
   │                                        │
   │◀── notifications/progress ─────────────│  { progressToken: "abc123", progress: 0, message: "research/research__research: started" }
   │                                        │
   │◀── notifications/progress ─────────────│  { progressToken: "abc123", progress: 1, message: "harper step 1 (tool)" }
   │                                        │
   │◀── notifications/progress ─────────────│  { progressToken: "abc123", progress: 2, message: "harper step 2 (llm)" }
   │                                        │
   │◀── notifications/progress ─────────────│  { progressToken: "abc123", progress: 1, total: 1, message: "research/research__research: completed" }
   │                                        │
   │◀── notifications/progress ─────────────│  { progressToken: "abc123", progress: 1, total: 1, message: "tech_research/tech_research__tech_research: completed" }
   │                                        │
   │◀── tools/call result ─────────────────│  { content: [{ type: "text", text: "..." }] }
   │                                        │
```

All messages flow over the existing SSE connection established by MCP Streamable HTTP. No additional transport is needed.

### Daedalus Requirements

#### Sending the Progress Token

When calling any agent tool (except `get_health`), Daedalus **must** include a `progressToken` in the request's `_meta`:

```python
result = await session.call_tool(
    "harper",
    arguments={"message": user_input},
    request_params={"_meta": {"progressToken": str(uuid4())}},
)
```

Without the `progressToken`, Pallas skips all progress notifications and Daedalus receives nothing until the final result.

#### Handling Progress Notifications

Daedalus receives `notifications/progress` messages on the SSE stream during the tool call. Each notification contains:

| Field | Type | Description |
|-------|------|-------------|
| `progressToken` | string/int | Matches the token sent in the request |
| `progress` | float | Monotonically increasing step counter |
| `total` | float \| null | `null` = indeterminate (loop in progress), `1.0` = task finished |
| `message` | string \| null | Human-readable status text |

#### Message Format

Progress messages follow predictable patterns:

| Pattern | Meaning | Example |
|---------|---------|---------|
| `{server}/{tool}: started` | Tool invocation began | `research/research__research: started` |
| `{server}/{tool}: completed` | Tool invocation finished | `tech_research/tech_research__tech_research: completed` |
| `{server}/{tool}: failed` | Tool invocation failed | `argos/search_web: failed` |
| `{agent} step N (llm)` | Agent loop: LLM turn | `harper step 2 (llm)` |
| `{agent} step N (tool)` | Agent loop: tool execution | `harper step 3 (tool)` |

#### Rendering Guidance

- Display the `message` as a status line beneath the "thinking" indicator
- Replace the previous status on each new notification (not appended)
- When `total` is `null`, show an indeterminate progress indicator (spinner)
- When `total` equals `progress` (typically `1.0/1.0`), the specific tool/sub-task has completed — but the overall tool call may still be in progress
- Clear the progress indicator when the final `tools/call` result arrives

### Pallas Guarantees

- Progress notifications are emitted automatically by FastAgent's `MCPToolProgressManager` — no additional server-side configuration is needed
- Notifications are only sent when the client provides a `progressToken`
- At minimum, `on_tool_start` (progress 0) and `on_tool_complete` (progress 1/1) are emitted for every downstream tool invocation
- Loop step notifications are emitted when `emit_loop_progress=True` (the default for all Pallas agents)
- Progress notifications are best-effort — if one fails to send, the agent loop continues unaffected

### Limitations

- **LLM intermediate text is not streamed as progress.** When the agent says "Let me look into that..." before calling tools, this text is generated server-side during the LLM streaming step but is not forwarded as a progress notification. The text is included in the final tool result. A future enhancement may stream LLM text deltas as progress messages with a distinguishable prefix.
- **Parallel tool calls** emit interleaved progress messages. Each message includes a tool-specific prefix (`{server}/{tool}`), so Daedalus can track them independently if desired, or simply display the most recent message.

---

## 6. Why MCP (Not REST)

Pallas wraps each FastAgent instance in a `MultimodalAgentMCPServer` and serves it over StreamableHTTP. The MCP transport gives Daedalus:

- **Tool discovery** — `session.list_tools()` returns the full capability manifest
- **Streaming** — MCP Streamable HTTP handles streaming natively
- **Health checks** — `get_health` is just another tool call, no separate API surface
- **Protocol alignment** — MCP is the abstraction boundary both above and below Pallas. No MCP→REST→MCP translation layer.

The alternative (REST between Daedalus and Pallas) would require building a custom API layer in Pallas that reimplements what the MCP server already provides, with no simplification on the Daedalus side.