Files
pallas/docs/pallas_integration.md
Robert Helewka 0cea5ece3a feat: add /healthz and /metrics endpoints, replace print with logging
- Add /healthz endpoint returning LLM provider validation status
- Add /metrics endpoint serving Prometheus metrics via prometheus_client
- Replace all print() calls in health.py with proper logging module
- Remove _PREFIX variable in favor of structured logger context
2026-04-10 11:22:26 +00:00

376 lines
17 KiB
Markdown

# Pallas MCP Interface Specification
This document defines the contract between **Daedalus** (MCP client / web UI) and **Pallas** (FastAgent MCP servers). It specifies the interfaces Pallas must expose: a **registry endpoint** for agent discovery, a **`get_health` MCP tool** on each agent for health monitoring, and **progress notifications** for real-time feedback during agent execution.
---
## Architecture Overview
```
Pallas Instance (puck.incus)
┌────────────────────────────────────────┐
│ │
│ Registry (port 23030) │
Daedalus ──GET──▶│ /.well-known/mcp/server.json │
│ │
│ Agent: Research (port 23031) │
Daedalus ──MCP──▶│ MultimodalAgentMCPServer │──MCP──▶ Argos, Neo4j
│ └─ get_health tool │
│ │
│ Agent: Engineering (port 23032) │
Daedalus ──MCP──▶│ MultimodalAgentMCPServer │──MCP──▶ Kernos, Gitea
│ └─ get_health tool │
│ │
│ Agent: Orchestrator (port 23033) │
Daedalus ──MCP──▶│ MultimodalAgentMCPServer │──MCP──▶ Research, Infra
│ └─ get_health tool │
└────────────────────────────────────────┘
```
A single Pallas instance hosts multiple FastAgent agents, each on its own port. The registry runs on a dedicated port (e.g. 23030) and provides a catalogue of all agents. Each agent exposes a `get_health` MCP tool that FastAgent intercepts programmatically — no LLM invocation.
Daedalus registers the registry URL once in global settings. Everything else is automatic.
---
## 1. Registry Endpoint
### `GET {registry_url}/.well-known/mcp/server.json`
The registry is a plain HTTP endpoint (not MCP) served on a dedicated port. It returns a dynamic list of all agents currently provided by the Pallas instance, following the [MCP Server Schema](https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json).
#### Request
```
GET http://puck.incus:23030/.well-known/mcp/server.json
Accept: application/json
```
No authentication. No query parameters.
#### Response
```json
{
"servers": [
{
"server": {
"$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json",
"name": "ca.helu.ouranos/pallas-research",
"title": "Research Agent",
"description": "Web search via Argos and knowledge graph via Neo4j",
"version": "1.0.0",
"icons": [
{ "src": "https://daedalus.ouranos.helu.ca/icons/research.svg", "sizes": "any" }
],
"remotes": [
{ "type": "streamable-http", "url": "http://puck.incus:23031/mcp" }
],
"capabilities": {
"model": "qwen3-8b-q5",
"vision": false,
"context_window": 200000,
"max_output_tokens": 32000
}
},
"_meta": {
"io.modelcontextprotocol.registry/official": {
"status": "active",
"updatedAt": "2026-03-12T10:00:00Z",
"isLatest": true
}
}
},
{
"server": {
"name": "ca.helu.ouranos/pallas-infra",
"title": "Engineering Agent",
"description": "Shell access via Kernos and repository management via Gitea",
"version": "1.0.0",
"remotes": [
{ "type": "streamable-http", "url": "http://puck.incus:23032/mcp" }
],
"capabilities": {
"model": "qwen3-8b-q5",
"vision": false,
"context_window": 200000,
"max_output_tokens": 32000
}
},
"_meta": {
"io.modelcontextprotocol.registry/official": {
"status": "active",
"updatedAt": "2026-03-12T10:00:00Z",
"isLatest": true
}
}
}
]
}
```
#### Schema
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `servers` | array | yes | List of server entries |
| `servers[].server.name` | string | yes | Reverse-domain identifier (e.g. `ca.helu.ouranos/pallas-research`). Daedalus derives `server_id` from the segment after the last `/`. |
| `servers[].server.title` | string | no | Human-readable display name. Falls back to `name` if absent. |
| `servers[].server.description` | string | no | One-line description shown in Daedalus UI. |
| `servers[].server.version` | string | no | Semver version string. |
| `servers[].server.icons` | array | no | Array of `{ src, sizes }`. Daedalus uses the first entry. |
| `servers[].server.remotes` | array | yes | Connection endpoints. Daedalus looks for `type: "streamable-http"` and uses its `url`. |
| `servers[].server.capabilities` | object | no | Model capabilities. Contains `model` (string), `vision` (bool), `context_window` (int), `max_output_tokens` (int). Published when `model_capabilities` is configured in `fastagent.config.yaml`. |
| `servers[]._meta` | object | no | Registry metadata. Informational only — Daedalus does not act on it. |
#### Behaviour
- The response **must** reflect the current set of registered agents. If an agent is added or removed from Pallas, subsequent requests must reflect the change.
- Content-Type **must** be `application/json`.
- Every entry in `remotes` with `type: "streamable-http"` is treated as an MCP endpoint Daedalus can connect to.
- The `icons[].src` URL may be absolute or relative. Daedalus stores it as-is.
---
## 2. Health Tool
### MCP tool: `get_health`
Each agent's MCP server **must** expose a tool named `get_health`. FastAgent intercepts this tool programmatically — it does not route through the LLM. This keeps health checks fast (~ms) and free of inference cost.
#### Tool Definition
The tool should appear in `session.list_tools()` with:
```json
{
"name": "get_health",
"description": "Returns the health status of this agent and its downstream dependencies.",
"inputSchema": {
"type": "object",
"properties": {},
"additionalProperties": false
}
}
```
No input arguments.
#### Invocation
Daedalus calls this via the standard MCP SDK:
```python
result = await session.call_tool("get_health")
```
#### Response
The tool returns a single `text` content block containing a JSON object:
```json
{
"status": "ok",
"timestamp": "2026-03-12T15:42:00Z"
}
```
##### Status Values
| Status | Meaning | Daedalus Behaviour |
|--------|---------|-------------------|
| `ok` | Agent healthy, all downstream MCP servers reachable | Green badge. Normal operation. |
| `degraded` | Agent responds but with issues (slow responses, partial downstream outage) | Yellow badge + warning banner. Chat allowed. |
| `error` | Agent cannot process requests | Red badge. Chat disabled — user cannot send messages. |
##### Fields
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `status` | `"ok" \| "degraded" \| "error"` | yes | Current health state |
| `timestamp` | string (ISO 8601) | no | When the health check was performed |
| `message` | string | no | Human-readable explanation. Required when `status` is `degraded` or `error`. Shown in Daedalus UI tooltips and warning banners. |
##### Examples
**Healthy:**
```json
{
"status": "ok",
"timestamp": "2026-03-12T15:42:00Z"
}
```
**Degraded:**
```json
{
"status": "degraded",
"timestamp": "2026-03-12T15:42:00Z",
"message": "Avg response 12s — Neo4j connection slow"
}
```
**Error:**
```json
{
"status": "error",
"timestamp": "2026-03-12T15:42:00Z",
"message": "Argos MCP server unreachable"
}
```
#### Implementation Guidance
The `get_health` tool checks connectivity to all downstream MCP servers the agent depends on using the MCP `initialize` handshake — the only MCP method that works without a pre-established session. This avoids burning LLM tokens on health checks.
For each downstream MCP server:
1. `POST` an MCP `initialize` request to the server URL (with auth headers and `Accept: application/json, text/event-stream`)
2. On success, tear down the session by sending `DELETE` with the returned `Mcp-Session-Id` header to avoid leaking server-side state
3. On failure (HTTP error, timeout, connection refused), record the server as unreachable
Result mapping:
- All downstream servers reachable and active LLM provider healthy → `ok`
- Some downstream servers unreachable, or active LLM provider failed preflight → `degraded` with explanation
- Agent failed to start or cannot process requests → `error` with explanation
The tool **must not** invoke the LLM. It should complete in under 1 second (3-second timeout per downstream probe).
---
## 3. Daedalus Consumption
### Registration Flow
1. User enters registry URL in Daedalus global settings (e.g. `http://puck.incus:23030`)
2. Daedalus `GET`s `{url}/.well-known/mcp/server.json`
3. Daedalus stores the `PallasInstance` with its registry URL
4. Discovered agents are shown with metadata (title, description, icon)
### Workspace Attachment
1. User selects a registered Pallas instance in workspace settings
2. Daedalus re-fetches the registry and creates `AgentConnection` rows for every agent in the instance
3. All agents from the instance become available in the workspace
4. Detaching removes all agent connections for that instance from the workspace
### Health Polling
- Daedalus polls `get_health` on connected agents at a configurable interval (`DAEDALUS_MCP_HEALTH_INTERVAL`, default 60 seconds)
- Health is cached in memory and exposed via the agent status API
- Prometheus gauge `daedalus_agent_health{instance, agent}` tracks health (1.0=ok, 0.5=degraded, 0.0=error)
- If health check fails entirely (connection error, timeout), status is treated as `error`
### Chat Blocking
- If the target agent's cached health is `error`, the chat endpoint returns HTTP 503 and the UI disables the message input
- If `degraded`, a warning bar appears but chat is allowed
- Users **can** create a workspace and attach an instance with unhealthy agents — health only blocks sending messages
---
## 4. Agent Progress Notifications
Agent tool calls can take tens of seconds to minutes when the agent enters an agentic loop — calling sub-agents, searching the web, querying knowledge graphs, etc. During this time, the MCP tool call has not yet returned. Without progress feedback, the user sees a dead spinner.
MCP provides a built-in mechanism for this: `notifications/progress`. Pallas already emits these notifications during agent execution. Daedalus must opt in by sending a `progressToken` and rendering the notifications it receives.
### How It Works
```
Daedalus Pallas (harper, port 24101)
│ │
│── tools/call ─────────────────────────▶│ { message: "...", _meta: { progressToken: "abc123" } }
│ │
│ │── LLM generates text + tool calls ──▶
│ │
│◀── notifications/progress ─────────────│ { progressToken: "abc123", progress: 0, message: "research/research__research: started" }
│ │
│◀── notifications/progress ─────────────│ { progressToken: "abc123", progress: 1, message: "harper step 1 (tool)" }
│ │
│◀── notifications/progress ─────────────│ { progressToken: "abc123", progress: 2, message: "harper step 2 (llm)" }
│ │
│◀── notifications/progress ─────────────│ { progressToken: "abc123", progress: 1, total: 1, message: "research/research__research: completed" }
│ │
│◀── notifications/progress ─────────────│ { progressToken: "abc123", progress: 1, total: 1, message: "tech_research/tech_research__tech_research: completed" }
│ │
│◀── tools/call result ─────────────────│ { content: [{ type: "text", text: "..." }] }
│ │
```
All messages flow over the existing SSE connection established by MCP Streamable HTTP. No additional transport is needed.
### Daedalus Requirements
#### Sending the Progress Token
When calling any agent tool (except `get_health`), Daedalus **must** include a `progressToken` in the request's `_meta`:
```python
result = await session.call_tool(
"harper",
arguments={"message": user_input},
request_params={"_meta": {"progressToken": str(uuid4())}},
)
```
Without the `progressToken`, Pallas skips all progress notifications and Daedalus receives nothing until the final result.
#### Handling Progress Notifications
Daedalus receives `notifications/progress` messages on the SSE stream during the tool call. Each notification contains:
| Field | Type | Description |
|-------|------|-------------|
| `progressToken` | string/int | Matches the token sent in the request |
| `progress` | float | Monotonically increasing step counter |
| `total` | float \| null | `null` = indeterminate (loop in progress), `1.0` = task finished |
| `message` | string \| null | Human-readable status text |
#### Message Format
Progress messages follow predictable patterns:
| Pattern | Meaning | Example |
|---------|---------|---------|
| `{server}/{tool}: started` | Tool invocation began | `research/research__research: started` |
| `{server}/{tool}: completed` | Tool invocation finished | `tech_research/tech_research__tech_research: completed` |
| `{server}/{tool}: failed` | Tool invocation failed | `argos/search_web: failed` |
| `{agent} step N (llm)` | Agent loop: LLM turn | `harper step 2 (llm)` |
| `{agent} step N (tool)` | Agent loop: tool execution | `harper step 3 (tool)` |
#### Rendering Guidance
- Display the `message` as a status line beneath the "thinking" indicator
- Replace the previous status on each new notification (not appended)
- When `total` is `null`, show an indeterminate progress indicator (spinner)
- When `total` equals `progress` (typically `1.0/1.0`), the specific tool/sub-task has completed — but the overall tool call may still be in progress
- Clear the progress indicator when the final `tools/call` result arrives
### Pallas Guarantees
- Progress notifications are emitted automatically by FastAgent's `MCPToolProgressManager` — no additional server-side configuration is needed
- Notifications are only sent when the client provides a `progressToken`
- At minimum, `on_tool_start` (progress 0) and `on_tool_complete` (progress 1/1) are emitted for every downstream tool invocation
- Loop step notifications are emitted when `emit_loop_progress=True` (the default for all Pallas agents)
- Progress notifications are best-effort — if one fails to send, the agent loop continues unaffected
### Limitations
- **LLM intermediate text is not streamed as progress.** When the agent says "Let me look into that..." before calling tools, this text is generated server-side during the LLM streaming step but is not forwarded as a progress notification. The text is included in the final tool result. A future enhancement may stream LLM text deltas as progress messages with a distinguishable prefix.
- **Parallel tool calls** emit interleaved progress messages. Each message includes a tool-specific prefix (`{server}/{tool}`), so Daedalus can track them independently if desired, or simply display the most recent message.
---
## 5. Why MCP (Not REST)
Pallas wraps each FastAgent instance in a `MultimodalAgentMCPServer` and serves it over StreamableHTTP. The MCP transport gives Daedalus:
- **Tool discovery** — `session.list_tools()` returns the full capability manifest
- **Streaming** — MCP Streamable HTTP handles streaming natively
- **Health checks** — `get_health` is just another tool call, no separate API surface
- **Protocol alignment** — MCP is the abstraction boundary both above and below Pallas. No MCP→REST→MCP translation layer.
The alternative (REST between Daedalus and Pallas) would require building a custom API layer in Pallas that reimplements what the MCP server already provides, with no simplification on the Daedalus side.