Files
pallas/docs/pallas_integration.md
Robert Helewka 0cea5ece3a feat: add /healthz and /metrics endpoints, replace print with logging
- Add /healthz endpoint returning LLM provider validation status
- Add /metrics endpoint serving Prometheus metrics via prometheus_client
- Replace all print() calls in health.py with proper logging module
- Remove _PREFIX variable in favor of structured logger context
2026-04-10 11:22:26 +00:00

17 KiB

Pallas MCP Interface Specification

This document defines the contract between Daedalus (MCP client / web UI) and Pallas (FastAgent MCP servers). It specifies the interfaces Pallas must expose: a registry endpoint for agent discovery, a get_health MCP tool on each agent for health monitoring, and progress notifications for real-time feedback during agent execution.


Architecture Overview

                         Pallas Instance (puck.incus)
                    ┌────────────────────────────────────────┐
                    │                                        │
                    │   Registry (port 23030)                │
   Daedalus ──GET──▶│   /.well-known/mcp/server.json        │
                    │                                        │
                    │   Agent: Research (port 23031)         │
   Daedalus ──MCP──▶│   MultimodalAgentMCPServer            │──MCP──▶ Argos, Neo4j
                    │     └─ get_health tool                 │
                    │                                        │
                    │   Agent: Engineering (port 23032)      │
   Daedalus ──MCP──▶│   MultimodalAgentMCPServer            │──MCP──▶ Kernos, Gitea
                    │     └─ get_health tool                 │
                    │                                        │
                    │   Agent: Orchestrator (port 23033)     │
   Daedalus ──MCP──▶│   MultimodalAgentMCPServer            │──MCP──▶ Research, Infra
                    │     └─ get_health tool                 │
                    └────────────────────────────────────────┘

A single Pallas instance hosts multiple FastAgent agents, each on its own port. The registry runs on a dedicated port (e.g. 23030) and provides a catalogue of all agents. Each agent exposes a get_health MCP tool that FastAgent intercepts programmatically — no LLM invocation.

Daedalus registers the registry URL once in global settings. Everything else is automatic.


1. Registry Endpoint

GET {registry_url}/.well-known/mcp/server.json

The registry is a plain HTTP endpoint (not MCP) served on a dedicated port. It returns a dynamic list of all agents currently provided by the Pallas instance, following the MCP Server Schema.

Request

GET http://puck.incus:23030/.well-known/mcp/server.json
Accept: application/json

No authentication. No query parameters.

Response

{
  "servers": [
    {
      "server": {
        "$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json",
        "name": "ca.helu.ouranos/pallas-research",
        "title": "Research Agent",
        "description": "Web search via Argos and knowledge graph via Neo4j",
        "version": "1.0.0",
        "icons": [
          { "src": "https://daedalus.ouranos.helu.ca/icons/research.svg", "sizes": "any" }
        ],
        "remotes": [
          { "type": "streamable-http", "url": "http://puck.incus:23031/mcp" }
        ],
        "capabilities": {
          "model": "qwen3-8b-q5",
          "vision": false,
          "context_window": 200000,
          "max_output_tokens": 32000
        }
      },
      "_meta": {
        "io.modelcontextprotocol.registry/official": {
          "status": "active",
          "updatedAt": "2026-03-12T10:00:00Z",
          "isLatest": true
        }
      }
    },
    {
      "server": {
        "name": "ca.helu.ouranos/pallas-infra",
        "title": "Engineering Agent",
        "description": "Shell access via Kernos and repository management via Gitea",
        "version": "1.0.0",
        "remotes": [
          { "type": "streamable-http", "url": "http://puck.incus:23032/mcp" }
        ],
        "capabilities": {
          "model": "qwen3-8b-q5",
          "vision": false,
          "context_window": 200000,
          "max_output_tokens": 32000
        }
      },
      "_meta": {
        "io.modelcontextprotocol.registry/official": {
          "status": "active",
          "updatedAt": "2026-03-12T10:00:00Z",
          "isLatest": true
        }
      }
    }
  ]
}

Schema

Field Type Required Description
servers array yes List of server entries
servers[].server.name string yes Reverse-domain identifier (e.g. ca.helu.ouranos/pallas-research). Daedalus derives server_id from the segment after the last /.
servers[].server.title string no Human-readable display name. Falls back to name if absent.
servers[].server.description string no One-line description shown in Daedalus UI.
servers[].server.version string no Semver version string.
servers[].server.icons array no Array of { src, sizes }. Daedalus uses the first entry.
servers[].server.remotes array yes Connection endpoints. Daedalus looks for type: "streamable-http" and uses its url.
servers[].server.capabilities object no Model capabilities. Contains model (string), vision (bool), context_window (int), max_output_tokens (int). Published when model_capabilities is configured in fastagent.config.yaml.
servers[]._meta object no Registry metadata. Informational only — Daedalus does not act on it.

Behaviour

  • The response must reflect the current set of registered agents. If an agent is added or removed from Pallas, subsequent requests must reflect the change.
  • Content-Type must be application/json.
  • Every entry in remotes with type: "streamable-http" is treated as an MCP endpoint Daedalus can connect to.
  • The icons[].src URL may be absolute or relative. Daedalus stores it as-is.

2. Health Tool

MCP tool: get_health

Each agent's MCP server must expose a tool named get_health. FastAgent intercepts this tool programmatically — it does not route through the LLM. This keeps health checks fast (~ms) and free of inference cost.

Tool Definition

The tool should appear in session.list_tools() with:

{
  "name": "get_health",
  "description": "Returns the health status of this agent and its downstream dependencies.",
  "inputSchema": {
    "type": "object",
    "properties": {},
    "additionalProperties": false
  }
}

No input arguments.

Invocation

Daedalus calls this via the standard MCP SDK:

result = await session.call_tool("get_health")

Response

The tool returns a single text content block containing a JSON object:

{
  "status": "ok",
  "timestamp": "2026-03-12T15:42:00Z"
}
Status Values
Status Meaning Daedalus Behaviour
ok Agent healthy, all downstream MCP servers reachable Green badge. Normal operation.
degraded Agent responds but with issues (slow responses, partial downstream outage) Yellow badge + warning banner. Chat allowed.
error Agent cannot process requests Red badge. Chat disabled — user cannot send messages.
Fields
Field Type Required Description
status "ok" | "degraded" | "error" yes Current health state
timestamp string (ISO 8601) no When the health check was performed
message string no Human-readable explanation. Required when status is degraded or error. Shown in Daedalus UI tooltips and warning banners.
Examples

Healthy:

{
  "status": "ok",
  "timestamp": "2026-03-12T15:42:00Z"
}

Degraded:

{
  "status": "degraded",
  "timestamp": "2026-03-12T15:42:00Z",
  "message": "Avg response 12s — Neo4j connection slow"
}

Error:

{
  "status": "error",
  "timestamp": "2026-03-12T15:42:00Z",
  "message": "Argos MCP server unreachable"
}

Implementation Guidance

The get_health tool checks connectivity to all downstream MCP servers the agent depends on using the MCP initialize handshake — the only MCP method that works without a pre-established session. This avoids burning LLM tokens on health checks.

For each downstream MCP server:

  1. POST an MCP initialize request to the server URL (with auth headers and Accept: application/json, text/event-stream)
  2. On success, tear down the session by sending DELETE with the returned Mcp-Session-Id header to avoid leaking server-side state
  3. On failure (HTTP error, timeout, connection refused), record the server as unreachable

Result mapping:

  • All downstream servers reachable and active LLM provider healthy → ok
  • Some downstream servers unreachable, or active LLM provider failed preflight → degraded with explanation
  • Agent failed to start or cannot process requests → error with explanation

The tool must not invoke the LLM. It should complete in under 1 second (3-second timeout per downstream probe).


3. Daedalus Consumption

Registration Flow

  1. User enters registry URL in Daedalus global settings (e.g. http://puck.incus:23030)
  2. Daedalus GETs {url}/.well-known/mcp/server.json
  3. Daedalus stores the PallasInstance with its registry URL
  4. Discovered agents are shown with metadata (title, description, icon)

Workspace Attachment

  1. User selects a registered Pallas instance in workspace settings
  2. Daedalus re-fetches the registry and creates AgentConnection rows for every agent in the instance
  3. All agents from the instance become available in the workspace
  4. Detaching removes all agent connections for that instance from the workspace

Health Polling

  • Daedalus polls get_health on connected agents at a configurable interval (DAEDALUS_MCP_HEALTH_INTERVAL, default 60 seconds)
  • Health is cached in memory and exposed via the agent status API
  • Prometheus gauge daedalus_agent_health{instance, agent} tracks health (1.0=ok, 0.5=degraded, 0.0=error)
  • If health check fails entirely (connection error, timeout), status is treated as error

Chat Blocking

  • If the target agent's cached health is error, the chat endpoint returns HTTP 503 and the UI disables the message input
  • If degraded, a warning bar appears but chat is allowed
  • Users can create a workspace and attach an instance with unhealthy agents — health only blocks sending messages

4. Agent Progress Notifications

Agent tool calls can take tens of seconds to minutes when the agent enters an agentic loop — calling sub-agents, searching the web, querying knowledge graphs, etc. During this time, the MCP tool call has not yet returned. Without progress feedback, the user sees a dead spinner.

MCP provides a built-in mechanism for this: notifications/progress. Pallas already emits these notifications during agent execution. Daedalus must opt in by sending a progressToken and rendering the notifications it receives.

How It Works

Daedalus                              Pallas (harper, port 24101)
   │                                        │
   │── tools/call ─────────────────────────▶│  { message: "...", _meta: { progressToken: "abc123" } }
   │                                        │
   │                                        │── LLM generates text + tool calls ──▶
   │                                        │
   │◀── notifications/progress ─────────────│  { progressToken: "abc123", progress: 0, message: "research/research__research: started" }
   │                                        │
   │◀── notifications/progress ─────────────│  { progressToken: "abc123", progress: 1, message: "harper step 1 (tool)" }
   │                                        │
   │◀── notifications/progress ─────────────│  { progressToken: "abc123", progress: 2, message: "harper step 2 (llm)" }
   │                                        │
   │◀── notifications/progress ─────────────│  { progressToken: "abc123", progress: 1, total: 1, message: "research/research__research: completed" }
   │                                        │
   │◀── notifications/progress ─────────────│  { progressToken: "abc123", progress: 1, total: 1, message: "tech_research/tech_research__tech_research: completed" }
   │                                        │
   │◀── tools/call result ─────────────────│  { content: [{ type: "text", text: "..." }] }
   │                                        │

All messages flow over the existing SSE connection established by MCP Streamable HTTP. No additional transport is needed.

Daedalus Requirements

Sending the Progress Token

When calling any agent tool (except get_health), Daedalus must include a progressToken in the request's _meta:

result = await session.call_tool(
    "harper",
    arguments={"message": user_input},
    request_params={"_meta": {"progressToken": str(uuid4())}},
)

Without the progressToken, Pallas skips all progress notifications and Daedalus receives nothing until the final result.

Handling Progress Notifications

Daedalus receives notifications/progress messages on the SSE stream during the tool call. Each notification contains:

Field Type Description
progressToken string/int Matches the token sent in the request
progress float Monotonically increasing step counter
total float | null null = indeterminate (loop in progress), 1.0 = task finished
message string | null Human-readable status text

Message Format

Progress messages follow predictable patterns:

Pattern Meaning Example
{server}/{tool}: started Tool invocation began research/research__research: started
{server}/{tool}: completed Tool invocation finished tech_research/tech_research__tech_research: completed
{server}/{tool}: failed Tool invocation failed argos/search_web: failed
{agent} step N (llm) Agent loop: LLM turn harper step 2 (llm)
{agent} step N (tool) Agent loop: tool execution harper step 3 (tool)

Rendering Guidance

  • Display the message as a status line beneath the "thinking" indicator
  • Replace the previous status on each new notification (not appended)
  • When total is null, show an indeterminate progress indicator (spinner)
  • When total equals progress (typically 1.0/1.0), the specific tool/sub-task has completed — but the overall tool call may still be in progress
  • Clear the progress indicator when the final tools/call result arrives

Pallas Guarantees

  • Progress notifications are emitted automatically by FastAgent's MCPToolProgressManager — no additional server-side configuration is needed
  • Notifications are only sent when the client provides a progressToken
  • At minimum, on_tool_start (progress 0) and on_tool_complete (progress 1/1) are emitted for every downstream tool invocation
  • Loop step notifications are emitted when emit_loop_progress=True (the default for all Pallas agents)
  • Progress notifications are best-effort — if one fails to send, the agent loop continues unaffected

Limitations

  • LLM intermediate text is not streamed as progress. When the agent says "Let me look into that..." before calling tools, this text is generated server-side during the LLM streaming step but is not forwarded as a progress notification. The text is included in the final tool result. A future enhancement may stream LLM text deltas as progress messages with a distinguishable prefix.
  • Parallel tool calls emit interleaved progress messages. Each message includes a tool-specific prefix ({server}/{tool}), so Daedalus can track them independently if desired, or simply display the most recent message.

5. Why MCP (Not REST)

Pallas wraps each FastAgent instance in a MultimodalAgentMCPServer and serves it over StreamableHTTP. The MCP transport gives Daedalus:

  • Tool discoverysession.list_tools() returns the full capability manifest
  • Streaming — MCP Streamable HTTP handles streaming natively
  • Health checksget_health is just another tool call, no separate API surface
  • Protocol alignment — MCP is the abstraction boundary both above and below Pallas. No MCP→REST→MCP translation layer.

The alternative (REST between Daedalus and Pallas) would require building a custom API layer in Pallas that reimplements what the MCP server already provides, with no simplification on the Daedalus side.