r/pallas

Files

Robert Helewka 0cea5ece3a feat: add /healthz and /metrics endpoints, replace print with logging

- Add /healthz endpoint returning LLM provider validation status
- Add /metrics endpoint serving Prometheus metrics via prometheus_client
- Replace all print() calls in health.py with proper logging module
- Remove _PREFIX variable in favor of structured logger context

2026-04-10 11:22:26 +00:00

17 KiB

Raw Blame History

Pallas MCP Interface Specification

This document defines the contract between Daedalus (MCP client / web UI) and Pallas (FastAgent MCP servers). It specifies the interfaces Pallas must expose: a registry endpoint for agent discovery, a get_health MCP tool on each agent for health monitoring, and progress notifications for real-time feedback during agent execution.

Architecture Overview

                         Pallas Instance (puck.incus)
                    ┌────────────────────────────────────────┐
                    │                                        │
                    │   Registry (port 23030)                │
   Daedalus ──GET──▶│   /.well-known/mcp/server.json        │
                    │                                        │
                    │   Agent: Research (port 23031)         │
   Daedalus ──MCP──▶│   MultimodalAgentMCPServer            │──MCP──▶ Argos, Neo4j
                    │     └─ get_health tool                 │
                    │                                        │
                    │   Agent: Engineering (port 23032)      │
   Daedalus ──MCP──▶│   MultimodalAgentMCPServer            │──MCP──▶ Kernos, Gitea
                    │     └─ get_health tool                 │
                    │                                        │
                    │   Agent: Orchestrator (port 23033)     │
   Daedalus ──MCP──▶│   MultimodalAgentMCPServer            │──MCP──▶ Research, Infra
                    │     └─ get_health tool                 │
                    └────────────────────────────────────────┘

A single Pallas instance hosts multiple FastAgent agents, each on its own port. The registry runs on a dedicated port (e.g. 23030) and provides a catalogue of all agents. Each agent exposes a get_health MCP tool that FastAgent intercepts programmatically — no LLM invocation.

Daedalus registers the registry URL once in global settings. Everything else is automatic.

1. Registry Endpoint

`GET {registry_url}/.well-known/mcp/server.json`

The registry is a plain HTTP endpoint (not MCP) served on a dedicated port. It returns a dynamic list of all agents currently provided by the Pallas instance, following the MCP Server Schema.

Request

GET http://puck.incus:23030/.well-known/mcp/server.json
Accept: application/json

No authentication. No query parameters.

Response

{
  "servers": [
    {
      "server": {
        "$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json",
        "name": "ca.helu.ouranos/pallas-research",
        "title": "Research Agent",
        "description": "Web search via Argos and knowledge graph via Neo4j",
        "version": "1.0.0",
        "icons": [
          { "src": "https://daedalus.ouranos.helu.ca/icons/research.svg", "sizes": "any" }
        ],
        "remotes": [
          { "type": "streamable-http", "url": "http://puck.incus:23031/mcp" }
        ],
        "capabilities": {
          "model": "qwen3-8b-q5",
          "vision": false,
          "context_window": 200000,
          "max_output_tokens": 32000
        }
      },
      "_meta": {
        "io.modelcontextprotocol.registry/official": {
          "status": "active",
          "updatedAt": "2026-03-12T10:00:00Z",
          "isLatest": true
        }
      }
    },
    {
      "server": {
        "name": "ca.helu.ouranos/pallas-infra",
        "title": "Engineering Agent",
        "description": "Shell access via Kernos and repository management via Gitea",
        "version": "1.0.0",
        "remotes": [
          { "type": "streamable-http", "url": "http://puck.incus:23032/mcp" }
        ],
        "capabilities": {
          "model": "qwen3-8b-q5",
          "vision": false,
          "context_window": 200000,
          "max_output_tokens": 32000
        }
      },
      "_meta": {
        "io.modelcontextprotocol.registry/official": {
          "status": "active",
          "updatedAt": "2026-03-12T10:00:00Z",
          "isLatest": true
        }
      }
    }
  ]
}

Schema

Field	Type	Required	Description
`servers`	array	yes	List of server entries
`servers[].server.name`	string	yes	Reverse-domain identifier (e.g. `ca.helu.ouranos/pallas-research`). Daedalus derives `server_id` from the segment after the last `/`.
`servers[].server.title`	string	no	Human-readable display name. Falls back to `name` if absent.
`servers[].server.description`	string	no	One-line description shown in Daedalus UI.
`servers[].server.version`	string	no	Semver version string.
`servers[].server.icons`	array	no	Array of `{ src, sizes }`. Daedalus uses the first entry.
`servers[].server.remotes`	array	yes	Connection endpoints. Daedalus looks for `type: "streamable-http"` and uses its `url`.
`servers[].server.capabilities`	object	no	Model capabilities. Contains `model` (string), `vision` (bool), `context_window` (int), `max_output_tokens` (int). Published when `model_capabilities` is configured in `fastagent.config.yaml`.
`servers[]._meta`	object	no	Registry metadata. Informational only — Daedalus does not act on it.

Behaviour

The response must reflect the current set of registered agents. If an agent is added or removed from Pallas, subsequent requests must reflect the change.
Content-Type must be application/json.
Every entry in remotes with type: "streamable-http" is treated as an MCP endpoint Daedalus can connect to.
The icons[].src URL may be absolute or relative. Daedalus stores it as-is.

2. Health Tool

MCP tool: `get_health`

Each agent's MCP server must expose a tool named get_health. FastAgent intercepts this tool programmatically — it does not route through the LLM. This keeps health checks fast (~ms) and free of inference cost.

Tool Definition

The tool should appear in session.list_tools() with:

{
  "name": "get_health",
  "description": "Returns the health status of this agent and its downstream dependencies.",
  "inputSchema": {
    "type": "object",
    "properties": {},
    "additionalProperties": false
  }
}

No input arguments.

Invocation

Daedalus calls this via the standard MCP SDK:

result = await session.call_tool("get_health")

Response

The tool returns a single text content block containing a JSON object:

{
  "status": "ok",
  "timestamp": "2026-03-12T15:42:00Z"
}

Status Values

Status	Meaning	Daedalus Behaviour
`ok`	Agent healthy, all downstream MCP servers reachable	Green badge. Normal operation.
`degraded`	Agent responds but with issues (slow responses, partial downstream outage)	Yellow badge + warning banner. Chat allowed.
`error`	Agent cannot process requests	Red badge. Chat disabled — user cannot send messages.

Fields

Field	Type	Required	Description
`status`	`"ok" \| "degraded" \| "error"`	yes	Current health state
`timestamp`	string (ISO 8601)	no	When the health check was performed
`message`	string	no	Human-readable explanation. Required when `status` is `degraded` or `error`. Shown in Daedalus UI tooltips and warning banners.

Examples

Healthy:

{
  "status": "ok",
  "timestamp": "2026-03-12T15:42:00Z"
}

Degraded:

{
  "status": "degraded",
  "timestamp": "2026-03-12T15:42:00Z",
  "message": "Avg response 12s — Neo4j connection slow"
}

Error:

{
  "status": "error",
  "timestamp": "2026-03-12T15:42:00Z",
  "message": "Argos MCP server unreachable"
}

Implementation Guidance

The get_health tool checks connectivity to all downstream MCP servers the agent depends on using the MCP initialize handshake — the only MCP method that works without a pre-established session. This avoids burning LLM tokens on health checks.

For each downstream MCP server:

POST an MCP initialize request to the server URL (with auth headers and Accept: application/json, text/event-stream)
On success, tear down the session by sending DELETE with the returned Mcp-Session-Id header to avoid leaking server-side state
On failure (HTTP error, timeout, connection refused), record the server as unreachable

Result mapping:

All downstream servers reachable and active LLM provider healthy → ok
Some downstream servers unreachable, or active LLM provider failed preflight → degraded with explanation
Agent failed to start or cannot process requests → error with explanation

The tool must not invoke the LLM. It should complete in under 1 second (3-second timeout per downstream probe).

3. Daedalus Consumption

Registration Flow

User enters registry URL in Daedalus global settings (e.g. http://puck.incus:23030)
Daedalus GETs {url}/.well-known/mcp/server.json
Daedalus stores the PallasInstance with its registry URL
Discovered agents are shown with metadata (title, description, icon)

Workspace Attachment

User selects a registered Pallas instance in workspace settings
Daedalus re-fetches the registry and creates AgentConnection rows for every agent in the instance
All agents from the instance become available in the workspace
Detaching removes all agent connections for that instance from the workspace

Health Polling

Daedalus polls get_health on connected agents at a configurable interval (DAEDALUS_MCP_HEALTH_INTERVAL, default 60 seconds)
Health is cached in memory and exposed via the agent status API
Prometheus gauge daedalus_agent_health{instance, agent} tracks health (1.0=ok, 0.5=degraded, 0.0=error)
If health check fails entirely (connection error, timeout), status is treated as error

Chat Blocking

If the target agent's cached health is error, the chat endpoint returns HTTP 503 and the UI disables the message input
If degraded, a warning bar appears but chat is allowed
Users can create a workspace and attach an instance with unhealthy agents — health only blocks sending messages

4. Agent Progress Notifications

Agent tool calls can take tens of seconds to minutes when the agent enters an agentic loop — calling sub-agents, searching the web, querying knowledge graphs, etc. During this time, the MCP tool call has not yet returned. Without progress feedback, the user sees a dead spinner.

MCP provides a built-in mechanism for this: notifications/progress. Pallas already emits these notifications during agent execution. Daedalus must opt in by sending a progressToken and rendering the notifications it receives.

How It Works

Daedalus                              Pallas (harper, port 24101)
   │                                        │
   │── tools/call ─────────────────────────▶│  { message: "...", _meta: { progressToken: "abc123" } }
   │                                        │
   │                                        │── LLM generates text + tool calls ──▶
   │                                        │
   │◀── notifications/progress ─────────────│  { progressToken: "abc123", progress: 0, message: "research/research__research: started" }
   │                                        │
   │◀── notifications/progress ─────────────│  { progressToken: "abc123", progress: 1, message: "harper step 1 (tool)" }
   │                                        │
   │◀── notifications/progress ─────────────│  { progressToken: "abc123", progress: 2, message: "harper step 2 (llm)" }
   │                                        │
   │◀── notifications/progress ─────────────│  { progressToken: "abc123", progress: 1, total: 1, message: "research/research__research: completed" }
   │                                        │
   │◀── notifications/progress ─────────────│  { progressToken: "abc123", progress: 1, total: 1, message: "tech_research/tech_research__tech_research: completed" }
   │                                        │
   │◀── tools/call result ─────────────────│  { content: [{ type: "text", text: "..." }] }
   │                                        │

All messages flow over the existing SSE connection established by MCP Streamable HTTP. No additional transport is needed.

Daedalus Requirements

Sending the Progress Token

When calling any agent tool (except get_health), Daedalus must include a progressToken in the request's _meta:

result = await session.call_tool(
    "harper",
    arguments={"message": user_input},
    request_params={"_meta": {"progressToken": str(uuid4())}},
)

Without the progressToken, Pallas skips all progress notifications and Daedalus receives nothing until the final result.

Handling Progress Notifications

Daedalus receives notifications/progress messages on the SSE stream during the tool call. Each notification contains:

Field	Type	Description
`progressToken`	string/int	Matches the token sent in the request
`progress`	float	Monotonically increasing step counter
`total`	float \| null	`null` = indeterminate (loop in progress), `1.0` = task finished
`message`	string \| null	Human-readable status text

Message Format

Progress messages follow predictable patterns:

Pattern	Meaning	Example
`{server}/{tool}: started`	Tool invocation began	`research/research__research: started`
`{server}/{tool}: completed`	Tool invocation finished	`tech_research/tech_research__tech_research: completed`
`{server}/{tool}: failed`	Tool invocation failed	`argos/search_web: failed`
`{agent} step N (llm)`	Agent loop: LLM turn	`harper step 2 (llm)`
`{agent} step N (tool)`	Agent loop: tool execution	`harper step 3 (tool)`

Rendering Guidance

Display the message as a status line beneath the "thinking" indicator
Replace the previous status on each new notification (not appended)
When total is null, show an indeterminate progress indicator (spinner)
When total equals progress (typically 1.0/1.0), the specific tool/sub-task has completed — but the overall tool call may still be in progress
Clear the progress indicator when the final tools/call result arrives

Pallas Guarantees

Progress notifications are emitted automatically by FastAgent's MCPToolProgressManager — no additional server-side configuration is needed
Notifications are only sent when the client provides a progressToken
At minimum, on_tool_start (progress 0) and on_tool_complete (progress 1/1) are emitted for every downstream tool invocation
Loop step notifications are emitted when emit_loop_progress=True (the default for all Pallas agents)
Progress notifications are best-effort — if one fails to send, the agent loop continues unaffected

Limitations

LLM intermediate text is not streamed as progress. When the agent says "Let me look into that..." before calling tools, this text is generated server-side during the LLM streaming step but is not forwarded as a progress notification. The text is included in the final tool result. A future enhancement may stream LLM text deltas as progress messages with a distinguishable prefix.
Parallel tool calls emit interleaved progress messages. Each message includes a tool-specific prefix ({server}/{tool}), so Daedalus can track them independently if desired, or simply display the most recent message.

5. Why MCP (Not REST)

Pallas wraps each FastAgent instance in a MultimodalAgentMCPServer and serves it over StreamableHTTP. The MCP transport gives Daedalus:

Tool discovery — session.list_tools() returns the full capability manifest
Streaming — MCP Streamable HTTP handles streaming natively
Health checks — get_health is just another tool call, no separate API surface
Protocol alignment — MCP is the abstraction boundary both above and below Pallas. No MCP→REST→MCP translation layer.

The alternative (REST between Daedalus and Pallas) would require building a custom API layer in Pallas that reimplements what the MCP server already provides, with no simplification on the Daedalus side.

17 KiB Raw Blame History

Pallas MCP Interface Specification

Architecture Overview

1. Registry Endpoint

GET {registry_url}/.well-known/mcp/server.json

Request

Response

Schema

Behaviour

2. Health Tool

MCP tool: get_health

Tool Definition

Invocation

Response

Status Values

Fields

Examples

Implementation Guidance

3. Daedalus Consumption

Registration Flow

Workspace Attachment

Health Polling

Chat Blocking

4. Agent Progress Notifications

How It Works

Daedalus Requirements

Sending the Progress Token

Handling Progress Notifications

Message Format

Rendering Guidance

Pallas Guarantees

Limitations

5. Why MCP (Not REST)

17 KiB

Raw Blame History

`GET {registry_url}/.well-known/mcp/server.json`

MCP tool: `get_health`