# Pallas — Technical Reference Pallas is the generic runtime that turns [fast-agent](https://github.com/evalstate/fast-agent) agent definitions into StreamableHTTP MCP servers. It is **completely deployment-agnostic**: all environment-specific values (agent names, ports, hosts, model) live in the calling project's configuration files, not in Pallas itself. --- ## Solution Architecture Pallas occupies the middle tier of a three-layer MCP architecture. It bridges a web-facing client (Daedalus) and a constellation of specialised downstream MCP servers. ``` ┌──────────────────────────────────┐ │ Daedalus │ Web UI / FastAPI / MCP client │ Workspace management, chat, │ Discovers agents via registry │ health monitoring, progress │ Calls agent tools via MCP └──────────┬───────────────────────┘ │ MCP over Streamable HTTP ▼ ┌──────────────────────────────────┐ │ Pallas (FastAgent MCP Bridge) │ Python runtime │ │ │ ┌─ Registry (port N) │ GET /.well-known/mcp/server.json │ ├─ Agent: Research (port N+1) │ Chains, routers, sub-agents │ ├─ Agent: Engineering (port N+2)│ Orchestrators, tool pipelines │ └─ Agent: Orchestrator (N+3) │ Delegates across agents │ │ │ Each agent exposes: │ │ • send_message tool │ │ • get_health tool │ │ • {agent}_history prompt │ └──────────┬───────────────────────┘ │ MCP over Streamable HTTP ▼ ┌──────────────────────────────────┐ │ Downstream MCP Servers │ │ │ │ Argos — web search │ │ Neo4j — knowledge graph │ │ Mnemosyne — content library │ │ Kernos — shell execution │ │ Gitea — repository mgmt │ │ Grafana — monitoring │ │ Rommie — system management│ └──────────────────────────────────┘ ``` ### Daedalus → Pallas | Interaction | Mechanism | |---|---| | Agent discovery | `GET {registry}/.well-known/mcp/server.json` — plain HTTP, returns all agents with MCP endpoint URLs | | Agent communication | MCP `tools/call` on `send_message` — text + optional images | | Health monitoring | MCP `tools/call` on `get_health` — programmatic, no LLM invocation | | Progress feedback | MCP `notifications/progress` — streamed over SSE during long-running tool calls | | Conversation history | MCP `prompts/get` on `{agent}_history` — retrieves stored message history | ### Pallas → Downstream Pallas agents call downstream MCP servers via standard MCP tool calls. Each agent declares its servers in its fast-agent definition (`servers=["argos", "neo4j_cypher", ...]`). The server URLs and auth headers are configured in the consuming project's `fastagent.config.yaml`. ### Mnemosyne's Role Mnemosyne provides a content-type-aware knowledge graph with hybrid search (vector + full-text + graph). Agents with `mnemosyne` in their `servers` list gain access to tools for searching documents, browsing libraries and collections, retrieving items, and traversing the concept graph. It complements Neo4j (graph topology and relationships) with content-focused retrieval and re-ranking. ### Why MCP End-to-End Pallas is the protocol boundary — MCP above (from Daedalus) and MCP below (to downstream servers). This eliminates any MCP→REST→MCP translation layer. A single `fast.start_server(transport="http")` call exposes a complete agent as a StreamableHTTP MCP endpoint, giving Daedalus: - **Tool discovery** via `session.list_tools()` - **Native streaming** via MCP Streamable HTTP / SSE - **Health checks** as ordinary tool calls — no separate API surface - **Progress notifications** built into the protocol --- ## Pallas Internal Architecture Pallas is four modules, composed at startup: ``` server.py main() │ ├─ _load_deployment_config() parse agents.yaml ├─ _build_agents_table() {name: (module, port)} ├─ _build_agent_deps() dependency graph │ ├─ _start_all() or _run_single() │ │ │ ├─ _preflight() │ │ ├─ _register_unknown_models() model registration │ │ └─ validate_llm_providers() LLM API key + model checks │ │ │ ├─ start subagents (depends_on) │ ├─ wait for subagent readiness │ ├─ start top-level agents │ │ │ │ │ └─ _start_agent(name) │ │ ├─ import agent module │ │ ├─ MultimodalAgentMCPServer(...) │ │ ├─ _resolve_downstream_servers() │ │ ├─ _preflight_mcp_servers() warn on missing auth │ │ ├─ register_health_tool() │ │ └─ server.run_async() │ │ │ └─ run_registry() Starlette app on registry port │ └─ asyncio.run(...) ``` | Module | Purpose | |---|---| | `pallas.server` | CLI entry point, configuration loading, agent lifecycle orchestration, model registration | | `pallas.registry` | Starlette app serving `GET /.well-known/mcp/server.json` — builds the agent catalogue from `agents.yaml` + `fastagent.config.yaml` | | `pallas.multimodal_server` | `MultimodalAgentMCPServer` — `AgentMCPServer` subclass adding image attachment support and conversation history prompts | | `pallas.health` | Two-layer health: startup LLM preflight validation + runtime `get_health` MCP tool with downstream server probing | --- ## Installation ```bash pip install git+ssh://git@git.helu.ca:22022/r/pallas.git ``` Or as a project dependency: ```toml dependencies = [ "pallas-mcp @ git+ssh://git@git.helu.ca:22022/r/pallas.git", ] ``` Requires Python ≥ 3.13. Key dependencies: `fast-agent-mcp`, `httpx`, `pyyaml`, `starlette`, `uvicorn`. --- ## Project Layout Pallas reads configuration from the **working directory** at runtime. A consuming project looks like: ``` my-project/ ├── agents/ │ ├── __init__.py │ └── jarvis.py # FastAgent definitions ├── agents.yaml # Deployment topology ├── fastagent.config.yaml # FastAgent + model config ├── fastagent.secrets.yaml # API keys (gitignored) └── .env # Secret values (gitignored) ``` Pallas itself contains no agent definitions, model names, ports, or hostnames. Everything is injected by the consuming project. --- ## Configuration Reference ### `agents.yaml` Single source of truth for deployment topology. ```yaml name: my-project # log prefixes and registry names version: "1.0.0" # published in registry entries host: my-host.example.com # hostname for registry URLs namespace: com.example.project # reverse-domain prefix for registry names registry_port: 8200 # port for the registry server agents: jarvis: module: agents.jarvis # importable Python module path port: 8201 # StreamableHTTP port for this agent title: Jarvis # human-readable name (registry) description: "My assistant" # one-line description (registry) depends_on: [research] # optional: start these agents first research: module: agents.research port: 8250 title: Research Agent description: "Web search and knowledge graph" ``` | Field | Required | Description | |---|---|---| | `name` | yes | Project name — used in log prefixes (`[my-project]`) and CLI help | | `version` | no | Semver string published in registry entries. Default: `"1.0.0"` | | `host` | no | Hostname used in registry `remotes[].url`. Default: `"localhost"` | | `namespace` | no | Reverse-domain prefix for registry `server.name` (e.g. `com.example/jarvis`) | | `registry_port` | no | Port for the registry server. Default: `24200` | | `agents..module` | yes | Importable Python module path containing a `fast` instance | | `agents..port` | yes | Port for this agent's StreamableHTTP MCP server | | `agents..title` | no | Display name in registry. Default: `name.title()` | | `agents..description` | no | Description in registry | | `agents..depends_on` | no | List of agent names that must start and become ready before this agent | ### `fastagent.config.yaml` Extensions Pallas reads two keys beyond the standard fast-agent config: ```yaml default_model: openai.my-model-name model_capabilities: vision: false context_window: 200000 max_output_tokens: 32000 ``` | Key | Description | |---|---| | `default_model` | `provider.model-name` format. The provider prefix (`anthropic` or `openai`) determines which LLM provider is active for health checks. | | `model_capabilities.vision` | `true` registers the model with multimodal tokenization; `false` registers as text-only. Default: `false` | | `model_capabilities.context_window` | Context window size in tokens. Default: `131072` | | `model_capabilities.max_output_tokens` | Max output token limit. Default: `16384` | Capabilities are declared explicitly rather than inferred from model name — naming conventions vary across model families, making regex heuristics brittle. These values are both used to register unknown models with fast-agent's `ModelDatabase` and published in the registry response. ### `fastagent.secrets.yaml` ```yaml anthropic: api_key: "${ANTHROPIC_API_KEY}" openai: api_key: "${OPENAI_API_KEY}" base_url: "${OPENAI_BASE_URL}" ``` `${ENV_VAR}` placeholders are expanded at runtime from environment variables. ### `.env` Pallas loads `.env` from the working directory into `os.environ` without overwriting existing variables. This supports both local development and systemd deployments: ```dotenv ANTHROPIC_API_KEY=sk-ant-... OPENAI_API_KEY=sk-... OPENAI_BASE_URL=http://my-llm-server:8080/v1 ``` `OPENAI_BASE_URL` defaults to `https://api.openai.com/v1` if unset. For local llama-cpp, vLLM, or other OpenAI-compatible servers, set it to their endpoint. ### Environment Variables | Variable | Default | Purpose | |---|---|---| | `PALLAS_AGENTS_CONFIG` | `agents.yaml` | Override path to deployment config | --- ## Running Pallas ### CLI ```bash pallas # start all agents + registry pallas --agent jarvis # start a single agent (no registry) python -m pallas.server # equivalent to `pallas` ``` ### Startup Sequence **All agents mode** (`pallas`): 1. Load `agents.yaml`, build agents table and dependency graph 2. **Preflight** — register unknown models with `ModelDatabase`, validate LLM provider API keys and model availability 3. Start the registry server on `registry_port` 4. Start **subagents** (agents listed in other agents' `depends_on`) 5. Wait for each subagent to become ready (HTTP probe on `/mcp`, 60s timeout) 6. Start **top-level agents** (everything not a subagent) 7. All servers run concurrently via `asyncio.gather` **Single agent mode** (`pallas --agent `): 1. Load `agents.yaml` 2. Preflight 3. Start the named agent (no registry, no dependency resolution) ### Per-Agent Startup For each agent: 1. Import the agent module (`agents.`) and obtain its `fast` instance 2. Enter `fast.run()` context — initialises the fast-agent runtime 3. Create a `MultimodalAgentMCPServer` wrapping the primary agent instance 4. Resolve downstream MCP server configs from the fast-agent configuration 5. Warn if any downstream auth headers reference unset environment variables 6. Register the `get_health` MCP tool with downstream server info 7. Bind to `0.0.0.0:` and serve StreamableHTTP --- ## Daedalus Integration This section describes the contract from Pallas's perspective. The full client-side specification is in `docs/pallas_integration.md`. ### Registration Flow 1. Daedalus stores a registry URL (e.g. `http://puck.incus:23030`) 2. Fetches `GET {url}/.well-known/mcp/server.json` 3. Discovers all agents with their MCP endpoint URLs, titles, and descriptions 4. Creates connections to each agent ### Health Polling Daedalus calls `get_health` on each connected agent at a configurable interval (default 60s). The response maps to UI indicators: | `status` | Daedalus behaviour | |---|---| | `ok` | Green badge, normal operation | | `degraded` | Yellow badge + warning banner showing `message`. Chat allowed. | | `error` | Red badge. Chat disabled. | ### Progress Notifications Long-running agent tool calls (agentic loops, sub-agent delegation) emit MCP `notifications/progress` on the SSE stream. Daedalus must include a `progressToken` in the `_meta` of `tools/call` requests to opt in: ```python result = await session.call_tool( "jarvis", arguments={"message": user_input}, request_params={"_meta": {"progressToken": str(uuid4())}}, ) ``` Progress notification fields: | Field | Description | |---|---| | `progressToken` | Matches the token sent in the request | | `progress` | Monotonically increasing step counter | | `total` | `null` = indeterminate (loop in progress), `1.0` = sub-task finished | | `message` | Status text: `{server}/{tool}: started\|completed\|failed` or `{agent} step N (llm\|tool)` | Without a `progressToken`, Pallas skips all progress notifications and the client receives nothing until the final result. ### Chat Blocking If the target agent's cached health is `error`, Daedalus returns HTTP 503 and disables the message input. `degraded` shows a warning but allows chat. --- ## Registry Server ### Endpoint ``` GET {host}:{registry_port}/.well-known/mcp/server.json ``` Plain HTTP — not MCP. No authentication. Returns `application/json`. ### Response Structure Built dynamically from `agents.yaml` + `fastagent.config.yaml`: ```json { "servers": [ { "server": { "$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json", "name": "com.example.project/jarvis", "title": "Jarvis", "description": "My assistant agent", "version": "1.0.0", "remotes": [ { "type": "streamable-http", "url": "http://my-host.example.com:8201/mcp" } ], "capabilities": { "model": "my-model-name", "vision": false, "context_window": 200000, "max_output_tokens": 32000 } }, "_meta": { "io.modelcontextprotocol.registry/official": { "status": "active", "updatedAt": "2026-01-01T00:00:00Z", "isLatest": true } } } ] } ``` ### Registry Name Construction `{namespace}/{slug}` — where `slug` is the agent key with underscores replaced by hyphens. Example: namespace `com.example.project` + agent key `tech_research` → `com.example.project/tech-research`. ### Capabilities If `model_capabilities` is defined in `fastagent.config.yaml`, each registry entry includes a `capabilities` object with model name, vision support, context window, and max output tokens. This allows clients to make informed decisions about what an agent can handle. --- ## Multimodal Support `MultimodalAgentMCPServer` extends fast-agent's `AgentMCPServer` with image attachment support. ### `send_message` Tool Each agent's MCP tool accepts: | Parameter | Type | Required | Description | |---|---|---|---| | `message` | `str` | yes | Text message to the agent | | `images` | `list[dict]` | no | Base64-encoded images: `[{"data": "...", "mime_type": "image/png"}]` | When `images` is provided, the message is sent as a `PromptMessageExtended` containing both `TextContent` and `ImageContent` parts — the agent's underlying model must support vision. ### Conversation History Prompt For agents with `instance_scope != "request"`, a `{agent}_history` prompt is registered that returns the full conversation history as FastMCP `Message` objects. This allows clients to retrieve the stored context. ### Bearer Token Propagation The server captures the authenticated bearer token from the incoming MCP request and propagates it via `request_bearer_token` context variable to downstream calls. --- ## Health System Two-layer health checking: **startup preflight** validates LLM providers before agents launch, and a **runtime `get_health` tool** reports ongoing status. ### Startup Preflight Runs once before any agents start. Validates all LLM providers that have API keys configured. | Provider | Active (default_model matches) | Key set, not active | |---|---|---| | **Anthropic** | `GET /v1/models/{model}` — confirms model exists and key is valid | `GET /v1/models/claude-sonnet-4-5` — verifies API access | | **OpenAI** | `GET {base_url}/models` — lists models, confirms configured model is present | `GET {base_url}/models` — lists available models | - **Warn-only** — never blocks startup. Agents start regardless. - **5-second timeout** per provider API call. - Loads `.env` before checking. ### Runtime `get_health` Tool Registered on each agent's MCP server. Checks: 1. **Downstream MCP servers** — sends an MCP `initialize` handshake to each server URL. Uses `initialize` because it is the only MCP method that works without a pre-established session. After success, sends `DELETE` with the returned `Mcp-Session-Id` to tear down the session cleanly. 3-second timeout. 2. **Active LLM provider** — includes the preflight result for the provider that `default_model` points to. Only the active provider affects health status. ### Response Format ```json { "status": "ok", "timestamp": "2026-01-01T00:00:00Z" } ``` ```json { "status": "degraded", "timestamp": "2026-01-01T00:00:00Z", "message": "Unreachable: neo4j_cypher; LLM: openai: model 'bad-model' not found" } ``` | Status | Meaning | |---|---| | `ok` | All downstream servers reachable and active LLM provider healthy | | `degraded` | One or more downstream servers unreachable, or active LLM provider failed | --- ## Model Registration Pallas registers models not in fast-agent's built-in `ModelDatabase` at startup, using the explicit capability declarations from `fastagent.config.yaml`. The process: 1. Read `default_model` and `model_capabilities` from config 2. Extract the model name (portion after the provider prefix dot) 3. Check if `ModelDatabase` already knows this model — if so, skip 4. Register with `ModelDatabase.register_runtime_model_params()`: - `vision: true` → multimodal tokenization (`QWEN_MULTIMODAL`) - `vision: false` → text-only tokenization (`TEXT_ONLY`) - `context_window` and `max_output_tokens` from config (with sensible defaults) This avoids the brittle pattern of inferring capabilities from model name substrings, which breaks for custom or fine-tuned models with non-standard names. --- ## Module Reference | Module | File | Purpose | |---|---|---| | `pallas.server` | `server.py` | CLI entry point (`pallas` command), configuration loading, agent lifecycle orchestration, dependency ordering, model registration | | `pallas.registry` | `registry.py` | Starlette app serving `GET /.well-known/mcp/server.json` — agent catalogue built from config | | `pallas.multimodal_server` | `multimodal_server.py` | `MultimodalAgentMCPServer` — extends `AgentMCPServer` with image support, conversation history prompts, bearer token propagation | | `pallas.health` | `health.py` | LLM provider preflight validation, downstream MCP server probing, `get_health` tool registration |