feat!: stateless per-request agents; add history + conversation_id to send_message

Make Pallas truly stateless per the 'Pallas is ephemeral' contract.

BREAKING (behavioural, not API):
  * instance_scope changes from 'shared' to 'request' in pallas.server.
    Each MCP tools/call now acquires a freshly-created fast-agent instance
    via the existing create_instance / dispose_instance factories and
    disposes it immediately after the response.

With 'shared' mode:
  * Every MCP caller saw the same agent.message_history, so different
    Daedalus conversations leaked into each other.
  * Mid-chat context was silently truncated once the model window filled.
  * Restarting the Pallas process wiped all in-flight conversation state,
    even though Daedalus had it persisted in Postgres.

With 'request' mode the Pallas process holds no per-conversation state;
the caller (Daedalus) owns history and reseeds it on every turn.

send_message gains two optional arguments:
  * history: list[{role, content, images?}] in chronological order,
    converted to PromptMessageExtended and seeded onto the fresh
    instance's message_history before agent.send().
  * conversation_id: opaque string, logged for trace correlation only —
    Pallas never interprets or persists it.

Malformed history entries (bad role, missing image data/mime_type, etc.)
are skipped with a warning rather than raising, so a single bad row
cannot wipe a whole conversation.

The {agent}_history MCP prompt is still registered under 'request'
scope for backward compatibility but always returns []; history lives
on the client.

Version bumped to 0.2.0.
This commit is contained in:
2026-04-27 08:16:59 -04:00
parent a5b4650dff
commit 95fa6e6fc0
4 changed files with 215 additions and 20 deletions

View File

@@ -132,7 +132,56 @@ No authentication. No query parameters.
---
## 2. Health Tool
## 2. Conversation State & History (Daedalus-owned)
**Pallas is stateless.** As of version `0.2.0`, every MCP `tools/call` is
handled by a freshly-created fast-agent instance that is disposed immediately
after the response. The Pallas process holds **no per-conversation memory
between calls**. This is enforced by `instance_scope="request"` in
`pallas.server` — do not override it.
Conversation history is owned by the client (Daedalus). It must be replayed
on every turn through the `history` argument on `send_message`.
### `send_message` Arguments
Each agent's MCP tool accepts:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `message` | `str` | yes | The new user turn as plain text. |
| `images` | `list[dict]` | no | Images attached to this turn only: `[{"data": base64, "mime_type": "image/png"}]`. Requires a vision-capable model. |
| `history` | `list[dict]` | no | Prior conversation history in chronological order. Entries have shape `{"role": "user" \| "assistant", "content": str, "images"?: [...]}`. When present, seeds the freshly-created agent's `message_history` *before* the new turn is executed. |
| `conversation_id` | `str` | no | Opaque identifier logged by Pallas for trace correlation. Pallas does not interpret or persist it. |
### Rationale
| Problem with shared state | Behaviour with `instance_scope="request"` |
|---------------------------|-------------------------------------------|
| Every caller sees the same `agent.message_history`, so different conversations leak into each other. | Each call gets a fresh, isolated instance. No cross-conversation bleed. |
| Process restart wipes all in-flight context. | There was no in-flight context to wipe — Daedalus reseeds it on the next turn. |
| Context-window trimming happens invisibly inside fast-agent. | Daedalus decides what history to send and how much, based on `capabilities.context_window` from the registry. |
### `{agent}_history` Prompt
Under `instance_scope="request"` the `{agent}_history` MCP prompt is still
registered for backward compatibility but always returns `[]` — history lives
on the client and there is no authoritative server-side copy. Existing
callers that invoke this prompt will not error, but should migrate to
tracking history client-side.
### Backward Compatibility
All new arguments are optional. A client that calls `send_message(message=...)`
with no `history` and no `conversation_id` gets a *zero-history* turn (the
agent sees only the current message). This is correct stateless behaviour —
it is never "the last conversation's context". Existing fast-agent MCP
clients that do not know about `history` will produce one-shot responses,
which is the appropriate and visible failure mode.
---
## 3. Health Tool
### MCP tool: `get_health`
@@ -239,7 +288,7 @@ The tool **must not** invoke the LLM. It should complete in under 1 second (3-se
---
## 3. Daedalus Consumption
## 4. Daedalus Consumption
### Registration Flow
@@ -270,7 +319,7 @@ The tool **must not** invoke the LLM. It should complete in under 1 second (3-se
---
## 4. Agent Progress Notifications
## 5. Agent Progress Notifications
Agent tool calls can take tens of seconds to minutes when the agent enters an agentic loop — calling sub-agents, searching the web, querying knowledge graphs, etc. During this time, the MCP tool call has not yet returned. Without progress feedback, the user sees a dead spinner.
@@ -363,7 +412,7 @@ Progress messages follow predictable patterns:
---
## 5. Why MCP (Not REST)
## 6. Why MCP (Not REST)
Pallas wraps each FastAgent instance in a `MultimodalAgentMCPServer` and serves it over StreamableHTTP. The MCP transport gives Daedalus: