feat: add Mantle override for AWS Bedrock Anthropic endpoint

Introduce `model_capabilities.mantle` flag that installs a provider-specific override in fast-agent's `ModelDatabase._PROVIDER_MODEL_OVERRIDES` to strip features the AWS Bedrock Mantle endpoint rejects (beta headers, extended thinking, task budgets, web tools, prompt caching). Without this override, fast-agent sends default beta headers and `thinking` parameters for modern Claude models that Mantle rejects with a misleading 404 "model does not exist" error.
2026-05-12 07:41:41 -04:00
parent 4b954ed842
commit fe94f6a9a8
3 changed files with 482 additions and 17 deletions
--- a/README.md
+++ b/README.md
@@ -89,11 +89,32 @@ model_capabilities:
  vision: false
  context_window: 200000
  max_output_tokens: 32000
+  mantle: false               # optional — see "Mantle override" below
 ```

 Capabilities are published in the registry and used to register unknown models
 with fast-agent's `ModelDatabase`.

+### Mantle override (`model_capabilities.mantle: true`)
+
+Set this when the `anthropic.base_url` points at the AWS Bedrock **Mantle**
+endpoint (`https://bedrock-mantle.{region}.api.aws/anthropic`). Pallas then
+installs a provider-specific override for `(Provider.ANTHROPIC, model_name)`
+in fast-agent's `ModelDatabase._PROVIDER_MODEL_OVERRIDES` that clones the
+model's base parameters but strips the features Mantle rejects:
+
+- `anthropic_required_betas` — no `anthropic-beta: ...` header
+- `reasoning` / `reasoning_effort_spec` — no extended-thinking request
+- `anthropic_task_budget_supported` — no task budget
+- `anthropic_web_fetch_version` / `anthropic_web_search_version` — no web tools
+- `cache_ttl` — prompt caching disabled
+
+Without this flag, fast-agent sends its default beta headers and `thinking`
+parameters for modern Claude models (e.g. Opus 4.7, Sonnet 4.6) which Mantle
+rejects with a misleading `404 "The model '...' does not exist"`. See
+`docs/bedrock.md` for the full configuration walkthrough.
+
+
 ---

 ## Environment variable
--- a/docs/bedrock.md
+++ b/docs/bedrock.md
@@ -0,0 +1,392 @@
+# AWS Bedrock Integration
+
+Pallas supports AWS Bedrock through three integration paths, depending on the model and endpoint:
+
+| Path | fast-agent provider | Auth | Use when |
+|---|---|---|---|
+| [Direct Bedrock](#path-1-direct-bedrock-converse-api) | `bedrock` | AWS IAM / long-term key | Any Bedrock model; required for Sonnet 4.6 |
+| [Mantle → Anthropic](#path-2-mantle-anthropic-messages-api) | `anthropic` | Bedrock long-term API key | Claude models with Mantle support (Haiku 4.5, Opus 4.7) |
+| [Mantle → OpenAI](#path-3-mantle-openai-chat-completions) | `openai` | Bedrock long-term API key | Non-Anthropic models on Mantle (MiniMax M2.5, etc.) |
+
+**Mantle** is AWS's OpenAI-compatible and Anthropic-compatible gateway for Bedrock. It simplifies authentication (one long-term API key instead of IAM credential management) and is the recommended path when the target model supports it.
+
+---
+
+## Supported Models
+
+| Model | Bedrock model ID | Direct Bedrock | Mantle |
+|---|---|---|---|
+| Claude Haiku 4.5 | `anthropic.claude-haiku-4-5-20251001-v1:0` | ✓ | ✓ (Anthropic Messages API) |
+| Claude Sonnet 4.6 | `anthropic.claude-sonnet-4-6` | ✓ | ✗ |
+| Claude Opus 4.7 | `anthropic.claude-opus-4-7` | ✓ | ✓ (Anthropic Messages API) |
+| MiniMax M2.5 | `minimax.minimax-m2.5` | ✓ | ✓ (OpenAI Chat Completions) |
+
+Cross-region inference IDs (e.g. `us.anthropic.claude-opus-4-7`, `eu.anthropic.claude-sonnet-4-6`) can be used as the model ID for the `bedrock` provider to route across regions within a geography for higher throughput.
+
+---
+
+## Path 1: Direct Bedrock (Converse API)
+
+Fast-agent's `bedrock` provider calls the AWS Bedrock Converse API via `boto3`. This path works for all Bedrock models and is the only option for models without Mantle support (e.g. Claude Sonnet 4.6).
+
+### Prerequisites
+
+1. **Install `boto3`** — not included in fast-agent by default:
+
+   ```toml
+   # pyproject.toml
+   dependencies = [
+       "pallas-mcp @ git+ssh://git@git.helu.ca:22022/r/pallas.git",
+       "boto3",
+   ]
+   ```
+
+2. **AWS credentials** — the Bedrock provider uses the standard AWS credential chain in priority order:
+   - `AWS_BEARER_TOKEN_BEDROCK` environment variable (long-term Bedrock API key — see below)
+   - `AWS_ACCESS_KEY_ID` + `AWS_SECRET_ACCESS_KEY` environment variables
+   - `~/.aws/credentials` file (named profile or `default`)
+   - IAM instance role (EC2, ECS, Lambda)
+
+   The simplest approach for a server deployment is a **long-term Bedrock API key** generated from the [Amazon Bedrock console](https://console.aws.amazon.com/bedrock/home#/api-keys/long-term/create). Set it as `AWS_BEARER_TOKEN_BEDROCK`.
+
+3. **Enable model access** in the [Bedrock console](https://console.aws.amazon.com/bedrock/home#/modelaccess) for your target region.
+
+### `fastagent.config.yaml`
+
+```yaml
+default_model: bedrock.us.anthropic.claude-sonnet-4-6
+
+# ── Model Capabilities ──────────────────────────────────────────────────────
+# Required: Bedrock model IDs are not in fast-agent's ModelDatabase.
+model_capabilities:
+  vision: true                  # true for Claude models (image input supported)
+  context_window: 1000000       # 1M for Sonnet 4.6
+  max_output_tokens: 64000
+
+# ── Bedrock provider ─────────────────────────────────────────────────────────
+bedrock:
+  region: us-east-1             # or set AWS_REGION / AWS_DEFAULT_REGION
+  profile: default              # optional; or set AWS_PROFILE
+  reasoning: medium             # optional: minimal | low | medium | high
+```
+
+The `default_model` format is `bedrock.<model-id>`. Use a cross-region inference ID (e.g. `us.anthropic.claude-sonnet-4-6`) for geo-distributed routing, or the plain model ID (e.g. `anthropic.claude-sonnet-4-6`) for in-region only.
+
+### `fastagent.secrets.yaml`
+
+No API key entry is needed — credentials come from the AWS credential chain. If you are using a long-term Bedrock API key, set it in `.env` or the environment:
+
+```yaml
+# fastagent.secrets.yaml — nothing required for Bedrock credentials
+# AWS credentials are read from environment variables or ~/.aws/credentials
+```
+
+### `.env`
+
+```dotenv
+# Long-term Bedrock API key (recommended for server deployments)
+AWS_BEARER_TOKEN_BEDROCK=your-bedrock-api-key
+
+# Or use IAM access keys
+# AWS_ACCESS_KEY_ID=AKIA...
+# AWS_SECRET_ACCESS_KEY=...
+
+AWS_REGION=us-east-1
+```
+
+### `agents.yaml`
+
+No Bedrock-specific changes are needed. The `default_model` in `fastagent.config.yaml` is picked up automatically:
+
+```yaml
+name: my-project
+version: "1.0.0"
+host: my-host.example.com
+registry_port: 8200
+
+agents:
+  jarvis:
+    module: agents.jarvis
+    port: 8201
+    title: Jarvis
+    description: "My assistant"
+```
+
+To use a different Bedrock model for a specific agent, set `model` on the agent entry:
+
+```yaml
+agents:
+  jarvis:
+    module: agents.jarvis
+    port: 8201
+    model: bedrock.us.anthropic.claude-haiku-4-5-20251001-v1:0
+    model_capabilities:
+      vision: true
+      context_window: 200000
+      max_output_tokens: 64000
+```
+
+### Model capability reference
+
+| Model | `vision` | `context_window` | `max_output_tokens` |
+|---|---|---|---|
+| Claude Haiku 4.5 | `true` | `200000` | `64000` |
+| Claude Sonnet 4.6 | `true` | `1000000` | `64000` |
+| Claude Opus 4.7 | `true` | `1000000` | `128000` |
+| MiniMax M2.5 | `false` | `196000` | `8000` |
+
+### IAM permissions
+
+The IAM principal (user, role, or instance profile) needs:
+
+```json
+{
+  "Effect": "Allow",
+  "Action": [
+    "bedrock:InvokeModel",
+    "bedrock:InvokeModelWithResponseStream"
+  ],
+  "Resource": "arn:aws:bedrock:*::foundation-model/*"
+}
+```
+
+For cross-region inference, also allow:
+
+```json
+{
+  "Effect": "Allow",
+  "Action": [
+    "bedrock:InvokeModel",
+    "bedrock:InvokeModelWithResponseStream"
+  ],
+  "Resource": "arn:aws:bedrock:*:*:inference-profile/*"
+}
+```
+
+### Terraform snippet
+
+```hcl
+resource "aws_iam_policy" "bedrock_invoke" {
+  name = "bedrock-invoke"
+
+  policy = jsonencode({
+    Version = "2012-10-17"
+    Statement = [
+      {
+        Effect = "Allow"
+        Action = [
+          "bedrock:InvokeModel",
+          "bedrock:InvokeModelWithResponseStream",
+        ]
+        Resource = [
+          "arn:aws:bedrock:*::foundation-model/*",
+          "arn:aws:bedrock:*:*:inference-profile/*",
+        ]
+      }
+    ]
+  })
+}
+```
+
+---
+
+## Path 2: Mantle — Anthropic Messages API
+
+Mantle exposes the Anthropic Messages API for supported Claude models. Fast-agent's `anthropic` provider uses the Anthropic Python SDK (`AsyncAnthropic`), which calls `/v1/messages` — exactly what Mantle serves at `https://bedrock-mantle.{region}.api.aws/anthropic`.
+
+**Supported models:** Claude Haiku 4.5, Claude Opus 4.7. Claude Sonnet 4.6 does **not** have a Mantle endpoint and must use [Path 1](#path-1-direct-bedrock-converse-api).
+
+> **Note on Opus 4.7 and Chat Completions:** The AWS model card notes that Opus 4.7 does not support Chat Completions on Mantle. This does not affect fast-agent — the `anthropic` provider uses the Anthropic Messages API, not Chat Completions.
+
+### Prerequisites
+
+1. **Generate a long-term Bedrock API key** from the [Amazon Bedrock console](https://console.aws.amazon.com/bedrock/home#/api-keys/long-term/create).
+
+2. **Enable model access** in the Bedrock console for your target region.
+
+3. No additional Python packages needed — `anthropic` is already a fast-agent dependency.
+
+### `fastagent.config.yaml`
+
+```yaml
+default_model: anthropic.claude-opus-4-7
+
+# ── Model Capabilities ──────────────────────────────────────────────────────
+# mantle: true is REQUIRED — it installs a Pallas-level provider override that
+# strips the features the Mantle endpoint rejects (anthropic-beta headers,
+# extended thinking, task budget, web tools, prompt caching). Without this
+# flag fast-agent sends those features and Mantle returns a misleading
+# 404 "model does not exist" error.
+model_capabilities:
+  vision: true
+  context_window: 1000000
+  max_output_tokens: 128000
+  mantle: true
+
+# ── Anthropic provider pointing at Mantle ────────────────────────────────────
+anthropic:
+  base_url: "https://bedrock-mantle.us-east-1.api.aws/anthropic"
+```
+
+The Anthropic SDK appends `/v1/messages` to `base_url` automatically.
+
+> **Why `mantle: true` is required.** Fast-agent's built-in `ModelDatabase`
+> entries for Claude Opus 4.7 and Haiku 4.5 declare features that the
+> Anthropic API supports but the Mantle endpoint rejects —
+> `anthropic-beta: code-execution-web-tools-...` headers, extended thinking,
+> task budget, web search/fetch tools, and prompt caching in some
+> configurations. When Mantle sees a request carrying those features it
+> responds with a confusingly generic `{"type": "not_found_error",
+> "message": "The model '...' does not exist"}`. Pallas reads the `mantle`
+> flag and writes an entry into fast-agent's `_PROVIDER_MODEL_OVERRIDES`
+> dict for `(Provider.ANTHROPIC, <model>)` that strips those fields, so
+> fast-agent sends a plain Messages API request that Mantle accepts.
+
+
+### `fastagent.secrets.yaml`
+
+```yaml
+anthropic:
+  api_key: "${BEDROCK_API_KEY}"
+```
+
+### `.env`
+
+```dotenv
+BEDROCK_API_KEY=your-bedrock-long-term-api-key
+```
+
+### `agents.yaml`
+
+No Bedrock-specific changes needed. Example:
+
+```yaml
+name: my-project
+version: "1.0.0"
+host: my-host.example.com
+registry_port: 8200
+
+agents:
+  jarvis:
+    module: agents.jarvis
+    port: 8201
+    title: Jarvis
+    description: "My assistant"
+```
+
+### IAM permissions
+
+No IAM permissions are required when using a long-term Bedrock API key. The key itself carries the necessary access. If you need to restrict which models the key can invoke, use resource-based policies in the Bedrock console.
+
+---
+
+## Path 3: Mantle — OpenAI Chat Completions
+
+Mantle exposes an OpenAI-compatible Chat Completions endpoint (`/v1`) for non-Anthropic models such as MiniMax M2.5. Fast-agent's `openai` provider (or `generic` provider) can point at this endpoint.
+
+**Supported models:** MiniMax M2.5 (`minimax.minimax-m2.5`), and any other Bedrock model that Mantle exposes via Chat Completions.
+
+### Prerequisites
+
+1. **Generate a long-term Bedrock API key** from the [Amazon Bedrock console](https://console.aws.amazon.com/bedrock/home#/api-keys/long-term/create).
+
+2. **Enable model access** in the Bedrock console for your target region.
+
+### `fastagent.config.yaml`
+
+```yaml
+default_model: openai.minimax.minimax-m2.5
+
+# ── Model Capabilities ──────────────────────────────────────────────────────
+model_capabilities:
+  vision: false
+  context_window: 196000
+  max_output_tokens: 8000
+
+# ── OpenAI provider pointing at Mantle ───────────────────────────────────────
+openai:
+  base_url: "https://bedrock-mantle.us-east-1.api.aws/v1"
+```
+
+### `fastagent.secrets.yaml`
+
+```yaml
+openai:
+  api_key: "${BEDROCK_API_KEY}"
+```
+
+### `.env`
+
+```dotenv
+BEDROCK_API_KEY=your-bedrock-long-term-api-key
+```
+
+---
+
+## Health Checks
+
+### Startup preflight
+
+Pallas's `validate_llm_providers()` runs at startup and checks:
+
+| Provider | What is checked |
+|---|---|
+| `anthropic` | `GET {base_url}/v1/models/{model}` — confirms model exists and key is valid |
+| `openai` | `GET {base_url}/models` — lists models, confirms configured model is present |
+| `bedrock` | **No preflight check** — credential errors surface on the first inference call |
+
+For the `bedrock` provider, startup will succeed even with missing or invalid credentials. The first agent call will raise a `ProviderKeyError` with a message directing you to configure AWS credentials.
+
+### Runtime `get_health` tool
+
+The `get_health` MCP tool probes downstream MCP servers regardless of which LLM provider is active. LLM provider health (from the startup preflight) is included in the response for `anthropic` and `openai` providers. For `bedrock`, the LLM section of the health response will be absent.
+
+---
+
+## Troubleshooting
+
+### `NoCredentialsError` / `ProviderKeyError: AWS credentials not found`
+
+The `bedrock` provider could not find AWS credentials. Check in order:
+
+1. Is `AWS_BEARER_TOKEN_BEDROCK` set in `.env` or the environment?
+2. Is `~/.aws/credentials` present and does it contain the expected profile?
+3. Is the IAM role attached to the instance/container?
+
+### Model not found in `ModelDatabase`
+
+```
+KeyError: 'anthropic.claude-sonnet-4-6'
+```
+
+Pallas requires `model_capabilities` in `fastagent.config.yaml` for any model not in fast-agent's built-in database. All Bedrock model IDs fall into this category. Add:
+
+```yaml
+model_capabilities:
+  vision: true          # or false
+  context_window: 1000000
+  max_output_tokens: 64000
+```
+
+### `ValidationError` on `default_model`
+
+The `default_model` format must be `provider.model-id`. Examples:
+
+```yaml
+default_model: bedrock.us.anthropic.claude-sonnet-4-6   # Direct Bedrock, geo inference
+default_model: bedrock.anthropic.claude-sonnet-4-6       # Direct Bedrock, in-region
+default_model: anthropic.claude-opus-4-7                 # Mantle via Anthropic provider
+default_model: openai.minimax.minimax-m2.5               # Mantle via OpenAI provider
+```
+
+### Cross-region inference access denied
+
+If you use a geo inference ID (e.g. `us.anthropic.claude-sonnet-4-6`) and receive an access denied error, ensure the IAM policy includes `arn:aws:bedrock:*:*:inference-profile/*` in the `Resource` list. In-region model IDs do not require this.
+
+### Mantle 401 Unauthorized
+
+The Bedrock long-term API key is invalid or expired. Regenerate it from the [Bedrock console](https://console.aws.amazon.com/bedrock/home#/api-keys/long-term/create) and update `BEDROCK_API_KEY` in `.env`.
+
+### Claude Sonnet 4.6 on Mantle returns 404
+
+Claude Sonnet 4.6 does not have a Mantle endpoint. Use the `bedrock` provider (Path 1) with model ID `anthropic.claude-sonnet-4-6` or the geo inference ID `us.anthropic.claude-sonnet-4-6`.
--- a/pallas/server.py
+++ b/pallas/server.py
@@ -123,18 +123,44 @@ def _preflight_mcp_servers(agent_name: str, servers: dict[str, dict]) -> None:
 # ── Model registration ────────────────────────────────────────────────────────

 def _register_one_model(model_spec: str, capabilities: dict) -> None:
-    """Register a single model with fast-agent's ModelDatabase if unknown."""
+    """Register a single model with fast-agent's ModelDatabase.
+
+    Two cases:
+
+    1. **Unknown model** — if fast-agent has no built-in entry for this model,
+       register a minimal ``ModelParameters`` with the declared capabilities.
+
+    2. **Mantle-hosted model** (``capabilities.mantle: true``) — regardless of
+       whether the model has a built-in entry, install a provider-specific
+       override for ``(Provider.ANTHROPIC, model_name)`` in
+       ``_PROVIDER_MODEL_OVERRIDES`` that strips the features the AWS Bedrock
+       Mantle endpoint rejects:
+
+       - ``anthropic_required_betas`` (no ``anthropic-beta`` header)
+       - ``reasoning`` / ``reasoning_effort_spec`` (no extended-thinking request)
+       - ``anthropic_task_budget_supported``
+       - ``anthropic_web_fetch_version`` / ``anthropic_web_search_version``
+       - ``cache_ttl`` (prompt caching is not advertised as supported on
+         Mantle for every model; disable the cache planner by default)
+
+       Without this override fast-agent sends beta headers and ``thinking``
+       parameters that Mantle rejects with a misleading ``"model does not
+       exist"`` 404.
+    """
    from fast_agent.llm.model_database import ModelDatabase, ModelParameters
+    from fast_agent.llm.provider_types import Provider

    model_name = model_spec.split(".", 1)[-1] if "." in model_spec else model_spec

-    if ModelDatabase.get_model_params(model_name) is not None:
-        return
-
    is_vision = capabilities.get("vision", False)
    context_window = capabilities.get("context_window", 131072)
    max_output_tokens = capabilities.get("max_output_tokens", 16384)
+    is_mantle = capabilities.get("mantle", False)

+    existing = ModelDatabase.get_model_params(model_name)
+
+    if existing is None:
+        # Unknown model — register a fresh runtime entry.
        if is_vision:
            tokenizes = list(ModelDatabase.QWEN_MULTIMODAL)
            logger.info("Registered model '%s' with vision capabilities", model_name)
@@ -150,6 +176,32 @@ def _register_one_model(model_spec: str, capabilities: dict) -> None:
                tokenizes=tokenizes,
            ),
        )
+        base_params = ModelDatabase.get_model_params(model_name)
+    else:
+        base_params = existing
+
+    if is_mantle and base_params is not None:
+        # Clone the base params and strip Mantle-incompatible features.
+        override = base_params.model_copy(
+            update={
+                "context_window": context_window,
+                "max_output_tokens": max_output_tokens,
+                "anthropic_required_betas": None,
+                "reasoning": None,
+                "reasoning_effort_spec": None,
+                "anthropic_task_budget_supported": False,
+                "anthropic_web_fetch_version": None,
+                "anthropic_web_search_version": None,
+                "cache_ttl": None,
+            }
+        )
+        normalized = ModelDatabase.normalize_model_name(model_name)
+        ModelDatabase._PROVIDER_MODEL_OVERRIDES[(Provider.ANTHROPIC, normalized)] = override
+        logger.info(
+            "Registered Mantle override for anthropic/'%s' (strips beta headers, thinking, web tools, caching)",
+            model_name,
+        )
+


 def _register_unknown_models(deployment_config: dict) -> None: