diff --git a/README.md b/README.md index 195e63d..60838f7 100644 --- a/README.md +++ b/README.md @@ -89,11 +89,32 @@ model_capabilities: vision: false context_window: 200000 max_output_tokens: 32000 + mantle: false # optional — see "Mantle override" below ``` Capabilities are published in the registry and used to register unknown models with fast-agent's `ModelDatabase`. +### Mantle override (`model_capabilities.mantle: true`) + +Set this when the `anthropic.base_url` points at the AWS Bedrock **Mantle** +endpoint (`https://bedrock-mantle.{region}.api.aws/anthropic`). Pallas then +installs a provider-specific override for `(Provider.ANTHROPIC, model_name)` +in fast-agent's `ModelDatabase._PROVIDER_MODEL_OVERRIDES` that clones the +model's base parameters but strips the features Mantle rejects: + +- `anthropic_required_betas` — no `anthropic-beta: ...` header +- `reasoning` / `reasoning_effort_spec` — no extended-thinking request +- `anthropic_task_budget_supported` — no task budget +- `anthropic_web_fetch_version` / `anthropic_web_search_version` — no web tools +- `cache_ttl` — prompt caching disabled + +Without this flag, fast-agent sends its default beta headers and `thinking` +parameters for modern Claude models (e.g. Opus 4.7, Sonnet 4.6) which Mantle +rejects with a misleading `404 "The model '...' does not exist"`. See +`docs/bedrock.md` for the full configuration walkthrough. + + --- ## Environment variable diff --git a/docs/bedrock.md b/docs/bedrock.md new file mode 100644 index 0000000..cc8b46c --- /dev/null +++ b/docs/bedrock.md @@ -0,0 +1,392 @@ +# AWS Bedrock Integration + +Pallas supports AWS Bedrock through three integration paths, depending on the model and endpoint: + +| Path | fast-agent provider | Auth | Use when | +|---|---|---|---| +| [Direct Bedrock](#path-1-direct-bedrock-converse-api) | `bedrock` | AWS IAM / long-term key | Any Bedrock model; required for Sonnet 4.6 | +| [Mantle → Anthropic](#path-2-mantle-anthropic-messages-api) | `anthropic` | Bedrock long-term API key | Claude models with Mantle support (Haiku 4.5, Opus 4.7) | +| [Mantle → OpenAI](#path-3-mantle-openai-chat-completions) | `openai` | Bedrock long-term API key | Non-Anthropic models on Mantle (MiniMax M2.5, etc.) | + +**Mantle** is AWS's OpenAI-compatible and Anthropic-compatible gateway for Bedrock. It simplifies authentication (one long-term API key instead of IAM credential management) and is the recommended path when the target model supports it. + +--- + +## Supported Models + +| Model | Bedrock model ID | Direct Bedrock | Mantle | +|---|---|---|---| +| Claude Haiku 4.5 | `anthropic.claude-haiku-4-5-20251001-v1:0` | ✓ | ✓ (Anthropic Messages API) | +| Claude Sonnet 4.6 | `anthropic.claude-sonnet-4-6` | ✓ | ✗ | +| Claude Opus 4.7 | `anthropic.claude-opus-4-7` | ✓ | ✓ (Anthropic Messages API) | +| MiniMax M2.5 | `minimax.minimax-m2.5` | ✓ | ✓ (OpenAI Chat Completions) | + +Cross-region inference IDs (e.g. `us.anthropic.claude-opus-4-7`, `eu.anthropic.claude-sonnet-4-6`) can be used as the model ID for the `bedrock` provider to route across regions within a geography for higher throughput. + +--- + +## Path 1: Direct Bedrock (Converse API) + +Fast-agent's `bedrock` provider calls the AWS Bedrock Converse API via `boto3`. This path works for all Bedrock models and is the only option for models without Mantle support (e.g. Claude Sonnet 4.6). + +### Prerequisites + +1. **Install `boto3`** — not included in fast-agent by default: + + ```toml + # pyproject.toml + dependencies = [ + "pallas-mcp @ git+ssh://git@git.helu.ca:22022/r/pallas.git", + "boto3", + ] + ``` + +2. **AWS credentials** — the Bedrock provider uses the standard AWS credential chain in priority order: + - `AWS_BEARER_TOKEN_BEDROCK` environment variable (long-term Bedrock API key — see below) + - `AWS_ACCESS_KEY_ID` + `AWS_SECRET_ACCESS_KEY` environment variables + - `~/.aws/credentials` file (named profile or `default`) + - IAM instance role (EC2, ECS, Lambda) + + The simplest approach for a server deployment is a **long-term Bedrock API key** generated from the [Amazon Bedrock console](https://console.aws.amazon.com/bedrock/home#/api-keys/long-term/create). Set it as `AWS_BEARER_TOKEN_BEDROCK`. + +3. **Enable model access** in the [Bedrock console](https://console.aws.amazon.com/bedrock/home#/modelaccess) for your target region. + +### `fastagent.config.yaml` + +```yaml +default_model: bedrock.us.anthropic.claude-sonnet-4-6 + +# ── Model Capabilities ────────────────────────────────────────────────────── +# Required: Bedrock model IDs are not in fast-agent's ModelDatabase. +model_capabilities: + vision: true # true for Claude models (image input supported) + context_window: 1000000 # 1M for Sonnet 4.6 + max_output_tokens: 64000 + +# ── Bedrock provider ───────────────────────────────────────────────────────── +bedrock: + region: us-east-1 # or set AWS_REGION / AWS_DEFAULT_REGION + profile: default # optional; or set AWS_PROFILE + reasoning: medium # optional: minimal | low | medium | high +``` + +The `default_model` format is `bedrock.`. Use a cross-region inference ID (e.g. `us.anthropic.claude-sonnet-4-6`) for geo-distributed routing, or the plain model ID (e.g. `anthropic.claude-sonnet-4-6`) for in-region only. + +### `fastagent.secrets.yaml` + +No API key entry is needed — credentials come from the AWS credential chain. If you are using a long-term Bedrock API key, set it in `.env` or the environment: + +```yaml +# fastagent.secrets.yaml — nothing required for Bedrock credentials +# AWS credentials are read from environment variables or ~/.aws/credentials +``` + +### `.env` + +```dotenv +# Long-term Bedrock API key (recommended for server deployments) +AWS_BEARER_TOKEN_BEDROCK=your-bedrock-api-key + +# Or use IAM access keys +# AWS_ACCESS_KEY_ID=AKIA... +# AWS_SECRET_ACCESS_KEY=... + +AWS_REGION=us-east-1 +``` + +### `agents.yaml` + +No Bedrock-specific changes are needed. The `default_model` in `fastagent.config.yaml` is picked up automatically: + +```yaml +name: my-project +version: "1.0.0" +host: my-host.example.com +registry_port: 8200 + +agents: + jarvis: + module: agents.jarvis + port: 8201 + title: Jarvis + description: "My assistant" +``` + +To use a different Bedrock model for a specific agent, set `model` on the agent entry: + +```yaml +agents: + jarvis: + module: agents.jarvis + port: 8201 + model: bedrock.us.anthropic.claude-haiku-4-5-20251001-v1:0 + model_capabilities: + vision: true + context_window: 200000 + max_output_tokens: 64000 +``` + +### Model capability reference + +| Model | `vision` | `context_window` | `max_output_tokens` | +|---|---|---|---| +| Claude Haiku 4.5 | `true` | `200000` | `64000` | +| Claude Sonnet 4.6 | `true` | `1000000` | `64000` | +| Claude Opus 4.7 | `true` | `1000000` | `128000` | +| MiniMax M2.5 | `false` | `196000` | `8000` | + +### IAM permissions + +The IAM principal (user, role, or instance profile) needs: + +```json +{ + "Effect": "Allow", + "Action": [ + "bedrock:InvokeModel", + "bedrock:InvokeModelWithResponseStream" + ], + "Resource": "arn:aws:bedrock:*::foundation-model/*" +} +``` + +For cross-region inference, also allow: + +```json +{ + "Effect": "Allow", + "Action": [ + "bedrock:InvokeModel", + "bedrock:InvokeModelWithResponseStream" + ], + "Resource": "arn:aws:bedrock:*:*:inference-profile/*" +} +``` + +### Terraform snippet + +```hcl +resource "aws_iam_policy" "bedrock_invoke" { + name = "bedrock-invoke" + + policy = jsonencode({ + Version = "2012-10-17" + Statement = [ + { + Effect = "Allow" + Action = [ + "bedrock:InvokeModel", + "bedrock:InvokeModelWithResponseStream", + ] + Resource = [ + "arn:aws:bedrock:*::foundation-model/*", + "arn:aws:bedrock:*:*:inference-profile/*", + ] + } + ] + }) +} +``` + +--- + +## Path 2: Mantle — Anthropic Messages API + +Mantle exposes the Anthropic Messages API for supported Claude models. Fast-agent's `anthropic` provider uses the Anthropic Python SDK (`AsyncAnthropic`), which calls `/v1/messages` — exactly what Mantle serves at `https://bedrock-mantle.{region}.api.aws/anthropic`. + +**Supported models:** Claude Haiku 4.5, Claude Opus 4.7. Claude Sonnet 4.6 does **not** have a Mantle endpoint and must use [Path 1](#path-1-direct-bedrock-converse-api). + +> **Note on Opus 4.7 and Chat Completions:** The AWS model card notes that Opus 4.7 does not support Chat Completions on Mantle. This does not affect fast-agent — the `anthropic` provider uses the Anthropic Messages API, not Chat Completions. + +### Prerequisites + +1. **Generate a long-term Bedrock API key** from the [Amazon Bedrock console](https://console.aws.amazon.com/bedrock/home#/api-keys/long-term/create). + +2. **Enable model access** in the Bedrock console for your target region. + +3. No additional Python packages needed — `anthropic` is already a fast-agent dependency. + +### `fastagent.config.yaml` + +```yaml +default_model: anthropic.claude-opus-4-7 + +# ── Model Capabilities ────────────────────────────────────────────────────── +# mantle: true is REQUIRED — it installs a Pallas-level provider override that +# strips the features the Mantle endpoint rejects (anthropic-beta headers, +# extended thinking, task budget, web tools, prompt caching). Without this +# flag fast-agent sends those features and Mantle returns a misleading +# 404 "model does not exist" error. +model_capabilities: + vision: true + context_window: 1000000 + max_output_tokens: 128000 + mantle: true + +# ── Anthropic provider pointing at Mantle ──────────────────────────────────── +anthropic: + base_url: "https://bedrock-mantle.us-east-1.api.aws/anthropic" +``` + +The Anthropic SDK appends `/v1/messages` to `base_url` automatically. + +> **Why `mantle: true` is required.** Fast-agent's built-in `ModelDatabase` +> entries for Claude Opus 4.7 and Haiku 4.5 declare features that the +> Anthropic API supports but the Mantle endpoint rejects — +> `anthropic-beta: code-execution-web-tools-...` headers, extended thinking, +> task budget, web search/fetch tools, and prompt caching in some +> configurations. When Mantle sees a request carrying those features it +> responds with a confusingly generic `{"type": "not_found_error", +> "message": "The model '...' does not exist"}`. Pallas reads the `mantle` +> flag and writes an entry into fast-agent's `_PROVIDER_MODEL_OVERRIDES` +> dict for `(Provider.ANTHROPIC, )` that strips those fields, so +> fast-agent sends a plain Messages API request that Mantle accepts. + + +### `fastagent.secrets.yaml` + +```yaml +anthropic: + api_key: "${BEDROCK_API_KEY}" +``` + +### `.env` + +```dotenv +BEDROCK_API_KEY=your-bedrock-long-term-api-key +``` + +### `agents.yaml` + +No Bedrock-specific changes needed. Example: + +```yaml +name: my-project +version: "1.0.0" +host: my-host.example.com +registry_port: 8200 + +agents: + jarvis: + module: agents.jarvis + port: 8201 + title: Jarvis + description: "My assistant" +``` + +### IAM permissions + +No IAM permissions are required when using a long-term Bedrock API key. The key itself carries the necessary access. If you need to restrict which models the key can invoke, use resource-based policies in the Bedrock console. + +--- + +## Path 3: Mantle — OpenAI Chat Completions + +Mantle exposes an OpenAI-compatible Chat Completions endpoint (`/v1`) for non-Anthropic models such as MiniMax M2.5. Fast-agent's `openai` provider (or `generic` provider) can point at this endpoint. + +**Supported models:** MiniMax M2.5 (`minimax.minimax-m2.5`), and any other Bedrock model that Mantle exposes via Chat Completions. + +### Prerequisites + +1. **Generate a long-term Bedrock API key** from the [Amazon Bedrock console](https://console.aws.amazon.com/bedrock/home#/api-keys/long-term/create). + +2. **Enable model access** in the Bedrock console for your target region. + +### `fastagent.config.yaml` + +```yaml +default_model: openai.minimax.minimax-m2.5 + +# ── Model Capabilities ────────────────────────────────────────────────────── +model_capabilities: + vision: false + context_window: 196000 + max_output_tokens: 8000 + +# ── OpenAI provider pointing at Mantle ─────────────────────────────────────── +openai: + base_url: "https://bedrock-mantle.us-east-1.api.aws/v1" +``` + +### `fastagent.secrets.yaml` + +```yaml +openai: + api_key: "${BEDROCK_API_KEY}" +``` + +### `.env` + +```dotenv +BEDROCK_API_KEY=your-bedrock-long-term-api-key +``` + +--- + +## Health Checks + +### Startup preflight + +Pallas's `validate_llm_providers()` runs at startup and checks: + +| Provider | What is checked | +|---|---| +| `anthropic` | `GET {base_url}/v1/models/{model}` — confirms model exists and key is valid | +| `openai` | `GET {base_url}/models` — lists models, confirms configured model is present | +| `bedrock` | **No preflight check** — credential errors surface on the first inference call | + +For the `bedrock` provider, startup will succeed even with missing or invalid credentials. The first agent call will raise a `ProviderKeyError` with a message directing you to configure AWS credentials. + +### Runtime `get_health` tool + +The `get_health` MCP tool probes downstream MCP servers regardless of which LLM provider is active. LLM provider health (from the startup preflight) is included in the response for `anthropic` and `openai` providers. For `bedrock`, the LLM section of the health response will be absent. + +--- + +## Troubleshooting + +### `NoCredentialsError` / `ProviderKeyError: AWS credentials not found` + +The `bedrock` provider could not find AWS credentials. Check in order: + +1. Is `AWS_BEARER_TOKEN_BEDROCK` set in `.env` or the environment? +2. Is `~/.aws/credentials` present and does it contain the expected profile? +3. Is the IAM role attached to the instance/container? + +### Model not found in `ModelDatabase` + +``` +KeyError: 'anthropic.claude-sonnet-4-6' +``` + +Pallas requires `model_capabilities` in `fastagent.config.yaml` for any model not in fast-agent's built-in database. All Bedrock model IDs fall into this category. Add: + +```yaml +model_capabilities: + vision: true # or false + context_window: 1000000 + max_output_tokens: 64000 +``` + +### `ValidationError` on `default_model` + +The `default_model` format must be `provider.model-id`. Examples: + +```yaml +default_model: bedrock.us.anthropic.claude-sonnet-4-6 # Direct Bedrock, geo inference +default_model: bedrock.anthropic.claude-sonnet-4-6 # Direct Bedrock, in-region +default_model: anthropic.claude-opus-4-7 # Mantle via Anthropic provider +default_model: openai.minimax.minimax-m2.5 # Mantle via OpenAI provider +``` + +### Cross-region inference access denied + +If you use a geo inference ID (e.g. `us.anthropic.claude-sonnet-4-6`) and receive an access denied error, ensure the IAM policy includes `arn:aws:bedrock:*:*:inference-profile/*` in the `Resource` list. In-region model IDs do not require this. + +### Mantle 401 Unauthorized + +The Bedrock long-term API key is invalid or expired. Regenerate it from the [Bedrock console](https://console.aws.amazon.com/bedrock/home#/api-keys/long-term/create) and update `BEDROCK_API_KEY` in `.env`. + +### Claude Sonnet 4.6 on Mantle returns 404 + +Claude Sonnet 4.6 does not have a Mantle endpoint. Use the `bedrock` provider (Path 1) with model ID `anthropic.claude-sonnet-4-6` or the geo inference ID `us.anthropic.claude-sonnet-4-6`. diff --git a/pallas/server.py b/pallas/server.py index be88a8f..e25c1a5 100644 --- a/pallas/server.py +++ b/pallas/server.py @@ -123,33 +123,85 @@ def _preflight_mcp_servers(agent_name: str, servers: dict[str, dict]) -> None: # ── Model registration ──────────────────────────────────────────────────────── def _register_one_model(model_spec: str, capabilities: dict) -> None: - """Register a single model with fast-agent's ModelDatabase if unknown.""" + """Register a single model with fast-agent's ModelDatabase. + + Two cases: + + 1. **Unknown model** — if fast-agent has no built-in entry for this model, + register a minimal ``ModelParameters`` with the declared capabilities. + + 2. **Mantle-hosted model** (``capabilities.mantle: true``) — regardless of + whether the model has a built-in entry, install a provider-specific + override for ``(Provider.ANTHROPIC, model_name)`` in + ``_PROVIDER_MODEL_OVERRIDES`` that strips the features the AWS Bedrock + Mantle endpoint rejects: + + - ``anthropic_required_betas`` (no ``anthropic-beta`` header) + - ``reasoning`` / ``reasoning_effort_spec`` (no extended-thinking request) + - ``anthropic_task_budget_supported`` + - ``anthropic_web_fetch_version`` / ``anthropic_web_search_version`` + - ``cache_ttl`` (prompt caching is not advertised as supported on + Mantle for every model; disable the cache planner by default) + + Without this override fast-agent sends beta headers and ``thinking`` + parameters that Mantle rejects with a misleading ``"model does not + exist"`` 404. + """ from fast_agent.llm.model_database import ModelDatabase, ModelParameters + from fast_agent.llm.provider_types import Provider model_name = model_spec.split(".", 1)[-1] if "." in model_spec else model_spec - if ModelDatabase.get_model_params(model_name) is not None: - return - is_vision = capabilities.get("vision", False) context_window = capabilities.get("context_window", 131072) max_output_tokens = capabilities.get("max_output_tokens", 16384) + is_mantle = capabilities.get("mantle", False) - if is_vision: - tokenizes = list(ModelDatabase.QWEN_MULTIMODAL) - logger.info("Registered model '%s' with vision capabilities", model_name) + existing = ModelDatabase.get_model_params(model_name) + + if existing is None: + # Unknown model — register a fresh runtime entry. + if is_vision: + tokenizes = list(ModelDatabase.QWEN_MULTIMODAL) + logger.info("Registered model '%s' with vision capabilities", model_name) + else: + tokenizes = list(ModelDatabase.TEXT_ONLY) + logger.info("Registered model '%s' as text-only", model_name) + + ModelDatabase.register_runtime_model_params( + model_name, + ModelParameters( + context_window=context_window, + max_output_tokens=max_output_tokens, + tokenizes=tokenizes, + ), + ) + base_params = ModelDatabase.get_model_params(model_name) else: - tokenizes = list(ModelDatabase.TEXT_ONLY) - logger.info("Registered model '%s' as text-only", model_name) + base_params = existing + + if is_mantle and base_params is not None: + # Clone the base params and strip Mantle-incompatible features. + override = base_params.model_copy( + update={ + "context_window": context_window, + "max_output_tokens": max_output_tokens, + "anthropic_required_betas": None, + "reasoning": None, + "reasoning_effort_spec": None, + "anthropic_task_budget_supported": False, + "anthropic_web_fetch_version": None, + "anthropic_web_search_version": None, + "cache_ttl": None, + } + ) + normalized = ModelDatabase.normalize_model_name(model_name) + ModelDatabase._PROVIDER_MODEL_OVERRIDES[(Provider.ANTHROPIC, normalized)] = override + logger.info( + "Registered Mantle override for anthropic/'%s' (strips beta headers, thinking, web tools, caching)", + model_name, + ) - ModelDatabase.register_runtime_model_params( - model_name, - ModelParameters( - context_window=context_window, - max_output_tokens=max_output_tokens, - tokenizes=tokenizes, - ), - ) def _register_unknown_models(deployment_config: dict) -> None: