pallas/docs/bedrock.md

# AWS Bedrock Integration

Pallas supports AWS Bedrock through three integration paths, depending on the model and endpoint:

| Path | fast-agent provider | Auth | Use when |
|---|---|---|---|
| [Direct Bedrock](#path-1-direct-bedrock-converse-api) | `bedrock` | AWS IAM / long-term key | Any Bedrock model; required for Sonnet 4.6 |
| [Mantle → Anthropic](#path-2-mantle-anthropic-messages-api) | `anthropic` | Bedrock long-term API key | Claude models with Mantle support (Haiku 4.5, Opus 4.7) |
| [Mantle → OpenAI](#path-3-mantle-openai-chat-completions) | `openai` | Bedrock long-term API key | Non-Anthropic models on Mantle (MiniMax M2.5, etc.) |

**Mantle** is AWS's OpenAI-compatible and Anthropic-compatible gateway for Bedrock. It simplifies authentication (one long-term API key instead of IAM credential management) and is the recommended path when the target model supports it.

---

## Supported Models

| Model | Bedrock model ID | Direct Bedrock | Mantle |
|---|---|---|---|
| Claude Haiku 4.5 | `anthropic.claude-haiku-4-5-20251001-v1:0` | ✓ | ✓ (Anthropic Messages API) |
| Claude Sonnet 4.6 | `anthropic.claude-sonnet-4-6` | ✓ | ✗ |
| Claude Opus 4.7 | `anthropic.claude-opus-4-7` | ✓ | ✓ (Anthropic Messages API) |
| MiniMax M2.5 | `minimax.minimax-m2.5` | ✓ | ✓ (OpenAI Chat Completions) |

Cross-region inference IDs (e.g. `us.anthropic.claude-opus-4-7`, `eu.anthropic.claude-sonnet-4-6`) can be used as the model ID for the `bedrock` provider to route across regions within a geography for higher throughput.

---

## Path 1: Direct Bedrock (Converse API)

Fast-agent's `bedrock` provider calls the AWS Bedrock Converse API via `boto3`. This path works for all Bedrock models and is the only option for models without Mantle support (e.g. Claude Sonnet 4.6).

### Prerequisites

1. **Install `boto3`** — not included in fast-agent by default:

   ```toml
   # pyproject.toml
   dependencies = [
       "pallas-mcp @ git+ssh://git@git.helu.ca:22022/r/pallas.git",
       "boto3",
   ]
   ```

2. **AWS credentials** — the Bedrock provider uses the standard AWS credential chain in priority order:
   - `AWS_BEARER_TOKEN_BEDROCK` environment variable (long-term Bedrock API key — see below)
   - `AWS_ACCESS_KEY_ID` + `AWS_SECRET_ACCESS_KEY` environment variables
   - `~/.aws/credentials` file (named profile or `default`)
   - IAM instance role (EC2, ECS, Lambda)

   The simplest approach for a server deployment is a **long-term Bedrock API key** generated from the [Amazon Bedrock console](https://console.aws.amazon.com/bedrock/home#/api-keys/long-term/create). Set it as `AWS_BEARER_TOKEN_BEDROCK`.

3. **Enable model access** in the [Bedrock console](https://console.aws.amazon.com/bedrock/home#/modelaccess) for your target region.

### `fastagent.config.yaml`

```yaml
default_model: bedrock.us.anthropic.claude-sonnet-4-6

# ── Model Capabilities ──────────────────────────────────────────────────────
# Required: Bedrock model IDs are not in fast-agent's ModelDatabase.
model_capabilities:
  vision: true                  # true for Claude models (image input supported)
  context_window: 1000000       # 1M for Sonnet 4.6
  max_output_tokens: 64000

# ── Bedrock provider ─────────────────────────────────────────────────────────
bedrock:
  region: us-east-1             # or set AWS_REGION / AWS_DEFAULT_REGION
  profile: default              # optional; or set AWS_PROFILE
  reasoning: medium             # optional: minimal | low | medium | high
```

The `default_model` format is `bedrock.<model-id>`. Use a cross-region inference ID (e.g. `us.anthropic.claude-sonnet-4-6`) for geo-distributed routing, or the plain model ID (e.g. `anthropic.claude-sonnet-4-6`) for in-region only.

### `fastagent.secrets.yaml`

No API key entry is needed — credentials come from the AWS credential chain. If you are using a long-term Bedrock API key, set it in `.env` or the environment:

```yaml
# fastagent.secrets.yaml — nothing required for Bedrock credentials
# AWS credentials are read from environment variables or ~/.aws/credentials
```

### `.env`

```dotenv
# Long-term Bedrock API key (recommended for server deployments)
AWS_BEARER_TOKEN_BEDROCK=your-bedrock-api-key

# Or use IAM access keys
# AWS_ACCESS_KEY_ID=AKIA...
# AWS_SECRET_ACCESS_KEY=...

AWS_REGION=us-east-1
```

### `agents.yaml`

No Bedrock-specific changes are needed. The `default_model` in `fastagent.config.yaml` is picked up automatically:

```yaml
name: my-project
version: "1.0.0"
host: my-host.example.com
registry_port: 8200

agents:
  jarvis:
    module: agents.jarvis
    port: 8201
    title: Jarvis
    description: "My assistant"
```

To use a different Bedrock model for a specific agent, set `model` on the agent entry:

```yaml
agents:
  jarvis:
    module: agents.jarvis
    port: 8201
    model: bedrock.us.anthropic.claude-haiku-4-5-20251001-v1:0
    model_capabilities:
      vision: true
      context_window: 200000
      max_output_tokens: 64000
```

### Model capability reference

| Model | `vision` | `context_window` | `max_output_tokens` |
|---|---|---|---|
| Claude Haiku 4.5 | `true` | `200000` | `64000` |
| Claude Sonnet 4.6 | `true` | `1000000` | `64000` |
| Claude Opus 4.7 | `true` | `1000000` | `128000` |
| MiniMax M2.5 | `false` | `196000` | `8000` |

### IAM permissions

The IAM principal (user, role, or instance profile) needs:

```json
{
  "Effect": "Allow",
  "Action": [
    "bedrock:InvokeModel",
    "bedrock:InvokeModelWithResponseStream"
  ],
  "Resource": "arn:aws:bedrock:*::foundation-model/*"
}
```

For cross-region inference, also allow:

```json
{
  "Effect": "Allow",
  "Action": [
    "bedrock:InvokeModel",
    "bedrock:InvokeModelWithResponseStream"
  ],
  "Resource": "arn:aws:bedrock:*:*:inference-profile/*"
}
```

### Terraform snippet

```hcl
resource "aws_iam_policy" "bedrock_invoke" {
  name = "bedrock-invoke"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "bedrock:InvokeModel",
          "bedrock:InvokeModelWithResponseStream",
        ]
        Resource = [
          "arn:aws:bedrock:*::foundation-model/*",
          "arn:aws:bedrock:*:*:inference-profile/*",
        ]
      }
    ]
  })
}
```

---

## Path 2: Mantle — Anthropic Messages API

Mantle exposes the Anthropic Messages API for supported Claude models. Fast-agent's `anthropic` provider uses the Anthropic Python SDK (`AsyncAnthropic`), which calls `/v1/messages` — exactly what Mantle serves at `https://bedrock-mantle.{region}.api.aws/anthropic`.

**Supported models:** Claude Haiku 4.5, Claude Opus 4.7. Claude Sonnet 4.6 does **not** have a Mantle endpoint and must use [Path 1](#path-1-direct-bedrock-converse-api).

> **Note on Opus 4.7 and Chat Completions:** The AWS model card notes that Opus 4.7 does not support Chat Completions on Mantle. This does not affect fast-agent — the `anthropic` provider uses the Anthropic Messages API, not Chat Completions.

### Prerequisites

1. **Generate a long-term Bedrock API key** from the [Amazon Bedrock console](https://console.aws.amazon.com/bedrock/home#/api-keys/long-term/create).

2. **Enable model access** in the Bedrock console for your target region.

3. No additional Python packages needed — `anthropic` is already a fast-agent dependency.

### `fastagent.config.yaml`

```yaml
default_model: anthropic.claude-opus-4-7

# ── Anthropic provider pointing at Mantle ────────────────────────────────────
anthropic:
  base_url: "https://bedrock-mantle.us-east-1.api.aws/anthropic"
```

That's the whole configuration. Pallas auto-detects the
`bedrock-mantle` hostname in `anthropic.base_url` at startup and installs
two compatibility shims so fast-agent's default request shape matches
what Mantle expects (see `pallas/mantle_shims.py`):

1. **Wire-name prefix** — re-adds the `anthropic.` prefix that fast-agent's
   parser strips off, because Mantle requires the full
   `anthropic.claude-opus-4-7` wire id. Without this shim you get
   `404 "The model '...' does not exist"`.

2. **`caller: null` strip** — drops the stray `caller` field Anthropic
   SDK 0.100.x leaks onto replayed `tool_use` blocks (upstream issue
   [anthropics/anthropic-sdk-python#1454](https://github.com/anthropics/anthropic-sdk-python/issues/1454)).
   Mantle's validator rejects `caller: null` with `"tool_use.caller:
   Input should be a valid dictionary or object"`, which would otherwise
   break the MCP tool-use loop on the second turn.

The Anthropic SDK appends `/v1/messages` to `base_url` automatically.

**Feature support.** Mantle accepts the same Messages API request shape
as `api.anthropic.com` once the shims are in place, including full MCP
tool use (`tools`, `tool_use`/`tool_result` content blocks). Extended
thinking, task budget, web_fetch/web_search server tools, and explicit
prompt caching (`cache_control`) are not available via Mantle and should
be left off in agent code when targeting Mantle — fast-agent's
`ModelDatabase` entries already disable the ones the Anthropic SDK 0.100.x
would otherwise auto-attach.


### `fastagent.secrets.yaml`

```yaml
anthropic:
  api_key: "${BEDROCK_API_KEY}"
```

### `.env`

```dotenv
BEDROCK_API_KEY=your-bedrock-long-term-api-key
```

### `agents.yaml`

No Bedrock-specific changes needed. Example:

```yaml
name: my-project
version: "1.0.0"
host: my-host.example.com
registry_port: 8200

agents:
  jarvis:
    module: agents.jarvis
    port: 8201
    title: Jarvis
    description: "My assistant"
```

### IAM permissions

No IAM permissions are required when using a long-term Bedrock API key. The key itself carries the necessary access. If you need to restrict which models the key can invoke, use resource-based policies in the Bedrock console.

---

## Path 3: Mantle — OpenAI Chat Completions

Mantle exposes an OpenAI-compatible Chat Completions endpoint (`/v1`) for non-Anthropic models such as MiniMax M2.5. Fast-agent's `openai` provider (or `generic` provider) can point at this endpoint.

**Supported models:** MiniMax M2.5 (`minimax.minimax-m2.5`), and any other Bedrock model that Mantle exposes via Chat Completions.

### Prerequisites

1. **Generate a long-term Bedrock API key** from the [Amazon Bedrock console](https://console.aws.amazon.com/bedrock/home#/api-keys/long-term/create).

2. **Enable model access** in the Bedrock console for your target region.

### `fastagent.config.yaml`

```yaml
default_model: openai.minimax.minimax-m2.5

# ── Model Capabilities ──────────────────────────────────────────────────────
model_capabilities:
  vision: false
  context_window: 196000
  max_output_tokens: 8000

# ── OpenAI provider pointing at Mantle ───────────────────────────────────────
openai:
  base_url: "https://bedrock-mantle.us-east-1.api.aws/v1"
```

### `fastagent.secrets.yaml`

```yaml
openai:
  api_key: "${BEDROCK_API_KEY}"
```

### `.env`

```dotenv
BEDROCK_API_KEY=your-bedrock-long-term-api-key
```

---

## Health Checks

### Startup preflight

Pallas's `validate_llm_providers()` runs at startup and caches a status for the *active* provider (the one named by `default_model`). The cached value is read back by `get_health()` on every MCP `get_health` tool call, so Daedalus (or any headless consumer) can see *why* an agent is degraded when there's no fast-agent TUI to surface it.

Preflight probes are deliberately chosen to be **free of inference tokens**. Each provider has a dedicated probe:

| Provider | Probe |
|---|---|
| `anthropic` (direct — `api.anthropic.com` or empty `base_url`) | `GET {base_url}/models/{model}` — confirms model exists and the API key is valid |
| `anthropic` (Mantle — `bedrock-mantle.{region}.api.aws/anthropic`) | `GET {region_root}/v1/models/{wire_model}` — Mantle serves its model catalogue at the **region root**, not under `/anthropic`; Pallas strips the `/anthropic` suffix and applies `pallas.mantle_shims.MANTLE_WIRE_NAMES` to turn `claude-opus-4-7` into `anthropic.claude-opus-4-7`. The IAM policy for the long-term Bedrock API key must include `bedrock-mantle:ListModels` / `bedrock-mantle:GetModel` for this probe to return 200. |
| `openai` | `GET {base_url}/models` — lists models, confirms configured model is present |
| `generic` | `GET {base_url}/models` — status-code-only probe (body is not inspected). llama.cpp's `/v1/models` response isn't strictly OpenAI-shaped and users hot-swap models by name, so a 200 is enough |
| `bedrock` | **No HTTP request.** `ok` when any of `AWS_BEARER_TOKEN_BEDROCK`, `AWS_ACCESS_KEY_ID`+`AWS_SECRET_ACCESS_KEY`, `AWS_PROFILE`, or `~/.aws/credentials` is present; `error` otherwise. Bedrock's Converse API has no cheap health endpoint and the first inference call will surface any real credential problem within seconds |
| Unknown / malformed provider | No HTTP request; `error: unknown provider 'X' in default_model`. Prevents silent "looks degraded" lies when `default_model` is mistyped |

API key resolution for every provider goes through `fast_agent.llm.provider_key_manager.ProviderKeyManager.get_api_key`, so the preflight reads keys from the exact same place the real LLM client does — config file, env var, Codex OAuth, HF hub, etc. Duplicate key-loading logic inside `pallas.health` has been removed.

### Runtime `get_health` tool

The `get_health` MCP tool probes downstream MCP servers on every call and includes the cached LLM preflight status in the response. If the active provider's cached status isn't `ok`, `get_health` returns `status: degraded` with an `LLM: <provider>: <message>` prefix appended to the `message` field.

---

## Troubleshooting

### `NoCredentialsError` / `ProviderKeyError: AWS credentials not found`

The `bedrock` provider could not find AWS credentials. Check in order:

1. Is `AWS_BEARER_TOKEN_BEDROCK` set in `.env` or the environment?
2. Is `~/.aws/credentials` present and does it contain the expected profile?
3. Is the IAM role attached to the instance/container?

### Model not found in `ModelDatabase`

```
KeyError: 'anthropic.claude-sonnet-4-6'
```

Pallas requires `model_capabilities` in `fastagent.config.yaml` for any model not in fast-agent's built-in database. All Bedrock model IDs fall into this category. Add:

```yaml
model_capabilities:
  vision: true          # or false
  context_window: 1000000
  max_output_tokens: 64000
```

### `ValidationError` on `default_model`

The `default_model` format must be `provider.model-id`. Examples:

```yaml
default_model: bedrock.us.anthropic.claude-sonnet-4-6   # Direct Bedrock, geo inference
default_model: bedrock.anthropic.claude-sonnet-4-6       # Direct Bedrock, in-region
default_model: anthropic.claude-opus-4-7                 # Mantle via Anthropic provider
default_model: openai.minimax.minimax-m2.5               # Mantle via OpenAI provider
```

### Cross-region inference access denied

If you use a geo inference ID (e.g. `us.anthropic.claude-sonnet-4-6`) and receive an access denied error, ensure the IAM policy includes `arn:aws:bedrock:*:*:inference-profile/*` in the `Resource` list. In-region model IDs do not require this.

### Mantle 401 Unauthorized

The Bedrock long-term API key is invalid or expired. Regenerate it from the [Bedrock console](https://console.aws.amazon.com/bedrock/home#/api-keys/long-term/create) and update `BEDROCK_API_KEY` in `.env`.

### Claude Sonnet 4.6 on Mantle returns 404

Claude Sonnet 4.6 does not have a Mantle endpoint. Use the `bedrock` provider (Path 1) with model ID `anthropic.claude-sonnet-4-6` or the geo inference ID `us.anthropic.claude-sonnet-4-6`.