docs(pallas): expand LLM preflight docs and refactor health probes

2026-05-12 15:04:57 -04:00
parent 75d529cf16
commit df40d32d80
4 changed files with 3574 additions and 104 deletions
--- a/docs/bedrock.md
+++ b/docs/bedrock.md
@@ -329,19 +329,24 @@ BEDROCK_API_KEY=your-bedrock-long-term-api-key

 ### Startup preflight

-Pallas's `validate_llm_providers()` runs at startup and checks:
+Pallas's `validate_llm_providers()` runs at startup and caches a status for the *active* provider (the one named by `default_model`). The cached value is read back by `get_health()` on every MCP `get_health` tool call, so Daedalus (or any headless consumer) can see *why* an agent is degraded when there's no fast-agent TUI to surface it.

-| Provider | What is checked |
+Preflight probes are deliberately chosen to be **free of inference tokens**. Each provider has a dedicated probe:
+
+| Provider | Probe |
 |---|---|
-| `anthropic` | `GET {base_url}/v1/models/{model}` — confirms model exists and key is valid |
+| `anthropic` (direct — `api.anthropic.com` or empty `base_url`) | `GET {base_url}/models/{model}` — confirms model exists and the API key is valid |
+| `anthropic` (Mantle — `bedrock-mantle.{region}.api.aws/anthropic`) | `GET {region_root}/v1/models/{wire_model}` — Mantle serves its model catalogue at the **region root**, not under `/anthropic`; Pallas strips the `/anthropic` suffix and applies `pallas.mantle_shims.MANTLE_WIRE_NAMES` to turn `claude-opus-4-7` into `anthropic.claude-opus-4-7`. The IAM policy for the long-term Bedrock API key must include `bedrock-mantle:ListModels` / `bedrock-mantle:GetModel` for this probe to return 200. |
 | `openai` | `GET {base_url}/models` — lists models, confirms configured model is present |
-| `bedrock` | **No preflight check** — credential errors surface on the first inference call |
+| `generic` | `GET {base_url}/models` — status-code-only probe (body is not inspected). llama.cpp's `/v1/models` response isn't strictly OpenAI-shaped and users hot-swap models by name, so a 200 is enough |
+| `bedrock` | **No HTTP request.** `ok` when any of `AWS_BEARER_TOKEN_BEDROCK`, `AWS_ACCESS_KEY_ID`+`AWS_SECRET_ACCESS_KEY`, `AWS_PROFILE`, or `~/.aws/credentials` is present; `error` otherwise. Bedrock's Converse API has no cheap health endpoint and the first inference call will surface any real credential problem within seconds |
+| Unknown / malformed provider | No HTTP request; `error: unknown provider 'X' in default_model`. Prevents silent "looks degraded" lies when `default_model` is mistyped |

-For the `bedrock` provider, startup will succeed even with missing or invalid credentials. The first agent call will raise a `ProviderKeyError` with a message directing you to configure AWS credentials.
+API key resolution for every provider goes through `fast_agent.llm.provider_key_manager.ProviderKeyManager.get_api_key`, so the preflight reads keys from the exact same place the real LLM client does — config file, env var, Codex OAuth, HF hub, etc. Duplicate key-loading logic inside `pallas.health` has been removed.

 ### Runtime `get_health` tool

-The `get_health` MCP tool probes downstream MCP servers regardless of which LLM provider is active. LLM provider health (from the startup preflight) is included in the response for `anthropic` and `openai` providers. For `bedrock`, the LLM section of the health response will be absent.
+The `get_health` MCP tool probes downstream MCP servers on every call and includes the cached LLM preflight status in the response. If the active provider's cached status isn't `ok`, `get_health` returns `status: degraded` with an `LLM: <provider>: <message>` prefix appended to the `message` field.

 ---