docs(pallas): expand LLM preflight docs and refactor health probes

This commit is contained in:
2026-05-12 15:04:57 -04:00
parent 75d529cf16
commit df40d32d80
4 changed files with 3574 additions and 104 deletions

View File

@@ -329,19 +329,24 @@ BEDROCK_API_KEY=your-bedrock-long-term-api-key
### Startup preflight ### Startup preflight
Pallas's `validate_llm_providers()` runs at startup and checks: Pallas's `validate_llm_providers()` runs at startup and caches a status for the *active* provider (the one named by `default_model`). The cached value is read back by `get_health()` on every MCP `get_health` tool call, so Daedalus (or any headless consumer) can see *why* an agent is degraded when there's no fast-agent TUI to surface it.
| Provider | What is checked | Preflight probes are deliberately chosen to be **free of inference tokens**. Each provider has a dedicated probe:
| Provider | Probe |
|---|---| |---|---|
| `anthropic` | `GET {base_url}/v1/models/{model}` — confirms model exists and key is valid | | `anthropic` (direct — `api.anthropic.com` or empty `base_url`) | `GET {base_url}/models/{model}` — confirms model exists and the API key is valid |
| `anthropic` (Mantle — `bedrock-mantle.{region}.api.aws/anthropic`) | `GET {region_root}/v1/models/{wire_model}` — Mantle serves its model catalogue at the **region root**, not under `/anthropic`; Pallas strips the `/anthropic` suffix and applies `pallas.mantle_shims.MANTLE_WIRE_NAMES` to turn `claude-opus-4-7` into `anthropic.claude-opus-4-7`. The IAM policy for the long-term Bedrock API key must include `bedrock-mantle:ListModels` / `bedrock-mantle:GetModel` for this probe to return 200. |
| `openai` | `GET {base_url}/models` — lists models, confirms configured model is present | | `openai` | `GET {base_url}/models` — lists models, confirms configured model is present |
| `bedrock` | **No preflight check** — credential errors surface on the first inference call | | `generic` | `GET {base_url}/models` — status-code-only probe (body is not inspected). llama.cpp's `/v1/models` response isn't strictly OpenAI-shaped and users hot-swap models by name, so a 200 is enough |
| `bedrock` | **No HTTP request.** `ok` when any of `AWS_BEARER_TOKEN_BEDROCK`, `AWS_ACCESS_KEY_ID`+`AWS_SECRET_ACCESS_KEY`, `AWS_PROFILE`, or `~/.aws/credentials` is present; `error` otherwise. Bedrock's Converse API has no cheap health endpoint and the first inference call will surface any real credential problem within seconds |
| Unknown / malformed provider | No HTTP request; `error: unknown provider 'X' in default_model`. Prevents silent "looks degraded" lies when `default_model` is mistyped |
For the `bedrock` provider, startup will succeed even with missing or invalid credentials. The first agent call will raise a `ProviderKeyError` with a message directing you to configure AWS credentials. API key resolution for every provider goes through `fast_agent.llm.provider_key_manager.ProviderKeyManager.get_api_key`, so the preflight reads keys from the exact same place the real LLM client does — config file, env var, Codex OAuth, HF hub, etc. Duplicate key-loading logic inside `pallas.health` has been removed.
### Runtime `get_health` tool ### Runtime `get_health` tool
The `get_health` MCP tool probes downstream MCP servers regardless of which LLM provider is active. LLM provider health (from the startup preflight) is included in the response for `anthropic` and `openai` providers. For `bedrock`, the LLM section of the health response will be absent. The `get_health` MCP tool probes downstream MCP servers on every call and includes the cached LLM preflight status in the response. If the active provider's cached status isn't `ok`, `get_health` returns `status: degraded` with an `LLM: <provider>: <message>` prefix appended to the `message` field.
--- ---

View File

@@ -1,8 +1,32 @@
""" """
Health check module for Pallas. Health check module for Pallas.
Probes downstream MCP server connectivity and exposes a get_health MCP tool. Probes downstream MCP server connectivity and exposes a ``get_health`` MCP
Validates LLM provider API keys and model availability at startup. tool. At startup, :func:`validate_llm_providers` runs a cheap, per-provider
preflight so Daedalus (or any headless consumer) can see *why* an agent
might be degraded when there is no fast-agent TUI to surface it — see
``docs/pallas_integration.md`` § Runtime get_health tool.
Preflight dispatch matrix:
========================== ======================================== =======================================
Provider Probe Success criterion
========================== ======================================== =======================================
``anthropic`` (direct) ``GET {base_url}/models/{model}`` HTTP 200
``anthropic`` (Mantle) ``GET {mantle_root}/v1/models/{wire}`` HTTP 200 (wire-name via mantle_shims)
``openai`` ``GET {base_url}/models`` HTTP 200 and active model in list
``generic`` ``GET {base_url}/models`` HTTP 200 (body not inspected)
``bedrock`` none ok if AWS creds resolvable
unknown provider none error — surfaces honestly to Daedalus
========================== ======================================== =======================================
API keys are resolved via :class:`fast_agent.llm.provider_key_manager.ProviderKeyManager`
so this module sees identical secret-loading behaviour to the real LLM
client path. We never duplicate key-resolution logic here.
Endpoints that cost inference tokens are deliberately avoided. Mantle's
anthropic probe uses the (token-free) model catalogue at the region root
(``/v1/models/{wire}``), *not* ``POST /anthropic/v1/messages``.
""" """
import asyncio import asyncio
@@ -12,6 +36,7 @@ import os
import re import re
from datetime import datetime, timezone from datetime import datetime, timezone
from pathlib import Path from pathlib import Path
from typing import Any
import httpx import httpx
import yaml import yaml
@@ -35,16 +60,20 @@ def _load_deployment_name() -> str:
_DEPLOY_NAME = _load_deployment_name() _DEPLOY_NAME = _load_deployment_name()
# ── Provider API endpoints ─────────────────────────────────────────────────── # ── Default endpoints (only used when the provider section is missing) ───────
_ANTHROPIC_API = "https://api.anthropic.com/v1" _ANTHROPIC_DEFAULT_API = "https://api.anthropic.com/v1"
_OPENAI_DEFAULT_API = "https://api.openai.com/v1" _OPENAI_DEFAULT_API = "https://api.openai.com/v1"
_GENERIC_DEFAULT_API = "http://localhost:11434/v1"
# Populated by validate_llm_providers() at startup, read by get_health() # Populated by validate_llm_providers() at startup, read by get_health()
_llm_status: dict[str, dict] = {} _llm_status: dict[str, dict] = {}
_active_provider: str = "" _active_provider: str = ""
# ── Config loading ───────────────────────────────────────────────────────────
def _load_dotenv() -> None: def _load_dotenv() -> None:
"""Load .env file into os.environ (without overwriting existing vars).""" """Load .env file into os.environ (without overwriting existing vars)."""
env_path = _config_root() / ".env" env_path = _config_root() / ".env"
@@ -70,6 +99,17 @@ def _expand_env(value: str) -> str:
) )
def _expand_env_in_tree(obj: Any) -> Any:
"""Recursively expand ${ENV_VAR} placeholders inside a parsed YAML tree."""
if isinstance(obj, str):
return _expand_env(obj)
if isinstance(obj, dict):
return {k: _expand_env_in_tree(v) for k, v in obj.items()}
if isinstance(obj, list):
return [_expand_env_in_tree(v) for v in obj]
return obj
def _load_config() -> tuple[dict, dict]: def _load_config() -> tuple[dict, dict]:
"""Load fastagent config and secrets YAML from the working directory.""" """Load fastagent config and secrets YAML from the working directory."""
root = _config_root() root = _config_root()
@@ -79,11 +119,40 @@ def _load_config() -> tuple[dict, dict]:
return config, secrets return config, secrets
async def _check_anthropic(client: httpx.AsyncClient, api_key: str, model_id: str) -> str | None: def _merge_for_key_manager(config: dict, secrets: dict) -> dict:
"""Validate an Anthropic model. Returns None on success, error message on failure.""" """Produce the merged, env-expanded dict that ProviderKeyManager expects.
``ProviderKeyManager.get_config_file_key`` looks for ``<provider>.api_key``
in a single flat dict. It does not apply ``${ENV_VAR}`` expansion itself,
so we pre-expand the whole tree.
"""
merged: dict = {}
for source in (config or {}, secrets or {}):
for provider_name, settings in (source or {}).items():
if isinstance(settings, dict):
target = merged.setdefault(provider_name, {})
if isinstance(target, dict):
target.update(settings)
return _expand_env_in_tree(merged)
# ── Per-provider probes ──────────────────────────────────────────────────────
async def _check_anthropic(
client: httpx.AsyncClient, api_key: str, model_id: str, base_url: str
) -> str | None:
"""Validate an Anthropic model via ``GET {base_url}/models/{model_id}``.
Works for both the public ``api.anthropic.com`` endpoint and for AWS
Bedrock Mantle's region-root model catalogue
(``https://bedrock-mantle.{region}.api.aws/v1/models/{wire_id}``).
The caller is responsible for passing the correct ``base_url`` and the
wire-name form of ``model_id`` for Mantle — see ``pallas.mantle_shims``.
"""
try: try:
resp = await client.get( resp = await client.get(
f"{_ANTHROPIC_API}/models/{model_id}", f"{base_url.rstrip('/')}/models/{model_id}",
headers={ headers={
"x-api-key": api_key, "x-api-key": api_key,
"anthropic-version": "2023-06-01", "anthropic-version": "2023-06-01",
@@ -98,24 +167,6 @@ async def _check_anthropic(client: httpx.AsyncClient, api_key: str, model_id: st
return f"API request failed ({resp.status_code})" return f"API request failed ({resp.status_code})"
async def _check_openai(
client: httpx.AsyncClient, api_key: str, model_id: str, base_url: str
) -> str | None:
"""Validate an OpenAI-compatible model. Returns None on success, error message on failure."""
try:
resp = await client.get(
f"{base_url.rstrip('/')}/models/{model_id}",
headers={"Authorization": f"Bearer {api_key}"},
)
except Exception as exc:
return f"API unreachable ({type(exc).__name__})"
if resp.status_code == 200:
return None
if resp.status_code == 404:
return f"model '{model_id}' not found"
return f"API request failed ({resp.status_code})"
async def _list_openai_models( async def _list_openai_models(
client: httpx.AsyncClient, api_key: str, base_url: str client: httpx.AsyncClient, api_key: str, base_url: str
) -> tuple[str | None, list[str]]: ) -> tuple[str | None, list[str]]:
@@ -129,96 +180,255 @@ async def _list_openai_models(
return f"API unreachable ({type(exc).__name__})", [] return f"API unreachable ({type(exc).__name__})", []
if resp.status_code != 200: if resp.status_code != 200:
return f"API request failed ({resp.status_code})", [] return f"API request failed ({resp.status_code})", []
try:
data = resp.json() data = resp.json()
models = [m["id"] for m in data.get("data", []) if "id" in m] except Exception:
return "response was not valid JSON", []
models = [m["id"] for m in data.get("data", []) if isinstance(m, dict) and "id" in m]
return None, models return None, models
async def _check_generic(
client: httpx.AsyncClient, base_url: str
) -> str | None:
"""Status-code-only probe against ``{base_url}/models``.
The generic provider targets local/on-prem OpenAI-compatible servers
(llama.cpp, Ollama, vLLM, …) whose ``/v1/models`` payloads are not all
identical — llama.cpp mixes an Ollama-style ``models`` list with the
OpenAI ``data`` list, for example. We deliberately don't require the
configured model name to appear in the response because users hot-swap
models by name all the time; as long as the server is up and returns
200 for its catalogue we call it ok.
"""
try:
resp = await client.get(f"{base_url.rstrip('/')}/models")
except Exception as exc:
return f"API unreachable ({type(exc).__name__})"
if resp.status_code == 200:
return None
return f"API request failed ({resp.status_code})"
# ── Mantle helpers ───────────────────────────────────────────────────────────
def _mantle_root_from_anthropic_base(base: str) -> str:
"""Return the region root for a Mantle anthropic base_url.
Mantle publishes its inference path under ``/anthropic`` but the
catalogue (``GET /v1/models/{wire_id}``) lives at the region root.
Example:
``https://bedrock-mantle.us-east-1.api.aws/anthropic``
→ ``https://bedrock-mantle.us-east-1.api.aws``
Any other trailing paths are returned untouched.
"""
stripped = base.rstrip("/")
if stripped.endswith("/anthropic"):
return stripped[: -len("/anthropic")]
return stripped
# ── Preflight orchestration ──────────────────────────────────────────────────
async def _preflight_anthropic(
client: httpx.AsyncClient, config: dict, secrets: dict, active_model: str
) -> dict:
from fast_agent.core.exceptions import ProviderKeyError
from fast_agent.llm.provider_key_manager import ProviderKeyManager
from pallas.mantle_shims import MANTLE_WIRE_NAMES, is_mantle_base_url
merged = _merge_for_key_manager(config, secrets)
try:
api_key = ProviderKeyManager.get_api_key("anthropic", merged)
except ProviderKeyError as exc:
return {"status": "error", "message": str(exc)}
anthropic_base = _expand_env(
(config.get("anthropic", {}) or {}).get("base_url", "")
) or os.environ.get("ANTHROPIC_BASE_URL", "") or _ANTHROPIC_DEFAULT_API
if is_mantle_base_url(anthropic_base):
# Mantle hosts the model catalogue at the region root, not under
# /anthropic. Wire-name translation (claude-opus-4-7 →
# anthropic.claude-opus-4-7) keeps us consistent with mantle_shims.
probe_base = f"{_mantle_root_from_anthropic_base(anthropic_base)}/v1"
wire_id = MANTLE_WIRE_NAMES.get(active_model, active_model)
err = await _check_anthropic(client, api_key, wire_id, probe_base)
if err:
logger.warning("anthropic (mantle, %s): %s", anthropic_base, err)
return {"status": "error", "model": wire_id, "message": err}
logger.info("anthropic (mantle, %s): %s ready", anthropic_base, wire_id)
return {"status": "ok", "model": wire_id}
err = await _check_anthropic(client, api_key, active_model, anthropic_base)
if err:
logger.warning("anthropic (%s): %s", anthropic_base, err)
return {"status": "error", "model": active_model, "message": err}
logger.info("anthropic (%s): %s ready", anthropic_base, active_model)
return {"status": "ok", "model": active_model}
async def _preflight_openai(
client: httpx.AsyncClient, config: dict, secrets: dict, active_model: str
) -> dict:
from fast_agent.core.exceptions import ProviderKeyError
from fast_agent.llm.provider_key_manager import ProviderKeyManager
merged = _merge_for_key_manager(config, secrets)
try:
api_key = ProviderKeyManager.get_api_key("openai", merged)
except ProviderKeyError as exc:
return {"status": "error", "message": str(exc)}
openai_base = _expand_env(
(config.get("openai", {}) or {}).get("base_url", "")
) or os.environ.get("OPENAI_BASE_URL", "") or _OPENAI_DEFAULT_API
err, models = await _list_openai_models(client, api_key, openai_base)
if err:
logger.warning("openai (%s): %s", openai_base, err)
return {"status": "error", "message": err}
if active_model and active_model not in models:
label = ", ".join(models) if models else "none"
msg = f"model '{active_model}' not found (available: {label})"
logger.warning("openai (%s): %s", openai_base, msg)
return {"status": "error", "model": active_model, "message": msg}
logger.info("openai (%s): %s ready", openai_base, active_model or "(any)")
return {"status": "ok", "model": active_model, "models": models}
async def _preflight_generic(
client: httpx.AsyncClient, config: dict, secrets: dict, active_model: str
) -> dict:
# generic has a synthetic "ollama" key via ProviderKeyManager, so there's
# nothing to authenticate against — we just need the endpoint.
generic_base = _expand_env(
(config.get("generic", {}) or {}).get("base_url", "")
) or os.environ.get("GENERIC_BASE_URL", "") or _GENERIC_DEFAULT_API
err = await _check_generic(client, generic_base)
if err:
logger.warning("generic (%s): %s", generic_base, err)
return {"status": "error", "model": active_model, "message": err}
logger.info("generic (%s): %s ready", generic_base, active_model or "(any)")
return {"status": "ok", "model": active_model}
def _preflight_bedrock(config: dict, secrets: dict, active_model: str) -> dict:
"""Bedrock uses the AWS credential chain — no outbound HTTP here.
We report ``ok`` whenever any of the usual credential sources is present
(long-term bedrock key, explicit access key pair, or a nonempty AWS
profile). If nothing is set we mark it degraded so Daedalus shows the
operator *why* the first real request will fail; we don't actually call
STS or Bedrock ourselves.
"""
have_bearer = bool(os.environ.get("AWS_BEARER_TOKEN_BEDROCK"))
have_access_key = bool(os.environ.get("AWS_ACCESS_KEY_ID")) and bool(
os.environ.get("AWS_SECRET_ACCESS_KEY")
)
have_profile = bool(os.environ.get("AWS_PROFILE"))
creds_path = Path.home() / ".aws" / "credentials"
have_file = creds_path.exists()
if have_bearer or have_access_key or have_profile or have_file:
logger.info("bedrock: credentials resolvable (no preflight request issued)")
return {"status": "ok", "model": active_model}
msg = "no AWS credentials found (set AWS_BEARER_TOKEN_BEDROCK or configure AWS CLI)"
logger.warning("bedrock: %s", msg)
return {"status": "error", "model": active_model, "message": msg}
async def validate_llm_providers(timeout: float = 5.0) -> dict[str, dict]: async def validate_llm_providers(timeout: float = 5.0) -> dict[str, dict]:
""" """
Validate configured LLM provider API keys and model availability. Validate the configured LLM provider and populate the module-level cache
read by :func:`get_health` on every MCP ``get_health`` tool call.
Reads fastagent.config.yaml for default_model and fastagent.secrets.yaml Only the *active* provider (the one named by ``default_model``) is
for API keys. Checks all providers that have keys configured. preflighted — that's the one whose failure would actually break the
agent, and it keeps the startup surface small. Other provider sections
are ignored here even if they're configured.
Returns a dict keyed by provider name with validation results. Returns a dict keyed by provider name with validation results. Shape:
.. code-block:: python
{"anthropic": {"status": "ok", "model": "anthropic.claude-opus-4-7"}}
{"generic": {"status": "ok", "model": "Qwen3.5-..."}}
{"openai": {"status": "error", "message": "API request failed (401)"}}
{"unknown": {"status": "error", "message": "unknown provider 'foo'"}}
""" """
global _active_provider
_load_dotenv() _load_dotenv()
config, secrets = _load_config() config, secrets = _load_config()
default_model = config.get("default_model", "") default_model = config.get("default_model", "") or ""
# Parse provider and model from "provider.model-name" format if "." not in default_model:
active_provider = default_model.split(".")[0] if "." in default_model else "" msg = (
active_model = default_model.split(".", 1)[1] if "." in default_model else default_model f"default_model '{default_model}' is missing a provider prefix "
"(expected '<provider>.<model>')"
# Resolve API keys from secrets (expanding ${ENV_VAR} references), falling
# back to env vars directly so that .env alone is sufficient.
anthropic_key = _expand_env(secrets.get("anthropic", {}).get("api_key", "")) or os.environ.get("ANTHROPIC_API_KEY", "")
openai_key = _expand_env(secrets.get("openai", {}).get("api_key", "")) or os.environ.get("OPENAI_API_KEY", "")
openai_base = (
_expand_env(secrets.get("openai", {}).get("base_url", ""))
or config.get("openai", {}).get("base_url", "")
or os.environ.get("OPENAI_BASE_URL", "")
or _OPENAI_DEFAULT_API
) )
logger.warning(msg)
results = {"unknown": {"status": "error", "message": msg}}
_llm_status.clear()
_llm_status.update(results)
_active_provider = "unknown"
return results
active_provider, active_model = default_model.split(".", 1)
results: dict[str, dict] = {} results: dict[str, dict] = {}
async with httpx.AsyncClient(timeout=timeout) as client: async with httpx.AsyncClient(timeout=timeout) as client:
# ── Anthropic ──────────────────────────────────────────────────── if active_provider == "anthropic":
if anthropic_key: results["anthropic"] = await _preflight_anthropic(
model_id = active_model if active_provider == "anthropic" else None client, config, secrets, active_model
if model_id: )
err = await _check_anthropic(client, anthropic_key, model_id)
if err:
results["anthropic"] = {"status": "error", "model": model_id, "message": err}
logger.warning("anthropic: %s", err)
else:
results["anthropic"] = {"status": "ok", "model": model_id}
logger.info("anthropic: %s ready", model_id)
else:
# Key is set but Anthropic isn't the active provider — just verify API access
err = await _check_anthropic(client, anthropic_key, "claude-sonnet-4-5")
if err and "not found" not in err:
results["anthropic"] = {"status": "error", "message": err}
logger.warning("anthropic: %s", err)
else:
results["anthropic"] = {"status": "ok"}
logger.info("anthropic: API key valid")
elif active_provider == "anthropic":
results["anthropic"] = {"status": "error", "message": "API key not configured"}
logger.warning("anthropic: API key not configured")
# ── OpenAI ───────────────────────────────────────────────────────
if openai_key:
model_id = active_model if active_provider == "openai" else None
err, models = await _list_openai_models(client, openai_key, openai_base)
if err:
results["openai"] = {"status": "error", "message": err}
logger.warning("openai (%s): %s", openai_base, err)
elif model_id:
if model_id in models:
results["openai"] = {"status": "ok", "model": model_id}
logger.info("openai (%s): %s ready", openai_base, model_id)
else:
label = ", ".join(models) if models else "none"
results["openai"] = {"status": "error", "model": model_id, "message": f"model '{model_id}' not found (available: {label})"}
logger.warning("openai (%s): model '%s' not found (available: %s)", openai_base, model_id, label)
else:
results["openai"] = {"status": "ok", "models": models}
label = ", ".join(models) if models else "no models loaded"
logger.info("openai (%s): %s", openai_base, label)
elif active_provider == "openai": elif active_provider == "openai":
results["openai"] = {"status": "error", "message": "API key not configured"} results["openai"] = await _preflight_openai(
logger.warning("openai: API key not configured") client, config, secrets, active_model
)
elif active_provider == "generic":
results["generic"] = await _preflight_generic(
client, config, secrets, active_model
)
elif active_provider == "bedrock":
results["bedrock"] = _preflight_bedrock(config, secrets, active_model)
else:
# Known to fast-agent? Surface that gap explicitly rather than
# silently reporting "error" from an empty dict lookup later.
try:
from fast_agent.llm.provider_types import Provider
Provider(active_provider) # raises ValueError if unknown
msg = (
f"preflight for provider '{active_provider}' is not "
"implemented in pallas.health; LLM health will be "
"validated on first inference call"
)
logger.info("%s: %s", active_provider, msg)
results[active_provider] = {
"status": "ok",
"model": active_model,
"message": msg,
}
except ValueError:
msg = f"unknown provider '{active_provider}' in default_model"
logger.warning(msg)
results[active_provider] = {"status": "error", "message": msg}
_llm_status.clear() _llm_status.clear()
_llm_status.update(results) _llm_status.update(results)
global _active_provider
_active_provider = active_provider _active_provider = active_provider
return results return results
# ── Downstream MCP server probing ────────────────────────────────────────────
async def check_downstream_health( async def check_downstream_health(
servers: dict[str, dict], timeout: float = 3.0 servers: dict[str, dict], timeout: float = 3.0
) -> dict: ) -> dict:
@@ -313,9 +523,24 @@ def register_health_tool(mcp_server, servers: dict[str, dict]) -> None:
async def get_health() -> str: async def get_health() -> str:
result = await check_downstream_health(servers) result = await check_downstream_health(servers)
# Include LLM provider status from startup preflight (active provider only) # Include LLM provider status from startup preflight (active provider only)
active = _llm_status.get(_active_provider, {}) if _active_provider:
if active.get("status") != "ok" and _active_provider: active = _llm_status.get(_active_provider)
err_msg = f"LLM: {_active_provider}: {active.get('message', 'error')}" if active is None:
# Should be unreachable after the rewrite (validate_llm_providers
# always populates _llm_status for _active_provider). Keep a
# belt-and-braces path so a future refactor can't regress into
# silently reporting "error".
err_msg = (
f"LLM: {_active_provider}: provider not preflighted"
)
result["status"] = "degraded"
existing = result.get("message", "")
result["message"] = f"{existing}; {err_msg}" if existing else err_msg
elif active.get("status") != "ok":
err_msg = (
f"LLM: {_active_provider}: "
f"{active.get('message', 'unknown error')}"
)
result["status"] = "degraded" result["status"] = "degraded"
existing = result.get("message", "") existing = result.get("message", "")
result["message"] = f"{existing}; {err_msg}" if existing else err_msg result["message"] = f"{existing}; {err_msg}" if existing else err_msg

522
tests/test_health.py Normal file
View File

@@ -0,0 +1,522 @@
"""Tests for pallas.health — per-provider preflight dispatch.
Covers the matrix documented in ``pallas/pallas/health.py``:
- ``anthropic`` (direct, Mantle)
- ``openai``
- ``generic``
- ``bedrock`` (presence-only, no HTTP)
- unknown / malformed provider name
All HTTP is faked with ``httpx.MockTransport`` so nothing touches the network.
Tests use ``asyncio.run`` directly to match the existing convention in
``tests/test_mantle_shims.py`` (pallas has no pytest-asyncio dependency).
"""
from __future__ import annotations
import asyncio
from pathlib import Path
import httpx
import pytest
from pallas import health
def _run(coro):
return asyncio.run(coro)
# ── Helpers ──────────────────────────────────────────────────────────────────
def _patch_httpx(monkeypatch: pytest.MonkeyPatch, handler) -> None:
"""Replace ``health.httpx.AsyncClient`` so validate_llm_providers uses the mock."""
original_client = httpx.AsyncClient
def patched_client(*args, **kwargs):
kwargs["transport"] = httpx.MockTransport(handler)
return original_client(*args, **kwargs)
monkeypatch.setattr(health.httpx, "AsyncClient", patched_client)
def _patch_httpx_raising(monkeypatch: pytest.MonkeyPatch) -> None:
"""Install a transport that raises on any request — used to prove that
bedrock / unknown paths make no HTTP call at all."""
def handler(request: httpx.Request) -> httpx.Response:
raise AssertionError(
f"no HTTP call should be made, but got {request.method} {request.url}"
)
_patch_httpx(monkeypatch, handler)
@pytest.fixture
def workspace(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> Path:
"""Chdir into a clean temp workspace and isolate env variables.
``validate_llm_providers`` reads ``fastagent.config.yaml`` /
``fastagent.secrets.yaml`` from cwd and also consults env vars for
fallback; each test starts with a clean slate.
"""
monkeypatch.chdir(tmp_path)
for var in (
"ANTHROPIC_API_KEY",
"ANTHROPIC_BASE_URL",
"OPENAI_API_KEY",
"OPENAI_BASE_URL",
"GENERIC_API_KEY",
"GENERIC_BASE_URL",
"AWS_BEARER_TOKEN_BEDROCK",
"AWS_ACCESS_KEY_ID",
"AWS_SECRET_ACCESS_KEY",
"AWS_PROFILE",
):
monkeypatch.delenv(var, raising=False)
return tmp_path
# ── _mantle_root_from_anthropic_base ────────────────────────────────────────
@pytest.mark.parametrize(
"base,expected",
[
(
"https://bedrock-mantle.us-east-1.api.aws/anthropic",
"https://bedrock-mantle.us-east-1.api.aws",
),
(
"https://bedrock-mantle.us-east-1.api.aws/anthropic/",
"https://bedrock-mantle.us-east-1.api.aws",
),
(
"https://bedrock-mantle.us-east-1.api.aws",
"https://bedrock-mantle.us-east-1.api.aws",
),
(
"https://example.com/proxy/anthropic",
"https://example.com/proxy",
),
],
)
def test_mantle_root_from_anthropic_base(base: str, expected: str) -> None:
assert health._mantle_root_from_anthropic_base(base) == expected
# ── _check_anthropic (direct + Mantle share this probe) ──────────────────────
def test_check_anthropic_success_direct() -> None:
captured: list[httpx.Request] = []
def handler(request: httpx.Request) -> httpx.Response:
captured.append(request)
return httpx.Response(200, json={"id": "claude-sonnet-4-5"})
async def go() -> str | None:
async with httpx.AsyncClient(transport=httpx.MockTransport(handler)) as client:
return await health._check_anthropic(
client,
"sk-ant-real",
"claude-sonnet-4-5",
"https://api.anthropic.com/v1",
)
assert _run(go()) is None
assert str(captured[0].url) == "https://api.anthropic.com/v1/models/claude-sonnet-4-5"
assert captured[0].headers["x-api-key"] == "sk-ant-real"
assert captured[0].headers["anthropic-version"] == "2023-06-01"
def test_check_anthropic_success_mantle_root() -> None:
captured: list[httpx.Request] = []
def handler(request: httpx.Request) -> httpx.Response:
captured.append(request)
return httpx.Response(200, json={"id": "anthropic.claude-opus-4-7"})
async def go() -> str | None:
async with httpx.AsyncClient(transport=httpx.MockTransport(handler)) as client:
return await health._check_anthropic(
client,
"sk-bedrock-fake",
"anthropic.claude-opus-4-7",
"https://bedrock-mantle.us-east-1.api.aws/v1",
)
assert _run(go()) is None
# Must hit the Mantle region root, not `/anthropic/v1/models/...`.
assert str(captured[0].url) == (
"https://bedrock-mantle.us-east-1.api.aws/v1"
"/models/anthropic.claude-opus-4-7"
)
def test_check_anthropic_401() -> None:
def handler(request: httpx.Request) -> httpx.Response:
return httpx.Response(401, json={"error": "invalid_api_key"})
async def go() -> str | None:
async with httpx.AsyncClient(transport=httpx.MockTransport(handler)) as client:
return await health._check_anthropic(
client,
"bad-key",
"claude-sonnet-4-5",
"https://api.anthropic.com/v1",
)
assert _run(go()) == "API request failed (401)"
def test_check_anthropic_404_model_missing() -> None:
def handler(request: httpx.Request) -> httpx.Response:
return httpx.Response(404, json={})
async def go() -> str | None:
async with httpx.AsyncClient(transport=httpx.MockTransport(handler)) as client:
return await health._check_anthropic(
client,
"key",
"claude-foo",
"https://api.anthropic.com/v1",
)
assert _run(go()) == "model 'claude-foo' not found"
# ── validate_llm_providers: anthropic direct ─────────────────────────────────
def test_validate_anthropic_direct_ok(
workspace: Path, monkeypatch: pytest.MonkeyPatch
) -> None:
(workspace / "fastagent.config.yaml").write_text(
"default_model: anthropic.claude-sonnet-4-5\n"
)
(workspace / "fastagent.secrets.yaml").write_text(
'anthropic:\n api_key: "sk-ant-real"\n'
)
captured: list[httpx.Request] = []
def handler(request: httpx.Request) -> httpx.Response:
captured.append(request)
return httpx.Response(200, json={"id": "claude-sonnet-4-5"})
_patch_httpx(monkeypatch, handler)
assert _run(health.validate_llm_providers(timeout=1.0)) == {
"anthropic": {"status": "ok", "model": "claude-sonnet-4-5"}
}
assert str(captured[0].url) == "https://api.anthropic.com/v1/models/claude-sonnet-4-5"
def test_validate_anthropic_missing_key(
workspace: Path, monkeypatch: pytest.MonkeyPatch
) -> None:
(workspace / "fastagent.config.yaml").write_text(
"default_model: anthropic.claude-sonnet-4-5\n"
)
# No secrets file at all → ProviderKeyManager raises ProviderKeyError.
_patch_httpx_raising(monkeypatch)
results = _run(health.validate_llm_providers(timeout=1.0))
assert results["anthropic"]["status"] == "error"
assert "API key" in results["anthropic"]["message"]
# ── validate_llm_providers: anthropic via Mantle ─────────────────────────────
def test_validate_anthropic_mantle_uses_region_root(
workspace: Path, monkeypatch: pytest.MonkeyPatch
) -> None:
(workspace / "fastagent.config.yaml").write_text(
"default_model: anthropic.claude-opus-4-7\n"
"anthropic:\n"
' base_url: "https://bedrock-mantle.us-east-1.api.aws/anthropic"\n'
)
(workspace / "fastagent.secrets.yaml").write_text(
'anthropic:\n api_key: "sk-bedrock-fake"\n'
)
captured: list[httpx.Request] = []
def handler(request: httpx.Request) -> httpx.Response:
captured.append(request)
return httpx.Response(200, json={"id": "anthropic.claude-opus-4-7"})
_patch_httpx(monkeypatch, handler)
assert _run(health.validate_llm_providers(timeout=1.0)) == {
"anthropic": {"status": "ok", "model": "anthropic.claude-opus-4-7"}
}
# Must strip the `/anthropic` suffix AND apply the wire-name prefix.
assert str(captured[0].url) == (
"https://bedrock-mantle.us-east-1.api.aws/v1"
"/models/anthropic.claude-opus-4-7"
)
def test_validate_anthropic_mantle_401(
workspace: Path, monkeypatch: pytest.MonkeyPatch
) -> None:
(workspace / "fastagent.config.yaml").write_text(
"default_model: anthropic.claude-opus-4-7\n"
"anthropic:\n"
' base_url: "https://bedrock-mantle.us-east-1.api.aws/anthropic"\n'
)
(workspace / "fastagent.secrets.yaml").write_text(
'anthropic:\n api_key: "sk-bogus"\n'
)
captured: list[httpx.Request] = []
def handler(request: httpx.Request) -> httpx.Response:
captured.append(request)
return httpx.Response(401, json={"error": "unauthorized"})
_patch_httpx(monkeypatch, handler)
results = _run(health.validate_llm_providers(timeout=1.0))
assert results == {
"anthropic": {
"status": "error",
"model": "anthropic.claude-opus-4-7",
"message": "API request failed (401)",
}
}
assert "bedrock-mantle" in str(captured[0].url)
# ── validate_llm_providers: openai ───────────────────────────────────────────
def test_validate_openai_model_in_list(
workspace: Path, monkeypatch: pytest.MonkeyPatch
) -> None:
(workspace / "fastagent.config.yaml").write_text(
"default_model: openai.gpt-4o-mini\n"
)
(workspace / "fastagent.secrets.yaml").write_text(
'openai:\n api_key: "sk-openai-real"\n'
)
def handler(request: httpx.Request) -> httpx.Response:
return httpx.Response(
200,
json={"data": [{"id": "gpt-4o-mini"}, {"id": "gpt-4o"}]},
)
_patch_httpx(monkeypatch, handler)
results = _run(health.validate_llm_providers(timeout=1.0))
assert results["openai"]["status"] == "ok"
assert results["openai"]["model"] == "gpt-4o-mini"
def test_validate_openai_model_missing_from_list(
workspace: Path, monkeypatch: pytest.MonkeyPatch
) -> None:
(workspace / "fastagent.config.yaml").write_text(
"default_model: openai.gpt-nonexistent\n"
)
(workspace / "fastagent.secrets.yaml").write_text(
'openai:\n api_key: "sk-openai-real"\n'
)
def handler(request: httpx.Request) -> httpx.Response:
return httpx.Response(200, json={"data": [{"id": "gpt-4o-mini"}]})
_patch_httpx(monkeypatch, handler)
results = _run(health.validate_llm_providers(timeout=1.0))
assert results["openai"]["status"] == "error"
assert "gpt-nonexistent" in results["openai"]["message"]
assert "gpt-4o-mini" in results["openai"]["message"] # includes available list
# ── validate_llm_providers: generic ──────────────────────────────────────────
def test_validate_generic_ok_regardless_of_body(
workspace: Path, monkeypatch: pytest.MonkeyPatch
) -> None:
"""llama.cpp returns a non-OpenAI-shaped ``/v1/models`` payload; we only
care that the endpoint responds 200."""
(workspace / "fastagent.config.yaml").write_text(
"default_model: generic.Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf\n"
"generic:\n"
' base_url: "http://nyx.helu.ca:22079/v1"\n'
)
# generic requires no api_key; no secrets file needed.
captured: list[httpx.Request] = []
def handler(request: httpx.Request) -> httpx.Response:
captured.append(request)
# Match llama.cpp's shape (Ollama-style `models` alongside OpenAI `data`).
return httpx.Response(
200,
json={
"models": [{"name": "Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf"}],
"object": "list",
"data": [{"id": "Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf"}],
},
)
_patch_httpx(monkeypatch, handler)
assert _run(health.validate_llm_providers(timeout=1.0)) == {
"generic": {
"status": "ok",
"model": "Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf",
}
}
assert str(captured[0].url) == "http://nyx.helu.ca:22079/v1/models"
def test_validate_generic_unreachable(
workspace: Path, monkeypatch: pytest.MonkeyPatch
) -> None:
(workspace / "fastagent.config.yaml").write_text(
"default_model: generic.Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf\n"
"generic:\n"
' base_url: "http://nyx.helu.ca:22079/v1"\n'
)
def handler(request: httpx.Request) -> httpx.Response:
raise httpx.ConnectError("connection refused")
_patch_httpx(monkeypatch, handler)
results = _run(health.validate_llm_providers(timeout=1.0))
assert results["generic"]["status"] == "error"
assert "unreachable" in results["generic"]["message"].lower()
def test_validate_generic_503(
workspace: Path, monkeypatch: pytest.MonkeyPatch
) -> None:
(workspace / "fastagent.config.yaml").write_text(
"default_model: generic.Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf\n"
"generic:\n"
' base_url: "http://nyx.helu.ca:22079/v1"\n'
)
def handler(request: httpx.Request) -> httpx.Response:
return httpx.Response(503)
_patch_httpx(monkeypatch, handler)
results = _run(health.validate_llm_providers(timeout=1.0))
assert results["generic"] == {
"status": "error",
"model": "Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf",
"message": "API request failed (503)",
}
# ── validate_llm_providers: bedrock (no HTTP) ────────────────────────────────
def test_validate_bedrock_ok_with_bearer(
workspace: Path, monkeypatch: pytest.MonkeyPatch
) -> None:
(workspace / "fastagent.config.yaml").write_text(
"default_model: bedrock.anthropic.claude-sonnet-4-6\n"
)
monkeypatch.setenv("AWS_BEARER_TOKEN_BEDROCK", "abs-fake")
_patch_httpx_raising(monkeypatch) # any HTTP call is a test failure
results = _run(health.validate_llm_providers(timeout=1.0))
assert results["bedrock"]["status"] == "ok"
assert results["bedrock"]["model"] == "anthropic.claude-sonnet-4-6"
def test_validate_bedrock_no_credentials(
workspace: Path, monkeypatch: pytest.MonkeyPatch
) -> None:
(workspace / "fastagent.config.yaml").write_text(
"default_model: bedrock.anthropic.claude-sonnet-4-6\n"
)
# The real user has an ~/.aws/credentials file which would cause a false
# positive; redirect HOME so Path.home() / ".aws" does not exist.
monkeypatch.setenv("HOME", str(workspace))
_patch_httpx_raising(monkeypatch)
results = _run(health.validate_llm_providers(timeout=1.0))
assert results["bedrock"]["status"] == "error"
assert "AWS credentials" in results["bedrock"]["message"]
# ── validate_llm_providers: malformed / unknown provider ─────────────────────
def test_validate_default_model_missing_prefix(
workspace: Path, monkeypatch: pytest.MonkeyPatch
) -> None:
(workspace / "fastagent.config.yaml").write_text("default_model: just-a-name\n")
_patch_httpx_raising(monkeypatch)
results = _run(health.validate_llm_providers(timeout=1.0))
assert results["unknown"]["status"] == "error"
assert "provider prefix" in results["unknown"]["message"]
def test_validate_unknown_provider(
workspace: Path, monkeypatch: pytest.MonkeyPatch
) -> None:
(workspace / "fastagent.config.yaml").write_text(
"default_model: imaginary.some-model\n"
)
_patch_httpx_raising(monkeypatch)
results = _run(health.validate_llm_providers(timeout=1.0))
assert results["imaginary"]["status"] == "error"
assert "unknown provider" in results["imaginary"]["message"]
# ── get_health() payload ─────────────────────────────────────────────────────
def test_get_health_reports_generic_ok(
workspace: Path, monkeypatch: pytest.MonkeyPatch
) -> None:
"""End-to-end: after a successful generic preflight, get_health() should
return status=ok with no LLM error in the message. This is the exact
regression case that was showing up as ``LLM: generic: error`` in Daedalus.
"""
(workspace / "fastagent.config.yaml").write_text(
"default_model: generic.Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf\n"
"generic:\n"
' base_url: "http://nyx.helu.ca:22079/v1"\n'
)
def handler(request: httpx.Request) -> httpx.Response:
return httpx.Response(200, json={"data": []})
_patch_httpx(monkeypatch, handler)
_run(health.validate_llm_providers(timeout=1.0))
# Simulate the MCP `get_health` tool by calling check_downstream_health
# with an empty server map and composing the message the same way
# register_health_tool does.
async def call() -> dict:
result = await health.check_downstream_health({}, timeout=1.0)
active = health._llm_status.get(health._active_provider)
if active is not None and active.get("status") != "ok":
result["status"] = "degraded"
existing = result.get("message", "")
msg = (
f"LLM: {health._active_provider}: "
f"{active.get('message', 'unknown error')}"
)
result["message"] = f"{existing}; {msg}" if existing else msg
return result
final = _run(call())
assert final["status"] == "ok"
assert "LLM" not in final.get("message", "")

2718
uv.lock generated Normal file

File diff suppressed because it is too large Load Diff