feat: add loop guard to halt repeated-identical tool call loops
Introduces `pallas.loop_guard` module that detects and halts agentic loops
where the same `(tool, args) → result` repeats consecutively, preventing
wasted LLM turns when upstream MCP servers return contradictory data.
- Add per-request `ToolRunnerHooks` tracking rolling tool-call signatures
- Halt loop after `loop_repeat_threshold` consecutive repeats (default 3)
- Collapse `max_iterations` on halt to terminate without further LLM call
- Append user-facing explanation to the turn with `stop_reason=endTurn`
- Expose `pallas_agent_loop_aborted_total{agent,reason}` counter
- Add per-agent `max_iterations` and `loop_repeat_threshold` config
- Document guard behavior, metric, and alerting query
This commit is contained in:
@@ -193,6 +193,8 @@ agents:
|
||||
| `agents.<name>.title` | no | Display name in registry. Default: `name.title()` |
|
||||
| `agents.<name>.description` | no | Description in registry |
|
||||
| `agents.<name>.depends_on` | no | List of agent names that must start and become ready before this agent |
|
||||
| `agents.<name>.max_iterations` | no | Hard cap on agentic-loop turns per `send_message`. Default: `15`. fast-agent returns a partial answer once exceeded |
|
||||
| `agents.<name>.loop_repeat_threshold` | no | Halt the loop after this many consecutive identical `(tool, args) → result` rounds. Default: `3`. `0` disables the guard |
|
||||
|
||||
### `fastagent.config.yaml` Extensions
|
||||
|
||||
@@ -530,6 +532,39 @@ Registered on each agent's MCP server. Checks:
|
||||
|
||||
---
|
||||
|
||||
## Loop Guard
|
||||
|
||||
A small model occasionally gets stuck emitting the *identical* tool call every
|
||||
iteration — usually because an upstream MCP server returned a contradictory or
|
||||
malformed result it keeps trying to reconcile. Left alone the loop burns LLM
|
||||
turns and context until the client times out and the user sees
|
||||
`empty_response`.
|
||||
|
||||
`pallas.loop_guard` installs per-request `ToolRunnerHooks` (composed on top of
|
||||
the assistant-stream hooks) that track a rolling signature of
|
||||
`(tool, normalized_args) → result_hash`. When the same signature repeats
|
||||
`loop_repeat_threshold` times consecutively (default **3**), the loop is
|
||||
**halted immediately** — the runtime does *not* ask the model to troubleshoot,
|
||||
because the fault is almost always upstream and self-recovery is slow,
|
||||
unpredictable, and token-hungry. On halt it:
|
||||
|
||||
- collapses the request's `max_iterations` to the current iteration, so
|
||||
fast-agent's own `_iteration > max_iterations` check terminates the turn
|
||||
after the current tool result with **no further LLM call**;
|
||||
- appends an honest, user-facing explanation to the returned turn (and sets
|
||||
`stop_reason = endTurn`) so the client gets a real message instead of an
|
||||
empty/truncated one;
|
||||
- logs the offending tool, arguments, and result at WARNING (`event=loop_halt`
|
||||
in `pallas.loop_guard`) so the upstream bug can be fixed durably; and
|
||||
- increments `pallas_agent_loop_aborted_total{reason="repeat"}`.
|
||||
|
||||
This fires well before the `max_iterations` cap (a 3-round repeat halts within
|
||||
~3 turns regardless of the configured ceiling), which is the point: the cap is
|
||||
a backstop, the guard is the fast path. Set `loop_repeat_threshold: 0` on an
|
||||
agent to disable it.
|
||||
|
||||
---
|
||||
|
||||
## Metrics
|
||||
|
||||
Pallas exposes Prometheus metrics for scraping and alerting. One scrape target per Pallas deployment is sufficient — all agents run as coroutines in a single process under `asyncio.gather`, so metrics are process-global.
|
||||
@@ -570,6 +605,7 @@ scrape_configs:
|
||||
| `pallas_downstream_up` | gauge | `agent`, `server` | `1` when the named downstream MCP server passed the last `get_health` probe |
|
||||
| `pallas_llm_provider_up` | gauge | `provider` | `1` when the active LLM provider passed its last preflight or runtime re-probe |
|
||||
| `pallas_agent_health_status` | gauge | `agent` | Aggregate from the last `get_health`: `1`=ok, `0.5`=degraded, `0`=error |
|
||||
| `pallas_agent_loop_aborted_total` | counter | `agent`, `reason` | Agentic loops force-stopped by a runtime guard. `reason` ∈ `repeat` (identical-tool-call loop detected) |
|
||||
|
||||
Standard process metrics (RSS, CPU, GC, open FDs) are emitted by `prometheus-client`'s default collectors on the same endpoint.
|
||||
|
||||
@@ -616,6 +652,7 @@ pallas_llm_provider_up == 0
|
||||
| Agent error rate elevated | `rate(pallas_send_message_total{outcome="error"}[10m]) > 0.1` | >10% errors over 10 min |
|
||||
| Latency regression | `histogram_quantile(0.95, sum by (agent, le) (rate(pallas_send_message_duration_seconds_bucket[10m]))) > 60` | p95 over 60 s |
|
||||
| Token burn | `sum(rate(pallas_llm_tokens_total{kind="output"}[1h])) > N` | Set N to your budget |
|
||||
| Agent loop halted | `increase(pallas_agent_loop_aborted_total[15m]) > 0` | A repeated-tool-call loop was force-stopped — investigate the upstream tool/data |
|
||||
|
||||
---
|
||||
|
||||
@@ -645,6 +682,7 @@ This avoids the brittle pattern of inferring capabilities from model name substr
|
||||
| `pallas.registry` | `registry.py` | Starlette app serving `GET /.well-known/mcp/server.json` — agent catalogue built from config |
|
||||
| `pallas.multimodal_server` | `multimodal_server.py` | `MultimodalAgentMCPServer` — extends `AgentMCPServer` with image support, conversation history prompts, bearer token propagation |
|
||||
| `pallas.health` | `health.py` | LLM provider preflight validation, downstream MCP server probing, `get_health` tool registration |
|
||||
| `pallas.loop_guard` | `loop_guard.py` | Per-request `ToolRunnerHooks` that halt the agentic loop on repeated-identical tool calls |
|
||||
| `pallas.log` | `log.py` | JSON log configuration, third-party traceback capture, Rich-TUI-safe handler attachment |
|
||||
| `pallas._fastagent_patch` | `_fastagent_patch.py` | Monkey-patches fast-agent at import time: per-request bearer forwarding via `httpx.Auth`, diagnostic trace-capture wrappers around `send_request` / `session.call_tool` / `_execute_on_server` |
|
||||
|
||||
|
||||
Reference in New Issue
Block a user