URGENT: Detect and break repeated-identical-tool-call loops in the agentic loop #1

Open
opened 2026-06-15 15:30:25 +00:00 by r · 0 comments
Owner

Priority: URGENT

A small-model agent (Shawn, Iolaus, proteus.helu.ca) entered an infinite tool-call loop that ran until the Daedalus MCP send_message call timed out and returned empty_response to the web UI. There is currently no guard in the Pallas/fast-agent agentic loop to detect or break a model that repeats the same tool call indefinitely. This is the single most expensive failure mode we have: the loop consumes LLM turns and context (cost + latency) for the entire client timeout window and produces nothing.

Evidence (Loki, {hostname="proteus.helu.ca"}, 2026-06-15)

Reading one episode chronologically, the same cycle repeats ~1/sec:

shawn tool call - kairos__update_task
shawn tool result - text only 92 chars
Streaming complete - Model: Qwen3.6-27B-Q5_K_M, Input tokens: 41216, Output tokens: 33
shawn tool call - kairos__update_task        <-- identical call again
... Input tokens: 41295 ... 41374 ... 41453 ... 41532 ... 41611 ... 41690 ... 41769 ... 41848 ... 41927
  • Output is a fixed 33 tokens every turn -> the model emits the identical kairos__update_task call each iteration.
  • Input grows ~79 tokens/turn -> each loop appends the same ~92-char tool result, inflating context until the call wall-clocks out.
  • The Daedalus UI recorded the call at step 31 as kairos__update_task (task_id=494) ... empty_response.

The immediate trigger was a data inconsistency in Kairos task 494 ("Mnemosyne Deploy": status=COMPLETED but percent_complete=0), which the model tried to reconcile forever. That data bug is being fixed separately — but the loop itself must be broken by the runtime regardless of what triggers it. Any bad/contradictory tool result can induce this.

Proposed fix

Add loop-breaking to the agentic loop (pallas.multimodal_server.send_message / wherever the fast-agent loop is driven):

  1. Repeated-identical-tool-call detection (primary). Track a rolling signature of (server, tool, normalized_arguments) -> result_hash per request. If the same (tool, args) produces the same result hash N times consecutively (suggest N=3), stop the loop and return a partial result with an injected system message explaining the tool call is not converging, so the model can change strategy or summarize instead of spinning.
  2. Hard max-turns cap per send_message (defense in depth). A configurable ceiling (e.g. PALLAS_MAX_AGENT_TURNS, default ~40) that terminates the loop with a partial result rather than letting it run to the client timeout. Even with #1, this bounds pathological non-identical loops.
  3. Emit a metric when either guard fires (e.g. pallas_agent_loop_aborted_total{agent, reason="repeat"|"max_turns"}) so we can alert on it. (Note: pallas_* metrics do not currently appear in the Taurus Prometheus — only daedalus_pallas_instances_total — so the Pallas scrape target may also need to be wired up; see separate follow-up.)

Acceptance

  • A model repeating the same tool call with the same result is stopped within N iterations and returns a partial answer instead of empty_response.
  • The hard turn cap prevents any single send_message from running to the client timeout.
  • Loop aborts are observable via logs and a counter metric.

Found while investigating Shawn timeouts on Taurus production (proteus.helu.ca). Filed by Claude Code on behalf of @r.

## Priority: URGENT A small-model agent (Shawn, Iolaus, `proteus.helu.ca`) entered an **infinite tool-call loop** that ran until the Daedalus MCP `send_message` call timed out and returned `empty_response` to the web UI. There is currently no guard in the Pallas/fast-agent agentic loop to detect or break a model that repeats the same tool call indefinitely. This is the single most expensive failure mode we have: the loop consumes LLM turns and context (cost + latency) for the entire client timeout window and produces nothing. ## Evidence (Loki, `{hostname="proteus.helu.ca"}`, 2026-06-15) Reading one episode chronologically, the same cycle repeats ~1/sec: ``` shawn tool call - kairos__update_task shawn tool result - text only 92 chars Streaming complete - Model: Qwen3.6-27B-Q5_K_M, Input tokens: 41216, Output tokens: 33 shawn tool call - kairos__update_task <-- identical call again ... Input tokens: 41295 ... 41374 ... 41453 ... 41532 ... 41611 ... 41690 ... 41769 ... 41848 ... 41927 ``` - **Output is a fixed 33 tokens every turn** -> the model emits the *identical* `kairos__update_task` call each iteration. - **Input grows ~79 tokens/turn** -> each loop appends the same ~92-char tool result, inflating context until the call wall-clocks out. - The Daedalus UI recorded the call at **step 31** as `kairos__update_task (task_id=494) ... empty_response`. The immediate *trigger* was a data inconsistency in Kairos task 494 ("Mnemosyne Deploy": `status=COMPLETED` but `percent_complete=0`), which the model tried to reconcile forever. That data bug is being fixed separately — but **the loop itself must be broken by the runtime regardless of what triggers it.** Any bad/contradictory tool result can induce this. ## Proposed fix Add loop-breaking to the agentic loop (`pallas.multimodal_server.send_message` / wherever the fast-agent loop is driven): 1. **Repeated-identical-tool-call detection (primary).** Track a rolling signature of `(server, tool, normalized_arguments)` -> `result_hash` per request. If the same `(tool, args)` produces the same result hash **N times consecutively** (suggest N=3), stop the loop and return a partial result with an injected system message explaining the tool call is not converging, so the model can change strategy or summarize instead of spinning. 2. **Hard max-turns cap per `send_message` (defense in depth).** A configurable ceiling (e.g. `PALLAS_MAX_AGENT_TURNS`, default ~40) that terminates the loop with a partial result rather than letting it run to the client timeout. Even with #1, this bounds pathological non-identical loops. 3. **Emit a metric** when either guard fires (e.g. `pallas_agent_loop_aborted_total{agent, reason="repeat"|"max_turns"}`) so we can alert on it. (Note: `pallas_*` metrics do not currently appear in the Taurus Prometheus — only `daedalus_pallas_instances_total` — so the Pallas scrape target may also need to be wired up; see separate follow-up.) ## Acceptance - A model repeating the same tool call with the same result is stopped within N iterations and returns a partial answer instead of `empty_response`. - The hard turn cap prevents any single `send_message` from running to the client timeout. - Loop aborts are observable via logs and a counter metric. Found while investigating Shawn timeouts on Taurus production (proteus.helu.ca). Filed by Claude Code on behalf of @r.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: r/pallas#1