Move llama-cpp to generic fastagent slot

2026-05-12 15:07:00 -04:00
parent 8c95173705
commit b2fc398782
2 changed files with 11 additions and 6 deletions
--- a/docs/iolaus.md
+++ b/docs/iolaus.md
@@ -95,7 +95,7 @@ Committed to the repo. Contains LLM provider settings and explicit model capabil
 declarations.
 ```yaml
-default_model: openai.Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf
+default_model: generic.Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf
 model_capabilities:
  vision: false
@@ -249,6 +249,7 @@ sudo systemctl status iolaus
 - **Python 3.13** required (`fast-agent-mcp` pins `>=3.13`)
 - **Runtime:** [Pallas](https://git.helu.ca/r/pallas) — `pallas-mcp @ git+ssh://git@git.helu.ca:22022/r/pallas.git`
 - **Transport:** StreamableHTTP (`/mcp`) throughout — not SSE
- **LLM:** OpenAI-compatible API at `http://nyx.helu.ca:22079/v1` (personal Qwen model)
+- **LLM:** Local Qwen via fast-agent's Generic (OpenAI-compatible) provider at
  `http://nyx.helu.ca:22079/v1`
 - **Logging:** Console output — stdout → syslog → Alloy → Loki in production
 - **Port scheme:** registry at 24000, personal agents 24001–24049, sub-agents 24050–24099
--- a/docs/kottos.md
+++ b/docs/kottos.md
@@ -89,7 +89,7 @@ In Ansible-managed deployments this file is replaced by the
 for model, MCP URLs, etc.
 ```yaml
-default_model: openai.Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf
+default_model: generic.Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf
 model_capabilities:
  vision: false
@@ -199,8 +199,11 @@ kottos_scotty_port: 24102
 kottos_research_port: 24150
 kottos_tech_research_port: 24151
 pallas_log_level: INFO
-kottos_default_model: "openai.Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf"
+# Local Qwen served via fast-agent's Generic (OpenAI-compatible) provider.
-kottos_openai_base_url: "http://nyx.helu.ca:22079/v1"
+# The openai_base_url slot is reserved for cloud OpenAI endpoints (e.g.
 # Bedrock Mantle Chat Completions).
 kottos_default_model: "generic.Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf"
 kottos_generic_base_url: "http://nyx.helu.ca:22079/v1"
 # ...plus one entry per downstream MCP URL so each environment overrides freely
 ```
@@ -274,6 +277,7 @@ See [logging.md](logging.md) for the full label schema + level policy + add-a-ne
 - **Python 3.13** required (`fast-agent-mcp` pins `>=3.13`)
 - **Runtime:** [Pallas](https://git.helu.ca/r/pallas) — `pallas-mcp @ git+ssh://git@git.helu.ca:22022/r/pallas.git`
 - **Transport:** StreamableHTTP (`/mcp`) throughout — not SSE
- **LLM:** OpenAI-compatible API at `http://nyx.helu.ca:22079/v1` (personal Qwen model)
+- **LLM:** Local Qwen via fast-agent's Generic (OpenAI-compatible) provider at
  `http://nyx.helu.ca:22079/v1`
 - **Logging:** Console output — stdout → syslog → Alloy → Loki in production
 - **Port scheme:** registry at 24100, agents 24101–24149, sub-agents 24150–24199