Files
ouranos/docs/rommie.md

155 lines
6.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Ansible Deployment for Rommie
Rommie is an MCP server that wraps [Agent S](https://github.com/simular-ai/Agent-S), enabling agent-to-agent collaboration for GUI automation. It exposes three MCP tools — `execute_gui_task`, `get_screenshot`, and `get_agent_status` — over Streamable HTTP, allowing remote AI agents to delegate GUI tasks to the MATE desktop running on `caliban.incus`.
Named after the Andromeda Ascendant's AI avatar.
## Host
| Host | Group | Type |
|------|-------|------|
| `caliban.incus` | `rommie` | Incus container |
## Prerequisites
### Control node
- Staged release tarball in `~/rel/` (produced by `agent_s/stage.yml`):
- `~/rel/rommie_<rommie_rel>.tar`
### Target host
- Agent S fully deployed (`agent_s/deploy.yml`) — Rommie's `deploy.yml` imports it as a dependency
- MATE desktop and XRDP running (Agent S deployment provides this)
- Python 3.13 (Ubuntu 25.04)
- X11 display available at the configured `DISPLAY` value
> **Note**: `gui-agents` 0.3.x declares `Requires-Python <=3.12` in its PyPI metadata despite working on Python 3.13. The deploy playbook pre-installs it with `--ignore-requires-python` before installing Rommie.
## Staging
Rommie is staged from a local git checkout using `agent_s/stage.yml` (which creates the rommie tarball as part of the Agent S staging run). The release branch is controlled by `rommie_rel` in `group_vars/all/vars.yml` (default: `main`).
## Deployment
```bash
ansible-playbook ansible/rommie/deploy.yml
```
The playbook imports `agent_s/deploy.yml` first to ensure the MATE desktop and Agent S dependencies are in place, then:
1. Creates `~/rommie/` and extracts the staged tarball
2. Creates a Python venv at `~/env/rommie` with `--system-site-packages`
3. Pre-installs `gui-agents>=0.3.1` with `--ignore-requires-python`
4. Installs Rommie into the venv in editable mode (`pip install -e`)
5. Deploys `~/rommie/.env` from the template
6. Deploys and enables the `rommie.service` systemd unit
7. Health-checks `http://localhost:<rommie_port>/mcp` (retries 5×, 3 s apart)
## MCP Tools
| Tool | Concurrency | Description |
|------|-------------|-------------|
| `execute_gui_task` | Serialized (one at a time) | Execute a GUI automation task via Agent S |
| `get_screenshot` | Always available | Capture the current screen state |
| `get_agent_status` | Always available | Query task progress and agent state |
Read-only tools (`get_screenshot`, `get_agent_status`) remain available while a GUI task is running. A second `execute_gui_task` call while one is in-flight returns a "busy" error.
## Architecture
```
External Agent (e.g., Claude Desktop / MCP Switchboard)
│ MCP Protocol (Streamable HTTP, TLS)
│ https://rommie.ouranos.helu.ca/mcp
Titania HAProxy (TLS termination, wildcard cert)
│ http://caliban.incus:22031/mcp
Rommie MCP Server
(serialized task execution, multi-client reads)
Agent S (gui-agents package)
MATE Desktop ← X11 display :10 ← XRDP session
```
## Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `rommie_port` | `22031` | HTTP listen port |
| `rommie_host` | `0.0.0.0` | Bind address |
| `rommie_display` | `:10` | X11 display for Agent S (XRDP assigns `:10` by default) |
| `rommie_allowed_hosts` | `caliban.incus` | Allowed Host header values |
| `rommie_model` | `Qwen3-VL-30B-A3B-Instruct-UD-Q5_K_XL.gguf` | Primary vision-language model |
| `rommie_model_url` | `http://nyx.helu.ca:22078` | Inference endpoint for the primary model |
| `rommie_provider` | `openai` | API provider for the primary model |
| `rommie_ground_provider` | `huggingface` | API provider for the grounding model |
| `rommie_ground_url` | `http://pan.helu.ca:22078` | Inference endpoint for the grounding model |
| `rommie_ground_model` | `UI-TARS-7B-DPO-Q6_K_L.gguf` | Grounding model (UI element localisation) |
| `rommie_grounding_width` | `1024` | Screenshot width passed to the grounding model |
| `rommie_grounding_height` | `1024` | Screenshot height passed to the grounding model |
| `rommie_rel` | `main` | Git branch/tag to stage from `~/git/rommie` |
All host-specific variables are set in `ansible/inventory/host_vars/caliban.incus.yml`. The `rommie_rel` default is in `ansible/inventory/group_vars/all/vars.yml`.
## Integration
The MCP URL for Rommie is registered in `group_vars/all/vars.yml`:
```yaml
rommie_mcp_url: https://rommie.ouranos.helu.ca/mcp
```
Consumers (e.g., MCP Switchboard, Open WebUI, Claude Desktop) reference `{{ rommie_mcp_url }}`.
The route is served via Titania's HAProxy using the existing `*.ouranos.helu.ca` Let's Encrypt wildcard certificate. No additional certificate provisioning is required.
## Service Management
```bash
# Check status
systemctl status rommie
# Restart
systemctl restart rommie
# View logs
journalctl -u rommie -f
```
The unit runs as `principal_user` (`robert`) and loads environment from `~/rommie/.env`. It restarts automatically on failure with a 10 s back-off.
## Troubleshooting
### `gui-agents` version conflict
`gui-agents` 0.3.x requires Python <=3.12 in its PyPI metadata but works on 3.13. The deploy playbook installs it with `--ignore-requires-python`. If the install step fails with a version conflict, confirm the pre-install task ran and check the venv Python version:
```bash
/home/robert/env/rommie/bin/python --version
/home/robert/env/rommie/bin/pip show gui-agents
```
### Health check fails
The playbook probes `http://localhost:22031/mcp` after starting the service. If it times out:
1. Check the service started: `systemctl status rommie`
2. Confirm the `DISPLAY` variable resolves — XRDP must have created the `:10` display before Rommie starts
3. Check logs: `journalctl -u rommie --since "5 min ago"`
### No X display
Rommie inherits `DISPLAY` from `.env`. If Agent S cannot connect to the display:
```bash
# Verify XRDP created the display
ls /tmp/.X11-unix/
```
An active RDP session must exist or XRDP's `Xorg` daemon must be running for display `:10` to be present.