Files
ouranos/docs/agent_s.md
Robert Helewka 042df52bca Refactor user management in Ansible playbooks to standardize on keeper_user
- Updated user addition tasks across multiple playbooks (mcp_switchboard, mcpo, neo4j, neo4j_mcp, openwebui, postgresql, rabbitmq, searxng, smtp4dev) to replace references to ansible_user and remote_user with keeper_user.
- Modified PostgreSQL deployment to create directories and manage files under keeper_user's home.
- Enhanced documentation to clarify account taxonomy and usage of keeper_user in playbooks.
- Introduced new deployment for Agent S, including environment setup, desktop environment installation, XRDP configuration, and accessibility support.
- Added staging playbook for preparing release tarballs from local repositories.
- Created templates for XRDP configuration and environment activation scripts.
- Removed obsolete sunwait documentation.
2026-03-05 10:37:41 +00:00

246 lines
9.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Ansible Deployment for Agent S
Agent S is a computer-use automation agent that controls a desktop environment via RDP. The deployment configures a full graphical stack on `larissa.helu.ca`: MATE desktop, XRDP, audio redirection via PulseAudio, and an AT-SPI accessibility bridge so agents can introspect the UI tree.
## Host
| Host | Group | Type |
|------|-------|------|
| `larissa.helu.ca` | `agent_s` | Incus container |
## Overview
The deployment installs and configures:
- **Principal user** (`robert`, UID 1000) — the human account the agent operates on behalf of
- **MATE desktop** — chosen for strong AT-SPI accessibility support
- **Firefox** from the Mozilla APT repo (bypasses the snap dependency introduced by Ubuntu)
- **Google Chrome** for browser automation
- **XRDP** with a custom Xorg config pinned to 1024×768 for UI-TARS / Agent-S model compatibility
- **PulseAudio + pulseaudio-module-xrdp** — audio redirection over RDP
- **AT-SPI accessibility stack** — allows agents to query the widget tree
- **Python virtual environment** with the Agent-S package and dependencies
- **Agent-S repository** extracted from a staged release tarball
- **Environment activation script** at `~/.agent_s_env`
## Prerequisites
### Control node
- Ansible 2.12+
- SSH access to `larissa.helu.ca`
- Staged release tarballs in `~/rel/` (produced by `stage.yml`):
- `~/rel/agent_s_<agent_s_rel>.tar`
- `~/rel/pulseaudio_module_xrdp_<pulseaudio_module_xrdp_rel>.tar`
### Target host
- Ubuntu 25.04
- Network access to Mozilla APT, Google Chrome DL, and the Ubuntu package mirrors
- deb-src repositories available (the playbook enables them if absent)
## Staging
Before deploying, stage release tarballs from local git checkouts:
```bash
ansible-playbook ansible/agent_s/stage.yml
```
`stage.yml` runs on localhost and creates two tarballs from the configured git branches:
| Tarball | Source repo | Branch variable |
|---------|-------------|-----------------|
| `agent_s_<rel>.tar` | `~/gh/Agent-S` | `agent_s_rel` |
| `pulseaudio_module_xrdp_<rel>.tar` | `~/gh/pulseaudio-module-xrdp` | `pulseaudio_module_xrdp_rel` |
Both variables default to the `all` group vars (`agent_s_rel: main`, `pulseaudio_module_xrdp_rel: devel`).
## Deployment
```bash
ansible-playbook ansible/agent_s/deploy.yml
```
### Phase 1 — Principal user and snap suppression
Creates the `robert` account (UID 1000) and pins `snapd` at priority -10 via `/etc/apt/preferences.d/nosnap.pref`. This must run before the desktop install so `ubuntu-mate-desktop` cannot pull in snap-packaged Firefox.
### Phase 2 — Firefox
Adds the Mozilla APT repo and pins all `packages.mozilla.org` packages at priority 1000, ensuring the system installs the .deb Firefox rather than the snap.
### Phase 3 — MATE desktop
Installs `ubuntu-mate-desktop`. MATE is preferred over GNOME because its AT-SPI bridge is more reliable in a headless XRDP session.
### Phase 4 — XRDP
Installs `xrdp` and `xorgxrdp`, adds the `xrdp` user to `ssl-cert`, and enables the service. The Xorg configuration is deployed separately (Phase 8).
### Phase 5 — AT-SPI accessibility
Installs the AT-SPI core libraries and adds `/etc/profile.d/atspi.sh` to set:
```bash
GTK_MODULES=gail:atk-bridge
NO_AT_BRIDGE=0
ACCESSIBILITY_ENABLED=1
```
These environment variables are picked up by MATE applications when launched from `.xsession`, making the widget tree available to AT-SPI consumers such as `pyatspi`.
### Phase 6 — PulseAudio and RDP audio
See [Sound device configuration](#sound-device-configuration) below.
### Phase 7 — Packages, Chrome, Python environment, Agent-S
- Installs OCR support (`tesseract-ocr`), Python 3, and assistive tech libraries
- Downloads and installs Google Chrome from the official .deb
- Creates a Python venv at `~/env/agents` with `--system-site-packages` (so `pyatspi` and `python3-gi`, installed system-wide, are available)
- Extracts Agent-S into `~/gh/Agent-S`
### Phase 8 — XRDP Xorg configuration and session
Deploys `xorg.conf.j2` to `/etc/X11/xrdp/xorg.conf` and writes `.xsession` to start MATE:
```
exec mate-session
```
Any change to the Xorg config triggers the `restart xrdp` handler.
---
## X Server / RDP configuration
The Xorg config is templated from `ansible/agent_s/xorg.conf.j2` and deployed to `/etc/X11/xrdp/xorg.conf`.
### Resolution
The display is pinned to **1024×768**. UI-TARS and similar visionlanguage models are trained on screenshots at this resolution; higher resolutions degrade accuracy. Fallback modelines for 800×600 and 640×480 satisfy the `xrdpdev` driver's internal requirements but are never selected in normal operation.
### Module loading order
```
Load "glamoregl" ← must precede xorgxrdp
Load "xorgxrdp"
```
`xorgxrdp 0.10.2` (shipped in Ubuntu 25.04) depends on the symbol `glamor_xv_init`, which lives in `libglamoregl.so`. If `glamoregl` is not loaded first, `libxorgxrdp.so` fails with:
```
undefined symbol: glamor_xv_init
```
This cascades — `xrdpdev`, `xrdpmouse`, and `xrdpkeyb` all fail because they depend on symbols exported by `libxorgxrdp.so`.
### Device section
The Device section uses only the `xrdpdev` virtual framebuffer driver with no DRM/GPU options:
```
Section "Device"
Identifier "Video Card (xrdpdev)"
Driver "xrdpdev"
EndSection
```
DRM options (`DRMDevice`, `DRI3`, `DRMAllowList`) are not applicable to the virtual framebuffer and were the source of a previous misconfiguration on `larissa`.
### Display variable
The agent environment sets `DISPLAY=:10.0` (via `~/.agent_s_env`). XRDP assigns display numbers starting at `:10` by default.
---
## Sound device configuration
RDP audio redirection requires two components:
1. **PulseAudio** — the userspace sound server
2. **pulseaudio-module-xrdp** — a PulseAudio sink/source module that forwards audio to the RDP client
### Build process
`pulseaudio-module-xrdp` must be compiled against the PulseAudio headers matching the exact version running on the target host. The playbook automates this:
1. Enables `deb-src` entries in `/etc/apt/sources.list.d/ubuntu.sources`
2. Runs `apt-get source pulseaudio` into `/usr/local/src/`
3. Generates `config.h` by running `meson setup build` in the PulseAudio source tree
4. Extracts `pulseaudio-module-xrdp` from the staged tarball into `/usr/local/src/pulseaudio-module-xrdp/`
5. Runs `./bootstrap && ./configure PULSE_DIR=<pulse-src> && make && make install`
Steps 45 are skipped if `module-xrdp-sink.so` is already present under `/usr/lib/pulse-*/modules/`. Re-running the playbook after a PulseAudio upgrade will trigger a rebuild because the old `.so` won't be found at the new versioned path.
### How audio flows
```
MATE application
→ PulseAudio (userspace daemon, per-user session)
→ module-xrdp-sink (installed by pulseaudio-module-xrdp)
→ XRDP audio channel
→ RDP client
```
PulseAudio is started automatically as part of the MATE session (`mate-session` launches `pulseaudio --start`). No additional service unit is required.
---
## Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `principal_user` | `robert` | Username for the human account the agent runs as |
| `agent_s_rel` | `main` | Git branch/tag to stage from `~/gh/Agent-S` |
| `pulseaudio_module_xrdp_rel` | `devel` | Git branch/tag to stage from `~/gh/pulseaudio-module-xrdp` |
| `agent_s_venv` | `/home/{{ principal_user }}/env/agents` | Path to the Python virtual environment |
| `agent_s_repo` | `/home/{{ principal_user }}/gh/Agent-S` | Extraction path for the Agent-S source |
All variables are set in `ansible/inventory/host_vars/larissa.helu.ca.yml` (`principal_user`) and `ansible/inventory/group_vars/all/vars.yml` (release branches).
---
## Environment activation
After deployment, activate the Agent-S environment on the target host:
```bash
source ~/.agent_s_env
```
This activates the virtual environment, sets `AGENT_S_HOME`, `DISPLAY`, and exports `HF_TOKEN` and `OPENAI_API_KEY` placeholder values that must be filled in for model inference.
---
## Troubleshooting
### X server fails to start — `undefined symbol: glamor_xv_init`
`glamoregl` is missing or ordered after `xorgxrdp` in the Module section. Check `/etc/X11/xrdp/xorg.conf` and ensure `Load "glamoregl"` appears before `Load "xorgxrdp"`.
### Black screen on RDP connect
Confirm `.xsession` contains `exec mate-session` and is executable (`chmod 0755`). Check `/var/log/xrdp-sesman.log` for session startup errors.
### No audio in RDP session
Verify `module-xrdp-sink.so` is installed:
```bash
find /usr/lib/pulse-*/modules/ -name 'module-xrdp-sink.so'
```
If absent, re-run the playbook. If PulseAudio was upgraded, the old `.so` path will not exist and the build steps will execute automatically.
### Accessibility / AT-SPI not working
Confirm the profile script is loaded in the session:
```bash
echo $GTK_MODULES # should include atk-bridge
```
If empty, verify `/etc/profile.d/atspi.sh` exists and the session was started via `.xsession` (not a display manager session that bypasses `/etc/profile.d/`).