docs(readme): update assistant roster, prompt layers, repo structure
- Update assistant lists (added Shawn, Watson, David, CASE, AWS SA; modified Scotty/Harper roles) - Reflect new architecture layers: Tool Prompt Snippets and Shared Context - Align repository structure diagram with current filesystem layout
This commit is contained in:
33
docs/tools/rommie.md
Normal file
33
docs/tools/rommie.md
Normal file
@@ -0,0 +1,33 @@
|
||||
# Rommie
|
||||
|
||||
> Autonomous desktop automation — drives a MATE desktop via Agent S.
|
||||
|
||||
- **MCP server name:** `rommie` (runs on `caliban.incus`)
|
||||
- **Prompt snippet:** [prompts/tools/rommie.md](../../prompts/tools/rommie.md)
|
||||
|
||||
## What It Is
|
||||
|
||||
Rommie is the agent that operates a desktop. Powered by Agent S (a vision-based desktop automation framework), Rommie sees and drives a MATE desktop environment — clicking, typing, navigating GUI applications that have no API. Named after Andromeda's ship-mind avatar, who could project into physical space when needed.
|
||||
|
||||
Other agents delegate to Rommie when GUI interaction is unavoidable. The conversation pattern is: send Rommie a natural-language task, wait, verify with a screenshot.
|
||||
|
||||
## What It's Good For
|
||||
|
||||
- Using a website or app that only works through a browser GUI
|
||||
- Driving software that has no API or CLI
|
||||
- "Check the latest headlines on Google" style high-level web interactions
|
||||
- Generating screenshots of GUI state for verification
|
||||
- Anything where "just look at the screen" is the only way to know what happened
|
||||
|
||||
## What It's Not Good For
|
||||
|
||||
- Anything achievable through a shell or API — Kernos and Argos are faster, more deterministic, and don't tie up Rommie's single session
|
||||
- Bulk operations — Rommie is one desktop, one task at a time
|
||||
- High-precision pixel work — Agent S is vision-based and works at semantic UI level, not at exact-pixel level
|
||||
|
||||
## Known Gotchas
|
||||
|
||||
- **One task at a time.** If Rommie is busy, wait — don't fire a second task. Subsequent requests will queue or fail.
|
||||
- **Verify with `get_screenshot`.** Don't assume Rommie completed the task; ask for a screenshot and look. This is especially important because Rommie's confidence about completion can outrun reality.
|
||||
- **Give natural-language tasks, not click coordinates.** Agent S decides where to click; the calling agent describes the goal.
|
||||
- **The desktop is real, the actions are real.** Rommie can buy things, send messages, modify files. Treat its tool calls like Kernos calls — with confirmation for anything irreversible.
|
||||
Reference in New Issue
Block a user