docs(readme): update assistant roster, prompt layers, repo structure

- Update assistant lists (added Shawn, Watson, David, CASE, AWS SA; modified Scotty/Harper roles)
- Reflect new architecture layers: Tool Prompt Snippets and Shared Context
- Align repository structure diagram with current filesystem layout
This commit is contained in:
2026-05-20 22:50:22 -04:00
parent c1cc6e26c5
commit 703b3402d4
39 changed files with 1181 additions and 158 deletions

31
docs/tools/argos.md Normal file
View File

@@ -0,0 +1,31 @@
# Argos
> Web search and page fetch.
- **MCP server name:** `argos` (runs on `miranda.incus` in the lab)
- **Prompt snippet:** [prompts/tools/argos.md](../../prompts/tools/argos.md)
## What It Is
Argos is the agent's window onto the outside world: web search and webpage fetching. Named for the many-eyed giant of Greek myth, fitting for something that watches everywhere.
## What It's Good For
- General web search ("how do I…", "what is…", "current state of…")
- Fetching a specific URL when the agent already knows where to look
- Documentation lookups for libraries, frameworks, APIs (though Context7 is often better for these)
- CVE references, vendor status pages, upstream incident announcements
- Quick reality checks — "did this thing actually ship", "is this service up"
## What It's Not Good For
- Library/framework documentation when **Context7** is configured — Context7 is purpose-built for that and returns better-structured results
- Anything inside the Agathos lab — use Kernos, not Argos, for internal services
- Deep research with many follow-up queries — the agent should delegate to a research subagent rather than burning its own context window on long Argos chains
- Code search inside a known repo — use Gitea or GitHub MCP for repo-scoped lookups
## Known Gotchas
- **Quotes and operators matter** — Argos respects search-engine query syntax. Bad quoting → bad results.
- **Cached pages can mislead.** If a page's "last updated" matters (e.g., status pages, release notes), confirm by checking the page itself, not just the search snippet.
- **Rate limits exist.** Burning Argos on a tight loop will eventually get throttled.

33
docs/tools/context7.md Normal file
View File

@@ -0,0 +1,33 @@
# Context7
> Library and framework documentation lookup.
- **MCP server name:** `context7` (runs locally via npx)
- **Prompt snippet:** [prompts/tools/context7.md](../../prompts/tools/context7.md)
## What It Is
Context7 is a purpose-built MCP server for fetching current library, framework, SDK, API, and CLI documentation. It returns structured, version-aware results — meaningfully better than Argos for "how does this library work" type questions.
## What It's Good For
- API syntax, method signatures, configuration options for libraries
- Framework setup instructions and patterns (Django, React, Next.js, Tailwind, FastAPI, etc.)
- CLI tool usage and flags
- Version migration guides
- Library-specific debugging — "why does this configuration fail"
Use Context7 even for well-known libraries — training data may be stale on recent releases.
## What It's Not Good For
- Refactoring or writing scripts from scratch — Context7 documents, doesn't implement
- General programming concepts — Context7 indexes libraries, not theory
- Code review — use the agent's own judgment, not external docs
- Business logic debugging — Context7 won't know your code
## Known Gotchas
- **Resolve the library ID first.** Context7 typically expects a library identifier; `resolve-library-id` style calls precede `query-docs` calls.
- **Version matters.** When library behavior is version-specific, include the version in the query. The doc index may have multiple versions.
- **Prefer over web search for libraries.** When the question is "how does X library work," Context7 is the right first stop. Argos is the fallback.

28
docs/tools/gitea.md Normal file
View File

@@ -0,0 +1,28 @@
# Gitea
> Self-hosted Git repository management.
- **MCP server name:** `gitea` (runs on `miranda.incus` in the lab; talks to the Gitea instance at `git.helu.ca`)
- **Prompt snippet:** [prompts/tools/gitea.md](../../prompts/tools/gitea.md)
## What It Is
Gitea is the user's self-hosted Git server. The MCP integration lets agents read repos, list issues, work with pull requests, and inspect commits without shelling out to `git`.
## What It's Good For
- Reading code from any koios-org-or-user-owned repo without cloning it locally
- Listing or inspecting issues and pull requests
- Checking commit history, blame, file contents at a specific revision
- Cross-repo lookups when an agent needs context from a repo it isn't sitting inside
## What It's Not Good For
- Code search across many repos at once — Gitea MCP is per-repo; for broad searches use Argos with site-scoped queries
- Heavy edit workflows — for active development, work in a local clone via Kernos; Gitea MCP is mostly read-oriented in practice
- Repos hosted on GitHub — use the GitHub MCP for those
## Known Gotchas
- **Repos are user-scoped, not org-scoped.** Per Robert's convention, repos on `git.helu.ca` are owned by his personal user account, not an org. Default secrets/variables/permissions accordingly.
- **Gitea Actions vars vs. secrets.** When configuring CI, prefer user-scope (not org-scope) on this instance.

28
docs/tools/github.md Normal file
View File

@@ -0,0 +1,28 @@
# GitHub
> GitHub repository access via GitHub Copilot MCP.
- **MCP server name:** `github` (GitHub Copilot MCP, hosted at `api.githubcopilot.com`)
- **Prompt snippet:** [prompts/tools/github.md](../../prompts/tools/github.md)
## What It Is
GitHub MCP gives agents access to repositories on GitHub.com — Robert's own repos, plus public repos when reference is needed. Powered by GitHub's Copilot MCP service.
## What It's Good For
- Reading source from public projects (libraries, frameworks, reference implementations)
- Inspecting issues and PRs on GitHub-hosted repos
- Pulling context from a project Robert has on GitHub specifically (vs. Gitea)
- Cross-checking how an upstream library actually behaves vs. how its docs describe it
## What It's Not Good For
- Repos hosted on `git.helu.ca` — that's Gitea
- Bulk operations or rate-limited heavy workflows — GitHub's API limits apply
- Anything that should be local — use Kernos in a clone for active development
## Known Gotchas
- **Auth scope.** The MCP server's token determines what it can see. Private repos require correct token scope; expect "not found" errors if scope is wrong.
- **Rate limits are real.** Hitting the GitHub API too aggressively will produce 403/429 responses. The MCP layer doesn't magically hide this.

33
docs/tools/grafana.md Normal file
View File

@@ -0,0 +1,33 @@
# Grafana
> Metrics, logs, and dashboards.
- **MCP server name:** `grafana` (Grafana MCP server; talks to the Grafana instance which hosts Prometheus metrics, Loki logs, and dashboards)
- **Prompt snippet:** [prompts/tools/grafana.md](../../prompts/tools/grafana.md)
## What It Is
Grafana is Scotty's observability tool. Through the MCP server, agents can query Prometheus metrics (PromQL), Loki logs (LogQL), and read dashboard configuration — all the things you'd otherwise click through the Grafana web UI to see.
This is the primary tool for **"what changed?"** and **"what's wrong right now?"** Without it, Scotty is guessing from fragments. With it, Scotty can see actual system state across time.
## What It's Good For
- Pulling logs during an incident — service logs, application logs, system logs (Loki)
- Querying metrics — CPU, memory, request rates, error rates, latency percentiles (Prometheus)
- Checking historical state — "how did this look an hour ago, before the deploy?"
- Confirming a fix worked — was the metric actually restored after the intervention?
- Capacity planning conversations — read trends, not guesses
## What It's Not Good For
- Mutating system state — Grafana reads; Kernos acts
- Realtime tail-the-log-and-watch — Grafana is request/response; for live tailing, shell into the host via Kernos and use `journalctl -f`
- Code-level debugging — Grafana shows symptoms; the cause may be in source, where this tool can't help
## Known Gotchas
- **Time ranges matter.** A PromQL query without a sensible time window returns either nothing or the whole history. Always scope.
- **Loki label cardinality.** Some labels have huge cardinality; querying without filters can be expensive and slow. Prefer filtering by service / level / host.
- **Partial-log overconfidence.** Reading a fragment of a log and forming a hypothesis is one of Scotty's documented failure modes. Pull enough context (surrounding lines, related services) before concluding.
- **PromQL is not SQL.** Aggregation operators behave differently. If a query looks weird, sanity-check on a known-good metric first.

34
docs/tools/kernos.md Normal file
View File

@@ -0,0 +1,34 @@
# Kernos
> Terminal interface to hosts — shell execution and file operations.
- **MCP server name:** `korax` (the host that runs the MCP server; e.g., `korax.helu.ca` in prod)
- **Prompt snippet:** [prompts/tools/kernos.md](../../prompts/tools/kernos.md)
## What It Is
Kernos is the workbench. It's how agents run shell commands, inspect files, and operate on hosts. Most engineering work routes through here — Scotty uses it for production operations, Harper uses it for builds and experiments.
The Kernos MCP server itself runs on a host (the codename for the Andromeda-class host is "Kernos"; the actual hostname is environment-dependent — `korax.helu.ca` in production, something else in sandbox/dev). The hostname can matter when an agent needs to talk to it directly, not just through MCP.
## What It's Good For
- Running whitelisted shell commands on a target host
- File inspection (`file_info` for existence, size, permissions before touching)
- Reading config files, log fragments, command output
- Running scripts and one-liners during build and ops work
- Shelling into hosts that aren't the host running the MCP server (when configured)
## What It's Not Good For
- Anything not on the whitelist — `get_shell_config` shows what's allowed
- Long-running interactive sessions — Kernos is request/response, not a persistent shell
- Operations that should be in IaC (Terraform, Ansible) — use those for repeatable provisioning, not Kernos for one-off prod changes
- Anything Argos can do for free (don't use Kernos to `curl` a web page when Argos exists)
## Known Gotchas
- **The `success` boolean matters.** Every Kernos response includes an explicit `success` field. If it's `false`, the command did not run as intended — treat that as the truth, not the surrounding text. This is the root mitigation for the MCP-failure-confabulation pattern noted in agent docs.
- **Whitelist surprises.** A command that "should work" may not be on the whitelist. Run `get_shell_config` first when in doubt.
- **`file_info` before file operations.** Cheaper than failing on a missing path or a permissions issue mid-operation.
- **Hostname targeting.** Kernos can operate on multiple hosts; specifying the wrong target host will silently do the right command on the wrong machine. Verify the target.

46
docs/tools/mnemosyne.md Normal file
View File

@@ -0,0 +1,46 @@
# Mnemosyne
> Multimodal personal knowledge base — text, images, and graph-structured content.
- **MCP server name:** `mnemosyne` (runs in the lab; FastMCP at `/mcp` on its own host)
- **Prompt snippet:** [prompts/tools/mnemosyne.md](../../prompts/tools/mnemosyne.md)
- **Project repo:** `/home/robert/git/mnemosyne` (full README, architecture docs)
## What It Is
Mnemosyne is "the memory of everything you know" — a content-type-aware multimodal knowledge management system built on Neo4j vectors and Qwen3-VL embeddings. Unlike a generic vector store, Mnemosyne knows what *kind* of thing a document is (a novel, a textbook, an album, a journal entry, a business proposal) and adjusts chunking, embedding, and retrieval accordingly.
It is a **retrieval engine**, not a synthesis engine. It returns ranked chunks plus metadata; the calling agent does its own synthesis. Architecturally this is intentional — letting the LLM see chunks and pivot mid-search beats pre-digesting answers server-side.
## What It's Good For
- Searching the user's personal knowledge base across libraries (fiction, nonfiction, technical, music, film, art, journal, business, finance)
- Multimodal queries — find a book cover, an album sleeve, a screenshot, alongside text
- "Did I read something about X" / "what did I write about Y on what date"
- Pulling source material the user has actually curated, rather than guessing from training data
- Following graph relationships (Author → Book → Topic; Artist → Album → Track)
## What It's Not Good For
- General web knowledge — that's Argos
- Anything not already in the KB — Mnemosyne only knows what's been ingested
- Synthesis or "give me the answer" — Mnemosyne returns chunks; the calling agent synthesizes
- Real-time information (status, news) — content is ingested, not live
## MCP Tools Exposed
| Tool | Purpose |
|---|---|
| `search` | Hybrid search (vector + graph + full-text), re-ranked |
| `get_chunk` | Retrieve the full text of a chunk by ID |
| `list_libraries` | What libraries exist (fiction, technical, etc.) |
| `list_collections` | Collections within a library |
| `list_items` | Items within a collection |
| `get_health` | Service health probe |
## Known Gotchas
- **It's retrieval, not answers.** A `search` call returns chunks; the agent then has to read them and form the answer. Don't expect Mnemosyne to "tell you" something.
- **Library type matters.** Searching the *fiction* library for technical content returns nothing useful. Use `list_libraries` first if uncertain.
- **Citations should be preserved.** Mnemosyne returns chunk IDs and source metadata — when synthesizing, cite back to the chunk so the user can verify and trace.
- **Empty results may mean the index isn't ready.** If `setup_neo4j_indexes` hasn't been run for a given environment, vector search returns empty results and the app logs a readiness warning. Surface that, don't silently confabulate.

View File

@@ -0,0 +1,19 @@
## Neo4j Version Compatibility Notes
Neo4j had significant breaking changes between version 4.x and 5.x regarding schema introspection:
**Neo4j 5.x+ (current):**
- Use `SHOW INDEXES` instead of `CALL db.indexes()`
- Use `SHOW CONSTRAINTS` instead of `CALL db.constraints()`
- Use `CALL db.schema.visualization()` for full schema (works in both versions)
**Neo4j 4.x and earlier:**
- Use `CALL db.indexes()`
- Use `CALL db.constraints()`
**Safe queries that work across versions:**
- `CALL db.schema.visualization()` - Full schema visualization
- `CALL db.labels()` - Get all node labels
- `CALL db.relationshipTypes()` - Get all relationship types
When querying indexes or constraints, prefer the `SHOW` commands for Neo4j 5+ environments.

View File

@@ -0,0 +1,75 @@
# Neo4j Knowledge Graph — Engineering Team
You have access to a unified Neo4j knowledge graph shared across fifteen AI assistants (9 personal, 4 work, 2 engineering).
## Principles
1. **Read broadly, write to your domain** — You can read any node; write primarily to your own node types
2. **Always MERGE on `id`** — Check before creating to avoid duplicates
3. **Use consistent IDs** — Format: `{type}_{identifier}_{qualifier}` (e.g., `infra_neo4j_prod`, `proto_mcp_dashboard`)
4. **Always set timestamps**`created_at` on CREATE, `updated_at` on every SET
5. **Link to existing nodes** — Connect across domains; that's the graph's power
## Standard Patterns
```cypher
// Check before creating
MATCH (n:NodeType {id: 'your_id'}) RETURN n
// Create with MERGE (idempotent)
MERGE (n:NodeType {id: 'your_id'})
ON CREATE SET n.created_at = datetime()
SET n.name = 'Name', n.updated_at = datetime()
// Link to existing nodes
MATCH (a:TypeA {id: 'a_id'}), (b:TypeB {id: 'b_id'})
MERGE (a)-[:RELATIONSHIP]->(b)
```
## Engineering Node Ownership
| Assistant | Domain | Owns |
|-----------|--------|------|
| **Scotty** | Infrastructure & Ops | Infrastructure, Incident |
| **Harper** | Prototyping & Hacking | Prototype, Experiment |
### Scotty's Nodes
| Node | Required | Optional |
|------|----------|----------|
| Infrastructure | id, name, type | status, environment, host, version, notes |
| Incident | id, title, severity | status, date, root_cause, resolution, duration |
### Harper's Nodes
| Node | Required | Optional |
|------|----------|----------|
| Prototype | id, name | status, tech_stack, purpose, outcome, notes |
| Experiment | id, title | hypothesis, result, date, learnings, notes |
## Key Relationships
- Infrastructure -[DEPENDS_ON]-> Infrastructure
- Infrastructure -[HOSTS]-> Project | Prototype
- Incident -[AFFECTED]-> Infrastructure
- Incident -[CAUSED_BY]-> Infrastructure
- Prototype -[DEPLOYED_ON]-> Infrastructure
- Prototype -[SUPPORTS]-> Opportunity
- Prototype -[DEMONSTRATES]-> Technology
- Experiment -[LED_TO]-> Prototype
- Experiment -[VALIDATES]-> MarketTrend
- Prototype -[AUTOMATES]-> Habit | Task
## Cross-Team Reads
- **Work team:** Projects (infrastructure requirements), Opportunities (demo needs), Client SLAs
- **Personal team:** Habits (automation candidates), Goals (tooling support)
- **Universal nodes:** Person, Location, Event, Topic, Goal (shared by all)
## Scotty ↔ Harper Handoff
Harper builds and deploys; Scotty operates production and provisions resources. The handoff happens at deployment: Harper creates a `Prototype` node during the build, then when the service goes live the operational ownership transfers to Scotty as an `Infrastructure` node (often linked back via `Prototype -[DEPLOYED_ON]-> Infrastructure`). Use the messaging system to coordinate. See `docs/engineering/team.md` for the full responsibility matrix.
## Full Schema Reference
See `docs/tools/neo4j/unified-schema.md` for complete node definitions, all fields, and relationship types.

View File

@@ -0,0 +1,52 @@
# Neo4j Knowledge Graph — Personal Team
You have access to a unified Neo4j knowledge graph shared across fifteen AI assistants (9 personal, 4 work, 2 engineering).
## Principles
1. **Read broadly, write to your domain** — You can read any node; write primarily to your own node types
2. **Always MERGE on `id`** — Check before creating to avoid duplicates
3. **Use consistent IDs** — Format: `{type}_{identifier}_{qualifier}` (e.g., `trip_costarica_2025`, `recipe_carbonara_classic`)
4. **Always set timestamps**`created_at` on CREATE, `updated_at` on every SET
5. **Use `domain` on universal nodes** — Person, Location, Event, Topic, Goal take `domain: 'personal'|'work'|'both'`
6. **Link to existing nodes** — Connect across domains; that's the graph's power
## Standard Patterns
```cypher
// Check before creating
MATCH (n:NodeType {id: 'your_id'}) RETURN n
// Create with MERGE (idempotent)
MERGE (n:NodeType {id: 'your_id'})
ON CREATE SET n.created_at = datetime()
SET n.name = 'Name', n.updated_at = datetime()
// Link to existing nodes
MATCH (a:TypeA {id: 'a_id'}), (b:TypeB {id: 'b_id'})
MERGE (a)-[:RELATIONSHIP]->(b)
```
## Your Team's Node Ownership
| Assistant | Domain | Owns |
|-----------|--------|------|
| **Nate** | Travel & Adventure | Trip, Destination, Activity |
| **Hypatia** | Learning & Reading | Book, Author, LearningPath, Concept, Quote |
| **Marcus** | Fitness & Training | Training, Exercise, Program, PersonalRecord, BodyMetric |
| **Seneca** | Reflection & Wellness | Reflection, Value, Habit, LifeEvent, Intention |
| **Bourdain** | Food & Cooking | Recipe, Restaurant, Ingredient, Meal, Technique |
| **Bowie** | Arts & Culture | Music, Film, Artwork, Playlist, Artist, Style |
| **Cousteau** | Nature & Living Things | Species, Plant, Tank, Garden, Ecosystem, Observation |
| **Garth** | Personal Finance | Account, Investment, Asset, Liability, Budget, FinancialGoal |
| **Cristiano** | Football | Match, Team, League, Tournament, Player, Season |
## Cross-Team Reads
- **Work team:** Skills, Projects, Clients (for context on professional life)
- **Engineering:** Infrastructure status, Prototypes (for automation ideas)
- **Universal nodes:** Person, Location, Event, Topic, Goal (shared by all)
## Full Schema Reference
See `docs/tools/neo4j/unified-schema.md` for complete node definitions, all fields, and relationship types.

149
docs/tools/neo4j/shared.md Normal file
View File

@@ -0,0 +1,149 @@
# Shared Tools & Infrastructure
## User
You are assisting **Robert Helewka**. Address him as Robert. His node in the Neo4j knowledge graph is `Person {id: "user_main", name: "Robert"}`.
## Your Toolbox (MCP Servers)
MCP tool discovery tells you what each tool does at runtime. This table gives you the operational context that tool descriptions don't:
| Server | Purpose | Location |
|--------|---------|----------|
| **korax** | Shell execution + file operations (Kernos) — primary workbench | korax.helu.ca |
| **neo4j** | Knowledge graph (Cypher queries) | ariel.incus |
| **gitea** | Git repository management | miranda.incus |
| **argos** | Web search + webpage fetching | miranda.incus |
| **rommie** | Computer automation (Agent S, MATE desktop) | caliban.incus |
| **github** | GitHub Copilot MCP | api.githubcopilot.com |
| **context7** | Library/framework documentation lookup | local (npx) |
| **time** | Current time and timezone | local |
**Korax is your workbench.** For shell commands and file operations, use Korax (Kernos MCP). Call `get_shell_config` first to see what commands are whitelisted.
Use the `time` server to check the current date when temporal context matters.
> **Note:** Not every assistant has every server. Your available servers are listed in your FastAgent config.
## Agathos Sandbox
You work within Agathos — a set of Incus containers (LXC) on a 10.10.0.0/24 network, named after moons of Uranus. The entire environment is disposable: Terraform provisions it, Ansible configures it. It can be rebuilt trivially.
Key hosts: ariel (Neo4j), miranda (MCP servers), oberon (Docker/SearXNG), portia (PostgreSQL), prospero (monitoring), puck (apps), sycorax (LLM proxy), caliban (agent automation), titania (HAProxy/SSO).
## Inter-Assistant Graph Messaging
Other assistants may leave you messages as `Note` nodes in the Neo4j knowledge
graph. Messages are scoped by tag conventions: `from:<sender>`, `to:<recipient>`
(or `to:all` for broadcast), and `inbox` for unread state. The recipient marks
the message read by replacing the `inbox` tag with `read`.
This protocol applies to every assistant on every team — Personal (Iolaus),
Work (Mentor), Engineering (Kottos). The shape is identical; only the
`from:`/`to:` tag values change per agent.
### When to read your inbox
Read on demand only. Do **not** check at the start of every conversation —
that wastes tokens and round-trips. Read when:
- The user explicitly asks you to check.
- A scheduler (Daedalus) invokes the inbox-check prompt against you. See
[mentor/docs/inbox_check_prompt.md](../../mentor/docs/inbox_check_prompt.md)
for the canonical scheduler prompt.
- You're picking up cross-domain work and want context from other agents.
### Reading your inbox
Call `read_neo4j_cypher` (substitute your own agent name for `<self>`):
```cypher
MATCH (n:Note)
WHERE n.type = 'assistant_message'
AND ANY(tag IN n.tags WHERE tag IN ['to:<self>', 'to:all'])
AND ANY(tag IN n.tags WHERE tag = 'inbox')
RETURN n.id AS id, n.title AS title, n.content AS content,
n.action_required AS action_required, n.tags AS tags,
n.created_at AS sent_at
ORDER BY n.created_at DESC
```
If messages were returned, mark them all read with a single write
(substituting the actual IDs into `$ids`):
```cypher
MATCH (n:Note)
WHERE n.id IN $ids
SET n.tags = [tag IN n.tags WHERE tag <> 'inbox'] + ['read'],
n.updated_at = datetime()
```
If no messages were returned, skip the write entirely.
Acknowledge messages naturally in conversation. If `action_required: true`,
prioritize addressing the request.
### Sending messages to other assistants
Call `write_neo4j_cypher` with this exact parameterized query (no string
interpolation in the query body — all values come from `params`):
```cypher
MERGE (n:Note {id: $id})
ON CREATE SET n.created_at = datetime()
SET n.title = $title,
n.date = date(),
n.type = 'assistant_message',
n.content = $content,
n.action_required = $action_required,
n.tags = ['from:<self>', $to_tag, 'inbox'],
n.updated_at = datetime()
```
`<self>` is your own agent name (a constant in the query body — `'from:harper'`,
`'from:bourdain'`, etc.). Everything else flows through `params`.
Example `params` (Harper sending Scotty a handoff):
```json
{
"id": "note_2026-05-17_harper_scotty_prod_hardening",
"title": "Prototype ready for production hardening",
"content": "The slack-neo4j bridge is stable. Need your eyes on TLS, systemd, secrets.",
"action_required": true,
"to_tag": "to:scotty"
}
```
Conventions:
- **id** — `note_<YYYY-MM-DD>_<sender>_<recipient>_<short_snake_slug>`. Check
the Time tool for today's date.
- **to_tag** — `to:<recipient>` for a directed message, `to:all` to broadcast.
- **action_required** — `true` when a response is expected, `false` for FYI.
- **Never** use `{placeholder}` syntax in the query body — local models
(Qwen3.5-35B) mishandle it. Pass literal values through `params`.
### Why tag-based `from:` / `to:` (not a `from` property)
The protocol uses tags for both directions (`'from:alan'` AND `'to:jeffrey'`
both live in `n.tags`). This is simpler than splitting into a `from` property
plus a `to:` tag — the local model only has to emit one consistent list,
inbox queries filter on the same array, and there's no second source of truth
to keep in sync.
### Assistant Directory
| Team | Assistants |
|------|-----------|
| **Personal** | shawn, nate, hypatia, marcus, watson, bourdain, david, cousteau, garth, cristiano |
| **Work** | alan, ann, jeffrey, jarvis, aws_sa |
| **Engineering** | scotty, harper |
Watson replaces Seneca (as of 2026-04-28); David replaces Bowie; Shawn is the
personal general assistant (calendar/contacts/email). AWS SA is the work-team
cloud-architecture specialist.
## Graph Error Handling
If a graph query fails, continue the conversation. Mention it briefly and move on. Never expose raw Cypher errors to the user.

File diff suppressed because it is too large Load Diff

301
docs/tools/neo4j/utils.md Normal file
View File

@@ -0,0 +1,301 @@
# Neo4j Utility Scripts
> Documentation for the database management scripts in `utils/`
---
## Scripts Overview
| Script | Purpose | Destructive? |
|--------|---------|:------------:|
| `neo4j-schema-init.py` | Create constraints, indexes, and sample data | No (idempotent) |
| `neo4j-reset.py` | Wipe all data, constraints, and indexes | **Yes** |
| `neo4j-validate.py` | Comprehensive validation report | No (read-only) |
---
## neo4j-schema-init.py
Creates the foundational schema for the unified knowledge graph: 74 uniqueness constraints, ~94 performance indexes, and 12 sample nodes with 5 cross-domain relationships.
### Usage
```bash
# Interactive — prompts for URI, user, password
python utils/neo4j-schema-init.py
# Specify URI (will prompt for user/password)
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687
# Skip sample data creation
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --skip-samples
# Test-only mode (no schema changes)
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --test-only
# Quiet mode
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --quiet
```
### What It Creates
1. **74 uniqueness constraints** — one per node type, on the `id` property
2. **~94 performance indexes** — on name/title, date, type/status/category, and domain fields
3. **12 sample nodes** — spanning all three teams (Personal, Work, Engineering)
4. **5 sample relationships** — demonstrating cross-domain connections
### Idempotent
Safe to run multiple times. Uses `IF NOT EXISTS` for constraints/indexes and `MERGE` for sample data.
---
## neo4j-reset.py
Wipes the database clean. Drops all constraints, indexes, nodes, and relationships.
### Usage
```bash
# Interactive — will prompt for confirmation
python utils/neo4j-reset.py --uri bolt://ariel.incus:7687
# Skip confirmation prompt
python utils/neo4j-reset.py --uri bolt://ariel.incus:7687 --force
```
### What It Does
1. Reports current database contents (node/relationship/constraint/index counts)
2. Drops all constraints
3. Drops all non-lookup indexes
4. Deletes all nodes and relationships (batched for large databases)
5. Verifies the database is clean
### Safety
- Requires typing `yes` to confirm (unless `--force`)
- Shows before/after counts so you know exactly what was removed
---
## neo4j-validate.py
Generates a comprehensive validation report. Share the output to verify the graph is correctly built.
### Usage
```bash
python utils/neo4j-validate.py --uri bolt://ariel.incus:7687
```
### What It Checks
| Section | What's Validated |
|---------|-----------------|
| **Connection** | Database reachable, APOC plugin available |
| **Constraints** | All 74 uniqueness constraints present, no extras |
| **Indexes** | Total count, spot-check of 11 key indexes |
| **Node Labels** | No unexpected labels (detects junk from Memory server, etc.) |
| **Sample Nodes** | All 12 sample nodes exist with correct properties |
| **Sample Relationships** | All 5 cross-domain relationships exist |
| **Relationship Summary** | Total count and breakdown by type |
| **Node Summary** | Total count and breakdown by label |
### Expected Clean Output
```
═════════════════════════════════════════════════════════════════
VALIDATION REPORT — Koios Unified Knowledge Graph
═════════════════════════════════════════════════════════════════
Schema Version: 2.1.0
...
RESULT: ALL 23 CHECKS PASSED ✓
═════════════════════════════════════════════════════════════════
```
---
## Standard Workflow
### Fresh Setup / Clean Slate
```bash
# 1. Wipe everything
python utils/neo4j-reset.py --uri bolt://ariel.incus:7687
# 2. Build schema and sample data
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687
# 3. Validate
python utils/neo4j-validate.py --uri bolt://ariel.incus:7687
```
### Routine Validation
```bash
python utils/neo4j-validate.py --uri bolt://ariel.incus:7687
```
### Environment Variables
All three scripts support environment variables to avoid repeated prompts:
```bash
export NEO4J_URI="bolt://ariel.incus:7687"
export NEO4J_USER="neo4j"
export NEO4J_PASSWORD="your-password"
# Then just:
python utils/neo4j-reset.py --force
python utils/neo4j-schema-init.py --skip-docs
python utils/neo4j-validate.py
```
---
## Neo4j Python Driver — Lessons Learned
These patterns were discovered during development and are critical for anyone writing Cypher through the Neo4j Python driver (v5.x / v6.x).
### 1. Use Explicit Transactions for Writes
**Problem:** `session.run()` uses auto-commit transactions that don't reliably commit writes in the Neo4j Python driver 5.x+. Results must be fully consumed or the transaction may not commit.
**Bad — silently fails to persist:**
```python
with driver.session() as session:
session.run("CREATE (n:Person {id: 'test'})")
# Transaction may not commit!
```
**Good — explicit transaction with context manager:**
```python
with driver.session() as session:
with session.begin_transaction() as tx:
tx.run("CREATE (n:Person {id: 'test'})")
# Auto-commits when context exits normally
# Auto-rolls back on exception
```
**Also good — managed write transaction:**
```python
def create_person_tx(tx, name):
result = tx.run("CREATE (a:Person {name: $name}) RETURN a.id AS id", name=name)
record = result.single()
return record["id"]
with driver.session() as session:
node_id = session.execute_write(create_person_tx, "Alice")
```
### 2. Cypher MERGE Clause Ordering
**Problem:** `ON CREATE SET` must come immediately after `MERGE`, before any general `SET` clause. Placing `SET` before `ON CREATE SET` causes a syntax error.
**Bad — syntax error:**
```cypher
MERGE (p:Person {id: 'user_main'})
SET p.name = 'Main User',
p.updated_at = datetime()
ON CREATE SET p.created_at = datetime() -- ERROR: Invalid input 'ON'
```
**Good — correct clause order:**
```cypher
MERGE (p:Person {id: 'user_main'})
ON CREATE SET p.created_at = datetime()
SET p.name = 'Main User',
p.updated_at = datetime()
```
The full MERGE clause order is:
```
MERGE (pattern)
ON CREATE SET ... ← only runs when node is first created
ON MATCH SET ... ← only runs when node already exists (optional)
SET ... ← always runs
```
### 3. Consume Results in Transactions
**Problem:** In managed transactions (`execute_write`), results must be consumed within the transaction function. Unconsumed results can cause issues.
**Good pattern:**
```python
def create_node_tx(tx, node_id):
result = tx.run("MERGE (n:Person {id: $id}) RETURN n.id AS id", id=node_id)
record = result.single() # Consumes the result
return record["id"]
```
### 4. MATCH Returns No Rows ≠ Error
**Problem:** If a `MATCH` clause finds nothing, the query succeeds with zero rows — it does **not** raise an error. This means `MERGE` on a relationship after a failed `MATCH` silently does nothing.
```cypher
-- If person_xyz doesn't exist, this returns 0 rows (no error)
MATCH (p:Person {id: 'person_xyz'})
MATCH (b:Book {id: 'book_abc'})
MERGE (p)-[:COMPLETED]->(b)
-- Zero rows processed, zero relationships created, zero errors
```
**Mitigation:** Always check `result.single()` for `None` to detect this case:
```python
record = result.single()
if record is None:
logger.error("Endpoints not found — no relationship created")
```
### 5. Separate Node and Relationship Transactions
**Problem:** Creating nodes and then matching them for relationships in the same auto-commit transaction can fail because the nodes aren't visible yet within the same transaction scope.
**Good pattern:** Create all nodes in one explicit transaction (commit), then create relationships in a separate explicit transaction:
```python
# Transaction 1: Create nodes
with session.begin_transaction() as tx:
for query in node_queries:
tx.run(query)
# Auto-commits on exit
# Transaction 2: Create relationships (nodes now visible)
with session.begin_transaction() as tx:
for query in relationship_queries:
tx.run(query)
# Auto-commits on exit
```
### 6. MCP Memory Server vs Neo4j Cypher Server
**Problem:** The MCP Memory server (`@modelcontextprotocol/server-memory`) and Neo4j Cypher MCP server can both connect to the same Neo4j instance, but they use completely different data models.
| | Memory Server | Cypher Server |
|---|---|---|
| **Schema** | Fixed: `name`, `type`, `observations` | Your full custom schema |
| **Node labels** | `Memory`, `reference` | Your 74 defined types |
| **Relationships** | Simple string pairs | Rich typed relationships |
| **Query language** | API calls (`search_nodes`) | Full Cypher |
**Resolution:** If you have a custom Neo4j schema, use **only** the Cypher MCP server. Remove the Memory server to prevent it from polluting your graph with its own primitive node types.
---
## Dependencies
```
pip install neo4j
```
All three scripts require the `neo4j` Python package. APOC is optional but recommended (the init script's test suite checks for it).
---
## Version History
| Date | Change |
|------|--------|
| 2025-01-07 | Initial `neo4j-schema-init.py` |
| 2026-02-17 | Added `neo4j-reset.py` and `neo4j-validate.py` |
| 2026-02-17 | Fixed init script: explicit transactions, correct MERGE clause ordering |

57
docs/tools/neo4j/work.md Normal file
View File

@@ -0,0 +1,57 @@
# Neo4j Knowledge Graph — Work Team
You have access to a unified Neo4j knowledge graph shared across fifteen AI assistants (9 personal, 4 work, 2 engineering).
## Principles
1. **Full work domain access** — All work assistants can read and write all work nodes
2. **Always MERGE on `id`** — Check before creating to avoid duplicates
3. **Use consistent IDs** — Format: `{type}_{identifier}_{qualifier}` (e.g., `client_acme_corp`, `opp_acme_cx_2025`)
4. **Always set timestamps**`created_at` on CREATE, `updated_at` on every SET
5. **Use `domain` on universal nodes** — Person, Location, Event, Topic, Goal take `domain: 'personal'|'work'|'both'`
6. **Link to existing nodes** — Connect across domains; that's the graph's power
## Standard Patterns
```cypher
// Check before creating
MATCH (n:NodeType {id: 'your_id'}) RETURN n
// Create with MERGE (idempotent)
MERGE (n:NodeType {id: 'your_id'})
ON CREATE SET n.created_at = datetime()
SET n.name = 'Name', n.updated_at = datetime()
// Link to existing nodes
MATCH (a:TypeA {id: 'a_id'}), (b:TypeB {id: 'b_id'})
MERGE (a)-[:RELATIONSHIP]->(b)
```
## Work Node Types
| Category | Nodes |
|----------|-------|
| **Business** | Client, Contact, Opportunity, Proposal, Project |
| **Market Intelligence** | Vendor, Competitor, MarketTrend, Technology |
| **Content & Visibility** | Content, Publication |
| **Professional Development** | Skill, Certification, Relationship |
| **Daily Operations** | Task, Meeting, Note, Decision |
## Assistant Focus Areas
| Assistant | Primary Focus | Key Nodes |
|-----------|--------------|-----------|
| **Alan** | Strategy & Business Model | Client, Vendor, Competitor, MarketTrend, Decision |
| **Ann** | Marketing & Visibility | Content, Publication, Topic |
| **Jeffrey** | Proposals & Sales | Opportunity, Proposal, Contact |
| **Jarvis** | Daily Execution | Task, Meeting, Note |
## Cross-Team Reads
- **Personal team:** Books (for skill development), Trips (for client travel), Goals (for career alignment)
- **Engineering:** Infrastructure (hosting projects), Prototypes (for client demos)
- **Universal nodes:** Person, Location, Event, Topic, Goal (shared by all)
## Full Schema Reference
See `docs/tools/neo4j/unified-schema.md` for complete node definitions, all fields, and relationship types.

33
docs/tools/rommie.md Normal file
View File

@@ -0,0 +1,33 @@
# Rommie
> Autonomous desktop automation — drives a MATE desktop via Agent S.
- **MCP server name:** `rommie` (runs on `caliban.incus`)
- **Prompt snippet:** [prompts/tools/rommie.md](../../prompts/tools/rommie.md)
## What It Is
Rommie is the agent that operates a desktop. Powered by Agent S (a vision-based desktop automation framework), Rommie sees and drives a MATE desktop environment — clicking, typing, navigating GUI applications that have no API. Named after Andromeda's ship-mind avatar, who could project into physical space when needed.
Other agents delegate to Rommie when GUI interaction is unavoidable. The conversation pattern is: send Rommie a natural-language task, wait, verify with a screenshot.
## What It's Good For
- Using a website or app that only works through a browser GUI
- Driving software that has no API or CLI
- "Check the latest headlines on Google" style high-level web interactions
- Generating screenshots of GUI state for verification
- Anything where "just look at the screen" is the only way to know what happened
## What It's Not Good For
- Anything achievable through a shell or API — Kernos and Argos are faster, more deterministic, and don't tie up Rommie's single session
- Bulk operations — Rommie is one desktop, one task at a time
- High-precision pixel work — Agent S is vision-based and works at semantic UI level, not at exact-pixel level
## Known Gotchas
- **One task at a time.** If Rommie is busy, wait — don't fire a second task. Subsequent requests will queue or fail.
- **Verify with `get_screenshot`.** Don't assume Rommie completed the task; ask for a screenshot and look. This is especially important because Rommie's confidence about completion can outrun reality.
- **Give natural-language tasks, not click coordinates.** Agent S decides where to click; the calling agent describes the goal.
- **The desktop is real, the actions are real.** Rommie can buy things, send messages, modify files. Treat its tool calls like Kernos calls — with confirmation for anything irreversible.

26
docs/tools/time.md Normal file
View File

@@ -0,0 +1,26 @@
# Time
> Current time and timezone.
- **MCP server name:** `time` (runs locally)
- **Prompt snippet:** [prompts/tools/time.md](../../prompts/tools/time.md)
## What It Is
A tiny tool that does one thing: tell the agent what time it is, in a given timezone. Trivial in description, essential in practice — LLMs don't know the current date, and conversations can span days or months.
## What It's Good For
- Checking today's date before timestamping anything (graph nodes, file names, messages)
- Building IDs that include a date component (`note_2026-05-20_…`)
- Reasoning about "recent" vs "old" in any context where the answer depends on now
- Timezone conversions when scheduling or interpreting log timestamps
## What It's Not Good For
- Anything that isn't time. It's a single-purpose tool.
## Known Gotchas
- **Don't assume the date.** Always check before using a date in something that gets stored — node IDs, message slugs, file names, journal entries. The agent's training cutoff is not "now."
- **Timezone defaults vary.** Specify the timezone explicitly when it matters (UTC for logs, local time for user-facing).