- Updated user addition tasks across multiple playbooks (mcp_switchboard, mcpo, neo4j, neo4j_mcp, openwebui, postgresql, rabbitmq, searxng, smtp4dev) to replace references to ansible_user and remote_user with keeper_user. - Modified PostgreSQL deployment to create directories and manage files under keeper_user's home. - Enhanced documentation to clarify account taxonomy and usage of keeper_user in playbooks. - Introduced new deployment for Agent S, including environment setup, desktop environment installation, XRDP configuration, and accessibility support. - Added staging playbook for preparing release tarballs from local repositories. - Created templates for XRDP configuration and environment activation scripts. - Removed obsolete sunwait documentation.
22 KiB
Ansible Project Structure - Best Practices
This document describes the clean, maintainable Ansible structure implemented in the Agathos project. Use this as a reference template for other Ansible projects.
Overview
This structure emphasizes:
- Simplicity: Minimal files at root level
- Organization: Services contain all related files (playbooks + templates)
- Separation: Variables live in dedicated files, not inline in inventory
- Discoverability: Clear naming and logical grouping
Directory Structure
ansible/
├── ansible.cfg # Ansible configuration
├── .vault_pass # Vault password file
│
├── site.yml # Master orchestration playbook
├── apt_update.yml # Utility: Update all hosts
├── sandbox_up.yml # Utility: Start infrastructure
├── sandbox_down.yml # Utility: Stop infrastructure
│
├── inventory/ # Inventory organization
│ ├── hosts # Simple host/group membership
│ │
│ ├── group_vars/ # Variables for groups
│ │ └── all/
│ │ ├── vars.yml # Common variables
│ │ └── vault.yml # Encrypted secrets
│ │
│ └── host_vars/ # Variables per host
│ ├── hostname1.yml # All vars for hostname1
│ ├── hostname2.yml # All vars for hostname2
│ └── ...
│
└── service_name/ # Per-service directories
├── deploy.yml # Main deployment playbook
├── stage.yml # Staging playbook (if needed)
├── template1.j2 # Jinja2 templates
├── template2.j2
└── files/ # Static files (if needed)
Key Components
1. Simplified Inventory (inventory/hosts)
Purpose: Define ONLY host/group membership, no variables
Example:
---
# Ansible Inventory - Simplified
# Main infrastructure group
ubuntu:
hosts:
server1.example.com:
server2.example.com:
server3.example.com:
# Service-specific groups
web_servers:
hosts:
server1.example.com:
database_servers:
hosts:
server2.example.com:
Before: 361 lines with variables inline
After: 34 lines of pure structure
2. Host Variables (inventory/host_vars/)
Purpose: All configuration specific to a single host
File naming: {hostname}.yml (matches inventory hostname exactly)
Example (inventory/host_vars/server1.example.com.yml):
---
# Server1 Configuration - Web Server
# Services: nginx, php-fpm, redis
services:
- nginx
- php
- redis
# Nginx Configuration
nginx_user: www-data
nginx_worker_processes: auto
nginx_port: 80
nginx_ssl_port: 443
# PHP-FPM Configuration
php_version: 8.2
php_max_children: 50
# Redis Configuration
redis_port: 6379
redis_password: "{{vault_redis_password}}"
3. Group Variables (inventory/group_vars/)
Purpose: Variables shared across multiple hosts
Structure:
group_vars/
├── all/ # Variables for ALL hosts
│ ├── vars.yml # Common non-sensitive config
│ └── vault.yml # Encrypted secrets (ansible-vault)
│
└── web_servers/ # Variables for web_servers group
└── vars.yml
Example (inventory/group_vars/all/vars.yml):
---
# Common Variables for All Hosts
remote_user: ansible
deployment_environment: production
ansible_python_interpreter: /usr/bin/python3
# Release versions
app_release: v1.2.3
api_release: v2.0.1
# Monitoring endpoints
prometheus_url: http://monitoring.example.com:9090
loki_url: http://monitoring.example.com:3100
4. Service Directories
Purpose: Group all files related to a service deployment
Pattern: {service_name}/
Contents:
deploy.yml- Main deployment playbookstage.yml- Staging/update playbook (optional)*.j2- Jinja2 templatesfiles/- Static files (if needed)tasks/- Task files (if splitting large playbooks)
Example Structure:
nginx/
├── deploy.yml # Deployment playbook
├── nginx.conf.j2 # Main config template
├── site.conf.j2 # Virtual host template
├── nginx.service.j2 # Systemd service file
└── files/
└── ssl_params.conf # Static SSL configuration
5. Master Playbook (site.yml)
Purpose: Orchestrate full-stack deployment
Pattern: Import service playbooks in dependency order
Example:
---
- name: Update All Hosts
import_playbook: apt_update.yml
- name: Deploy Docker
import_playbook: docker/deploy.yml
- name: Deploy PostgreSQL
import_playbook: postgresql/deploy.yml
- name: Deploy Application
import_playbook: myapp/deploy.yml
- name: Deploy Monitoring
import_playbook: prometheus/deploy.yml
6. Service Playbook Pattern
Location: {service}/deploy.yml
Standard Structure:
---
- name: Deploy Service Name
hosts: target_group
tasks:
# Service detection (if using services list)
- name: Check if host has service_name service
ansible.builtin.set_fact:
has_service: "{{ 'service_name' in services | default([]) }}"
- name: Skip hosts without service
ansible.builtin.meta: end_host
when: not has_service
# Actual deployment tasks
- name: Create service user
become: true
ansible.builtin.user:
name: "{{service_user}}"
group: "{{service_group}}"
system: true
- name: Template configuration
become: true
ansible.builtin.template:
src: config.j2
dest: "{{service_directory}}/config.yml"
notify: restart service
# Handlers
handlers:
- name: restart service
become: true
ansible.builtin.systemd:
name: service_name
state: restarted
daemon_reload: true
IMPORTANT: Template Path Convention
- When playbooks are inside service directories, template
src:paths are relative to that directory - Use
src: config.j2NOTsrc: service_name/config.j2 - The service directory prefix was correct when playbooks were at the ansible root, but is wrong now
Host-Specific Templates Some services need different configuration per host. Store these in subdirectories named by hostname:
service_name/
├── deploy.yml
├── config.j2 # Default template
├── hostname1/ # Host-specific overrides
│ └── config.j2
├── hostname2/
│ └── config.j2
└── hostname3/
└── config.j2
Use conditional logic to select the correct template:
- name: Check for host-specific configuration
ansible.builtin.stat:
path: "{{playbook_dir}}/{{inventory_hostname_short}}/config.j2"
delegate_to: localhost
register: host_specific_config
become: false
- name: Template host-specific configuration
become: true
ansible.builtin.template:
src: "{{playbook_dir}}/{{inventory_hostname_short}}/config.j2"
dest: "{{service_directory}}/config"
when: host_specific_config.stat.exists
- name: Template default configuration
become: true
ansible.builtin.template:
src: config.j2
dest: "{{service_directory}}/config"
when: not host_specific_config.stat.exists
Real Example: Alloy Service
alloy/
├── deploy.yml
├── config.alloy.j2 # Default configuration
├── ariel/ # Neo4j monitoring
│ └── config.alloy.j2
├── miranda/ # Docker monitoring
│ └── config.alloy.j2
├── oberon/ # Web services monitoring
│ └── config.alloy.j2
└── puck/ # Application monitoring
└── config.alloy.j2
Service Detection Pattern
Purpose: Allow hosts to selectively run service playbooks
How it works:
- Each host defines a
services:list inhost_vars/ - Each playbook checks if its service is in the list
- Playbook skips host if service not needed
Example:
inventory/host_vars/server1.yml:
services:
- docker
- nginx
- redis
nginx/deploy.yml:
- name: Deploy Nginx
hosts: ubuntu
tasks:
- name: Check if host has nginx service
ansible.builtin.set_fact:
has_nginx: "{{ 'nginx' in services | default([]) }}"
- name: Skip hosts without nginx
ansible.builtin.meta: end_host
when: not has_nginx
# Rest of tasks only run if nginx in services list
Ansible Vault Integration
Setup:
# Create vault password file (one-time)
echo "your_vault_password" > .vault_pass
chmod 600 .vault_pass
# Configure ansible.cfg
echo "vault_password_file = .vault_pass" >> ansible.cfg
Usage:
# Edit vault file
ansible-vault edit inventory/group_vars/all/vault.yml
# View vault file
ansible-vault view inventory/group_vars/all/vault.yml
# Encrypt new file
ansible-vault encrypt secrets.yml
Variable naming convention:
- Prefix vault variables with
vault_ - Reference in regular vars:
db_password: "{{vault_db_password}}"
Running Playbooks
Full deployment:
ansible-playbook site.yml
Single service:
ansible-playbook nginx/deploy.yml
Specific hosts:
ansible-playbook nginx/deploy.yml --limit server1.example.com
Check mode (dry-run):
ansible-playbook site.yml --check
With extra verbosity:
ansible-playbook nginx/deploy.yml -vv
Benefits of This Structure
1. Cleaner Root Directory
- Before: 29+ playbook files cluttering root
- After: 3-4 utility playbooks + site.yml
2. Simplified Inventory
- Before: 361 lines with inline variables
- After: 34 lines of pure structure
- Variables organized logically by host/group
3. Service Cohesion
- Everything related to a service in one place
- Easy to find templates when editing playbooks
- Natural grouping for git operations
4. Scalability
- Easy to add new services (create directory, add playbook)
- Easy to add new hosts (create host_vars file)
- No risk of playbook name conflicts
5. Reusability
- Service directories can be copied to other projects
- host_vars pattern works for any inventory size
- Clear separation of concerns
6. Maintainability
- Changes isolated to service directories
- Inventory file rarely needs editing
- Clear audit trail in git (changes per service)
Migration Checklist
Moving an existing Ansible project to this structure:
- Create service directories for each playbook
- Move
{service}_deploy.yml→{service}/deploy.yml - Move templates into service directories
- Extract host variables from inventory to
host_vars/ - Extract group variables to
group_vars/all/vars.yml - Move secrets to
group_vars/all/vault.yml(encrypted) - Update
site.ymlimport_playbook paths - Backup original inventory:
cp hosts hosts.backup - Create simplified inventory with only group/host structure
- Test with
ansible-playbook site.yml --check - Verify with limited deployment:
--limit test_host
Example: Adding a New Service
1. Create service directory:
mkdir ansible/myapp
2. Create deployment playbook (ansible/myapp/deploy.yml):
---
- name: Deploy MyApp
hosts: ubuntu
tasks:
- name: Check if host has myapp service
ansible.builtin.set_fact:
has_myapp: "{{ 'myapp' in services | default([]) }}"
- name: Skip hosts without myapp
ansible.builtin.meta: end_host
when: not has_myapp
- name: Deploy myapp
# ... deployment tasks
3. Create template (ansible/myapp/config.yml.j2):
app_name: MyApp
port: {{myapp_port}}
database: {{myapp_db_host}}
4. Add variables to host (inventory/host_vars/server1.yml):
services:
- myapp # Add to services list
# MyApp configuration
myapp_port: 8080
myapp_db_host: db.example.com
5. Add to site.yml:
- name: Deploy MyApp
import_playbook: myapp/deploy.yml
6. Deploy:
ansible-playbook myapp/deploy.yml
Best Practices
Naming Conventions
- Service directories: lowercase, underscores (e.g.,
mcp_switchboard/) - Playbooks:
deploy.yml,stage.yml,remove.yml - Templates: descriptive name +
.j2extension - Variables: service prefix (e.g.,
nginx_port,redis_password) - Vault variables:
vault_prefix
File Organization
- Keep playbooks under 100 lines (split into task files if larger)
- Group related templates in service directory
- Use comments to document non-obvious variables
- Add README.md to complex service directories
Variable Organization
- Host-specific:
host_vars/{hostname}.yml - Service-specific across hosts:
group_vars/{service_group}/vars.yml - Global configuration:
group_vars/all/vars.yml - Secrets:
group_vars/all/vault.yml(encrypted)
Idempotency
- Use
creates:parameter for one-time operations - Use
state:explicitly (present/absent/restarted) - Check conditions before destructive operations
- Test with
--checkmode before applying
Documentation
- Comment complex task logic
- Document required variables in playbook header
- Add README.md for service directories with many files
- Keep docs/ separate from ansible/ directory
Related Documentation
Account Taxonomy
Standardized account roles used across Ansible and Terraform. This taxonomy eliminates confusion between Ansible reserved connection keywords (remote_user in ansible.cfg) and infrastructure-managed account variables in playbooks.
| Role | Variable | Example | Home | Sudo | Purpose |
|---|---|---|---|---|---|
| user | (login name) | robert:1000 | /home/robert | varies | Human user account |
| service_user | {service}_user |
arke:500 | /srv/arke | no | Service daemon account |
| keeper_user | keeper_user |
ponos:519 | /srv/ponos | yes | Ansible/Terraform management (sudo) |
| watcher_user | watcher_user |
poros:520 | — | no | Non-sudo observation account |
| principal_user | principal_user |
robert:1000 | /home/robert | varies | AI agent collaborative account |
Key Rules
keeper_userreplaces all uses of{{ ansible_user }}and{{ remote_user }}as Jinja2 variables in playbooksansible.cfgretainsremote_user = ponosas the SSH connection keyword (Ansible built-in) — this is not a Jinja2 variableservice_useraccounts live in/srv/{service}— if currently in/home, they migrate on next re-provisionwatcher_useris provisioned by Ansible playbook when needed (not via cloud-init)principal_useris for AI agent hosts where the agent operates on behalf of a human user; define inhost_vars/{hostname}.yml- Do not use
vault_prefix for any of these — that prefix is reserved for Ansible Vault variables
Variable Definitions
All taxonomy variables are defined in inventory/group_vars/all/vars.yml:
# Account Taxonomy
keeper_user: ponos
keeper_uid: 519
keeper_group: ponos
keeper_home: /srv/ponos
watcher_user: poros
watcher_uid: 520
principal_user is host-specific and defined in the relevant host_vars file:
# inventory/host_vars/caliban.incus.yml
principal_user: robert
principal_uid: 1000
Bootstrap Chain
- Terraform provisions
ponos(keeper_user) on all containers viacloud-init- UID 519, home
/srv/ponos, sudoers, SSH authorized keys at/srv/ponos/.ssh/authorized_keys
- UID 519, home
ansible.cfgsetsremote_user = ponosso all Ansible connections use the keeper account- Playbooks reference
{{ keeper_user }}for any task that needs the management account name
Playbook Pattern
- name: Add keeper_user to service group
become: true
ansible.builtin.user:
name: "{{ keeper_user }}"
groups: "{{ service_group }}"
append: true
Never use {{ ansible_user }} or {{ remote_user }} as Jinja2 template variables in tasks — these shadow Ansible built-in connection variables and cause unpredictable behaviour.
Secret Management Patterns
Ansible Vault (Sandbox Environment)
Purpose: Store sensitive values encrypted at rest in version control
File Location: inventory/group_vars/all/vault.yml
Variable Naming Convention: Prefix all vault variables with vault_
Example vault.yml: Note the entire vault file is encrypted
---
# Database passwords
vault_postgres_admin_password: # Avoid special characters & non-ASCII
vault_casdoor_db_password:
# S3 credentials
vault_casdoor_s3_access_key:
vault_casdoor_s3_secret_key:
vault_casdoor_s3_bucket:
Host Variables Reference Vault:
# In host_vars/oberon.incus.yml
casdoor_db_password: "{{ vault_casdoor_db_password }}"
casdoor_s3_access_key: "{{ vault_casdoor_s3_access_key }}"
casdoor_s3_secret_key: "{{ vault_casdoor_s3_secret_key }}"
casdoor_s3_bucket: "{{ vault_casdoor_s3_bucket }}"
# Non-sensitive values stay as plain variables
casdoor_s3_endpoint: "https://ariel.incus:9000"
casdoor_s3_region: "us-east-1"
Prerequisites:
- Set
ANSIBLE_VAULT_PASSWORD_FILEenvironment variable - Create
.vault_passfile with vault password - Add
.vault_passto.gitignore
Encrypting New Values:
# Encrypt a string and add to vault.yml
echo -n "secret_value" | ansible-vault encrypt_string --stdin-name 'vault_variable_name'
# Edit vault file directly
ansible-vault edit inventory/group_vars/all/vault.yml
OCI Vault (Production Environment)
Purpose: Use Oracle Cloud Infrastructure Vault for centralized secret management
Variable Pattern: Use Ansible lookups to fetch secrets at runtime
Example host_vars for OCI:
# In host_vars/production-server.yml
# Database passwords from OCI Vault
casdoor_db_password: "{{ lookup('community.oci.oci_secret', 'casdoor-db-password', compartment_id=oci_compartment_id, vault_id=oci_services_vault_id) }}"
# S3 credentials from OCI Vault
casdoor_s3_access_key: "{{ lookup('community.oci.oci_secret', 'casdoor-s3-access-key', compartment_id=oci_compartment_id, vault_id=oci_services_vault_id) }}"
casdoor_s3_secret_key: "{{ lookup('community.oci.oci_secret', 'casdoor-s3-secret-key', compartment_id=oci_compartment_id, vault_id=oci_services_vault_id) }}"
casdoor_s3_bucket: "{{ lookup('community.oci.oci_secret', 'casdoor-s3-bucket', compartment_id=oci_compartment_id, vault_id=oci_services_vault_id) }}"
# Non-sensitive values remain as plain variables
casdoor_s3_endpoint: "https://objectstorage.us-phoenix-1.oraclecloud.com"
casdoor_s3_region: "us-phoenix-1"
OCI Vault Organization:
OCI Compartment: production
├── Vault: agathos-databases
│ ├── Secret: postgres-admin-password
│ └── Secret: casdoor-db-password
│
├── Vault: agathos-services
│ ├── Secret: casdoor-s3-access-key
│ ├── Secret: casdoor-s3-secret-key
│ ├── Secret: casdoor-s3-bucket
│ └── Secret: openwebui-db-password
│
└── Vault: agathos-integrations
├── Secret: apikey-openai
└── Secret: apikey-anthropic
Secret Naming Convention:
- Ansible Vault:
vault_service_secret(underscores) - OCI Vault:
service-secret(hyphens)
Benefits of Two-Tier Pattern:
- Portability: Service playbooks remain unchanged across environments
- Flexibility: Switch secret backends by changing only host_vars
- Clarity: Variable names clearly indicate their purpose
- Security: Secrets never appear in playbooks or templates
S3 Bucket Provisioning with Ansible
Purpose: Provision Incus S3 buckets and manage credentials in Ansible Vault
Playbooks:
provision_s3.yml- Create bucket and store credentialsregenerate_s3_key.yml- Rotate credentialsremove_s3.yml- Delete bucket and clean vault
Usage:
# Provision new S3 bucket for a service
ansible-playbook provision_s3.yml -e bucket_name=casdoor -e service_name=casdoor
# Regenerate access credentials (invalidates old keys)
ansible-playbook regenerate_s3_key.yml -e bucket_name=casdoor -e service_name=casdoor
# Remove bucket and credentials
ansible-playbook remove_s3.yml -e bucket_name=casdoor -e service_name=casdoor
Requirements:
- User must be member of
incusgroup ANSIBLE_VAULT_PASSWORD_FILEmust be set- Incus CLI must be configured and accessible
What Gets Created:
- Incus storage bucket in project
agathos, pooldefault - Admin access key for the bucket
- Encrypted vault entries:
vault_<service>_s3_access_key,vault_<service>_s3_secret_key,vault_<service>_s3_bucket
Behind the Scenes:
- Role:
incus_storage_bucket - Idempotent: Checks if bucket/key exists before creating
- Atomic: Credentials captured and encrypted in single operation
- Variables sourced from:
inventory/group_vars/all/vars.yml
Troubleshooting
Template Not Found Errors
Symptom: Could not find or access 'service_name/template.j2'
Cause: When playbooks were moved from ansible root into service directories, template paths weren't updated.
Solution: Remove the service directory prefix from template paths:
# WRONG (old path from when playbook was at root)
src: service_name/config.j2
# CORRECT (playbook is now in service_name/ directory)
src: config.j2
Host-Specific Template Path Issues
Symptom: Playbook fails to find host-specific templates
Cause: Host-specific directories are at the wrong level
Expected Structure:
service_name/
├── deploy.yml
├── config.j2 # Default
└── hostname/ # Host-specific (inside service dir)
└── config.j2
Use {{playbook_dir}} for relative paths:
# This finds templates relative to the playbook location
src: "{{playbook_dir}}/{{inventory_hostname_short}}/config.j2"
Last Updated: December 2025
Project: Agathos Infrastructure
Approval: Red Panda Approved™