Files
ouranos/docs/ansible.md
Robert Helewka 042df52bca Refactor user management in Ansible playbooks to standardize on keeper_user
- Updated user addition tasks across multiple playbooks (mcp_switchboard, mcpo, neo4j, neo4j_mcp, openwebui, postgresql, rabbitmq, searxng, smtp4dev) to replace references to ansible_user and remote_user with keeper_user.
- Modified PostgreSQL deployment to create directories and manage files under keeper_user's home.
- Enhanced documentation to clarify account taxonomy and usage of keeper_user in playbooks.
- Introduced new deployment for Agent S, including environment setup, desktop environment installation, XRDP configuration, and accessibility support.
- Added staging playbook for preparing release tarballs from local repositories.
- Created templates for XRDP configuration and environment activation scripts.
- Removed obsolete sunwait documentation.
2026-03-05 10:37:41 +00:00

22 KiB

Ansible Project Structure - Best Practices

This document describes the clean, maintainable Ansible structure implemented in the Agathos project. Use this as a reference template for other Ansible projects.

Overview

This structure emphasizes:

  • Simplicity: Minimal files at root level
  • Organization: Services contain all related files (playbooks + templates)
  • Separation: Variables live in dedicated files, not inline in inventory
  • Discoverability: Clear naming and logical grouping

Directory Structure

ansible/
├── ansible.cfg                    # Ansible configuration
├── .vault_pass                    # Vault password file
│
├── site.yml                       # Master orchestration playbook
├── apt_update.yml                 # Utility: Update all hosts
├── sandbox_up.yml                 # Utility: Start infrastructure
├── sandbox_down.yml               # Utility: Stop infrastructure
│
├── inventory/                     # Inventory organization
│   ├── hosts                      # Simple host/group membership 
│   │
│   ├── group_vars/                # Variables for groups
│   │   └── all/
│   │       ├── vars.yml           # Common variables
│   │       └── vault.yml          # Encrypted secrets
│   │
│   └── host_vars/                 # Variables per host
│       ├── hostname1.yml          # All vars for hostname1
│       ├── hostname2.yml          # All vars for hostname2
│       └── ...
│
└── service_name/                  # Per-service directories
    ├── deploy.yml                 # Main deployment playbook
    ├── stage.yml                  # Staging playbook (if needed)
    ├── template1.j2               # Jinja2 templates
    ├── template2.j2
    └── files/                     # Static files (if needed)

Key Components

1. Simplified Inventory (inventory/hosts)

Purpose: Define ONLY host/group membership, no variables

Example:

---
# Ansible Inventory - Simplified

# Main infrastructure group
ubuntu:
  hosts:
    server1.example.com:
    server2.example.com:
    server3.example.com:

# Service-specific groups
web_servers:
  hosts:
    server1.example.com:

database_servers:
  hosts:
    server2.example.com:

Before: 361 lines with variables inline
After: 34 lines of pure structure

2. Host Variables (inventory/host_vars/)

Purpose: All configuration specific to a single host

File naming: {hostname}.yml (matches inventory hostname exactly)

Example (inventory/host_vars/server1.example.com.yml):

---
# Server1 Configuration - Web Server
# Services: nginx, php-fpm, redis

services:
  - nginx
  - php
  - redis

# Nginx Configuration
nginx_user: www-data
nginx_worker_processes: auto
nginx_port: 80
nginx_ssl_port: 443

# PHP-FPM Configuration
php_version: 8.2
php_max_children: 50

# Redis Configuration
redis_port: 6379
redis_password: "{{vault_redis_password}}"

3. Group Variables (inventory/group_vars/)

Purpose: Variables shared across multiple hosts

Structure:

group_vars/
├── all/                    # Variables for ALL hosts
│   ├── vars.yml           # Common non-sensitive config
│   └── vault.yml          # Encrypted secrets (ansible-vault)
│
└── web_servers/           # Variables for web_servers group
    └── vars.yml

Example (inventory/group_vars/all/vars.yml):

---
# Common Variables for All Hosts

remote_user: ansible
deployment_environment: production
ansible_python_interpreter: /usr/bin/python3

# Release versions
app_release: v1.2.3
api_release: v2.0.1

# Monitoring endpoints
prometheus_url: http://monitoring.example.com:9090
loki_url: http://monitoring.example.com:3100

4. Service Directories

Purpose: Group all files related to a service deployment

Pattern: {service_name}/

Contents:

  • deploy.yml - Main deployment playbook
  • stage.yml - Staging/update playbook (optional)
  • *.j2 - Jinja2 templates
  • files/ - Static files (if needed)
  • tasks/ - Task files (if splitting large playbooks)

Example Structure:

nginx/
├── deploy.yml              # Deployment playbook
├── nginx.conf.j2           # Main config template
├── site.conf.j2            # Virtual host template
├── nginx.service.j2        # Systemd service file
└── files/
    └── ssl_params.conf     # Static SSL configuration

5. Master Playbook (site.yml)

Purpose: Orchestrate full-stack deployment

Pattern: Import service playbooks in dependency order

Example:

---
- name: Update All Hosts
  import_playbook: apt_update.yml

- name: Deploy Docker
  import_playbook: docker/deploy.yml

- name: Deploy PostgreSQL
  import_playbook: postgresql/deploy.yml

- name: Deploy Application
  import_playbook: myapp/deploy.yml

- name: Deploy Monitoring
  import_playbook: prometheus/deploy.yml

6. Service Playbook Pattern

Location: {service}/deploy.yml

Standard Structure:

---
- name: Deploy Service Name
  hosts: target_group
  tasks:
  
  # Service detection (if using services list)
  - name: Check if host has service_name service
    ansible.builtin.set_fact:
      has_service: "{{ 'service_name' in services | default([]) }}"

  - name: Skip hosts without service
    ansible.builtin.meta: end_host
    when: not has_service

  # Actual deployment tasks
  - name: Create service user
    become: true
    ansible.builtin.user:
      name: "{{service_user}}"
      group: "{{service_group}}"
      system: true

  - name: Template configuration
    become: true
    ansible.builtin.template:
      src: config.j2
      dest: "{{service_directory}}/config.yml"
    notify: restart service

  # Handlers
  handlers:
  - name: restart service
    become: true
    ansible.builtin.systemd:
      name: service_name
      state: restarted
      daemon_reload: true

IMPORTANT: Template Path Convention

  • When playbooks are inside service directories, template src: paths are relative to that directory
  • Use src: config.j2 NOT src: service_name/config.j2
  • The service directory prefix was correct when playbooks were at the ansible root, but is wrong now

Host-Specific Templates Some services need different configuration per host. Store these in subdirectories named by hostname:

service_name/
├── deploy.yml
├── config.j2              # Default template
├── hostname1/             # Host-specific overrides
│   └── config.j2
├── hostname2/
│   └── config.j2
└── hostname3/
    └── config.j2

Use conditional logic to select the correct template:

- name: Check for host-specific configuration
  ansible.builtin.stat:
    path: "{{playbook_dir}}/{{inventory_hostname_short}}/config.j2"
  delegate_to: localhost
  register: host_specific_config
  become: false

- name: Template host-specific configuration
  become: true
  ansible.builtin.template:
    src: "{{playbook_dir}}/{{inventory_hostname_short}}/config.j2"
    dest: "{{service_directory}}/config"
  when: host_specific_config.stat.exists

- name: Template default configuration
  become: true
  ansible.builtin.template:
    src: config.j2
    dest: "{{service_directory}}/config"
  when: not host_specific_config.stat.exists

Real Example: Alloy Service

alloy/
├── deploy.yml
├── config.alloy.j2        # Default configuration
├── ariel/                 # Neo4j monitoring
│   └── config.alloy.j2
├── miranda/               # Docker monitoring
│   └── config.alloy.j2
├── oberon/                # Web services monitoring
│   └── config.alloy.j2
└── puck/                  # Application monitoring
    └── config.alloy.j2

Service Detection Pattern

Purpose: Allow hosts to selectively run service playbooks

How it works:

  1. Each host defines a services: list in host_vars/
  2. Each playbook checks if its service is in the list
  3. Playbook skips host if service not needed

Example:

inventory/host_vars/server1.yml:

services:
  - docker
  - nginx
  - redis

nginx/deploy.yml:

- name: Deploy Nginx
  hosts: ubuntu
  tasks:
  - name: Check if host has nginx service
    ansible.builtin.set_fact:
      has_nginx: "{{ 'nginx' in services | default([]) }}"

  - name: Skip hosts without nginx
    ansible.builtin.meta: end_host
    when: not has_nginx
  
  # Rest of tasks only run if nginx in services list

Ansible Vault Integration

Setup:

# Create vault password file (one-time)
echo "your_vault_password" > .vault_pass
chmod 600 .vault_pass

# Configure ansible.cfg
echo "vault_password_file = .vault_pass" >> ansible.cfg

Usage:

# Edit vault file
ansible-vault edit inventory/group_vars/all/vault.yml

# View vault file
ansible-vault view inventory/group_vars/all/vault.yml

# Encrypt new file
ansible-vault encrypt secrets.yml

Variable naming convention:

  • Prefix vault variables with vault_
  • Reference in regular vars: db_password: "{{vault_db_password}}"

Running Playbooks

Full deployment:

ansible-playbook site.yml

Single service:

ansible-playbook nginx/deploy.yml

Specific hosts:

ansible-playbook nginx/deploy.yml --limit server1.example.com

Check mode (dry-run):

ansible-playbook site.yml --check

With extra verbosity:

ansible-playbook nginx/deploy.yml -vv

Benefits of This Structure

1. Cleaner Root Directory

  • Before: 29+ playbook files cluttering root
  • After: 3-4 utility playbooks + site.yml

2. Simplified Inventory

  • Before: 361 lines with inline variables
  • After: 34 lines of pure structure
  • Variables organized logically by host/group

3. Service Cohesion

  • Everything related to a service in one place
  • Easy to find templates when editing playbooks
  • Natural grouping for git operations

4. Scalability

  • Easy to add new services (create directory, add playbook)
  • Easy to add new hosts (create host_vars file)
  • No risk of playbook name conflicts

5. Reusability

  • Service directories can be copied to other projects
  • host_vars pattern works for any inventory size
  • Clear separation of concerns

6. Maintainability

  • Changes isolated to service directories
  • Inventory file rarely needs editing
  • Clear audit trail in git (changes per service)

Migration Checklist

Moving an existing Ansible project to this structure:

  • Create service directories for each playbook
  • Move {service}_deploy.yml{service}/deploy.yml
  • Move templates into service directories
  • Extract host variables from inventory to host_vars/
  • Extract group variables to group_vars/all/vars.yml
  • Move secrets to group_vars/all/vault.yml (encrypted)
  • Update site.yml import_playbook paths
  • Backup original inventory: cp hosts hosts.backup
  • Create simplified inventory with only group/host structure
  • Test with ansible-playbook site.yml --check
  • Verify with limited deployment: --limit test_host

Example: Adding a New Service

1. Create service directory:

mkdir ansible/myapp

2. Create deployment playbook (ansible/myapp/deploy.yml):

---
- name: Deploy MyApp
  hosts: ubuntu
  tasks:
  - name: Check if host has myapp service
    ansible.builtin.set_fact:
      has_myapp: "{{ 'myapp' in services | default([]) }}"

  - name: Skip hosts without myapp
    ansible.builtin.meta: end_host
    when: not has_myapp

  - name: Deploy myapp
    # ... deployment tasks

3. Create template (ansible/myapp/config.yml.j2):

app_name: MyApp
port: {{myapp_port}}
database: {{myapp_db_host}}

4. Add variables to host (inventory/host_vars/server1.yml):

services:
  - myapp  # Add to services list

# MyApp configuration
myapp_port: 8080
myapp_db_host: db.example.com

5. Add to site.yml:

- name: Deploy MyApp
  import_playbook: myapp/deploy.yml

6. Deploy:

ansible-playbook myapp/deploy.yml

Best Practices

Naming Conventions

  • Service directories: lowercase, underscores (e.g., mcp_switchboard/)
  • Playbooks: deploy.yml, stage.yml, remove.yml
  • Templates: descriptive name + .j2 extension
  • Variables: service prefix (e.g., nginx_port, redis_password)
  • Vault variables: vault_ prefix

File Organization

  • Keep playbooks under 100 lines (split into task files if larger)
  • Group related templates in service directory
  • Use comments to document non-obvious variables
  • Add README.md to complex service directories

Variable Organization

  • Host-specific: host_vars/{hostname}.yml
  • Service-specific across hosts: group_vars/{service_group}/vars.yml
  • Global configuration: group_vars/all/vars.yml
  • Secrets: group_vars/all/vault.yml (encrypted)

Idempotency

  • Use creates: parameter for one-time operations
  • Use state: explicitly (present/absent/restarted)
  • Check conditions before destructive operations
  • Test with --check mode before applying

Documentation

  • Comment complex task logic
  • Document required variables in playbook header
  • Add README.md for service directories with many files
  • Keep docs/ separate from ansible/ directory

Account Taxonomy

Standardized account roles used across Ansible and Terraform. This taxonomy eliminates confusion between Ansible reserved connection keywords (remote_user in ansible.cfg) and infrastructure-managed account variables in playbooks.

Role Variable Example Home Sudo Purpose
user (login name) robert:1000 /home/robert varies Human user account
service_user {service}_user arke:500 /srv/arke no Service daemon account
keeper_user keeper_user ponos:519 /srv/ponos yes Ansible/Terraform management (sudo)
watcher_user watcher_user poros:520 no Non-sudo observation account
principal_user principal_user robert:1000 /home/robert varies AI agent collaborative account

Key Rules

  • keeper_user replaces all uses of {{ ansible_user }} and {{ remote_user }} as Jinja2 variables in playbooks
  • ansible.cfg retains remote_user = ponos as the SSH connection keyword (Ansible built-in) — this is not a Jinja2 variable
  • service_user accounts live in /srv/{service} — if currently in /home, they migrate on next re-provision
  • watcher_user is provisioned by Ansible playbook when needed (not via cloud-init)
  • principal_user is for AI agent hosts where the agent operates on behalf of a human user; define in host_vars/{hostname}.yml
  • Do not use vault_ prefix for any of these — that prefix is reserved for Ansible Vault variables

Variable Definitions

All taxonomy variables are defined in inventory/group_vars/all/vars.yml:

# Account Taxonomy
keeper_user: ponos
keeper_uid: 519
keeper_group: ponos
keeper_home: /srv/ponos
watcher_user: poros
watcher_uid: 520

principal_user is host-specific and defined in the relevant host_vars file:

# inventory/host_vars/caliban.incus.yml
principal_user: robert
principal_uid: 1000

Bootstrap Chain

  1. Terraform provisions ponos (keeper_user) on all containers via cloud-init
    • UID 519, home /srv/ponos, sudoers, SSH authorized keys at /srv/ponos/.ssh/authorized_keys
  2. ansible.cfg sets remote_user = ponos so all Ansible connections use the keeper account
  3. Playbooks reference {{ keeper_user }} for any task that needs the management account name

Playbook Pattern

- name: Add keeper_user to service group
  become: true
  ansible.builtin.user:
    name: "{{ keeper_user }}"
    groups: "{{ service_group }}"
    append: true

Never use {{ ansible_user }} or {{ remote_user }} as Jinja2 template variables in tasks — these shadow Ansible built-in connection variables and cause unpredictable behaviour.

Secret Management Patterns

Ansible Vault (Sandbox Environment)

Purpose: Store sensitive values encrypted at rest in version control

File Location: inventory/group_vars/all/vault.yml

Variable Naming Convention: Prefix all vault variables with vault_

Example vault.yml: Note the entire vault file is encrypted

---
# Database passwords
vault_postgres_admin_password: # Avoid special characters & non-ASCII
vault_casdoor_db_password: 
# S3 credentials
vault_casdoor_s3_access_key:
vault_casdoor_s3_secret_key: 
vault_casdoor_s3_bucket: 

Host Variables Reference Vault:

# In host_vars/oberon.incus.yml
casdoor_db_password: "{{ vault_casdoor_db_password }}"
casdoor_s3_access_key: "{{ vault_casdoor_s3_access_key }}"
casdoor_s3_secret_key: "{{ vault_casdoor_s3_secret_key }}"
casdoor_s3_bucket: "{{ vault_casdoor_s3_bucket }}"

# Non-sensitive values stay as plain variables
casdoor_s3_endpoint: "https://ariel.incus:9000"
casdoor_s3_region: "us-east-1"

Prerequisites:

  • Set ANSIBLE_VAULT_PASSWORD_FILE environment variable
  • Create .vault_pass file with vault password
  • Add .vault_pass to .gitignore

Encrypting New Values:

# Encrypt a string and add to vault.yml
echo -n "secret_value" | ansible-vault encrypt_string --stdin-name 'vault_variable_name'

# Edit vault file directly
ansible-vault edit inventory/group_vars/all/vault.yml

OCI Vault (Production Environment)

Purpose: Use Oracle Cloud Infrastructure Vault for centralized secret management

Variable Pattern: Use Ansible lookups to fetch secrets at runtime

Example host_vars for OCI:

# In host_vars/production-server.yml

# Database passwords from OCI Vault
casdoor_db_password: "{{ lookup('community.oci.oci_secret', 'casdoor-db-password', compartment_id=oci_compartment_id, vault_id=oci_services_vault_id) }}"

# S3 credentials from OCI Vault
casdoor_s3_access_key: "{{ lookup('community.oci.oci_secret', 'casdoor-s3-access-key', compartment_id=oci_compartment_id, vault_id=oci_services_vault_id) }}"
casdoor_s3_secret_key: "{{ lookup('community.oci.oci_secret', 'casdoor-s3-secret-key', compartment_id=oci_compartment_id, vault_id=oci_services_vault_id) }}"
casdoor_s3_bucket: "{{ lookup('community.oci.oci_secret', 'casdoor-s3-bucket', compartment_id=oci_compartment_id, vault_id=oci_services_vault_id) }}"

# Non-sensitive values remain as plain variables
casdoor_s3_endpoint: "https://objectstorage.us-phoenix-1.oraclecloud.com"
casdoor_s3_region: "us-phoenix-1"

OCI Vault Organization:

OCI Compartment: production
├── Vault: agathos-databases
│   ├── Secret: postgres-admin-password
│   └── Secret: casdoor-db-password
│
├── Vault: agathos-services  
│   ├── Secret: casdoor-s3-access-key
│   ├── Secret: casdoor-s3-secret-key
│   ├── Secret: casdoor-s3-bucket
│   └── Secret: openwebui-db-password
│
└── Vault: agathos-integrations
    ├── Secret: apikey-openai
    └── Secret: apikey-anthropic

Secret Naming Convention:

  • Ansible Vault: vault_service_secret (underscores)
  • OCI Vault: service-secret (hyphens)

Benefits of Two-Tier Pattern:

  1. Portability: Service playbooks remain unchanged across environments
  2. Flexibility: Switch secret backends by changing only host_vars
  3. Clarity: Variable names clearly indicate their purpose
  4. Security: Secrets never appear in playbooks or templates

S3 Bucket Provisioning with Ansible

Purpose: Provision Incus S3 buckets and manage credentials in Ansible Vault

Playbooks:

  • provision_s3.yml - Create bucket and store credentials
  • regenerate_s3_key.yml - Rotate credentials
  • remove_s3.yml - Delete bucket and clean vault

Usage:

# Provision new S3 bucket for a service
ansible-playbook provision_s3.yml -e bucket_name=casdoor -e service_name=casdoor

# Regenerate access credentials (invalidates old keys)
ansible-playbook regenerate_s3_key.yml -e bucket_name=casdoor -e service_name=casdoor

# Remove bucket and credentials
ansible-playbook remove_s3.yml -e bucket_name=casdoor -e service_name=casdoor

Requirements:

  • User must be member of incus group
  • ANSIBLE_VAULT_PASSWORD_FILE must be set
  • Incus CLI must be configured and accessible

What Gets Created:

  1. Incus storage bucket in project agathos, pool default
  2. Admin access key for the bucket
  3. Encrypted vault entries: vault_<service>_s3_access_key, vault_<service>_s3_secret_key, vault_<service>_s3_bucket

Behind the Scenes:

  • Role: incus_storage_bucket
  • Idempotent: Checks if bucket/key exists before creating
  • Atomic: Credentials captured and encrypted in single operation
  • Variables sourced from: inventory/group_vars/all/vars.yml

Troubleshooting

Template Not Found Errors

Symptom: Could not find or access 'service_name/template.j2'

Cause: When playbooks were moved from ansible root into service directories, template paths weren't updated.

Solution: Remove the service directory prefix from template paths:

# WRONG (old path from when playbook was at root)
src: service_name/config.j2

# CORRECT (playbook is now in service_name/ directory)
src: config.j2

Host-Specific Template Path Issues

Symptom: Playbook fails to find host-specific templates

Cause: Host-specific directories are at the wrong level

Expected Structure:

service_name/
├── deploy.yml
├── config.j2              # Default
└── hostname/              # Host-specific (inside service dir)
    └── config.j2

Use {{playbook_dir}} for relative paths:

# This finds templates relative to the playbook location
src: "{{playbook_dir}}/{{inventory_hostname_short}}/config.j2"

Last Updated: December 2025
Project: Agathos Infrastructure
Approval: Red Panda Approved™