# Ansible Project Structure - Best Practices This document describes the clean, maintainable Ansible structure implemented in the Agathos project. Use this as a reference template for other Ansible projects. ## Overview This structure emphasizes: - **Simplicity**: Minimal files at root level - **Organization**: Services contain all related files (playbooks + templates) - **Separation**: Variables live in dedicated files, not inline in inventory - **Discoverability**: Clear naming and logical grouping ## Directory Structure ``` ansible/ ├── ansible.cfg # Ansible configuration ├── .vault_pass # Vault password file │ ├── site.yml # Master orchestration playbook ├── apt_update.yml # Utility: Update all hosts ├── sandbox_up.yml # Utility: Start infrastructure ├── sandbox_down.yml # Utility: Stop infrastructure │ ├── inventory/ # Inventory organization │ ├── hosts # Simple host/group membership │ │ │ ├── group_vars/ # Variables for groups │ │ └── all/ │ │ ├── vars.yml # Common variables │ │ └── vault.yml # Encrypted secrets │ │ │ └── host_vars/ # Variables per host │ ├── hostname1.yml # All vars for hostname1 │ ├── hostname2.yml # All vars for hostname2 │ └── ... │ └── service_name/ # Per-service directories ├── deploy.yml # Main deployment playbook ├── stage.yml # Staging playbook (if needed) ├── template1.j2 # Jinja2 templates ├── template2.j2 └── files/ # Static files (if needed) ``` ## Key Components ### 1. Simplified Inventory (`inventory/hosts`) **Purpose**: Define ONLY host/group membership, no variables **Example**: ```yaml --- # Ansible Inventory - Simplified # Main infrastructure group ubuntu: hosts: server1.example.com: server2.example.com: server3.example.com: # Service-specific groups web_servers: hosts: server1.example.com: database_servers: hosts: server2.example.com: ``` **Before**: 361 lines with variables inline **After**: 34 lines of pure structure ### 2. Host Variables (`inventory/host_vars/`) **Purpose**: All configuration specific to a single host **File naming**: `{hostname}.yml` (matches inventory hostname exactly) **Example** (`inventory/host_vars/server1.example.com.yml`): ```yaml --- # Server1 Configuration - Web Server # Services: nginx, php-fpm, redis services: - nginx - php - redis # Nginx Configuration nginx_user: www-data nginx_worker_processes: auto nginx_port: 80 nginx_ssl_port: 443 # PHP-FPM Configuration php_version: 8.2 php_max_children: 50 # Redis Configuration redis_port: 6379 redis_password: "{{vault_redis_password}}" ``` ### 3. Group Variables (`inventory/group_vars/`) **Purpose**: Variables shared across multiple hosts **Structure**: ``` group_vars/ ├── all/ # Variables for ALL hosts │ ├── vars.yml # Common non-sensitive config │ └── vault.yml # Encrypted secrets (ansible-vault) │ └── web_servers/ # Variables for web_servers group └── vars.yml ``` **Example** (`inventory/group_vars/all/vars.yml`): ```yaml --- # Common Variables for All Hosts remote_user: ansible deployment_environment: production ansible_python_interpreter: /usr/bin/python3 # Release versions app_release: v1.2.3 api_release: v2.0.1 # Monitoring endpoints prometheus_url: http://monitoring.example.com:9090 loki_url: http://monitoring.example.com:3100 ``` ### 4. Service Directories **Purpose**: Group all files related to a service deployment **Pattern**: `{service_name}/` **Contents**: - `deploy.yml` - Main deployment playbook - `stage.yml` - Staging/update playbook (optional) - `*.j2` - Jinja2 templates - `files/` - Static files (if needed) - `tasks/` - Task files (if splitting large playbooks) **Example Structure**: ``` nginx/ ├── deploy.yml # Deployment playbook ├── nginx.conf.j2 # Main config template ├── site.conf.j2 # Virtual host template ├── nginx.service.j2 # Systemd service file └── files/ └── ssl_params.conf # Static SSL configuration ``` ### 5. Master Playbook (`site.yml`) **Purpose**: Orchestrate full-stack deployment **Pattern**: Import service playbooks in dependency order **Example**: ```yaml --- - name: Update All Hosts import_playbook: apt_update.yml - name: Deploy Docker import_playbook: docker/deploy.yml - name: Deploy PostgreSQL import_playbook: postgresql/deploy.yml - name: Deploy Application import_playbook: myapp/deploy.yml - name: Deploy Monitoring import_playbook: prometheus/deploy.yml ``` ### 6. Service Playbook Pattern **Location**: `{service}/deploy.yml` **Standard Structure**: ```yaml --- - name: Deploy Service Name hosts: target_group tasks: # Service detection (if using services list) - name: Check if host has service_name service ansible.builtin.set_fact: has_service: "{{ 'service_name' in services | default([]) }}" - name: Skip hosts without service ansible.builtin.meta: end_host when: not has_service # Actual deployment tasks - name: Create service user become: true ansible.builtin.user: name: "{{service_user}}" group: "{{service_group}}" system: true - name: Template configuration become: true ansible.builtin.template: src: config.j2 dest: "{{service_directory}}/config.yml" notify: restart service # Handlers handlers: - name: restart service become: true ansible.builtin.systemd: name: service_name state: restarted daemon_reload: true ``` **IMPORTANT: Template Path Convention** - When playbooks are inside service directories, template `src:` paths are relative to that directory - Use `src: config.j2` NOT `src: service_name/config.j2` - The service directory prefix was correct when playbooks were at the ansible root, but is wrong now **Host-Specific Templates** Some services need different configuration per host. Store these in subdirectories named by hostname: ``` service_name/ ├── deploy.yml ├── config.j2 # Default template ├── hostname1/ # Host-specific overrides │ └── config.j2 ├── hostname2/ │ └── config.j2 └── hostname3/ └── config.j2 ``` Use conditional logic to select the correct template: ```yaml - name: Check for host-specific configuration ansible.builtin.stat: path: "{{playbook_dir}}/{{inventory_hostname_short}}/config.j2" delegate_to: localhost register: host_specific_config become: false - name: Template host-specific configuration become: true ansible.builtin.template: src: "{{playbook_dir}}/{{inventory_hostname_short}}/config.j2" dest: "{{service_directory}}/config" when: host_specific_config.stat.exists - name: Template default configuration become: true ansible.builtin.template: src: config.j2 dest: "{{service_directory}}/config" when: not host_specific_config.stat.exists ``` **Real Example: Alloy Service** ``` alloy/ ├── deploy.yml ├── config.alloy.j2 # Default configuration ├── ariel/ # Neo4j monitoring │ └── config.alloy.j2 ├── miranda/ # Docker monitoring │ └── config.alloy.j2 ├── oberon/ # Web services monitoring │ └── config.alloy.j2 └── puck/ # Application monitoring └── config.alloy.j2 ``` ## Service Detection Pattern **Purpose**: Allow hosts to selectively run service playbooks **How it works**: 1. Each host defines a `services:` list in `host_vars/` 2. Each playbook checks if its service is in the list 3. Playbook skips host if service not needed **Example**: `inventory/host_vars/server1.yml`: ```yaml services: - docker - nginx - redis ``` `nginx/deploy.yml`: ```yaml - name: Deploy Nginx hosts: ubuntu tasks: - name: Check if host has nginx service ansible.builtin.set_fact: has_nginx: "{{ 'nginx' in services | default([]) }}" - name: Skip hosts without nginx ansible.builtin.meta: end_host when: not has_nginx # Rest of tasks only run if nginx in services list ``` ## Ansible Vault Integration **Setup**: ```bash # Create vault password file (one-time) echo "your_vault_password" > .vault_pass chmod 600 .vault_pass # Configure ansible.cfg echo "vault_password_file = .vault_pass" >> ansible.cfg ``` **Usage**: ```bash # Edit vault file ansible-vault edit inventory/group_vars/all/vault.yml # View vault file ansible-vault view inventory/group_vars/all/vault.yml # Encrypt new file ansible-vault encrypt secrets.yml ``` **Variable naming convention**: - Prefix vault variables with `vault_` - Reference in regular vars: `db_password: "{{vault_db_password}}"` ## Running Playbooks **Full deployment**: ```bash ansible-playbook site.yml ``` **Single service**: ```bash ansible-playbook nginx/deploy.yml ``` **Specific hosts**: ```bash ansible-playbook nginx/deploy.yml --limit server1.example.com ``` **Check mode (dry-run)**: ```bash ansible-playbook site.yml --check ``` **With extra verbosity**: ```bash ansible-playbook nginx/deploy.yml -vv ``` ## Benefits of This Structure ### 1. Cleaner Root Directory - **Before**: 29+ playbook files cluttering root - **After**: 3-4 utility playbooks + site.yml ### 2. Simplified Inventory - **Before**: 361 lines with inline variables - **After**: 34 lines of pure structure - Variables organized logically by host/group ### 3. Service Cohesion - Everything related to a service in one place - Easy to find templates when editing playbooks - Natural grouping for git operations ### 4. Scalability - Easy to add new services (create directory, add playbook) - Easy to add new hosts (create host_vars file) - No risk of playbook name conflicts ### 5. Reusability - Service directories can be copied to other projects - host_vars pattern works for any inventory size - Clear separation of concerns ### 6. Maintainability - Changes isolated to service directories - Inventory file rarely needs editing - Clear audit trail in git (changes per service) ## Migration Checklist Moving an existing Ansible project to this structure: - [ ] Create service directories for each playbook - [ ] Move `{service}_deploy.yml` → `{service}/deploy.yml` - [ ] Move templates into service directories - [ ] Extract host variables from inventory to `host_vars/` - [ ] Extract group variables to `group_vars/all/vars.yml` - [ ] Move secrets to `group_vars/all/vault.yml` (encrypted) - [ ] Update `site.yml` import_playbook paths - [ ] Backup original inventory: `cp hosts hosts.backup` - [ ] Create simplified inventory with only group/host structure - [ ] Test with `ansible-playbook site.yml --check` - [ ] Verify with limited deployment: `--limit test_host` ## Example: Adding a New Service **1. Create service directory**: ```bash mkdir ansible/myapp ``` **2. Create deployment playbook** (`ansible/myapp/deploy.yml`): ```yaml --- - name: Deploy MyApp hosts: ubuntu tasks: - name: Check if host has myapp service ansible.builtin.set_fact: has_myapp: "{{ 'myapp' in services | default([]) }}" - name: Skip hosts without myapp ansible.builtin.meta: end_host when: not has_myapp - name: Deploy myapp # ... deployment tasks ``` **3. Create template** (`ansible/myapp/config.yml.j2`): ```yaml app_name: MyApp port: {{myapp_port}} database: {{myapp_db_host}} ``` **4. Add variables to host** (`inventory/host_vars/server1.yml`): ```yaml services: - myapp # Add to services list # MyApp configuration myapp_port: 8080 myapp_db_host: db.example.com ``` **5. Add to site.yml**: ```yaml - name: Deploy MyApp import_playbook: myapp/deploy.yml ``` **6. Deploy**: ```bash ansible-playbook myapp/deploy.yml ``` ## Best Practices ### Naming Conventions - Service directories: lowercase, underscores (e.g., `mcp_switchboard/`) - Playbooks: `deploy.yml`, `stage.yml`, `remove.yml` - Templates: descriptive name + `.j2` extension - Variables: service prefix (e.g., `nginx_port`, `redis_password`) - Vault variables: `vault_` prefix ### File Organization - Keep playbooks under 100 lines (split into task files if larger) - Group related templates in service directory - Use comments to document non-obvious variables - Add README.md to complex service directories ### Variable Organization - Host-specific: `host_vars/{hostname}.yml` - Service-specific across hosts: `group_vars/{service_group}/vars.yml` - Global configuration: `group_vars/all/vars.yml` - Secrets: `group_vars/all/vault.yml` (encrypted) ### Idempotency - Use `creates:` parameter for one-time operations - Use `state:` explicitly (present/absent/restarted) - Check conditions before destructive operations - Test with `--check` mode before applying ### Documentation - Comment complex task logic - Document required variables in playbook header - Add README.md for service directories with many files - Keep docs/ separate from ansible/ directory ## Related Documentation - [Ansible Best Practices](https://docs.ansible.com/ansible/latest/tips_tricks/ansible_tips_tricks.html) - [Ansible Vault Guide](https://docs.ansible.com/ansible/latest/vault_guide/index.html) - [Inventory Organization](https://docs.ansible.com/ansible/latest/inventory_guide/intro_inventory.html) ## Account Taxonomy Standardized account roles used across Ansible and Terraform. This taxonomy eliminates confusion between Ansible reserved connection keywords (`remote_user` in `ansible.cfg`) and infrastructure-managed account variables in playbooks. | Role | Variable | Example | Home | Sudo | Purpose | |------|----------|---------|------|------|---------| | user | *(login name)* | robert:1000 | /home/robert | varies | Human user account | | service_user | `{service}_user` | arke:500 | /srv/arke | no | Service daemon account | | keeper_user | `keeper_user` | ponos:519 | /srv/ponos | yes | Ansible/Terraform management (sudo) | | watcher_user | `watcher_user` | poros:520 | — | no | Non-sudo observation account | | principal_user | `principal_user` | robert:1000 | /home/robert | varies | AI agent collaborative account | ### Key Rules - **`keeper_user`** replaces all uses of `{{ ansible_user }}` and `{{ remote_user }}` as Jinja2 variables in playbooks - **`ansible.cfg`** retains `remote_user = ponos` as the SSH connection keyword (Ansible built-in) — this is not a Jinja2 variable - **`service_user`** accounts live in `/srv/{service}` — if currently in `/home`, they migrate on next re-provision - **`watcher_user`** is provisioned by Ansible playbook when needed (not via cloud-init) - **`principal_user`** is for AI agent hosts where the agent operates on behalf of a human user; define in `host_vars/{hostname}.yml` - Do **not** use `vault_` prefix for any of these — that prefix is reserved for Ansible Vault variables ### Variable Definitions All taxonomy variables are defined in `inventory/group_vars/all/vars.yml`: ```yaml # Account Taxonomy keeper_user: ponos keeper_uid: 519 keeper_group: ponos keeper_home: /srv/ponos watcher_user: poros watcher_uid: 520 ``` `principal_user` is host-specific and defined in the relevant `host_vars` file: ```yaml # inventory/host_vars/caliban.incus.yml principal_user: robert principal_uid: 1000 ``` ### Bootstrap Chain 1. **Terraform** provisions `ponos` (keeper_user) on all containers via `cloud-init` - UID 519, home `/srv/ponos`, sudoers, SSH authorized keys at `/srv/ponos/.ssh/authorized_keys` 2. **`ansible.cfg`** sets `remote_user = ponos` so all Ansible connections use the keeper account 3. **Playbooks** reference `{{ keeper_user }}` for any task that needs the management account name ### Playbook Pattern ```yaml - name: Add keeper_user to service group become: true ansible.builtin.user: name: "{{ keeper_user }}" groups: "{{ service_group }}" append: true ``` **Never use** `{{ ansible_user }}` or `{{ remote_user }}` as Jinja2 template variables in tasks — these shadow Ansible built-in connection variables and cause unpredictable behaviour. ## Secret Management Patterns ### Ansible Vault (Sandbox Environment) **Purpose**: Store sensitive values encrypted at rest in version control **File Location**: `inventory/group_vars/all/vault.yml` **Variable Naming Convention**: Prefix all vault variables with `vault_` **Example vault.yml**: Note the entire vault file is encrypted ```yaml --- # Database passwords vault_postgres_admin_password: # Avoid special characters & non-ASCII vault_casdoor_db_password: # S3 credentials vault_casdoor_s3_access_key: vault_casdoor_s3_secret_key: vault_casdoor_s3_bucket: ``` **Host Variables Reference Vault**: ```yaml # In host_vars/oberon.incus.yml casdoor_db_password: "{{ vault_casdoor_db_password }}" casdoor_s3_access_key: "{{ vault_casdoor_s3_access_key }}" casdoor_s3_secret_key: "{{ vault_casdoor_s3_secret_key }}" casdoor_s3_bucket: "{{ vault_casdoor_s3_bucket }}" # Non-sensitive values stay as plain variables casdoor_s3_endpoint: "https://ariel.incus:9000" casdoor_s3_region: "us-east-1" ``` **Prerequisites**: - Set `ANSIBLE_VAULT_PASSWORD_FILE` environment variable - Create `.vault_pass` file with vault password - Add `.vault_pass` to `.gitignore` **Encrypting New Values**: ```bash # Encrypt a string and add to vault.yml echo -n "secret_value" | ansible-vault encrypt_string --stdin-name 'vault_variable_name' # Edit vault file directly ansible-vault edit inventory/group_vars/all/vault.yml ``` ### OCI Vault (Production Environment) **Purpose**: Use Oracle Cloud Infrastructure Vault for centralized secret management **Variable Pattern**: Use Ansible lookups to fetch secrets at runtime **Example host_vars for OCI**: ```yaml # In host_vars/production-server.yml # Database passwords from OCI Vault casdoor_db_password: "{{ lookup('community.oci.oci_secret', 'casdoor-db-password', compartment_id=oci_compartment_id, vault_id=oci_services_vault_id) }}" # S3 credentials from OCI Vault casdoor_s3_access_key: "{{ lookup('community.oci.oci_secret', 'casdoor-s3-access-key', compartment_id=oci_compartment_id, vault_id=oci_services_vault_id) }}" casdoor_s3_secret_key: "{{ lookup('community.oci.oci_secret', 'casdoor-s3-secret-key', compartment_id=oci_compartment_id, vault_id=oci_services_vault_id) }}" casdoor_s3_bucket: "{{ lookup('community.oci.oci_secret', 'casdoor-s3-bucket', compartment_id=oci_compartment_id, vault_id=oci_services_vault_id) }}" # Non-sensitive values remain as plain variables casdoor_s3_endpoint: "https://objectstorage.us-phoenix-1.oraclecloud.com" casdoor_s3_region: "us-phoenix-1" ``` **OCI Vault Organization**: ``` OCI Compartment: production ├── Vault: agathos-databases │ ├── Secret: postgres-admin-password │ └── Secret: casdoor-db-password │ ├── Vault: agathos-services │ ├── Secret: casdoor-s3-access-key │ ├── Secret: casdoor-s3-secret-key │ ├── Secret: casdoor-s3-bucket │ └── Secret: openwebui-db-password │ └── Vault: agathos-integrations ├── Secret: apikey-openai └── Secret: apikey-anthropic ``` **Secret Naming Convention**: - Ansible Vault: `vault_service_secret` (underscores) - OCI Vault: `service-secret` (hyphens) **Benefits of Two-Tier Pattern**: 1. **Portability**: Service playbooks remain unchanged across environments 2. **Flexibility**: Switch secret backends by changing only host_vars 3. **Clarity**: Variable names clearly indicate their purpose 4. **Security**: Secrets never appear in playbooks or templates ### S3 Bucket Provisioning with Ansible **Purpose**: Provision Incus S3 buckets and manage credentials in Ansible Vault **Playbooks**: - `provision_s3.yml` - Create bucket and store credentials - `regenerate_s3_key.yml` - Rotate credentials - `remove_s3.yml` - Delete bucket and clean vault **Usage**: ```bash # Provision new S3 bucket for a service ansible-playbook provision_s3.yml -e bucket_name=casdoor -e service_name=casdoor # Regenerate access credentials (invalidates old keys) ansible-playbook regenerate_s3_key.yml -e bucket_name=casdoor -e service_name=casdoor # Remove bucket and credentials ansible-playbook remove_s3.yml -e bucket_name=casdoor -e service_name=casdoor ``` **Requirements**: - User must be member of `incus` group - `ANSIBLE_VAULT_PASSWORD_FILE` must be set - Incus CLI must be configured and accessible **What Gets Created**: 1. Incus storage bucket in project `agathos`, pool `default` 2. Admin access key for the bucket 3. Encrypted vault entries: `vault__s3_access_key`, `vault__s3_secret_key`, `vault__s3_bucket` **Behind the Scenes**: - Role: `incus_storage_bucket` - Idempotent: Checks if bucket/key exists before creating - Atomic: Credentials captured and encrypted in single operation - Variables sourced from: `inventory/group_vars/all/vars.yml` ## Troubleshooting ### Template Not Found Errors **Symptom**: `Could not find or access 'service_name/template.j2'` **Cause**: When playbooks were moved from ansible root into service directories, template paths weren't updated. **Solution**: Remove the service directory prefix from template paths: ```yaml # WRONG (old path from when playbook was at root) src: service_name/config.j2 # CORRECT (playbook is now in service_name/ directory) src: config.j2 ``` ### Host-Specific Template Path Issues **Symptom**: Playbook fails to find host-specific templates **Cause**: Host-specific directories are at the wrong level **Expected Structure**: ``` service_name/ ├── deploy.yml ├── config.j2 # Default └── hostname/ # Host-specific (inside service dir) └── config.j2 ``` **Use `{{playbook_dir}}` for relative paths**: ```yaml # This finds templates relative to the playbook location src: "{{playbook_dir}}/{{inventory_hostname_short}}/config.j2" ``` --- **Last Updated**: December 2025 **Project**: Agathos Infrastructure **Approval**: Red Panda Approved™