docs: rewrite README with structured overview and quick start guide

Replaces the minimal project description with a comprehensive README
including a component overview table, quick start instructions, common
Ansible operations, and links to detailed documentation. Aligns with
Red Panda Approval™ standards.
This commit is contained in:
2026-03-03 12:49:06 +00:00
parent c7be03a743
commit b4d60f2f38
219 changed files with 34586 additions and 2 deletions

705
docs/ansible.md Normal file
View File

@@ -0,0 +1,705 @@
# Ansible Project Structure - Best Practices
This document describes the clean, maintainable Ansible structure implemented in the Agathos project. Use this as a reference template for other Ansible projects.
## Overview
This structure emphasizes:
- **Simplicity**: Minimal files at root level
- **Organization**: Services contain all related files (playbooks + templates)
- **Separation**: Variables live in dedicated files, not inline in inventory
- **Discoverability**: Clear naming and logical grouping
## Directory Structure
```
ansible/
├── ansible.cfg # Ansible configuration
├── .vault_pass # Vault password file
├── site.yml # Master orchestration playbook
├── apt_update.yml # Utility: Update all hosts
├── sandbox_up.yml # Utility: Start infrastructure
├── sandbox_down.yml # Utility: Stop infrastructure
├── inventory/ # Inventory organization
│ ├── hosts # Simple host/group membership
│ │
│ ├── group_vars/ # Variables for groups
│ │ └── all/
│ │ ├── vars.yml # Common variables
│ │ └── vault.yml # Encrypted secrets
│ │
│ └── host_vars/ # Variables per host
│ ├── hostname1.yml # All vars for hostname1
│ ├── hostname2.yml # All vars for hostname2
│ └── ...
└── service_name/ # Per-service directories
├── deploy.yml # Main deployment playbook
├── stage.yml # Staging playbook (if needed)
├── template1.j2 # Jinja2 templates
├── template2.j2
└── files/ # Static files (if needed)
```
## Key Components
### 1. Simplified Inventory (`inventory/hosts`)
**Purpose**: Define ONLY host/group membership, no variables
**Example**:
```yaml
---
# Ansible Inventory - Simplified
# Main infrastructure group
ubuntu:
hosts:
server1.example.com:
server2.example.com:
server3.example.com:
# Service-specific groups
web_servers:
hosts:
server1.example.com:
database_servers:
hosts:
server2.example.com:
```
**Before**: 361 lines with variables inline
**After**: 34 lines of pure structure
### 2. Host Variables (`inventory/host_vars/`)
**Purpose**: All configuration specific to a single host
**File naming**: `{hostname}.yml` (matches inventory hostname exactly)
**Example** (`inventory/host_vars/server1.example.com.yml`):
```yaml
---
# Server1 Configuration - Web Server
# Services: nginx, php-fpm, redis
services:
- nginx
- php
- redis
# Nginx Configuration
nginx_user: www-data
nginx_worker_processes: auto
nginx_port: 80
nginx_ssl_port: 443
# PHP-FPM Configuration
php_version: 8.2
php_max_children: 50
# Redis Configuration
redis_port: 6379
redis_password: "{{vault_redis_password}}"
```
### 3. Group Variables (`inventory/group_vars/`)
**Purpose**: Variables shared across multiple hosts
**Structure**:
```
group_vars/
├── all/ # Variables for ALL hosts
│ ├── vars.yml # Common non-sensitive config
│ └── vault.yml # Encrypted secrets (ansible-vault)
└── web_servers/ # Variables for web_servers group
└── vars.yml
```
**Example** (`inventory/group_vars/all/vars.yml`):
```yaml
---
# Common Variables for All Hosts
remote_user: ansible
deployment_environment: production
ansible_python_interpreter: /usr/bin/python3
# Release versions
app_release: v1.2.3
api_release: v2.0.1
# Monitoring endpoints
prometheus_url: http://monitoring.example.com:9090
loki_url: http://monitoring.example.com:3100
```
### 4. Service Directories
**Purpose**: Group all files related to a service deployment
**Pattern**: `{service_name}/`
**Contents**:
- `deploy.yml` - Main deployment playbook
- `stage.yml` - Staging/update playbook (optional)
- `*.j2` - Jinja2 templates
- `files/` - Static files (if needed)
- `tasks/` - Task files (if splitting large playbooks)
**Example Structure**:
```
nginx/
├── deploy.yml # Deployment playbook
├── nginx.conf.j2 # Main config template
├── site.conf.j2 # Virtual host template
├── nginx.service.j2 # Systemd service file
└── files/
└── ssl_params.conf # Static SSL configuration
```
### 5. Master Playbook (`site.yml`)
**Purpose**: Orchestrate full-stack deployment
**Pattern**: Import service playbooks in dependency order
**Example**:
```yaml
---
- name: Update All Hosts
import_playbook: apt_update.yml
- name: Deploy Docker
import_playbook: docker/deploy.yml
- name: Deploy PostgreSQL
import_playbook: postgresql/deploy.yml
- name: Deploy Application
import_playbook: myapp/deploy.yml
- name: Deploy Monitoring
import_playbook: prometheus/deploy.yml
```
### 6. Service Playbook Pattern
**Location**: `{service}/deploy.yml`
**Standard Structure**:
```yaml
---
- name: Deploy Service Name
hosts: target_group
tasks:
# Service detection (if using services list)
- name: Check if host has service_name service
ansible.builtin.set_fact:
has_service: "{{ 'service_name' in services | default([]) }}"
- name: Skip hosts without service
ansible.builtin.meta: end_host
when: not has_service
# Actual deployment tasks
- name: Create service user
become: true
ansible.builtin.user:
name: "{{service_user}}"
group: "{{service_group}}"
system: true
- name: Template configuration
become: true
ansible.builtin.template:
src: config.j2
dest: "{{service_directory}}/config.yml"
notify: restart service
# Handlers
handlers:
- name: restart service
become: true
ansible.builtin.systemd:
name: service_name
state: restarted
daemon_reload: true
```
**IMPORTANT: Template Path Convention**
- When playbooks are inside service directories, template `src:` paths are relative to that directory
- Use `src: config.j2` NOT `src: service_name/config.j2`
- The service directory prefix was correct when playbooks were at the ansible root, but is wrong now
**Host-Specific Templates**
Some services need different configuration per host. Store these in subdirectories named by hostname:
```
service_name/
├── deploy.yml
├── config.j2 # Default template
├── hostname1/ # Host-specific overrides
│ └── config.j2
├── hostname2/
│ └── config.j2
└── hostname3/
└── config.j2
```
Use conditional logic to select the correct template:
```yaml
- name: Check for host-specific configuration
ansible.builtin.stat:
path: "{{playbook_dir}}/{{inventory_hostname_short}}/config.j2"
delegate_to: localhost
register: host_specific_config
become: false
- name: Template host-specific configuration
become: true
ansible.builtin.template:
src: "{{playbook_dir}}/{{inventory_hostname_short}}/config.j2"
dest: "{{service_directory}}/config"
when: host_specific_config.stat.exists
- name: Template default configuration
become: true
ansible.builtin.template:
src: config.j2
dest: "{{service_directory}}/config"
when: not host_specific_config.stat.exists
```
**Real Example: Alloy Service**
```
alloy/
├── deploy.yml
├── config.alloy.j2 # Default configuration
├── ariel/ # Neo4j monitoring
│ └── config.alloy.j2
├── miranda/ # Docker monitoring
│ └── config.alloy.j2
├── oberon/ # Web services monitoring
│ └── config.alloy.j2
└── puck/ # Application monitoring
└── config.alloy.j2
```
## Service Detection Pattern
**Purpose**: Allow hosts to selectively run service playbooks
**How it works**:
1. Each host defines a `services:` list in `host_vars/`
2. Each playbook checks if its service is in the list
3. Playbook skips host if service not needed
**Example**:
`inventory/host_vars/server1.yml`:
```yaml
services:
- docker
- nginx
- redis
```
`nginx/deploy.yml`:
```yaml
- name: Deploy Nginx
hosts: ubuntu
tasks:
- name: Check if host has nginx service
ansible.builtin.set_fact:
has_nginx: "{{ 'nginx' in services | default([]) }}"
- name: Skip hosts without nginx
ansible.builtin.meta: end_host
when: not has_nginx
# Rest of tasks only run if nginx in services list
```
## Ansible Vault Integration
**Setup**:
```bash
# Create vault password file (one-time)
echo "your_vault_password" > .vault_pass
chmod 600 .vault_pass
# Configure ansible.cfg
echo "vault_password_file = .vault_pass" >> ansible.cfg
```
**Usage**:
```bash
# Edit vault file
ansible-vault edit inventory/group_vars/all/vault.yml
# View vault file
ansible-vault view inventory/group_vars/all/vault.yml
# Encrypt new file
ansible-vault encrypt secrets.yml
```
**Variable naming convention**:
- Prefix vault variables with `vault_`
- Reference in regular vars: `db_password: "{{vault_db_password}}"`
## Running Playbooks
**Full deployment**:
```bash
ansible-playbook site.yml
```
**Single service**:
```bash
ansible-playbook nginx/deploy.yml
```
**Specific hosts**:
```bash
ansible-playbook nginx/deploy.yml --limit server1.example.com
```
**Check mode (dry-run)**:
```bash
ansible-playbook site.yml --check
```
**With extra verbosity**:
```bash
ansible-playbook nginx/deploy.yml -vv
```
## Benefits of This Structure
### 1. Cleaner Root Directory
- **Before**: 29+ playbook files cluttering root
- **After**: 3-4 utility playbooks + site.yml
### 2. Simplified Inventory
- **Before**: 361 lines with inline variables
- **After**: 34 lines of pure structure
- Variables organized logically by host/group
### 3. Service Cohesion
- Everything related to a service in one place
- Easy to find templates when editing playbooks
- Natural grouping for git operations
### 4. Scalability
- Easy to add new services (create directory, add playbook)
- Easy to add new hosts (create host_vars file)
- No risk of playbook name conflicts
### 5. Reusability
- Service directories can be copied to other projects
- host_vars pattern works for any inventory size
- Clear separation of concerns
### 6. Maintainability
- Changes isolated to service directories
- Inventory file rarely needs editing
- Clear audit trail in git (changes per service)
## Migration Checklist
Moving an existing Ansible project to this structure:
- [ ] Create service directories for each playbook
- [ ] Move `{service}_deploy.yml``{service}/deploy.yml`
- [ ] Move templates into service directories
- [ ] Extract host variables from inventory to `host_vars/`
- [ ] Extract group variables to `group_vars/all/vars.yml`
- [ ] Move secrets to `group_vars/all/vault.yml` (encrypted)
- [ ] Update `site.yml` import_playbook paths
- [ ] Backup original inventory: `cp hosts hosts.backup`
- [ ] Create simplified inventory with only group/host structure
- [ ] Test with `ansible-playbook site.yml --check`
- [ ] Verify with limited deployment: `--limit test_host`
## Example: Adding a New Service
**1. Create service directory**:
```bash
mkdir ansible/myapp
```
**2. Create deployment playbook** (`ansible/myapp/deploy.yml`):
```yaml
---
- name: Deploy MyApp
hosts: ubuntu
tasks:
- name: Check if host has myapp service
ansible.builtin.set_fact:
has_myapp: "{{ 'myapp' in services | default([]) }}"
- name: Skip hosts without myapp
ansible.builtin.meta: end_host
when: not has_myapp
- name: Deploy myapp
# ... deployment tasks
```
**3. Create template** (`ansible/myapp/config.yml.j2`):
```yaml
app_name: MyApp
port: {{myapp_port}}
database: {{myapp_db_host}}
```
**4. Add variables to host** (`inventory/host_vars/server1.yml`):
```yaml
services:
- myapp # Add to services list
# MyApp configuration
myapp_port: 8080
myapp_db_host: db.example.com
```
**5. Add to site.yml**:
```yaml
- name: Deploy MyApp
import_playbook: myapp/deploy.yml
```
**6. Deploy**:
```bash
ansible-playbook myapp/deploy.yml
```
## Best Practices
### Naming Conventions
- Service directories: lowercase, underscores (e.g., `mcp_switchboard/`)
- Playbooks: `deploy.yml`, `stage.yml`, `remove.yml`
- Templates: descriptive name + `.j2` extension
- Variables: service prefix (e.g., `nginx_port`, `redis_password`)
- Vault variables: `vault_` prefix
### File Organization
- Keep playbooks under 100 lines (split into task files if larger)
- Group related templates in service directory
- Use comments to document non-obvious variables
- Add README.md to complex service directories
### Variable Organization
- Host-specific: `host_vars/{hostname}.yml`
- Service-specific across hosts: `group_vars/{service_group}/vars.yml`
- Global configuration: `group_vars/all/vars.yml`
- Secrets: `group_vars/all/vault.yml` (encrypted)
### Idempotency
- Use `creates:` parameter for one-time operations
- Use `state:` explicitly (present/absent/restarted)
- Check conditions before destructive operations
- Test with `--check` mode before applying
### Documentation
- Comment complex task logic
- Document required variables in playbook header
- Add README.md for service directories with many files
- Keep docs/ separate from ansible/ directory
## Related Documentation
- [Ansible Best Practices](https://docs.ansible.com/ansible/latest/tips_tricks/ansible_tips_tricks.html)
- [Ansible Vault Guide](https://docs.ansible.com/ansible/latest/vault_guide/index.html)
- [Inventory Organization](https://docs.ansible.com/ansible/latest/inventory_guide/intro_inventory.html)
## Secret Management Patterns
### Ansible Vault (Sandbox Environment)
**Purpose**: Store sensitive values encrypted at rest in version control
**File Location**: `inventory/group_vars/all/vault.yml`
**Variable Naming Convention**: Prefix all vault variables with `vault_`
**Example vault.yml**:
Note the entire vault file is encrypted
```yaml
---
# Database passwords
vault_postgres_admin_password: # Avoid special characters & non-ASCII
vault_casdoor_db_password:
# S3 credentials
vault_casdoor_s3_access_key:
vault_casdoor_s3_secret_key:
vault_casdoor_s3_bucket:
```
**Host Variables Reference Vault**:
```yaml
# In host_vars/oberon.incus.yml
casdoor_db_password: "{{ vault_casdoor_db_password }}"
casdoor_s3_access_key: "{{ vault_casdoor_s3_access_key }}"
casdoor_s3_secret_key: "{{ vault_casdoor_s3_secret_key }}"
casdoor_s3_bucket: "{{ vault_casdoor_s3_bucket }}"
# Non-sensitive values stay as plain variables
casdoor_s3_endpoint: "https://ariel.incus:9000"
casdoor_s3_region: "us-east-1"
```
**Prerequisites**:
- Set `ANSIBLE_VAULT_PASSWORD_FILE` environment variable
- Create `.vault_pass` file with vault password
- Add `.vault_pass` to `.gitignore`
**Encrypting New Values**:
```bash
# Encrypt a string and add to vault.yml
echo -n "secret_value" | ansible-vault encrypt_string --stdin-name 'vault_variable_name'
# Edit vault file directly
ansible-vault edit inventory/group_vars/all/vault.yml
```
### OCI Vault (Production Environment)
**Purpose**: Use Oracle Cloud Infrastructure Vault for centralized secret management
**Variable Pattern**: Use Ansible lookups to fetch secrets at runtime
**Example host_vars for OCI**:
```yaml
# In host_vars/production-server.yml
# Database passwords from OCI Vault
casdoor_db_password: "{{ lookup('community.oci.oci_secret', 'casdoor-db-password', compartment_id=oci_compartment_id, vault_id=oci_services_vault_id) }}"
# S3 credentials from OCI Vault
casdoor_s3_access_key: "{{ lookup('community.oci.oci_secret', 'casdoor-s3-access-key', compartment_id=oci_compartment_id, vault_id=oci_services_vault_id) }}"
casdoor_s3_secret_key: "{{ lookup('community.oci.oci_secret', 'casdoor-s3-secret-key', compartment_id=oci_compartment_id, vault_id=oci_services_vault_id) }}"
casdoor_s3_bucket: "{{ lookup('community.oci.oci_secret', 'casdoor-s3-bucket', compartment_id=oci_compartment_id, vault_id=oci_services_vault_id) }}"
# Non-sensitive values remain as plain variables
casdoor_s3_endpoint: "https://objectstorage.us-phoenix-1.oraclecloud.com"
casdoor_s3_region: "us-phoenix-1"
```
**OCI Vault Organization**:
```
OCI Compartment: production
├── Vault: agathos-databases
│ ├── Secret: postgres-admin-password
│ └── Secret: casdoor-db-password
├── Vault: agathos-services
│ ├── Secret: casdoor-s3-access-key
│ ├── Secret: casdoor-s3-secret-key
│ ├── Secret: casdoor-s3-bucket
│ └── Secret: openwebui-db-password
└── Vault: agathos-integrations
├── Secret: apikey-openai
└── Secret: apikey-anthropic
```
**Secret Naming Convention**:
- Ansible Vault: `vault_service_secret` (underscores)
- OCI Vault: `service-secret` (hyphens)
**Benefits of Two-Tier Pattern**:
1. **Portability**: Service playbooks remain unchanged across environments
2. **Flexibility**: Switch secret backends by changing only host_vars
3. **Clarity**: Variable names clearly indicate their purpose
4. **Security**: Secrets never appear in playbooks or templates
### S3 Bucket Provisioning with Ansible
**Purpose**: Provision Incus S3 buckets and manage credentials in Ansible Vault
**Playbooks**:
- `provision_s3.yml` - Create bucket and store credentials
- `regenerate_s3_key.yml` - Rotate credentials
- `remove_s3.yml` - Delete bucket and clean vault
**Usage**:
```bash
# Provision new S3 bucket for a service
ansible-playbook provision_s3.yml -e bucket_name=casdoor -e service_name=casdoor
# Regenerate access credentials (invalidates old keys)
ansible-playbook regenerate_s3_key.yml -e bucket_name=casdoor -e service_name=casdoor
# Remove bucket and credentials
ansible-playbook remove_s3.yml -e bucket_name=casdoor -e service_name=casdoor
```
**Requirements**:
- User must be member of `incus` group
- `ANSIBLE_VAULT_PASSWORD_FILE` must be set
- Incus CLI must be configured and accessible
**What Gets Created**:
1. Incus storage bucket in project `agathos`, pool `default`
2. Admin access key for the bucket
3. Encrypted vault entries: `vault_<service>_s3_access_key`, `vault_<service>_s3_secret_key`, `vault_<service>_s3_bucket`
**Behind the Scenes**:
- Role: `incus_storage_bucket`
- Idempotent: Checks if bucket/key exists before creating
- Atomic: Credentials captured and encrypted in single operation
- Variables sourced from: `inventory/group_vars/all/vars.yml`
## Troubleshooting
### Template Not Found Errors
**Symptom**: `Could not find or access 'service_name/template.j2'`
**Cause**: When playbooks were moved from ansible root into service directories, template paths weren't updated.
**Solution**: Remove the service directory prefix from template paths:
```yaml
# WRONG (old path from when playbook was at root)
src: service_name/config.j2
# CORRECT (playbook is now in service_name/ directory)
src: config.j2
```
### Host-Specific Template Path Issues
**Symptom**: Playbook fails to find host-specific templates
**Cause**: Host-specific directories are at the wrong level
**Expected Structure**:
```
service_name/
├── deploy.yml
├── config.j2 # Default
└── hostname/ # Host-specific (inside service dir)
└── config.j2
```
**Use `{{playbook_dir}}` for relative paths**:
```yaml
# This finds templates relative to the playbook location
src: "{{playbook_dir}}/{{inventory_hostname_short}}/config.j2"
```
---
**Last Updated**: December 2025
**Project**: Agathos Infrastructure
**Approval**: Red Panda Approved™