# Terraform Practices & Patterns This document describes the Terraform design philosophy, patterns, and practices used across our infrastructure. The audience includes LLMs assisting with development, new team members, and existing team members seeking a reference. ## Design Philosophy ### Incus-First Infrastructure Incus containers form the foundational layer of all environments. Management and monitoring infrastructure (Prospero, Titania) must exist before application hosts. This is a **critical dependency** that must be explicitly codified. **Why?** Terraform isn't magic. Implicit ordering can lead to race conditions or failed deployments. Always use explicit `depends_on` for critical infrastructure chains. ```hcl # Example: Application host depends on monitoring infrastructure resource "incus_instance" "app_host" { # ... depends_on = [incus_instance.uranian_hosts["prospero"]] } ``` ### Explicit Dependencies Never rely solely on implicit resource ordering for critical infrastructure. Codify dependencies explicitly to: - ✔ Prevent race conditions during parallel applies - ✔ Document architectural relationships in code - ✔ Ensure consistent deployment ordering across environments ## Repository Strategy ### Ouranos (Sandbox) Ouranos is the **Sandbox repository** — isolated, safe for external demos, and uses local state. | Aspect | Decision | |--------|----------| | Purpose | Evaluation, demos, pattern experimentation, new software testing | | State | Local (no remote backend) | | Secrets | No production credentials or references | | Security | Safe to use on external infrastructure for demos | ### Production Repository (Separate) A separate repository manages Dev, UAT, and Prod environments: ``` terraform/ ├── modules/incus_host/ # Reusable container module ├── environments/ │ ├── dev/ # Local Incus only │ └── prod/ # OCI + Incus (parameterized via tfvars) ``` | Aspect | Decision | |--------|----------| | State | PostgreSQL backend on `eris.helu.ca:6432` with SSL | | Schemas | Separate per environment: `dev`, `uat`, `prod` | | UAT/Prod | Parameterized twins via `-var-file` | ## Module Design ### When to Extract a Module A pattern is a good module candidate when it meets these criteria: | Criterion | Description | |-----------|-------------| | **Reuse** | Pattern used across multiple environments (Sandbox, Dev, UAT, Prod) | | **Stable Interface** | Inputs/outputs won't change frequently | | **Testable** | Can validate module independently before promotion | | **Encapsulates Complexity** | Hides `dynamic` blocks, `for_each`, cloud-init generation | ### When NOT to Extract - Single-use patterns - Tightly coupled to specific environment - Adds indirection without measurable benefit ### The `incus_host` Module The standard container provisioning pattern extracted from Ouranos: **Inputs:** - `hosts` — Map of host definitions (name, role, image, devices, config) - `project` — Incus project name - `profile` — Incus profile name - `cloud_init_template` — Cloud-init configuration template - `ssh_key_path` — Path to SSH authorized keys - `depends_on_resources` — Explicit dependencies for infrastructure ordering **Outputs:** - `host_details` — Name, IPv4, role, description for each host - `inventory` — Documentation reference for DHCP/DNS provisioning ## Environment Strategy ### Environment Purposes | Environment | Purpose | Infrastructure | |-------------|---------|----------------| | **Sandbox** | Evaluation, demos, pattern experimentation | Local Incus only | | **Dev** | Integration testing, container builds, security testing | Local Incus only | | **UAT** | User acceptance testing, bug resolution | OCI + Incus (hybrid) | | **Prod** | Production workloads | OCI + Incus (hybrid) | ### Parameterized Twins (UAT/Prod) UAT and Prod are architecturally identical. Use a single environment directory with variable files: ```bash # UAT deployment terraform apply -var-file=uat.tfvars # Prod deployment terraform apply -var-file=prod.tfvars ``` Key differences in tfvars: - Hostnames and DNS domains - Resource sizing (CPU, memory limits) - OCI compartment IDs - Credential references ## State Management ### Sandbox (Ouranos) Local state is acceptable because: - Environment is ephemeral - Single-user workflow - No production secrets to protect - Safe for external demos ### Production Environments PostgreSQL backend on `eris.helu.ca`: ```hcl terraform { backend "pg" { conn_str = "postgres://eris.helu.ca:6432/terraform_state?sslmode=verify-full" schema_name = "dev" # or "uat", "prod" } } ``` **Connection requirements:** - Port 6432 (pgBouncer) - SSL with `sslmode=verify-full` - Credentials via environment variables (`PGUSER`, `PGPASSWORD`) - Separate schema per environment for isolation ## Integration Points ### Terraform → DHCP/DNS The `ouranos_inventory` output provides host information for DHCP/DNS provisioning: 1. Terraform creates containers with cloud-init 2. `ouranos_inventory` output includes hostnames and IPs 3. MAC addresses registered in DHCP server 4. DHCP server creates DNS entries (`hostname.incus` domain) 5. Ansible uses DNS names for host connectivity ### Terraform → Ansible Ansible does **not** consume Terraform outputs directly. Instead: 1. Terraform provisions containers 2. Incus DNS resolution provides `hostname.incus` domain 3. Ansible inventory uses static DNS names 4. `sandbox_up.yml` configures DNS resolution on the hypervisor ```yaml # Ansible inventory uses DNS names, not Terraform outputs ubuntu: hosts: oberon.incus: ariel.incus: prospero.incus: ``` ### Terraform → Bash Scripts The `ssh_key_update.sh` script demonstrates proper integration: ```bash terraform output -json ouranos_inventory | jq -r \ '.uranian_hosts.hosts | to_entries[] | "\(.key) \(.value.ipv4)"' | \ while read hostname ip; do ssh-keyscan -H "$ip" >> ~/.ssh/known_hosts ssh-keyscan -H "$hostname.incus" >> ~/.ssh/known_hosts done ``` ## Promotion Workflow All infrastructure changes flow through this pipeline: ``` Ouranos (Sandbox) ↓ Validate pattern works ↓ Extract to module if reusable Dev ↓ Integration testing ↓ Container builds ↓ Security testing UAT ↓ User acceptance testing ↓ Bug fixes return to Dev ↓ Delete environment, test restore Prod ↓ Deploy from tested artifacts ``` **Critical:** Nothing starts in Prod. Every change originates in Ouranos, is validated through the pipeline, and only then deployed to production. ### Promotion Includes When promoting Terraform changes, always update corresponding: - Ansible playbooks and templates - Service documentation in `/docs/services/` - Host variables if new services added ## Output Conventions ### `ouranos_inventory` The primary output for documentation and DNS integration: ```hcl output "ouranos_inventory" { description = "Host inventory for documentation and DHCP/DNS provisioning" value = { uranian_hosts = { hosts = { for name, instance in incus_instance.uranian_hosts : name => { name = instance.name ipv4 = instance.ipv4_address role = local.uranian_hosts[name].role description = local.uranian_hosts[name].description security_nesting = lookup(local.uranian_hosts[name].config, "security.nesting", false) } } } } } ``` **Purpose:** - Update [sandbox.html](sandbox.html) documentation - Reference for DHCP server MAC/IP registration - DNS entry creation via DHCP ## Layered Configuration ### Single Config with Conditional Resources Avoid multiple separate Terraform configurations. Use one config with conditional resources: ``` environments/prod/ ├── main.tf # Incus project, profile, images (always) ├── incus_hosts.tf # Module call for Incus containers (always) ├── oci_resources.tf # OCI compute (conditional) ├── variables.tf ├── dev.tfvars # Dev: enable_oci = false ├── uat.tfvars # UAT: enable_oci = true └── prod.tfvars # Prod: enable_oci = true ``` ```hcl variable "enable_oci" { description = "Enable OCI resources (false for Dev, true for UAT/Prod)" type = bool default = false } resource "oci_core_instance" "hosts" { for_each = var.enable_oci ? var.oci_hosts : {} # ... } ``` ## Best Practices Summary | Practice | Rationale | |----------|-----------| | ✔ Explicit `depends_on` for critical chains | Terraform isn't magic | | ✔ Local map for host definitions | Single source of truth, easy iteration | | ✔ `for_each` over `count` | Stable resource addresses | | ✔ `dynamic` blocks for optional devices | Clean, declarative device configuration | | ✔ Merge base config with overrides | DRY principle for common settings | | ✔ Separate tfvars for environment twins | Minimal duplication, clear parameterization | | ✔ Document module interfaces | Enable promotion across environments | | ✔ Never start in Prod | Always validate through pipeline |