Replaces the minimal project description with a comprehensive README including a component overview table, quick start instructions, common Ansible operations, and links to detailed documentation. Aligns with Red Panda Approval™ standards.
9.0 KiB
Terraform Practices & Patterns
This document describes the Terraform design philosophy, patterns, and practices used across our infrastructure. The audience includes LLMs assisting with development, new team members, and existing team members seeking a reference.
Design Philosophy
Incus-First Infrastructure
Incus containers form the foundational layer of all environments. Management and monitoring infrastructure (Prospero, Titania) must exist before application hosts. This is a critical dependency that must be explicitly codified.
Why? Terraform isn't magic. Implicit ordering can lead to race conditions or failed deployments. Always use explicit depends_on for critical infrastructure chains.
# Example: Application host depends on monitoring infrastructure
resource "incus_instance" "app_host" {
# ...
depends_on = [incus_instance.uranian_hosts["prospero"]]
}
Explicit Dependencies
Never rely solely on implicit resource ordering for critical infrastructure. Codify dependencies explicitly to:
- ✔ Prevent race conditions during parallel applies
- ✔ Document architectural relationships in code
- ✔ Ensure consistent deployment ordering across environments
Repository Strategy
Agathos (Sandbox)
Agathos is the Sandbox repository — isolated, safe for external demos, and uses local state.
| Aspect | Decision |
|---|---|
| Purpose | Evaluation, demos, pattern experimentation, new software testing |
| State | Local (no remote backend) |
| Secrets | No production credentials or references |
| Security | Safe to use on external infrastructure for demos |
Production Repository (Separate)
A separate repository manages Dev, UAT, and Prod environments:
terraform/
├── modules/incus_host/ # Reusable container module
├── environments/
│ ├── dev/ # Local Incus only
│ └── prod/ # OCI + Incus (parameterized via tfvars)
| Aspect | Decision |
|---|---|
| State | PostgreSQL backend on eris.helu.ca:6432 with SSL |
| Schemas | Separate per environment: dev, uat, prod |
| UAT/Prod | Parameterized twins via -var-file |
Module Design
When to Extract a Module
A pattern is a good module candidate when it meets these criteria:
| Criterion | Description |
|---|---|
| Reuse | Pattern used across multiple environments (Sandbox, Dev, UAT, Prod) |
| Stable Interface | Inputs/outputs won't change frequently |
| Testable | Can validate module independently before promotion |
| Encapsulates Complexity | Hides dynamic blocks, for_each, cloud-init generation |
When NOT to Extract
- Single-use patterns
- Tightly coupled to specific environment
- Adds indirection without measurable benefit
The incus_host Module
The standard container provisioning pattern extracted from Agathos:
Inputs:
hosts— Map of host definitions (name, role, image, devices, config)project— Incus project nameprofile— Incus profile namecloud_init_template— Cloud-init configuration templatessh_key_path— Path to SSH authorized keysdepends_on_resources— Explicit dependencies for infrastructure ordering
Outputs:
host_details— Name, IPv4, role, description for each hostinventory— Documentation reference for DHCP/DNS provisioning
Environment Strategy
Environment Purposes
| Environment | Purpose | Infrastructure |
|---|---|---|
| Sandbox | Evaluation, demos, pattern experimentation | Local Incus only |
| Dev | Integration testing, container builds, security testing | Local Incus only |
| UAT | User acceptance testing, bug resolution | OCI + Incus (hybrid) |
| Prod | Production workloads | OCI + Incus (hybrid) |
Parameterized Twins (UAT/Prod)
UAT and Prod are architecturally identical. Use a single environment directory with variable files:
# UAT deployment
terraform apply -var-file=uat.tfvars
# Prod deployment
terraform apply -var-file=prod.tfvars
Key differences in tfvars:
- Hostnames and DNS domains
- Resource sizing (CPU, memory limits)
- OCI compartment IDs
- Credential references
State Management
Sandbox (Agathos)
Local state is acceptable because:
- Environment is ephemeral
- Single-user workflow
- No production secrets to protect
- Safe for external demos
Production Environments
PostgreSQL backend on eris.helu.ca:
terraform {
backend "pg" {
conn_str = "postgres://eris.helu.ca:6432/terraform_state?sslmode=verify-full"
schema_name = "dev" # or "uat", "prod"
}
}
Connection requirements:
- Port 6432 (pgBouncer)
- SSL with
sslmode=verify-full - Credentials via environment variables (
PGUSER,PGPASSWORD) - Separate schema per environment for isolation
Integration Points
Terraform → DHCP/DNS
The agathos_inventory output provides host information for DHCP/DNS provisioning:
- Terraform creates containers with cloud-init
agathos_inventoryoutput includes hostnames and IPs- MAC addresses registered in DHCP server
- DHCP server creates DNS entries (
hostname.incusdomain) - Ansible uses DNS names for host connectivity
Terraform → Ansible
Ansible does not consume Terraform outputs directly. Instead:
- Terraform provisions containers
- Incus DNS resolution provides
hostname.incusdomain - Ansible inventory uses static DNS names
sandbox_up.ymlconfigures DNS resolution on the hypervisor
# Ansible inventory uses DNS names, not Terraform outputs
ubuntu:
hosts:
oberon.incus:
ariel.incus:
prospero.incus:
Terraform → Bash Scripts
The ssh_key_update.sh script demonstrates proper integration:
terraform output -json agathos_inventory | jq -r \
'.uranian_hosts.hosts | to_entries[] | "\(.key) \(.value.ipv4)"' | \
while read hostname ip; do
ssh-keyscan -H "$ip" >> ~/.ssh/known_hosts
ssh-keyscan -H "$hostname.incus" >> ~/.ssh/known_hosts
done
Promotion Workflow
All infrastructure changes flow through this pipeline:
Agathos (Sandbox)
↓ Validate pattern works
↓ Extract to module if reusable
Dev
↓ Integration testing
↓ Container builds
↓ Security testing
UAT
↓ User acceptance testing
↓ Bug fixes return to Dev
↓ Delete environment, test restore
Prod
↓ Deploy from tested artifacts
Critical: Nothing starts in Prod. Every change originates in Agathos, is validated through the pipeline, and only then deployed to production.
Promotion Includes
When promoting Terraform changes, always update corresponding:
- Ansible playbooks and templates
- Service documentation in
/docs/services/ - Host variables if new services added
Output Conventions
agathos_inventory
The primary output for documentation and DNS integration:
output "agathos_inventory" {
description = "Host inventory for documentation and DHCP/DNS provisioning"
value = {
uranian_hosts = {
hosts = {
for name, instance in incus_instance.uranian_hosts : name => {
name = instance.name
ipv4 = instance.ipv4_address
role = local.uranian_hosts[name].role
description = local.uranian_hosts[name].description
security_nesting = lookup(local.uranian_hosts[name].config, "security.nesting", false)
}
}
}
}
}
Purpose:
- Update sandbox.html documentation
- Reference for DHCP server MAC/IP registration
- DNS entry creation via DHCP
Layered Configuration
Single Config with Conditional Resources
Avoid multiple separate Terraform configurations. Use one config with conditional resources:
environments/prod/
├── main.tf # Incus project, profile, images (always)
├── incus_hosts.tf # Module call for Incus containers (always)
├── oci_resources.tf # OCI compute (conditional)
├── variables.tf
├── dev.tfvars # Dev: enable_oci = false
├── uat.tfvars # UAT: enable_oci = true
└── prod.tfvars # Prod: enable_oci = true
variable "enable_oci" {
description = "Enable OCI resources (false for Dev, true for UAT/Prod)"
type = bool
default = false
}
resource "oci_core_instance" "hosts" {
for_each = var.enable_oci ? var.oci_hosts : {}
# ...
}
Best Practices Summary
| Practice | Rationale |
|---|---|
✔ Explicit depends_on for critical chains |
Terraform isn't magic |
| ✔ Local map for host definitions | Single source of truth, easy iteration |
✔ for_each over count |
Stable resource addresses |
✔ dynamic blocks for optional devices |
Clean, declarative device configuration |
| ✔ Merge base config with overrides | DRY principle for common settings |
| ✔ Separate tfvars for environment twins | Minimal duplication, clear parameterization |
| ✔ Document module interfaces | Enable promotion across environments |
| ✔ Never start in Prod | Always validate through pipeline |