Replaces the minimal project description with a comprehensive README including a component overview table, quick start instructions, common Ansible operations, and links to detailed documentation. Aligns with Red Panda Approval™ standards.
297 lines
9.0 KiB
Markdown
297 lines
9.0 KiB
Markdown
# Terraform Practices & Patterns
|
|
|
|
This document describes the Terraform design philosophy, patterns, and practices used across our infrastructure. The audience includes LLMs assisting with development, new team members, and existing team members seeking a reference.
|
|
|
|
## Design Philosophy
|
|
|
|
### Incus-First Infrastructure
|
|
|
|
Incus containers form the foundational layer of all environments. Management and monitoring infrastructure (Prospero, Titania) must exist before application hosts. This is a **critical dependency** that must be explicitly codified.
|
|
|
|
**Why?** Terraform isn't magic. Implicit ordering can lead to race conditions or failed deployments. Always use explicit `depends_on` for critical infrastructure chains.
|
|
|
|
```hcl
|
|
# Example: Application host depends on monitoring infrastructure
|
|
resource "incus_instance" "app_host" {
|
|
# ...
|
|
depends_on = [incus_instance.uranian_hosts["prospero"]]
|
|
}
|
|
```
|
|
|
|
### Explicit Dependencies
|
|
|
|
Never rely solely on implicit resource ordering for critical infrastructure. Codify dependencies explicitly to:
|
|
|
|
- ✔ Prevent race conditions during parallel applies
|
|
- ✔ Document architectural relationships in code
|
|
- ✔ Ensure consistent deployment ordering across environments
|
|
|
|
## Repository Strategy
|
|
|
|
### Agathos (Sandbox)
|
|
|
|
Agathos is the **Sandbox repository** — isolated, safe for external demos, and uses local state.
|
|
|
|
| Aspect | Decision |
|
|
|--------|----------|
|
|
| Purpose | Evaluation, demos, pattern experimentation, new software testing |
|
|
| State | Local (no remote backend) |
|
|
| Secrets | No production credentials or references |
|
|
| Security | Safe to use on external infrastructure for demos |
|
|
|
|
### Production Repository (Separate)
|
|
|
|
A separate repository manages Dev, UAT, and Prod environments:
|
|
|
|
```
|
|
terraform/
|
|
├── modules/incus_host/ # Reusable container module
|
|
├── environments/
|
|
│ ├── dev/ # Local Incus only
|
|
│ └── prod/ # OCI + Incus (parameterized via tfvars)
|
|
```
|
|
|
|
| Aspect | Decision |
|
|
|--------|----------|
|
|
| State | PostgreSQL backend on `eris.helu.ca:6432` with SSL |
|
|
| Schemas | Separate per environment: `dev`, `uat`, `prod` |
|
|
| UAT/Prod | Parameterized twins via `-var-file` |
|
|
|
|
## Module Design
|
|
|
|
### When to Extract a Module
|
|
|
|
A pattern is a good module candidate when it meets these criteria:
|
|
|
|
| Criterion | Description |
|
|
|-----------|-------------|
|
|
| **Reuse** | Pattern used across multiple environments (Sandbox, Dev, UAT, Prod) |
|
|
| **Stable Interface** | Inputs/outputs won't change frequently |
|
|
| **Testable** | Can validate module independently before promotion |
|
|
| **Encapsulates Complexity** | Hides `dynamic` blocks, `for_each`, cloud-init generation |
|
|
|
|
### When NOT to Extract
|
|
|
|
- Single-use patterns
|
|
- Tightly coupled to specific environment
|
|
- Adds indirection without measurable benefit
|
|
|
|
### The `incus_host` Module
|
|
|
|
The standard container provisioning pattern extracted from Agathos:
|
|
|
|
**Inputs:**
|
|
- `hosts` — Map of host definitions (name, role, image, devices, config)
|
|
- `project` — Incus project name
|
|
- `profile` — Incus profile name
|
|
- `cloud_init_template` — Cloud-init configuration template
|
|
- `ssh_key_path` — Path to SSH authorized keys
|
|
- `depends_on_resources` — Explicit dependencies for infrastructure ordering
|
|
|
|
**Outputs:**
|
|
- `host_details` — Name, IPv4, role, description for each host
|
|
- `inventory` — Documentation reference for DHCP/DNS provisioning
|
|
|
|
## Environment Strategy
|
|
|
|
### Environment Purposes
|
|
|
|
| Environment | Purpose | Infrastructure |
|
|
|-------------|---------|----------------|
|
|
| **Sandbox** | Evaluation, demos, pattern experimentation | Local Incus only |
|
|
| **Dev** | Integration testing, container builds, security testing | Local Incus only |
|
|
| **UAT** | User acceptance testing, bug resolution | OCI + Incus (hybrid) |
|
|
| **Prod** | Production workloads | OCI + Incus (hybrid) |
|
|
|
|
### Parameterized Twins (UAT/Prod)
|
|
|
|
UAT and Prod are architecturally identical. Use a single environment directory with variable files:
|
|
|
|
```bash
|
|
# UAT deployment
|
|
terraform apply -var-file=uat.tfvars
|
|
|
|
# Prod deployment
|
|
terraform apply -var-file=prod.tfvars
|
|
```
|
|
|
|
Key differences in tfvars:
|
|
- Hostnames and DNS domains
|
|
- Resource sizing (CPU, memory limits)
|
|
- OCI compartment IDs
|
|
- Credential references
|
|
|
|
## State Management
|
|
|
|
### Sandbox (Agathos)
|
|
|
|
Local state is acceptable because:
|
|
- Environment is ephemeral
|
|
- Single-user workflow
|
|
- No production secrets to protect
|
|
- Safe for external demos
|
|
|
|
### Production Environments
|
|
|
|
PostgreSQL backend on `eris.helu.ca`:
|
|
|
|
```hcl
|
|
terraform {
|
|
backend "pg" {
|
|
conn_str = "postgres://eris.helu.ca:6432/terraform_state?sslmode=verify-full"
|
|
schema_name = "dev" # or "uat", "prod"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Connection requirements:**
|
|
- Port 6432 (pgBouncer)
|
|
- SSL with `sslmode=verify-full`
|
|
- Credentials via environment variables (`PGUSER`, `PGPASSWORD`)
|
|
- Separate schema per environment for isolation
|
|
|
|
## Integration Points
|
|
|
|
### Terraform → DHCP/DNS
|
|
|
|
The `agathos_inventory` output provides host information for DHCP/DNS provisioning:
|
|
|
|
1. Terraform creates containers with cloud-init
|
|
2. `agathos_inventory` output includes hostnames and IPs
|
|
3. MAC addresses registered in DHCP server
|
|
4. DHCP server creates DNS entries (`hostname.incus` domain)
|
|
5. Ansible uses DNS names for host connectivity
|
|
|
|
### Terraform → Ansible
|
|
|
|
Ansible does **not** consume Terraform outputs directly. Instead:
|
|
|
|
1. Terraform provisions containers
|
|
2. Incus DNS resolution provides `hostname.incus` domain
|
|
3. Ansible inventory uses static DNS names
|
|
4. `sandbox_up.yml` configures DNS resolution on the hypervisor
|
|
|
|
```yaml
|
|
# Ansible inventory uses DNS names, not Terraform outputs
|
|
ubuntu:
|
|
hosts:
|
|
oberon.incus:
|
|
ariel.incus:
|
|
prospero.incus:
|
|
```
|
|
|
|
### Terraform → Bash Scripts
|
|
|
|
The `ssh_key_update.sh` script demonstrates proper integration:
|
|
|
|
```bash
|
|
terraform output -json agathos_inventory | jq -r \
|
|
'.uranian_hosts.hosts | to_entries[] | "\(.key) \(.value.ipv4)"' | \
|
|
while read hostname ip; do
|
|
ssh-keyscan -H "$ip" >> ~/.ssh/known_hosts
|
|
ssh-keyscan -H "$hostname.incus" >> ~/.ssh/known_hosts
|
|
done
|
|
```
|
|
|
|
## Promotion Workflow
|
|
|
|
All infrastructure changes flow through this pipeline:
|
|
|
|
```
|
|
Agathos (Sandbox)
|
|
↓ Validate pattern works
|
|
↓ Extract to module if reusable
|
|
Dev
|
|
↓ Integration testing
|
|
↓ Container builds
|
|
↓ Security testing
|
|
UAT
|
|
↓ User acceptance testing
|
|
↓ Bug fixes return to Dev
|
|
↓ Delete environment, test restore
|
|
Prod
|
|
↓ Deploy from tested artifacts
|
|
```
|
|
|
|
**Critical:** Nothing starts in Prod. Every change originates in Agathos, is validated through the pipeline, and only then deployed to production.
|
|
|
|
### Promotion Includes
|
|
|
|
When promoting Terraform changes, always update corresponding:
|
|
- Ansible playbooks and templates
|
|
- Service documentation in `/docs/services/`
|
|
- Host variables if new services added
|
|
|
|
## Output Conventions
|
|
|
|
### `agathos_inventory`
|
|
|
|
The primary output for documentation and DNS integration:
|
|
|
|
```hcl
|
|
output "agathos_inventory" {
|
|
description = "Host inventory for documentation and DHCP/DNS provisioning"
|
|
value = {
|
|
uranian_hosts = {
|
|
hosts = {
|
|
for name, instance in incus_instance.uranian_hosts : name => {
|
|
name = instance.name
|
|
ipv4 = instance.ipv4_address
|
|
role = local.uranian_hosts[name].role
|
|
description = local.uranian_hosts[name].description
|
|
security_nesting = lookup(local.uranian_hosts[name].config, "security.nesting", false)
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Purpose:**
|
|
- Update [sandbox.html](sandbox.html) documentation
|
|
- Reference for DHCP server MAC/IP registration
|
|
- DNS entry creation via DHCP
|
|
|
|
## Layered Configuration
|
|
|
|
### Single Config with Conditional Resources
|
|
|
|
Avoid multiple separate Terraform configurations. Use one config with conditional resources:
|
|
|
|
```
|
|
environments/prod/
|
|
├── main.tf # Incus project, profile, images (always)
|
|
├── incus_hosts.tf # Module call for Incus containers (always)
|
|
├── oci_resources.tf # OCI compute (conditional)
|
|
├── variables.tf
|
|
├── dev.tfvars # Dev: enable_oci = false
|
|
├── uat.tfvars # UAT: enable_oci = true
|
|
└── prod.tfvars # Prod: enable_oci = true
|
|
```
|
|
|
|
```hcl
|
|
variable "enable_oci" {
|
|
description = "Enable OCI resources (false for Dev, true for UAT/Prod)"
|
|
type = bool
|
|
default = false
|
|
}
|
|
|
|
resource "oci_core_instance" "hosts" {
|
|
for_each = var.enable_oci ? var.oci_hosts : {}
|
|
# ...
|
|
}
|
|
```
|
|
|
|
## Best Practices Summary
|
|
|
|
| Practice | Rationale |
|
|
|----------|-----------|
|
|
| ✔ Explicit `depends_on` for critical chains | Terraform isn't magic |
|
|
| ✔ Local map for host definitions | Single source of truth, easy iteration |
|
|
| ✔ `for_each` over `count` | Stable resource addresses |
|
|
| ✔ `dynamic` blocks for optional devices | Clean, declarative device configuration |
|
|
| ✔ Merge base config with overrides | DRY principle for common settings |
|
|
| ✔ Separate tfvars for environment twins | Minimal duplication, clear parameterization |
|
|
| ✔ Document module interfaces | Enable promotion across environments |
|
|
| ✔ Never start in Prod | Always validate through pipeline |
|