# Ouranos Lab Infrastructure-as-Code project managing the **Ouranos Lab** — a development sandbox at [ouranos.helu.ca](https://ouranos.helu.ca). Uses **Terraform** for container provisioning and **Ansible** for configuration management, themed around the moons of Uranus. --- ## Project Overview | Component | Purpose | |-----------|---------| | **Terraform** | Provisions 10 specialised Incus containers (LXC) with DNS-resolved networking, security policies, and resource dependencies | | **Ansible** | Deploys Docker, databases (PostgreSQL, Neo4j), observability stack (Prometheus, Grafana, Loki), and application runtimes across all hosts | > **DNS Domain**: Incus resolves containers via the `.incus` domain suffix (e.g., `oberon.incus`, `portia.incus`). IPv4 addresses are dynamically assigned — always use DNS names, never hardcode IPs. --- ## Uranian Host Architecture All containers are named after moons of Uranus and resolved via the `.incus` DNS suffix. | Name | Role | Description | Nesting | |------|------|-------------|---------| | **ariel** | graph_database | Neo4j — Ethereal graph connections | ✔ | | **caliban** | agent_automation | Agent S MCP Server with MATE Desktop | ✔ | | **miranda** | mcp_docker_host | Dedicated Docker Host for MCP Servers | ✔ | | **oberon** | container_orchestration | Docker Host — MCP Switchboard, RabbitMQ, Open WebUI | ✔ | | **portia** | database | PostgreSQL — Relational database host | ❌ | | **prospero** | observability | PPLG stack — Prometheus, Grafana, Loki, PgAdmin | ❌ | | **puck** | application_runtime | Python App Host — JupyterLab, Django apps, Gitea Runner | ✔ | | **rosalind** | collaboration | Gitea, LobeChat, Nextcloud, AnythingLLM | ✔ | | **sycorax** | language_models | Arke LLM Proxy | ✔ | | **titania** | proxy_sso | HAProxy TLS termination + Casdoor SSO | ✔ | ### oberon — Container Orchestration King of the Fairies orchestrating containers and managing MCP infrastructure. - Docker engine - MCP Switchboard (port 22785) — Django app routing MCP tool calls - RabbitMQ message queue - Open WebUI LLM interface (port 22088, PostgreSQL backend on Portia) - SearXNG privacy search (port 22083, behind OAuth2-Proxy) - smtp4dev SMTP test server (port 22025) ### portia — Relational Database Intelligent and resourceful — the reliability of relational databases. - PostgreSQL 17 (port 5432) - Databases: `arke`, `anythingllm`, `gitea`, `hass`, `lobechat`, `mcp_switchboard`, `nextcloud`, `openwebui`, `periplus`, `spelunker` ### ariel — Graph Database Air spirit — ethereal, interconnected nature mirroring graph relationships. - Neo4j 5.26.0 (Docker) - HTTP API: port 25584 - Bolt: port 25554 ### puck — Application Runtime Shape-shifting trickster embodying Python's versatility. - Docker engine - JupyterLab (port 22071 via OAuth2-Proxy) - Gitea Runner (CI/CD agent) - Home Assistant (port 8123) - Django applications: Angelia (22281), Athena (22481), Kairos (22581), Icarlos (22681), Spelunker (22881), Peitho (22981) ### prospero — Observability Stack Master magician observing all events. - PPLG stack via Docker Compose: Prometheus, Loki, Grafana, PgAdmin - Internal HAProxy with OAuth2-Proxy for all dashboards - AlertManager with Pushover notifications - Prometheus metrics collection (`node-exporter`, HAProxy, Loki) - Loki log aggregation via Alloy (all hosts) - Grafana dashboard suite with Casdoor SSO integration ### miranda — MCP Docker Host Curious bridge between worlds — hosting MCP server containers. - Docker engine (API exposed on port 2375 for MCP Switchboard) - MCPO OpenAI-compatible MCP proxy - Grafana MCP Server (port 25533) - Gitea MCP Server (port 25535) - Neo4j MCP Server - Argos MCP Server — web search via SearXNG (port 25534) ### sycorax — Language Models Original magical power wielding language magic. - Arke LLM API Proxy (port 25540) - Multi-provider support (OpenAI, Anthropic, etc.) - Session management with Memcached - Database backend on Portia ### caliban — Agent Automation Autonomous computer agent learning through environmental interaction. - Docker engine - Agent S MCP Server (MATE desktop, AT-SPI automation) - Kernos MCP Shell Server (port 22021) - GPU passthrough for vision tasks - RDP access (port 25521) ### rosalind — Collaboration Services Witty and resourceful moon for PHP, Go, and Node.js runtimes. - Gitea self-hosted Git (port 22082, SSH on 22022) - LobeChat AI chat interface (port 22081) - Nextcloud file sharing and collaboration (port 22083) - AnythingLLM document AI workspace (port 22084) - Nextcloud data on dedicated Incus storage volume ### titania — Proxy & SSO Services Queen of the Fairies managing access control and authentication. - HAProxy 3.x with TLS termination (port 443) - Let's Encrypt wildcard certificate via certbot DNS-01 (Namecheap) - HTTP to HTTPS redirect (port 80) - Gitea SSH proxy (port 22022) - Casdoor SSO (port 22081, local PostgreSQL) - Prometheus metrics at `:8404/metrics` --- ## External Access via HAProxy Titania provides TLS termination and reverse proxy for all services. - **Base domain**: `ouranos.helu.ca` - **HTTPS**: port 443 (standard) - **HTTP**: port 80 (redirects to HTTPS) - **Certificate**: Let's Encrypt wildcard via certbot DNS-01 ### Route Table | Subdomain | Backend | Service | |-----------|---------|---------| | `ouranos.helu.ca` (root) | puck.incus:22281 | Angelia (Django) | | `alertmanager.ouranos.helu.ca` | prospero.incus:443 (SSL) | AlertManager | | `angelia.ouranos.helu.ca` | puck.incus:22281 | Angelia (Django) | | `anythingllm.ouranos.helu.ca` | rosalind.incus:22084 | AnythingLLM | | `arke.ouranos.helu.ca` | sycorax.incus:25540 | Arke LLM Proxy | | `athena.ouranos.helu.ca` | puck.incus:22481 | Athena (Django) | | `gitea.ouranos.helu.ca` | rosalind.incus:22082 | Gitea | | `grafana.ouranos.helu.ca` | prospero.incus:443 (SSL) | Grafana | | `hass.ouranos.helu.ca` | oberon.incus:8123 | Home Assistant | | `id.ouranos.helu.ca` | titania.incus:22081 | Casdoor SSO | | `icarlos.ouranos.helu.ca` | puck.incus:22681 | Icarlos (Django) | | `jupyterlab.ouranos.helu.ca` | puck.incus:22071 | JupyterLab (OAuth2-Proxy) | | `kairos.ouranos.helu.ca` | puck.incus:22581 | Kairos (Django) | | `lobechat.ouranos.helu.ca` | rosalind.incus:22081 | LobeChat | | `loki.ouranos.helu.ca` | prospero.incus:443 (SSL) | Loki | | `mcp-switchboard.ouranos.helu.ca` | oberon.incus:22785 | MCP Switchboard | | `nextcloud.ouranos.helu.ca` | rosalind.incus:22083 | Nextcloud | | `openwebui.ouranos.helu.ca` | oberon.incus:22088 | Open WebUI | | `peitho.ouranos.helu.ca` | puck.incus:22981 | Peitho (Django) | | `pgadmin.ouranos.helu.ca` | prospero.incus:443 (SSL) | PgAdmin 4 | | `prometheus.ouranos.helu.ca` | prospero.incus:443 (SSL) | Prometheus | | `searxng.ouranos.helu.ca` | oberon.incus:22073 | SearXNG (OAuth2-Proxy) | | `smtp4dev.ouranos.helu.ca` | oberon.incus:22085 | smtp4dev | | `spelunker.ouranos.helu.ca` | puck.incus:22881 | Spelunker (Django) | --- ## Infrastructure Management ### Quick Start ```bash # Provision containers cd terraform terraform init terraform plan terraform apply # Start all containers cd ../ansible source ~/env/ouranos/bin/activate ansible-playbook sandbox_up.yml # Deploy all services ansible-playbook site.yml # Stop all containers ansible-playbook sandbox_down.yml ``` ### Terraform Workflow 1. **Define** — Containers, networks, and resources in `*.tf` files 2. **Plan** — Review changes with `terraform plan` 3. **Apply** — Provision with `terraform apply` 4. **Verify** — Check outputs and container status ### Ansible Workflow 1. **Bootstrap** — Update packages, install essentials (`apt_update.yml`) 2. **Agents** — Deploy Alloy (log/metrics) and Node Exporter on all hosts 3. **Services** — Configure databases, Docker, applications, observability 4. **Verify** — Check service health and connectivity ### Vault Management ```bash # Edit secrets ansible-vault edit inventory/group_vars/all/vault.yml # View secrets ansible-vault view inventory/group_vars/all/vault.yml # Encrypt a new file ansible-vault encrypt new_secrets.yml ``` --- ## S3 Storage Provisioning Terraform provisions Incus S3 buckets for services requiring object storage: | Service | Host | Purpose | |---------|------|---------| | **Casdoor** | Titania | User avatars and SSO resource storage | | **LobeChat** | Rosalind | File uploads and attachments | > S3 credentials (access key, secret key, endpoint) are stored as sensitive Terraform outputs and managed in Ansible Vault with the `vault_*_s3_*` prefix. --- ## Ansible Automation ### Full Deployment (`site.yml`) Playbooks run in dependency order: | Playbook | Hosts | Purpose | |----------|-------|---------| | `apt_update.yml` | All | Update packages and install essentials | | `alloy/deploy.yml` | All | Grafana Alloy log/metrics collection | | `prometheus/node_deploy.yml` | All | Node Exporter metrics | | `docker/deploy.yml` | Oberon, Ariel, Miranda, Puck, Rosalind, Sycorax, Caliban, Titania | Docker engine | | `smtp4dev/deploy.yml` | Oberon | SMTP test server | | `pplg/deploy.yml` | Prospero | Full observability stack + HAProxy + OAuth2-Proxy | | `postgresql/deploy.yml` | Portia | PostgreSQL with all databases | | `postgresql_ssl/deploy.yml` | Titania | Dedicated PostgreSQL for Casdoor | | `neo4j/deploy.yml` | Ariel | Neo4j graph database | | `searxng/deploy.yml` | Oberon | SearXNG privacy search | | `haproxy/deploy.yml` | Titania | HAProxy TLS termination and routing | | `casdoor/deploy.yml` | Titania | Casdoor SSO | | `mcpo/deploy.yml` | Miranda | MCPO MCP proxy | | `openwebui/deploy.yml` | Oberon | Open WebUI LLM interface | | `hass/deploy.yml` | Oberon | Home Assistant | | `gitea/deploy.yml` | Rosalind | Gitea self-hosted Git | | `nextcloud/deploy.yml` | Rosalind | Nextcloud collaboration | ### Individual Service Deployments Services with standalone deploy playbooks (not in `site.yml`): | Playbook | Host | Service | |----------|------|---------| | `anythingllm/deploy.yml` | Rosalind | AnythingLLM document AI | | `arke/deploy.yml` | Sycorax | Arke LLM proxy | | `argos/deploy.yml` | Miranda | Argos MCP web search server | | `caliban/deploy.yml` | Caliban | Agent S MCP Server | | `certbot/deploy.yml` | Titania | Let's Encrypt certificate renewal | | `gitea_mcp/deploy.yml` | Miranda | Gitea MCP Server | | `gitea_runner/deploy.yml` | Puck | Gitea CI/CD runner | | `grafana_mcp/deploy.yml` | Miranda | Grafana MCP Server | | `jupyterlab/deploy.yml` | Puck | JupyterLab + OAuth2-Proxy | | `kernos/deploy.yml` | Caliban | Kernos MCP shell server | | `lobechat/deploy.yml` | Rosalind | LobeChat AI chat | | `neo4j_mcp/deploy.yml` | Miranda | Neo4j MCP Server | | `rabbitmq/deploy.yml` | Oberon | RabbitMQ message queue | ### Lifecycle Playbooks | Playbook | Purpose | |----------|---------| | `sandbox_up.yml` | Start all Uranian host containers | | `sandbox_down.yml` | Gracefully stop all containers | | `apt_update.yml` | Update packages on all hosts | | `site.yml` | Full deployment orchestration | --- ## Data Flow Architecture ### Observability Pipeline ``` All Hosts Prospero Alerts Alloy + Node Exporter → Prometheus + Loki + Grafana → AlertManager + Pushover collect metrics & logs storage & visualisation notifications ``` ### Integration Points | Consumer | Provider | Connection | |----------|----------|-----------| | All LLM apps | Arke (Sycorax) | `http://sycorax.incus:25540` | | Open WebUI, Arke, Gitea, Nextcloud, LobeChat | PostgreSQL (Portia) | `portia.incus:5432` | | Neo4j MCP | Neo4j (Ariel) | `ariel.incus:7687` (Bolt) | | MCP Switchboard | Docker API (Miranda) | `tcp://miranda.incus:2375` | | MCP Switchboard | RabbitMQ (Oberon) | `oberon.incus:5672` | | Kairos, Spelunker | RabbitMQ (Oberon) | `oberon.incus:5672` | | SMTP (all apps) | smtp4dev (Oberon) | `oberon.incus:22025` | | All hosts | Loki (Prospero) | `http://prospero.incus:3100` | | All hosts | Prometheus (Prospero) | `http://prospero.incus:9090` | --- ## Important Notes ⚠️ **Alloy Host Variables Required** — Every host with `alloy` in its `services` list must define `alloy_log_level` in `inventory/host_vars/.incus.yml`. The playbook will fail with an undefined variable error if this is missing. ⚠️ **Alloy Syslog Listeners Required for Docker Services** — Any Docker Compose service using the syslog logging driver must have a corresponding `loki.source.syslog` listener in the host's Alloy config template (`ansible/alloy//config.alloy.j2`). Missing listeners cause Docker containers to fail on start. ⚠️ **Local Terraform State** — This project uses local Terraform state (no remote backend). Do not run `terraform apply` from multiple machines simultaneously. ⚠️ **Nested Docker** — Docker runs inside Incus containers (nested), requiring `security.nesting = true` and `lxc.apparmor.profile=unconfined` AppArmor override on all Docker-enabled hosts. ⚠️ **Deployment Order** — Prospero (observability) must be fully deployed before other hosts, as Alloy on every host pushes logs and metrics to `prospero.incus`. Run `pplg/deploy.yml` before `site.yml` on a fresh environment.