358 lines
15 KiB
Markdown
358 lines
15 KiB
Markdown
# Ouranos Lab
|
|
|
|
Infrastructure-as-Code project managing the **Ouranos Lab** — a development sandbox at [ouranos.helu.ca](https://ouranos.helu.ca). Uses **Terraform** for container provisioning and **Ansible** for configuration management, themed around the moons of Uranus.
|
|
|
|
---
|
|
|
|
## Project Overview
|
|
|
|
| Component | Purpose |
|
|
|-----------|---------|
|
|
| **Terraform** | Provisions 10 specialised Incus containers (LXC) with DNS-resolved networking, security policies, and resource dependencies |
|
|
| **Ansible** | Deploys Docker, databases (PostgreSQL, Neo4j), observability stack (Prometheus, Grafana, Loki), and application runtimes across all hosts |
|
|
|
|
> **DNS Domain**: Incus resolves containers via the `.incus` domain suffix (e.g., `oberon.incus`, `portia.incus`). IPv4 addresses are dynamically assigned — always use DNS names, never hardcode IPs.
|
|
|
|
---
|
|
|
|
## Uranian Host Architecture
|
|
|
|
All containers are named after moons of Uranus and resolved via the `.incus` DNS suffix.
|
|
|
|
| Name | Role | Description | Nesting |
|
|
|------|------|-------------|---------|
|
|
| **ariel** | graph_database | Neo4j — Ethereal graph connections | ✔ |
|
|
| **caliban** | agent_automation | Agent S MCP Server with MATE Desktop | ✔ |
|
|
| **miranda** | mcp_docker_host | Dedicated Docker Host for MCP Servers | ✔ |
|
|
| **oberon** | container_orchestration | Docker Host — MCP Switchboard, RabbitMQ, Open WebUI | ✔ |
|
|
| **portia** | database | PostgreSQL — Relational database host | ❌ |
|
|
| **prospero** | observability | PPLG stack — Prometheus, Grafana, Loki, PgAdmin | ❌ |
|
|
| **puck** | application_runtime | Python App Host — JupyterLab, Django apps, Gitea Runner | ✔ |
|
|
| **rosalind** | collaboration | Gitea, LobeChat, Nextcloud, AnythingLLM | ✔ |
|
|
| **sycorax** | language_models | Arke LLM Proxy | ✔ |
|
|
| **titania** | proxy_sso | HAProxy TLS termination + Casdoor SSO | ✔ |
|
|
|
|
### puck — Project Application Runtime
|
|
|
|
Shape-shifting trickster embodying Python's versatility.
|
|
This is the host that runs Python projects in the Ouranos sandbox.
|
|
It has an RDP server and is generally where application development happens.
|
|
Each project has a number that is used to determine port numbers.
|
|
|
|
- Docker engine
|
|
- JupyterLab (port 22071 via OAuth2-Proxy)
|
|
- Gitea Runner (CI/CD agent)
|
|
- Django Projects: Zelus (221), Angelia (222), Athena (224), Kairos (225), Icarlos (226), MCP Switchboard (227), Spelunker (228), Peitho (229), Mnemosyne (230)
|
|
- FastAgent Projects: Pallas (240)
|
|
- FastAPI Projects: Daedalus (200), Arke (201) Kernos (202), Stentor (203), Orpheus (204), Periplus (205), Nike (206)
|
|
|
|
### caliban — Agent Automation
|
|
|
|
Autonomous computer agent learning through environmental interaction.
|
|
|
|
- Docker engine
|
|
- Agent S MCP Server (MATE desktop, AT-SPI automation)
|
|
- Kernos MCP Shell Server (port 22062)
|
|
- Rommie MCP Server (port 22061) — agent-to-agent GUI automation via Agent S
|
|
- FreeCAD Robust MCP Server (port 22063) — CAD automation via FreeCAD XML-RPC
|
|
- GPU passthrough
|
|
- RDP access (port 25521)
|
|
|
|
### oberon — Container Orchestration & Dockerized Shared Services
|
|
|
|
King of the Fairies orchestrating containers and managing MCP infrastructure.
|
|
|
|
- Docker engine
|
|
- MCP Switchboard (port 22781) — Django app routing MCP tool calls
|
|
- RabbitMQ message queue
|
|
- smtp4dev SMTP test server (port 22025)
|
|
|
|
### portia — Relational Database
|
|
|
|
Intelligent and resourceful — the reliability of relational databases.
|
|
|
|
- PostgreSQL 17 (port 5432)
|
|
- Databases: `arke`, `anythingllm`, `gitea`, `hass`, `lobechat`, `mcp_switchboard`, `nextcloud`, `openwebui`, `periplus`, `spelunker`
|
|
|
|
### ariel — Graph Database
|
|
|
|
Air spirit — ethereal, interconnected nature mirroring graph relationships.
|
|
|
|
- Neo4j 5.26.0 (Docker)
|
|
- HTTP API: port 25584
|
|
- Bolt: port 25554
|
|
|
|
### miranda — MCP Docker Host
|
|
|
|
Curious bridge between worlds — hosting MCP server containers.
|
|
|
|
- Docker engine (API exposed on port 2375 for MCP Switchboard)
|
|
- MCPO OpenAI-compatible MCP proxy 22071
|
|
- Argos MCP Server — web search via SearXNG (port 22062)
|
|
- Grafana MCP Server (port 22063)
|
|
- Neo4j MCP Server (port 22064)
|
|
- Gitea MCP Server (port 22065)
|
|
|
|
### prospero — Observability Stack
|
|
|
|
Master magician observing all events.
|
|
|
|
- PPLG stack via Docker Compose: Prometheus, Loki, Grafana, PgAdmin
|
|
- Internal HAProxy with OAuth2-Proxy for all dashboards
|
|
- AlertManager with Pushover notifications
|
|
- Prometheus metrics collection (`node-exporter`, HAProxy, Loki)
|
|
- Loki log aggregation via Alloy (all hosts)
|
|
- Grafana dashboard suite with Casdoor SSO integration
|
|
|
|
### rosalind — Third Party Applications for testing and evaluation
|
|
|
|
Witty and resourceful moon for PHP, Go, and Node.js runtimes.
|
|
|
|
- SearXNG privacy search (port 22083, behind OAuth2-Proxy)
|
|
- Gitea self-hosted Git (port 22082, SSH on 22022)
|
|
- LobeChat AI chat interface (port 22081)
|
|
- Nextcloud file sharing and collaboration (port 22083)
|
|
- AnythingLLM document AI workspace (port 22084)
|
|
- Nextcloud data on dedicated Incus storage volume
|
|
- Open WebUI LLM interface (port 22088, PostgreSQL backend on Portia
|
|
- Home Assistant (port 8123)
|
|
|
|
### sycorax — Language Models
|
|
|
|
Original magical power wielding language magic.
|
|
|
|
- Arke LLM API Proxy (port 25540)
|
|
- Multi-provider support (OpenAI, Anthropic, etc.)
|
|
- Session management with Memcached
|
|
- Database backend on Portia
|
|
|
|
### titania — Proxy & SSO Services
|
|
|
|
Queen of the Fairies managing access control and authentication.
|
|
|
|
- HAProxy 3.x with TLS termination (port 443)
|
|
- Let's Encrypt wildcard certificate via certbot DNS-01 (Namecheap)
|
|
- HTTP to HTTPS redirect (port 80)
|
|
- Gitea SSH proxy (port 22022)
|
|
- Casdoor SSO (port 22081, local PostgreSQL)
|
|
- Prometheus metrics at `:8404/metrics`
|
|
|
|
---
|
|
|
|
## Port Numbering
|
|
|
|
Well-known ports running as a service may be used: Postgresql 5432, Prometheus Metrics 9100.
|
|
|
|
However inside a docker project, the number plan needs to be followed to avoid port conflicts and confusion:
|
|
XXXYZ
|
|
XXX Project Number or 220 for external project
|
|
Y Service: 0 reserved, 1-4 flexible, 5 database, 6 MCP, 7 API, 8 Web App, 9 Prometheus metrics
|
|
Z Instance: The running instance of this app on the same host, starting at 1. May also be used to handle exceptions.
|
|
|
|
255 Incus port forwarding: Ports in ths range are forwarded from the Incus host to Incus containers (defined in Terraform)
|
|
|
|
514ZZ is the syslog port. Docker containers send their syslog to an Alloy syslog collector port. ZZ is the application instance, they just need to be different on the same host and increment from 01.
|
|
|
|
|
|
## External Access via HAProxy
|
|
|
|
Titania provides TLS termination and reverse proxy for all services.
|
|
|
|
- **Base domain**: `ouranos.helu.ca`
|
|
- **HTTPS**: port 443 (standard)
|
|
- **HTTP**: port 80 (redirects to HTTPS)
|
|
- **Certificate**: Let's Encrypt wildcard via certbot DNS-01
|
|
|
|
### Route Table
|
|
|
|
| Subdomain | Backend | Service |
|
|
|-----------|---------|---------|
|
|
| `ouranos.helu.ca` (root) | puck.incus:22281 | Angelia (Django) |
|
|
| `alertmanager.ouranos.helu.ca` | prospero.incus:443 (SSL) | AlertManager |
|
|
| `angelia.ouranos.helu.ca` | puck.incus:22281 | Angelia (Django) |
|
|
| `anythingllm.ouranos.helu.ca` | rosalind.incus:22084 | AnythingLLM |
|
|
| `arke.ouranos.helu.ca` | sycorax.incus:25540 | Arke LLM Proxy |
|
|
| `athena.ouranos.helu.ca` | puck.incus:22481 | Athena (Django) |
|
|
| `gitea.ouranos.helu.ca` | rosalind.incus:22082 | Gitea |
|
|
| `grafana.ouranos.helu.ca` | prospero.incus:443 (SSL) | Grafana |
|
|
| `hass.ouranos.helu.ca` | oberon.incus:8123 | Home Assistant |
|
|
| `id.ouranos.helu.ca` | titania.incus:22081 | Casdoor SSO |
|
|
| `icarlos.ouranos.helu.ca` | puck.incus:22681 | Icarlos (Django) |
|
|
| `jupyterlab.ouranos.helu.ca` | puck.incus:22071 | JupyterLab (OAuth2-Proxy) |
|
|
| `kairos.ouranos.helu.ca` | puck.incus:22581 | Kairos (Django) |
|
|
| `lobechat.ouranos.helu.ca` | rosalind.incus:22081 | LobeChat |
|
|
| `loki.ouranos.helu.ca` | prospero.incus:443 (SSL) | Loki |
|
|
| `mcp-switchboard.ouranos.helu.ca` | oberon.incus:22781 | MCP Switchboard |
|
|
| `nextcloud.ouranos.helu.ca` | rosalind.incus:22083 | Nextcloud |
|
|
| `openwebui.ouranos.helu.ca` | oberon.incus:22088 | Open WebUI |
|
|
| `peitho.ouranos.helu.ca` | puck.incus:22981 | Peitho (Django) |
|
|
| `pgadmin.ouranos.helu.ca` | prospero.incus:443 (SSL) | PgAdmin 4 |
|
|
| `prometheus.ouranos.helu.ca` | prospero.incus:443 (SSL) | Prometheus |
|
|
| `searxng.ouranos.helu.ca` | oberon.incus:22073 | SearXNG (OAuth2-Proxy) |
|
|
| `smtp4dev.ouranos.helu.ca` | oberon.incus:22085 | smtp4dev |
|
|
| `spelunker.ouranos.helu.ca` | puck.incus:22881 | Spelunker (Django) |
|
|
|
|
---
|
|
|
|
## Infrastructure Management
|
|
|
|
### Quick Start
|
|
|
|
```bash
|
|
# Provision containers
|
|
cd terraform
|
|
terraform init
|
|
terraform plan
|
|
terraform apply
|
|
|
|
# Start all containers
|
|
cd ../ansible
|
|
source ~/env/ouranos/bin/activate
|
|
ansible-playbook sandbox_up.yml
|
|
|
|
# Deploy all services
|
|
ansible-playbook site.yml
|
|
|
|
# Stop all containers
|
|
ansible-playbook sandbox_down.yml
|
|
```
|
|
|
|
### Terraform Workflow
|
|
|
|
1. **Define** — Containers, networks, and resources in `*.tf` files
|
|
2. **Plan** — Review changes with `terraform plan`
|
|
3. **Apply** — Provision with `terraform apply`
|
|
4. **Verify** — Check outputs and container status
|
|
|
|
### Ansible Workflow
|
|
|
|
1. **Bootstrap** — Update packages, install essentials (`apt_update.yml`)
|
|
2. **Agents** — Deploy Alloy (log/metrics) and Node Exporter on all hosts
|
|
3. **Services** — Configure databases, Docker, applications, observability
|
|
4. **Verify** — Check service health and connectivity
|
|
|
|
### Vault Management
|
|
|
|
```bash
|
|
# Edit secrets
|
|
ansible-vault edit inventory/group_vars/all/vault.yml
|
|
|
|
# View secrets
|
|
ansible-vault view inventory/group_vars/all/vault.yml
|
|
|
|
# Encrypt a new file
|
|
ansible-vault encrypt new_secrets.yml
|
|
```
|
|
|
|
---
|
|
|
|
## S3 Storage Provisioning
|
|
|
|
Terraform provisions Incus S3 buckets for services requiring object storage:
|
|
|
|
| Service | Host | Purpose |
|
|
|---------|------|---------|
|
|
| **Casdoor** | Titania | User avatars and SSO resource storage |
|
|
| **LobeChat** | Rosalind | File uploads and attachments |
|
|
|
|
> S3 credentials (access key, secret key, endpoint) are stored as sensitive Terraform outputs and managed in Ansible Vault with the `vault_*_s3_*` prefix.
|
|
|
|
---
|
|
|
|
## Ansible Automation
|
|
|
|
### Full Deployment (`site.yml`)
|
|
|
|
Playbooks run in dependency order:
|
|
|
|
| Playbook | Hosts | Purpose |
|
|
|----------|-------|---------|
|
|
| `apt_update.yml` | All | Update packages and install essentials |
|
|
| `alloy/deploy.yml` | All | Grafana Alloy log/metrics collection |
|
|
| `prometheus/node_deploy.yml` | All | Node Exporter metrics |
|
|
| `docker/deploy.yml` | Oberon, Ariel, Miranda, Puck, Rosalind, Sycorax, Caliban, Titania | Docker engine |
|
|
| `smtp4dev/deploy.yml` | Oberon | SMTP test server |
|
|
| `pplg/deploy.yml` | Prospero | Full observability stack + HAProxy + OAuth2-Proxy |
|
|
| `postgresql/deploy.yml` | Portia | PostgreSQL with all databases |
|
|
| `postgresql_ssl/deploy.yml` | Titania | Dedicated PostgreSQL for Casdoor |
|
|
| `neo4j/deploy.yml` | Ariel | Neo4j graph database |
|
|
| `searxng/deploy.yml` | Oberon | SearXNG privacy search |
|
|
| `haproxy/deploy.yml` | Titania | HAProxy TLS termination and routing |
|
|
| `casdoor/deploy.yml` | Titania | Casdoor SSO |
|
|
| `mcpo/deploy.yml` | Miranda | MCPO MCP proxy |
|
|
| `openwebui/deploy.yml` | Oberon | Open WebUI LLM interface |
|
|
| `hass/deploy.yml` | Oberon | Home Assistant |
|
|
| `gitea/deploy.yml` | Rosalind | Gitea self-hosted Git |
|
|
| `nextcloud/deploy.yml` | Rosalind | Nextcloud collaboration |
|
|
|
|
### Individual Service Deployments
|
|
|
|
Services with standalone deploy playbooks (not in `site.yml`):
|
|
|
|
| Playbook | Host | Service |
|
|
|----------|------|---------|
|
|
| `anythingllm/deploy.yml` | Rosalind | AnythingLLM document AI |
|
|
| `arke/deploy.yml` | Sycorax | Arke LLM proxy |
|
|
| `argos/deploy.yml` | Miranda | Argos MCP web search server |
|
|
| `caliban/deploy.yml` | Caliban | Agent S MCP Server |
|
|
| `certbot/deploy.yml` | Titania | Let's Encrypt certificate renewal |
|
|
| `gitea_mcp/deploy.yml` | Miranda | Gitea MCP Server |
|
|
| `gitea_runner/deploy.yml` | Puck | Gitea CI/CD runner |
|
|
| `grafana_mcp/deploy.yml` | Miranda | Grafana MCP Server |
|
|
| `jupyterlab/deploy.yml` | Puck | JupyterLab + OAuth2-Proxy |
|
|
| `kernos/deploy.yml` | Caliban | Kernos MCP shell server |
|
|
| `lobechat/deploy.yml` | Rosalind | LobeChat AI chat |
|
|
| `rommie/deploy.yml` | Caliban | Rommie MCP server (Agent S GUI automation) |
|
|
| `neo4j_mcp/deploy.yml` | Miranda | Neo4j MCP Server |
|
|
| `freecad_mcp/deploy.yml` | Caliban | FreeCAD Robust MCP Server |
|
|
| `rabbitmq/deploy.yml` | Oberon | RabbitMQ message queue |
|
|
|
|
### Lifecycle Playbooks
|
|
|
|
| Playbook | Purpose |
|
|
|----------|---------|
|
|
| `sandbox_up.yml` | Start all Uranian host containers |
|
|
| `sandbox_down.yml` | Gracefully stop all containers |
|
|
| `apt_update.yml` | Update packages on all hosts |
|
|
| `site.yml` | Full deployment orchestration |
|
|
|
|
---
|
|
|
|
## Data Flow Architecture
|
|
|
|
### Observability Pipeline
|
|
|
|
```
|
|
All Hosts Prospero Alerts
|
|
Alloy + Node Exporter → Prometheus + Loki + Grafana → AlertManager + Pushover
|
|
collect metrics & logs storage & visualisation notifications
|
|
```
|
|
|
|
### Integration Points
|
|
|
|
| Consumer | Provider | Connection |
|
|
|----------|----------|-----------|
|
|
| All LLM apps | Arke (Sycorax) | `http://sycorax.incus:25540` |
|
|
| Open WebUI, Arke, Gitea, Nextcloud, LobeChat | PostgreSQL (Portia) | `portia.incus:5432` |
|
|
| Neo4j MCP | Neo4j (Ariel) | `ariel.incus:7687` (Bolt) |
|
|
| MCP Switchboard | Docker API (Miranda) | `tcp://miranda.incus:2375` |
|
|
| MCP Switchboard | RabbitMQ (Oberon) | `oberon.incus:5672` |
|
|
| Kairos, Spelunker | RabbitMQ (Oberon) | `oberon.incus:5672` |
|
|
| SMTP (all apps) | smtp4dev (Oberon) | `oberon.incus:22025` |
|
|
| All hosts | Loki (Prospero) | `http://prospero.incus:3100` |
|
|
| All hosts | Prometheus (Prospero) | `http://prospero.incus:9090` |
|
|
|
|
---
|
|
|
|
## Important Notes
|
|
|
|
⚠️ **Alloy Host Variables Required** — Every host with `alloy` in its `services` list must define `alloy_log_level` in `inventory/host_vars/<host>.incus.yml`. The playbook will fail with an undefined variable error if this is missing.
|
|
|
|
⚠️ **Alloy Syslog Listeners Required for Docker Services** — Any Docker Compose service using the syslog logging driver must have a corresponding `loki.source.syslog` listener in the host's Alloy config template (`ansible/alloy/<hostname>/config.alloy.j2`). Missing listeners cause Docker containers to fail on start.
|
|
|
|
⚠️ **Local Terraform State** — This project uses local Terraform state (no remote backend). Do not run `terraform apply` from multiple machines simultaneously.
|
|
|
|
⚠️ **Nested Docker** — Docker runs inside Incus containers (nested), requiring `security.nesting = true` and `lxc.apparmor.profile=unconfined` AppArmor override on all Docker-enabled hosts.
|
|
|
|
⚠️ **Deployment Order** — Prospero (observability) must be fully deployed before other hosts, as Alloy on every host pushes logs and metrics to `prospero.incus`. Run `pplg/deploy.yml` before `site.yml` on a fresh environment.
|