334 lines
13 KiB
Markdown
334 lines
13 KiB
Markdown
# Ouranos Lab
|
|
|
|
Infrastructure-as-Code project managing the **Ouranos Lab** — a development sandbox at [ouranos.helu.ca](https://ouranos.helu.ca). Uses **Terraform** for container provisioning and **Ansible** for configuration management, themed around the moons of Uranus.
|
|
|
|
---
|
|
|
|
## Project Overview
|
|
|
|
| Component | Purpose |
|
|
|-----------|---------|
|
|
| **Terraform** | Provisions 10 specialised Incus containers (LXC) with DNS-resolved networking, security policies, and resource dependencies |
|
|
| **Ansible** | Deploys Docker, databases (PostgreSQL, Neo4j), observability stack (Prometheus, Grafana, Loki), and application runtimes across all hosts |
|
|
|
|
> **DNS Domain**: Incus resolves containers via the `.incus` domain suffix (e.g., `oberon.incus`, `portia.incus`). IPv4 addresses are dynamically assigned — always use DNS names, never hardcode IPs.
|
|
|
|
---
|
|
|
|
## Uranian Host Architecture
|
|
|
|
All containers are named after moons of Uranus and resolved via the `.incus` DNS suffix.
|
|
|
|
| Name | Role | Description | Nesting |
|
|
|------|------|-------------|---------|
|
|
| **ariel** | graph_database | Neo4j — Ethereal graph connections | ✔ |
|
|
| **caliban** | agent_automation | Agent S MCP Server with MATE Desktop | ✔ |
|
|
| **miranda** | mcp_docker_host | Dedicated Docker Host for MCP Servers | ✔ |
|
|
| **oberon** | container_orchestration | Docker Host — MCP Switchboard, RabbitMQ, Open WebUI | ✔ |
|
|
| **portia** | database | PostgreSQL — Relational database host | ❌ |
|
|
| **prospero** | observability | PPLG stack — Prometheus, Grafana, Loki, PgAdmin | ❌ |
|
|
| **puck** | application_runtime | Python App Host — JupyterLab, Django apps, Gitea Runner | ✔ |
|
|
| **rosalind** | collaboration | Gitea, LobeChat, Nextcloud, AnythingLLM | ✔ |
|
|
| **sycorax** | language_models | Arke LLM Proxy | ✔ |
|
|
| **titania** | proxy_sso | HAProxy TLS termination + Casdoor SSO | ✔ |
|
|
|
|
### oberon — Container Orchestration
|
|
|
|
King of the Fairies orchestrating containers and managing MCP infrastructure.
|
|
|
|
- Docker engine
|
|
- MCP Switchboard (port 22785) — Django app routing MCP tool calls
|
|
- RabbitMQ message queue
|
|
- Open WebUI LLM interface (port 22088, PostgreSQL backend on Portia)
|
|
- SearXNG privacy search (port 22083, behind OAuth2-Proxy)
|
|
- smtp4dev SMTP test server (port 22025)
|
|
|
|
### portia — Relational Database
|
|
|
|
Intelligent and resourceful — the reliability of relational databases.
|
|
|
|
- PostgreSQL 17 (port 5432)
|
|
- Databases: `arke`, `anythingllm`, `gitea`, `hass`, `lobechat`, `mcp_switchboard`, `nextcloud`, `openwebui`, `periplus`, `spelunker`
|
|
|
|
### ariel — Graph Database
|
|
|
|
Air spirit — ethereal, interconnected nature mirroring graph relationships.
|
|
|
|
- Neo4j 5.26.0 (Docker)
|
|
- HTTP API: port 25584
|
|
- Bolt: port 25554
|
|
|
|
### puck — Application Runtime
|
|
|
|
Shape-shifting trickster embodying Python's versatility.
|
|
|
|
- Docker engine
|
|
- JupyterLab (port 22071 via OAuth2-Proxy)
|
|
- Gitea Runner (CI/CD agent)
|
|
- Home Assistant (port 8123)
|
|
- Django applications: Angelia (22281), Athena (22481), Kairos (22581), Icarlos (22681), Spelunker (22881), Peitho (22981)
|
|
|
|
### prospero — Observability Stack
|
|
|
|
Master magician observing all events.
|
|
|
|
- PPLG stack via Docker Compose: Prometheus, Loki, Grafana, PgAdmin
|
|
- Internal HAProxy with OAuth2-Proxy for all dashboards
|
|
- AlertManager with Pushover notifications
|
|
- Prometheus metrics collection (`node-exporter`, HAProxy, Loki)
|
|
- Loki log aggregation via Alloy (all hosts)
|
|
- Grafana dashboard suite with Casdoor SSO integration
|
|
|
|
### miranda — MCP Docker Host
|
|
|
|
Curious bridge between worlds — hosting MCP server containers.
|
|
|
|
- Docker engine (API exposed on port 2375 for MCP Switchboard)
|
|
- MCPO OpenAI-compatible MCP proxy
|
|
- Grafana MCP Server (port 25533)
|
|
- Gitea MCP Server (port 25535)
|
|
- Neo4j MCP Server
|
|
- Argos MCP Server — web search via SearXNG (port 25534)
|
|
|
|
### sycorax — Language Models
|
|
|
|
Original magical power wielding language magic.
|
|
|
|
- Arke LLM API Proxy (port 25540)
|
|
- Multi-provider support (OpenAI, Anthropic, etc.)
|
|
- Session management with Memcached
|
|
- Database backend on Portia
|
|
|
|
### caliban — Agent Automation
|
|
|
|
Autonomous computer agent learning through environmental interaction.
|
|
|
|
- Docker engine
|
|
- Agent S MCP Server (MATE desktop, AT-SPI automation)
|
|
- Kernos MCP Shell Server (port 22021)
|
|
- GPU passthrough for vision tasks
|
|
- RDP access (port 25521)
|
|
|
|
### rosalind — Collaboration Services
|
|
|
|
Witty and resourceful moon for PHP, Go, and Node.js runtimes.
|
|
|
|
- Gitea self-hosted Git (port 22082, SSH on 22022)
|
|
- LobeChat AI chat interface (port 22081)
|
|
- Nextcloud file sharing and collaboration (port 22083)
|
|
- AnythingLLM document AI workspace (port 22084)
|
|
- Nextcloud data on dedicated Incus storage volume
|
|
|
|
### titania — Proxy & SSO Services
|
|
|
|
Queen of the Fairies managing access control and authentication.
|
|
|
|
- HAProxy 3.x with TLS termination (port 443)
|
|
- Let's Encrypt wildcard certificate via certbot DNS-01 (Namecheap)
|
|
- HTTP to HTTPS redirect (port 80)
|
|
- Gitea SSH proxy (port 22022)
|
|
- Casdoor SSO (port 22081, local PostgreSQL)
|
|
- Prometheus metrics at `:8404/metrics`
|
|
|
|
---
|
|
|
|
## External Access via HAProxy
|
|
|
|
Titania provides TLS termination and reverse proxy for all services.
|
|
|
|
- **Base domain**: `ouranos.helu.ca`
|
|
- **HTTPS**: port 443 (standard)
|
|
- **HTTP**: port 80 (redirects to HTTPS)
|
|
- **Certificate**: Let's Encrypt wildcard via certbot DNS-01
|
|
|
|
### Route Table
|
|
|
|
| Subdomain | Backend | Service |
|
|
|-----------|---------|---------|
|
|
| `ouranos.helu.ca` (root) | puck.incus:22281 | Angelia (Django) |
|
|
| `alertmanager.ouranos.helu.ca` | prospero.incus:443 (SSL) | AlertManager |
|
|
| `angelia.ouranos.helu.ca` | puck.incus:22281 | Angelia (Django) |
|
|
| `anythingllm.ouranos.helu.ca` | rosalind.incus:22084 | AnythingLLM |
|
|
| `arke.ouranos.helu.ca` | sycorax.incus:25540 | Arke LLM Proxy |
|
|
| `athena.ouranos.helu.ca` | puck.incus:22481 | Athena (Django) |
|
|
| `gitea.ouranos.helu.ca` | rosalind.incus:22082 | Gitea |
|
|
| `grafana.ouranos.helu.ca` | prospero.incus:443 (SSL) | Grafana |
|
|
| `hass.ouranos.helu.ca` | oberon.incus:8123 | Home Assistant |
|
|
| `id.ouranos.helu.ca` | titania.incus:22081 | Casdoor SSO |
|
|
| `icarlos.ouranos.helu.ca` | puck.incus:22681 | Icarlos (Django) |
|
|
| `jupyterlab.ouranos.helu.ca` | puck.incus:22071 | JupyterLab (OAuth2-Proxy) |
|
|
| `kairos.ouranos.helu.ca` | puck.incus:22581 | Kairos (Django) |
|
|
| `lobechat.ouranos.helu.ca` | rosalind.incus:22081 | LobeChat |
|
|
| `loki.ouranos.helu.ca` | prospero.incus:443 (SSL) | Loki |
|
|
| `mcp-switchboard.ouranos.helu.ca` | oberon.incus:22785 | MCP Switchboard |
|
|
| `nextcloud.ouranos.helu.ca` | rosalind.incus:22083 | Nextcloud |
|
|
| `openwebui.ouranos.helu.ca` | oberon.incus:22088 | Open WebUI |
|
|
| `peitho.ouranos.helu.ca` | puck.incus:22981 | Peitho (Django) |
|
|
| `pgadmin.ouranos.helu.ca` | prospero.incus:443 (SSL) | PgAdmin 4 |
|
|
| `prometheus.ouranos.helu.ca` | prospero.incus:443 (SSL) | Prometheus |
|
|
| `searxng.ouranos.helu.ca` | oberon.incus:22073 | SearXNG (OAuth2-Proxy) |
|
|
| `smtp4dev.ouranos.helu.ca` | oberon.incus:22085 | smtp4dev |
|
|
| `spelunker.ouranos.helu.ca` | puck.incus:22881 | Spelunker (Django) |
|
|
|
|
---
|
|
|
|
## Infrastructure Management
|
|
|
|
### Quick Start
|
|
|
|
```bash
|
|
# Provision containers
|
|
cd terraform
|
|
terraform init
|
|
terraform plan
|
|
terraform apply
|
|
|
|
# Start all containers
|
|
cd ../ansible
|
|
source ~/env/agathos/bin/activate
|
|
ansible-playbook sandbox_up.yml
|
|
|
|
# Deploy all services
|
|
ansible-playbook site.yml
|
|
|
|
# Stop all containers
|
|
ansible-playbook sandbox_down.yml
|
|
```
|
|
|
|
### Terraform Workflow
|
|
|
|
1. **Define** — Containers, networks, and resources in `*.tf` files
|
|
2. **Plan** — Review changes with `terraform plan`
|
|
3. **Apply** — Provision with `terraform apply`
|
|
4. **Verify** — Check outputs and container status
|
|
|
|
### Ansible Workflow
|
|
|
|
1. **Bootstrap** — Update packages, install essentials (`apt_update.yml`)
|
|
2. **Agents** — Deploy Alloy (log/metrics) and Node Exporter on all hosts
|
|
3. **Services** — Configure databases, Docker, applications, observability
|
|
4. **Verify** — Check service health and connectivity
|
|
|
|
### Vault Management
|
|
|
|
```bash
|
|
# Edit secrets
|
|
ansible-vault edit inventory/group_vars/all/vault.yml
|
|
|
|
# View secrets
|
|
ansible-vault view inventory/group_vars/all/vault.yml
|
|
|
|
# Encrypt a new file
|
|
ansible-vault encrypt new_secrets.yml
|
|
```
|
|
|
|
---
|
|
|
|
## S3 Storage Provisioning
|
|
|
|
Terraform provisions Incus S3 buckets for services requiring object storage:
|
|
|
|
| Service | Host | Purpose |
|
|
|---------|------|---------|
|
|
| **Casdoor** | Titania | User avatars and SSO resource storage |
|
|
| **LobeChat** | Rosalind | File uploads and attachments |
|
|
|
|
> S3 credentials (access key, secret key, endpoint) are stored as sensitive Terraform outputs and managed in Ansible Vault with the `vault_*_s3_*` prefix.
|
|
|
|
---
|
|
|
|
## Ansible Automation
|
|
|
|
### Full Deployment (`site.yml`)
|
|
|
|
Playbooks run in dependency order:
|
|
|
|
| Playbook | Hosts | Purpose |
|
|
|----------|-------|---------|
|
|
| `apt_update.yml` | All | Update packages and install essentials |
|
|
| `alloy/deploy.yml` | All | Grafana Alloy log/metrics collection |
|
|
| `prometheus/node_deploy.yml` | All | Node Exporter metrics |
|
|
| `docker/deploy.yml` | Oberon, Ariel, Miranda, Puck, Rosalind, Sycorax, Caliban, Titania | Docker engine |
|
|
| `smtp4dev/deploy.yml` | Oberon | SMTP test server |
|
|
| `pplg/deploy.yml` | Prospero | Full observability stack + HAProxy + OAuth2-Proxy |
|
|
| `postgresql/deploy.yml` | Portia | PostgreSQL with all databases |
|
|
| `postgresql_ssl/deploy.yml` | Titania | Dedicated PostgreSQL for Casdoor |
|
|
| `neo4j/deploy.yml` | Ariel | Neo4j graph database |
|
|
| `searxng/deploy.yml` | Oberon | SearXNG privacy search |
|
|
| `haproxy/deploy.yml` | Titania | HAProxy TLS termination and routing |
|
|
| `casdoor/deploy.yml` | Titania | Casdoor SSO |
|
|
| `mcpo/deploy.yml` | Miranda | MCPO MCP proxy |
|
|
| `openwebui/deploy.yml` | Oberon | Open WebUI LLM interface |
|
|
| `hass/deploy.yml` | Oberon | Home Assistant |
|
|
| `gitea/deploy.yml` | Rosalind | Gitea self-hosted Git |
|
|
| `nextcloud/deploy.yml` | Rosalind | Nextcloud collaboration |
|
|
|
|
### Individual Service Deployments
|
|
|
|
Services with standalone deploy playbooks (not in `site.yml`):
|
|
|
|
| Playbook | Host | Service |
|
|
|----------|------|---------|
|
|
| `anythingllm/deploy.yml` | Rosalind | AnythingLLM document AI |
|
|
| `arke/deploy.yml` | Sycorax | Arke LLM proxy |
|
|
| `argos/deploy.yml` | Miranda | Argos MCP web search server |
|
|
| `caliban/deploy.yml` | Caliban | Agent S MCP Server |
|
|
| `certbot/deploy.yml` | Titania | Let's Encrypt certificate renewal |
|
|
| `gitea_mcp/deploy.yml` | Miranda | Gitea MCP Server |
|
|
| `gitea_runner/deploy.yml` | Puck | Gitea CI/CD runner |
|
|
| `grafana_mcp/deploy.yml` | Miranda | Grafana MCP Server |
|
|
| `jupyterlab/deploy.yml` | Puck | JupyterLab + OAuth2-Proxy |
|
|
| `kernos/deploy.yml` | Caliban | Kernos MCP shell server |
|
|
| `lobechat/deploy.yml` | Rosalind | LobeChat AI chat |
|
|
| `neo4j_mcp/deploy.yml` | Miranda | Neo4j MCP Server |
|
|
| `rabbitmq/deploy.yml` | Oberon | RabbitMQ message queue |
|
|
|
|
### Lifecycle Playbooks
|
|
|
|
| Playbook | Purpose |
|
|
|----------|---------|
|
|
| `sandbox_up.yml` | Start all Uranian host containers |
|
|
| `sandbox_down.yml` | Gracefully stop all containers |
|
|
| `apt_update.yml` | Update packages on all hosts |
|
|
| `site.yml` | Full deployment orchestration |
|
|
|
|
---
|
|
|
|
## Data Flow Architecture
|
|
|
|
### Observability Pipeline
|
|
|
|
```
|
|
All Hosts Prospero Alerts
|
|
Alloy + Node Exporter → Prometheus + Loki + Grafana → AlertManager + Pushover
|
|
collect metrics & logs storage & visualisation notifications
|
|
```
|
|
|
|
### Integration Points
|
|
|
|
| Consumer | Provider | Connection |
|
|
|----------|----------|-----------|
|
|
| All LLM apps | Arke (Sycorax) | `http://sycorax.incus:25540` |
|
|
| Open WebUI, Arke, Gitea, Nextcloud, LobeChat | PostgreSQL (Portia) | `portia.incus:5432` |
|
|
| Neo4j MCP | Neo4j (Ariel) | `ariel.incus:7687` (Bolt) |
|
|
| MCP Switchboard | Docker API (Miranda) | `tcp://miranda.incus:2375` |
|
|
| MCP Switchboard | RabbitMQ (Oberon) | `oberon.incus:5672` |
|
|
| Kairos, Spelunker | RabbitMQ (Oberon) | `oberon.incus:5672` |
|
|
| SMTP (all apps) | smtp4dev (Oberon) | `oberon.incus:22025` |
|
|
| All hosts | Loki (Prospero) | `http://prospero.incus:3100` |
|
|
| All hosts | Prometheus (Prospero) | `http://prospero.incus:9090` |
|
|
|
|
---
|
|
|
|
## Important Notes
|
|
|
|
⚠️ **Alloy Host Variables Required** — Every host with `alloy` in its `services` list must define `alloy_log_level` in `inventory/host_vars/<host>.incus.yml`. The playbook will fail with an undefined variable error if this is missing.
|
|
|
|
⚠️ **Alloy Syslog Listeners Required for Docker Services** — Any Docker Compose service using the syslog logging driver must have a corresponding `loki.source.syslog` listener in the host's Alloy config template (`ansible/alloy/<hostname>/config.alloy.j2`). Missing listeners cause Docker containers to fail on start.
|
|
|
|
⚠️ **Local Terraform State** — This project uses local Terraform state (no remote backend). Do not run `terraform apply` from multiple machines simultaneously.
|
|
|
|
⚠️ **Nested Docker** — Docker runs inside Incus containers (nested), requiring `security.nesting = true` and `lxc.apparmor.profile=unconfined` AppArmor override on all Docker-enabled hosts.
|
|
|
|
⚠️ **Deployment Order** — Prospero (observability) must be fully deployed before other hosts, as Alloy on every host pushes logs and metrics to `prospero.incus`. Run `pplg/deploy.yml` before `site.yml` on a fresh environment.
|