docs(env): expand .env.example into full compose interpolation template
All checks were successful
CVE Scan & Docker Build / security-scan (push) Successful in 51s
CVE Scan & Docker Build / build-and-push (push) Successful in 3m3s

Replace the minimal placeholder .env.example with a comprehensive template
documenting every variable consumed by docker-compose.yaml, organized by
service (Django core, HTTP, Postgres, Neo4j, Memcached, S3/MinIO, Daedalus,
Celery/RabbitMQ, etc.). Clarifies that this file is rendered from an Ansible
Jinja2 template with vaulted secrets in production, and distinguishes it
from the in-tree mnemosyne/.env used for bare-Python development.
This commit is contained in:
2026-05-04 07:04:28 -04:00
parent d84f0e548b
commit 003f958f7b
4 changed files with 623 additions and 68 deletions

View File

@@ -31,16 +31,41 @@ All containers are named after moons of Uranus and resolved via the `.incus` DNS
| **rosalind** | collaboration | Gitea, LobeChat, Nextcloud, AnythingLLM | ✔ |
| **sycorax** | language_models | Arke LLM Proxy | ✔ |
| **titania** | proxy_sso | HAProxy TLS termination + Casdoor SSO | ✔ |
| **umbriel** | graph_database | Neo4j (Mnemosyne) — dedicated memory graph | ✔ |
### oberon — Container Orchestration
### puck — Project Application Runtime
Shape-shifting trickster embodying Python's versatility.
This is the host that runs Python projects in the Ouranos sandbox.
It has an RDP server and is generally where application development happens.
Each project has a number that is used to determine port numbers.
- Docker engine
- JupyterLab (port 22071 via OAuth2-Proxy)
- Gitea Runner (CI/CD agent)
- Django Projects: Zelus (221), Angelia (222), Athena (224), Kairos (225), Icarlos (226), MCP Switchboard (227), Spelunker (228), Peitho (229), Mnemosyne (230)
- FastAgent Projects: Pallas (240)
- FastAPI Projects: Daedalus (200), Arke (201) Kernos (202), Rommie (203), Orpheus (204), Periplus (205), Nike (206), Stentor (207)
### caliban — Agent Automation
Autonomous computer agent learning through environmental interaction.
- Docker engine
- Agent S MCP Server (MATE desktop, AT-SPI automation)
- Kernos MCP Shell Server (port 22062)
- Rommie MCP Server (port 22061) — agent-to-agent GUI automation via Agent S
- FreeCAD Robust MCP Server (port 22063) — CAD automation via FreeCAD XML-RPC
- GPU passthrough
- RDP access (port 25521)
### oberon — Container Orchestration & Dockerized Shared Services
King of the Fairies orchestrating containers and managing MCP infrastructure.
- Docker engine
- MCP Switchboard (port 22785) — Django app routing MCP tool calls
- MCP Switchboard (port 22781) — Django app routing MCP tool calls
- RabbitMQ message queue
- Open WebUI LLM interface (port 22088, PostgreSQL backend on Portia)
- SearXNG privacy search (port 22083, behind OAuth2-Proxy)
- smtp4dev SMTP test server (port 22025)
### portia — Relational Database
@@ -48,25 +73,38 @@ King of the Fairies orchestrating containers and managing MCP infrastructure.
Intelligent and resourceful — the reliability of relational databases.
- PostgreSQL 17 (port 5432)
- Databases: `arke`, `anythingllm`, `gitea`, `hass`, `lobechat`, `mcp_switchboard`, `nextcloud`, `openwebui`, `periplus`, `spelunker`
- Databases: `arke`, `anythingllm`, `gitea`, `hass`, `lobechat`, `mcp_switchboard`, `mnemosyne`, `nextcloud`, `openwebui`, `periplus`, `spelunker`
### ariel — Graph Database
Air spirit — ethereal, interconnected nature mirroring graph relationships.
- Neo4j 5.26.0 (Docker)
- HTTP API: port 25584
- Bolt: port 25554
- HTTP API: port 25554
- Bolt: port 7687 (reached as `ariel.incus:7687` on the internal network)
### puck — Application Runtime
### umbriel — Graph Database (Mnemosyne)
Shape-shifting trickster embodying Python's versatility.
Dusky melancholy sprite from Pope's *Rape of the Lock* — keeper of the Cave of
Spleen, naturally paired with Mnemosyne the Titan of memory. Dedicated Neo4j
instance so Mnemosyne's `Library`/`Collection`/`Item`/`Chunk`/`Concept` labels,
vector indexes, and schema migrations can't collide with another tenant's
graph on Ariel.
- Docker engine
- JupyterLab (port 22071 via OAuth2-Proxy)
- Gitea Runner (CI/CD agent)
- Home Assistant (port 8123)
- Django applications: Angelia (22281), Athena (22481), Kairos (22581), Icarlos (22681), Spelunker (22881), Peitho (22981)
- Neo4j 5.26.0 (Docker)
- HTTP Browser: port 25555
- Bolt: port 7687 (reached as `umbriel.incus:7687` on the internal network)
### miranda — MCP Docker Host
Curious bridge between worlds — hosting MCP server containers.
- Docker engine (API exposed on port 2375 for MCP Switchboard)
- MCPO OpenAI-compatible MCP proxy 22071
- Argos MCP Server — web search via SearXNG (port 22062)
- Grafana MCP Server (port 22063)
- Neo4j MCP Server (port 22064)
- Gitea MCP Server (port 22065)
### prospero — Observability Stack
@@ -79,16 +117,18 @@ Master magician observing all events.
- Loki log aggregation via Alloy (all hosts)
- Grafana dashboard suite with Casdoor SSO integration
### mirandaMCP Docker Host
### rosalind — Third Party Applications for testing and evaluation
Curious bridge between worlds — hosting MCP server containers.
Witty and resourceful moon for PHP, Go, and Node.js runtimes.
- Docker engine (API exposed on port 2375 for MCP Switchboard)
- MCPO OpenAI-compatible MCP proxy
- Grafana MCP Server (port 25533)
- Gitea MCP Server (port 25535)
- Neo4j MCP Server
- Argos MCP Server — web search via SearXNG (port 25534)
- SearXNG privacy search (port 22083, behind OAuth2-Proxy)
- Gitea self-hosted Git (port 22082, SSH on 22022)
- LobeChat AI chat interface (port 22081)
- Nextcloud file sharing and collaboration (port 22083)
- AnythingLLM document AI workspace (port 22084)
- Nextcloud data on dedicated Incus storage volume
- Open WebUI LLM interface (port 22088, PostgreSQL backend on Portia
- Home Assistant (port 8123)
### sycorax — Language Models
@@ -99,26 +139,6 @@ Original magical power wielding language magic.
- Session management with Memcached
- Database backend on Portia
### caliban — Agent Automation
Autonomous computer agent learning through environmental interaction.
- Docker engine
- Agent S MCP Server (MATE desktop, AT-SPI automation)
- Kernos MCP Shell Server (port 22021)
- GPU passthrough for vision tasks
- RDP access (port 25521)
### rosalind — Collaboration Services
Witty and resourceful moon for PHP, Go, and Node.js runtimes.
- Gitea self-hosted Git (port 22082, SSH on 22022)
- LobeChat AI chat interface (port 22081)
- Nextcloud file sharing and collaboration (port 22083)
- AnythingLLM document AI workspace (port 22084)
- Nextcloud data on dedicated Incus storage volume
### titania — Proxy & SSO Services
Queen of the Fairies managing access control and authentication.
@@ -132,6 +152,100 @@ Queen of the Fairies managing access control and authentication.
---
## Port Numbering
Well-known ports running as a service may be used: Postgresql 5432, Prometheus Metrics 9100.
However inside a docker project, the number plan needs to be followed to avoid port conflicts and confusion:
XXXYZ
XXX Project Number or 220 for external project
Y Service: 0 reserved, 1-4 flexible, 5 database, 6 MCP, 7 API, 8 Web App, 9 Prometheus metrics
Z Instance: The running instance of this app on the same host, starting at 1. May also be used to handle exceptions.
255 Incus port forwarding: Ports in ths range are forwarded from the Incus host to Incus containers (defined in Terraform)
514ZZ is the syslog port. Docker containers send their syslog to an Alloy syslog collector port. ZZ is the application instance, they just need to be different on the same host and increment from 01.
---
## Application Conventions
Standards that all services deployed in Ouranos MUST follow. For full logging standards and anti-patterns, see [red_panda_standards.md](red_panda_standards.md).
### Health Check Endpoints
All services MUST expose Kubernetes-style health endpoints:
| Endpoint | Purpose | Auth |
|----------|---------|------|
| `GET /live` | **Liveness** — process is running and accepting connections | None |
| `GET /ready` | **Readiness** — process is running AND all dependencies (DB, cache, upstream APIs) are healthy | None |
| `GET /metrics` | Prometheus metrics (see below) | IP-restricted |
- HAProxy checks `health_path` (typically `/ready/`) for backend health — return HTTP 200 when healthy
- Health endpoints MUST NOT require authentication (no JWT, no session)
- Third-party services use their native health paths (e.g., `/api/health`, `/api/healthz`, `/-/healthy`)
### Health Checks in Docker Compose
Use `curl -f` for Docker Compose healthchecks. Install curl in images if needed.
```yaml
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/live"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
```
### Logging Conventions
Log output flows through: **App → syslog (RFC3164) → Alloy → Loki → Grafana**
| Level | Usage |
|-------|-------|
| **ERROR** | Broken state requiring human action — always include `exc_info=True`, error type, and context |
| **WARNING** | Degraded but recovering — client disconnects, performance outliers, client-side exceptions, leaked markup |
| **INFO** | Lifecycle events — service start/stop, connections, requests completed, jobs finished |
| **DEBUG** | Diagnostic detail — SSE events, keepalive pings, health check 200 responses, negotiation steps |
**Health check responses MUST be logged at DEBUG only.** HAProxy and Prometheus probe endpoints every 15-30 seconds. Logging these at INFO floods syslog with thousands of identical `200 OK` lines per hour, burying real events.
### Protected vs Unprotected Endpoints
| Protected (require valid JWT) | Unprotected |
|-------------------------------|-------------|
| All `/api/v1/*` routes | `GET /live` |
| | `GET /ready` |
| | `GET /metrics` (IP-restricted to internal networks) |
| | `GET /api/auth/login-url` |
| | `POST /api/auth/token` |
| | `POST /api/v1/telemetry` (sendBeacon cannot set headers) |
### Prometheus Metrics
All services SHOULD expose `GET /metrics` in Prometheus exposition format, scraped by Prospero's Prometheus (default 15s interval).
- **IP-restricted** to internal networks only (`10.10.0.0/24`, `172.16.0.0/12`, `127.0.0.0/8`)
- Consider exposing: request counts/durations, error rates, active connections, queue depths, dependency health
### Browser Telemetry
Frontend/browser code MUST send telemetry data and errors back to the application's telemetry API:
- `POST /api/v1/telemetry` — unprotected (browser `sendBeacon` cannot set Authorization headers)
- Capture and report: JavaScript exceptions, performance metrics, user-facing errors
- Client-side exceptions should log as **WARNING** on the server (they indicate a problem but not a server-side failure)
### Docker Networking
- Use the **default Docker bridge network** for simple deployments
- Add additional named networks only when required (e.g., isolating database traffic) or explicitly requested
- Do not create custom network definitions for single-service Docker Compose stacks
---
## External Access via HAProxy
Titania provides TLS termination and reverse proxy for all services.
@@ -160,14 +274,13 @@ Titania provides TLS termination and reverse proxy for all services.
| `kairos.ouranos.helu.ca` | puck.incus:22581 | Kairos (Django) |
| `lobechat.ouranos.helu.ca` | rosalind.incus:22081 | LobeChat |
| `loki.ouranos.helu.ca` | prospero.incus:443 (SSL) | Loki |
| `mcp-switchboard.ouranos.helu.ca` | oberon.incus:22785 | MCP Switchboard |
| `mcp-switchboard.ouranos.helu.ca` | oberon.incus:22781 | MCP Switchboard |
| `nextcloud.ouranos.helu.ca` | rosalind.incus:22083 | Nextcloud |
| `openwebui.ouranos.helu.ca` | oberon.incus:22088 | Open WebUI |
| `peitho.ouranos.helu.ca` | puck.incus:22981 | Peitho (Django) |
| `periplus.ouranos.helu.ca` | puck.incus:20681 | Periplus (FastAPI + MCP via nginx) |
| `pgadmin.ouranos.helu.ca` | prospero.incus:443 (SSL) | PgAdmin 4 |
| `prometheus.ouranos.helu.ca` | prospero.incus:443 (SSL) | Prometheus |
| `freecad-mcp.ouranos.helu.ca` | caliban.incus:22032 | FreeCAD Robust MCP Server |
| `rommie.ouranos.helu.ca` | caliban.incus:22031 | Rommie MCP Server (Agent S GUI automation) |
| `searxng.ouranos.helu.ca` | oberon.incus:22073 | SearXNG (OAuth2-Proxy) |
| `smtp4dev.ouranos.helu.ca` | oberon.incus:22085 | smtp4dev |
| `spelunker.ouranos.helu.ca` | puck.incus:22881 | Spelunker (Django) |
@@ -187,7 +300,7 @@ terraform apply
# Start all containers
cd ../ansible
source ~/env/agathos/bin/activate
source ~/env/ouranos/bin/activate
ansible-playbook sandbox_up.yml
# Deploy all services
@@ -197,6 +310,35 @@ ansible-playbook site.yml
ansible-playbook sandbox_down.yml
```
### Python Virtual Environment Setup
The Ansible automation requires a Python virtual environment with the `ansible` package installed. Create and activate the environment from the `~` directory:
```bash
# Create virtual environment
cd ~
python3 -m venv env/ouranos
# Activate environment
source ~/env/ouranos/bin/activate
# Install Ansible
pip install ansible
pip install ansible-core
pip install ansible-community.postgresql
```
### Ansible Playbook Syntax Check
Before running playbooks, use the `apsc.sh` utility (in PATH) to quickly validate YAML syntax:
```bash
# From the ansible directory
apsc.sh
# This will check all YAML files in the current directory for syntax errors
```
### Terraform Workflow
1. **Define** — Containers, networks, and resources in `*.tf` files
@@ -204,6 +346,83 @@ ansible-playbook sandbox_down.yml
3. **Apply** — Provision with `terraform apply`
4. **Verify** — Check outputs and container status
### Terraform Import
When containers or other resources are created manually (outside Terraform) or need to be re-imported after recreation, use `terraform import` to sync the Terraform state with existing infrastructure.
#### Import Syntax
The correct import format for Incus resources requires quoting resource addresses with `for_each` keys and using the full ID including image fingerprints:
```bash
# Import a container with correct syntax
terraform import 'incus_instance.uranian_hosts["<name>"]' ouranos/<name>,image=<fingerprint>
```
#### Getting Image Fingerprints
First, get the fingerprint of the image resource from Terraform state:
```bash
cd terraform
terraform state show incus_image.noble | grep fingerprint
# Output: fingerprint = "75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644"
terraform state show incus_image.questing | grep fingerprint
# Output: fingerprint = "e78dd4a406b7fa3592ed0a6048862260b3d2e50c76e32a6169930245c0a13fdf"
```
#### Importing All Uranian Hosts
Replace containers missing from state (or re-import after manual recreation):
```bash
# Containers using noble image
terraform import 'incus_instance.uranian_hosts["ariel"]' ouranos/ariel,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
terraform import 'incus_instance.uranian_hosts["miranda"]' ouranos/miranda,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
terraform import 'incus_instance.uranian_hosts["oberon"]' ouranos/oberon,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
terraform import 'incus_instance.uranian_hosts["portia"]' ouranos/portia,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
terraform import 'incus_instance.uranian_hosts["prospero"]' ouranos/prospero,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
terraform import 'incus_instance.uranian_hosts["rosalind"]' ouranos/rosalind,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
terraform import 'incus_instance.uranian_hosts["sycorax"]' ouranos/sycorax,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
terraform import 'incus_instance.uranian_hosts["titania"]' ouranos/titania,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
terraform import 'incus_instance.uranian_hosts["umbriel"]' ouranos/umbriel,image=75cde3e755b0e657c05f67e03a42683217b233b0339448be747845747df58644
# Containers using questing image
terraform import 'incus_instance.uranian_hosts["caliban"]' ouranos/caliban,image=e78dd4a406b7fa3592ed0a6048862260b3d2e50c76e32a6169930245c0a13fdf
terraform import 'incus_instance.uranian_hosts["puck"]' ouranos/puck,image=e78dd4a406b7fa3592ed0a6048862260b3d2e50c76e32a6169930245c0a13fdf
```
#### Storage Bucket Import
For storage buckets, use the `<project>/<pool>/<name>` format:
```bash
terraform import incus_storage_bucket.<name> ouranos/default/<bucket-name>
```
#### Common Issues
1. **Import ID format errors**: Use quotes around resource addresses with `for_each` keys: `'incus_instance.uranian_hosts["name"]'`
2. **Image replacement on import**: Importing without specifying the image fingerprint will cause Terraform to replace the container on next apply. Always include `image=<fingerprint>` in the import ID.
3. **Tainted state**: If a resource shows "will be created" but already exists, it may be tainted. Remove from state and re-import:
```bash
terraform state rm 'incus_instance.uranian_hosts["name"]'
terraform import 'incus_instance.uranian_hosts["name"]' ouranos/name,image=<fingerprint>
```
#### Verify Import
After importing, verify with `terraform plan`:
```bash
terraform plan
# Should show: Plan: 0 to add, 0 to change, 0 to destroy
# (Minor "update in-place" changes are normal for state sync of computed attributes)
```
### Ansible Workflow
1. **Bootstrap** — Update packages, install essentials (`apt_update.yml`)
@@ -255,7 +474,7 @@ Playbooks run in dependency order:
| `pplg/deploy.yml` | Prospero | Full observability stack + HAProxy + OAuth2-Proxy |
| `postgresql/deploy.yml` | Portia | PostgreSQL with all databases |
| `postgresql_ssl/deploy.yml` | Titania | Dedicated PostgreSQL for Casdoor |
| `neo4j/deploy.yml` | Ariel | Neo4j graph database |
| `neo4j/deploy.yml` | Ariel, Umbriel | Neo4j graph database (Umbriel is the dedicated Mnemosyne instance) |
| `searxng/deploy.yml` | Oberon | SearXNG privacy search |
| `haproxy/deploy.yml` | Titania | HAProxy TLS termination and routing |
| `casdoor/deploy.yml` | Titania | Casdoor SSO |
@@ -282,7 +501,9 @@ Services with standalone deploy playbooks (not in `site.yml`):
| `jupyterlab/deploy.yml` | Puck | JupyterLab + OAuth2-Proxy |
| `kernos/deploy.yml` | Caliban | Kernos MCP shell server |
| `lobechat/deploy.yml` | Rosalind | LobeChat AI chat |
| `rommie/deploy.yml` | Caliban | Rommie MCP server (Agent S GUI automation) |
| `neo4j_mcp/deploy.yml` | Miranda | Neo4j MCP Server |
| `freecad_mcp/deploy.yml` | Caliban | FreeCAD Robust MCP Server |
| `rabbitmq/deploy.yml` | Oberon | RabbitMQ message queue |
### Lifecycle Playbooks
@@ -313,6 +534,7 @@ collect metrics & logs storage & visualisation notifications
| All LLM apps | Arke (Sycorax) | `http://sycorax.incus:25540` |
| Open WebUI, Arke, Gitea, Nextcloud, LobeChat | PostgreSQL (Portia) | `portia.incus:5432` |
| Neo4j MCP | Neo4j (Ariel) | `ariel.incus:7687` (Bolt) |
| Mnemosyne | Neo4j (Umbriel) | `umbriel.incus:7687` (Bolt) — dedicated tenant |
| MCP Switchboard | Docker API (Miranda) | `tcp://miranda.incus:2375` |
| MCP Switchboard | RabbitMQ (Oberon) | `oberon.incus:5672` |
| Kairos, Spelunker | RabbitMQ (Oberon) | `oberon.incus:5672` |