94 Commits

Author SHA1 Message Date
343b0e13d6 fix(certbot): harden renewal hook and fix permission errors
The renewal deploy-hook ran as the certbot user but lacked permissions to
write the combined PEM to /etc/haproxy/certs and to reload HAProxy,
causing silent failures that left a stale certificate in production until
expiry.

- Add certbot user to the haproxy group so it can write the combined PEM
- Grant certbot NOPASSWD sudo for `systemctl reload haproxy` only
- Make the Prometheus textfile directory group-owned by certbot (0775)
  so cert-metrics.sh can atomically update ssl_cert.prom
- Refactor renewal-hook.sh to always refresh cert metrics on exit via a
  trap, ensuring expiry alerts fire when the hook itself is broken
- Replace `set -e` with explicit error handling and structured logging
2026-06-17 09:58:46 -04:00
2f5a15eef5 chore(haproxy,terraform): harden haproxy stats and pin incus provider
- Add maxconn limit and HTTP timeouts to mitigate slowloris attacks
- Restrict stats endpoint to internal LAN and localhost only
- Hide HAProxy version on stats page
- Pin Incus Terraform provider to ~> 1.0 for stability
2026-06-09 22:52:23 -04:00
35061e3b6d Caliban: Update Rommie port 2026-06-07 08:14:55 -04:00
95682eca61 Caliban: configure Kernos mcp api key 2026-06-07 08:14:39 -04:00
711bbc093b Caliban: Update llama cpp ports 2026-06-07 08:14:18 -04:00
9bfa9a3617 feat(terraform): expand caliban port forwards and document port ranges
- Add proxy devices on caliban for SSH (25512), Postgres (25515),
  and three web ports (25516-25518) alongside existing RDP forward
- Remove HTTP/HTTPS proxy devices from prospero (now handled via
  HAProxy on titania)
- Document Incus port forwarding ranges (25510-25599) per host in
  ouranos.md and fix a typo
2026-06-07 06:40:42 -04:00
f2fb01ddd2 Titania: Add Hecate 2026-06-05 12:03:25 -04:00
c8ad7a0129 feat(terraform): add S3 storage bucket and credentials for Peitho 2026-06-01 13:47:18 -04:00
12b1db36f8 feat(haproxy): block internal observability endpoints from public traffic 2026-06-01 07:30:07 -04:00
77a82b4784 docs: update FreeCAD MCP README to document dual-service architecture 2026-05-31 10:13:43 -04:00
3893b91a55 feat(ansible): add CASE Field Systems MCP endpoint configuration
Configure FastAgent MCP server to connect to the CASE Field Systems
service over HTTP. Enables integration with LAN, SD Card, and
Provisioning workflows without authentication.

Uses dynamic Ansible variables for host and port to support
environment-specific deployments.
2026-05-30 10:19:24 -04:00
76a0e043e9 chore(ansible): add CASE agent configuration to kottos inventory
Introduce the CASE engineering agent by defining kottos_case_port
(24152) and updating the agents list comment. This extends the
systemd-managed pallas process configuration to include the CASE
runtime alongside existing Harper, Scotty, Research, and Tech
Research agents.
2026-05-30 09:44:07 -04:00
acf3419450 refactor(ansible): rename freecad_mcp env vars and rework deployment
- Drop `FREECAD_MCP_` prefix from env vars (use `FREECAD_*`)
- Update freecad_mcp port from 22032 to 22061
- Document that FreeCAD bridge is required for tool calls
- Replace kottos deployment with pallas deployment
2026-05-30 09:37:56 -04:00
bc431a3a2a refactor(alloy): remove athena syslog listener in favor of docker logs 2026-05-30 09:37:15 -04:00
30b5cab808 feat(rommie): add JPEG quality and size cap for get_screenshot
- Add ROMMIE_SCREENSHOT_JPEG_QUALITY and ROMMIE_SCREENSHOT_MAX_KB env vars
  to control parent-agent screenshot output encoding and size limit
- Configure defaults (quality 80, 512KB cap) in caliban.incus host vars
- Trigger rommie service restart when .env file changes
2026-05-28 13:30:17 -04:00
3bdb11dc72 chore(ansible): update model endpoints and enable Rommie deployment
- Bump Qwen model from 3.5 to 3.6 and update inference endpoints
  (nyx:22079→22072, pan:22078→22076) for caliban and puck hosts
- Add Rommie MCP server deployment to site.yml
- Update Rommie docs to reflect new port (20361), model versions,
  and health check accepting 200/406 status codes
2026-05-28 12:17:23 -04:00
a01feee663 chore(ansible): update vault credentials 2026-05-26 21:45:17 -04:00
f4a25316de SearXNG: set docker pull policy always 2026-05-26 06:47:48 -04:00
3c2f8c57ca feat(observability): add SearXNG, Argos, and Pallas monitoring
- Add SearXNG syslog ingestion and blackbox health probes on miranda
  and rosalind for per-host attributable failure detection
- Scrape Argos MCP application metrics from miranda
- Add Pallas dashboard panels for downstream availability and turn
  error ratios
2026-05-24 23:52:53 -04:00
43fae203d1 feat(ansible): standardize Neo4j ports and add monitoring
- Unify Neo4j HTTP/Bolt/syslog ports across ariel and umbriel hosts
- Add neo4j_metrics_port (22094) for APOC exporter sidecar
- Add umbriel to Prometheus node_exporter targets
- Add Neo4j scrape config and alerts for tx rollback rate and
  stalled store growth
- Replace kernos_harper MCP with andromeda (caliban.helu.ca)
- Remove angelia MCP from kottos fastagent config
- Switch neo4j group membership from keeper_user to ponos
2026-05-22 22:19:13 -04:00
698ceacb74 chore: update ansible vault secrets and credentials
Updated encrypted vault.yml file with new credentials and
secrets for production infrastructure
2026-05-17 07:32:51 -04:00
52d444f731 feat(ansible): add hold_slayer database variables and deployment
- Add hold_slayer_db_* variables to portia host_vars
- Update postgresql deploy.yml to create user, database,
  and enable extensions for hold_slayer
2026-05-16 19:10:49 -04:00
b2fc398782 Move llama-cpp to generic fastagent slot 2026-05-12 15:07:00 -04:00
8c95173705 feat(alloy): add journal relabeling and kottos integration on puck
Introduce structured journal relabel rules on puck to tag Pallas-managed
units with {service, project, component} labels matching the Mnemosyne
and Daedalus schema. Add kottos release variable and vault secrets
example entries for the new Pallas FastAgent runtime.

Remove the defunct mnemosyne syslog listener now that Mnemosyne ships
JSON logs via the docker-socket pipeline.
2026-05-11 13:54:14 -04:00
e92ab80bbf feat(ansible): add Jellyfin service and improve deployment
- Add Jellyfin backend to HAProxy configuration on titania.incus
- Simplify deployment by using community.docker.docker_compose_v2 module
- Consolidate handlers and remove redundant Docker commands
- Update Jellyfin systemd service from oneshot to simple type
- Remove PUID/PGID environment variables from docker-compose template
2026-05-04 15:49:18 -04:00
f818b7917d feat(infra): add Jellyfin media server configuration and logging support
Add Jellyfin service to ansible inventory with hardware
transcoding and Casdoor SSO configuration. Configure
Alloy syslog listener to capture Jellyfin logs to Loki.
Update documentation with new service mapping and S3
bucket credential retrieval instructions.
2026-05-04 15:33:25 -04:00
b9ce14ff77 Docs: Update Ouranos to include new Umbriel instance 2026-05-03 19:35:55 -04:00
4ae6379613 chore(ansible): centralize third-party Docker image versions
Add centralized image version variables in group_vars/all/vars.yml for
vulnerability tracking and controlled upgrades of third-party Docker
images (casdoor, flower, grafana-mcp, gitea-mcp, neo4j, memcached,
nginx, oauth2-proxy, rabbitmq, searxng).

Update vault.yml accordingly.
2026-05-03 18:57:58 -04:00
2be323f27e Casdoor: Change to curl for healthcheck 2026-05-02 07:01:54 -04:00
14f026d0bb Docs: Pallas agents 2026-04-29 07:21:01 -04:00
0789edc31a Docs: Pallas Agents 2026-04-29 07:21:00 -04:00
2794822871 docs: add Django-specific Red Panda Standards addendum
Add `Red_Panda_Standards_Django_V1-01.md` which extends the main Red
Panda Standards with Django-specific conventions covering:

- Environment setup and pyproject.toml build backend (setuptools)
- Dependency pinning strategy (floor pin with ceiling)
- Project directory structure
- Settings, environment variable, and database configuration patterns
- Code organization, model, view, URL, and serializer conventions
- Authentication, permissions, and API design guidelines
- Testing standards and Docker/deployment practices
2026-04-20 09:37:01 -04:00
1509b81ce0 Docs: deleted outdated file 2026-04-18 13:38:57 -04:00
5251288975 Docs: Red Panda Standards Update regarding logging 2026-04-18 06:58:59 -04:00
072291929f Docs: Red Panda Standards Upate 2026-04-18 06:36:43 -04:00
60074612f3 PGAdmin setup steps corrections 2026-04-13 16:37:18 +00:00
6301facc1a Vault additions 2026-04-13 15:47:47 +00:00
f3f599a33a Vault formatting 2026-04-13 15:31:49 +00:00
d60b9a972f feat(ansible): add mnemosyne db and update ouranos documentation
- Configure mnemosyne database credentials in ansible inventory
- Update postgresql playbook to provision user and database
- Add setup instructions and DB list to documentation
2026-04-13 14:31:21 +00:00
2f5a445945 Ouranos Vault Mnemosyne DB password 2026-04-13 12:49:28 +00:00
9a9f7986fc HA Proxy config for Periplus 2026-04-11 23:30:15 +00:00
c31c86f3b2 Port updates for MCP servers 2026-04-11 18:48:21 +00:00
6f1e792522 Merge branch 'main' of ssh://git.helu.ca:22022/r/ouranos 2026-04-11 14:56:28 +00:00
bd03c53f6b chore(inventory): migrate jupyterlab service from puck to caliban
Enable JupyterLab on caliban host and disable it on puck host.
This migration updates the Ansible inventory host_vars to reflect
the new service distribution across the infrastructure.
2026-04-11 14:56:03 +00:00
b889b9d8f4 fix(ansible): update prometheus oauth2 vault secret variable names
Update variable references in the prospero.incus.yml inventory file to remove the redundant _oauth2 suffix from vault keys. This aligns the ansible configuration with the updated secret naming convention.
2026-04-11 10:32:39 -04:00
bbfb1cfe08 Vault updates 2026-04-11 09:23:37 -04:00
82f5e3e094 feat(ansible): add conditional git cloning and fix vault variable names
- Add repo URLs and conditional clone tasks for Agent-S, pulseaudio-module-xrdp, and rommie repositories
- Create required directories (github_dir and repo_dir) before cloning
- Update fetch/pull commands to only execute when repositories are not freshly cloned
- Fix vault variable naming inconsistencies in host_vars files (rosalind.incus.yml, titania.incus.yml)
2026-04-11 09:18:25 -04:00
915851acda chore(ansible): add pgadmin oauth client secrets to titania host vars
Add pgadmin_oauth_client_id and pgadmin_oauth_client_secret variables to the titania inventory. This enables OAuth2 authentication for pgAdmin on the titania host.
2026-04-11 09:05:56 -04:00
7430ecf2b8 Add Agent S, place docker before alloy 2026-04-11 08:54:54 -04:00
a34caba582 refactor(ansible): remove pgadmin database init and service tasks
Remove the Ansible tasks responsible for initializing the PgAdmin database
and starting the PgAdmin systemd service. These steps are no longer required
in the current deployment workflow.
2026-04-11 08:41:05 -04:00