fix(certbot): harden renewal hook and fix permission errors

The renewal deploy-hook ran as the certbot user but lacked permissions to
write the combined PEM to /etc/haproxy/certs and to reload HAProxy,
causing silent failures that left a stale certificate in production until
expiry.

- Add certbot user to the haproxy group so it can write the combined PEM
- Grant certbot NOPASSWD sudo for `systemctl reload haproxy` only
- Make the Prometheus textfile directory group-owned by certbot (0775)
  so cert-metrics.sh can atomically update ssl_cert.prom
- Refactor renewal-hook.sh to always refresh cert metrics on exit via a
  trap, ensuring expiry alerts fire when the hook itself is broken
- Replace `set -e` with explicit error handling and structured logging
This commit is contained in:
2026-06-17 09:58:46 -04:00
parent 2f5a15eef5
commit 343b0e13d6
10 changed files with 665 additions and 46 deletions

View File

@@ -23,7 +23,6 @@ alloy_log_level: "warn"
rommie_port: 20361
rommie_host: "0.0.0.0"
rommie_display: ":10"
rommie_allowed_hosts: "caliban.incus,rommie.ouranos.helu.ca"
rommie_model: Qwen3.6-27B-Q5_K_M
rommie_model_url: "http://nyx.helu.ca:29000"
rommie_provider: "openai"

View File

@@ -163,3 +163,11 @@ mnemosyne_app_metrics_host: caliban.incus
mnemosyne_app_metrics_port: 23181
mnemosyne_web_metrics_host: caliban.incus
mnemosyne_web_metrics_port: 23191
# Athena — two scrape targets (same shape as Mnemosyne):
# app: Django /metrics via nginx (django-prometheus)
# web: nginx-prometheus-exporter sidecar (nginx stub_status → Prometheus format)
athena_app_metrics_host: puck.incus
athena_app_metrics_port: 22481
athena_web_metrics_host: puck.incus
athena_web_metrics_port: 22491