fix(certbot): harden renewal hook and fix permission errors

The renewal deploy-hook ran as the certbot user but lacked permissions to
write the combined PEM to /etc/haproxy/certs and to reload HAProxy,
causing silent failures that left a stale certificate in production until
expiry.

- Add certbot user to the haproxy group so it can write the combined PEM
- Grant certbot NOPASSWD sudo for `systemctl reload haproxy` only
- Make the Prometheus textfile directory group-owned by certbot (0775)
  so cert-metrics.sh can atomically update ssl_cert.prom
- Refactor renewal-hook.sh to always refresh cert metrics on exit via a
  trap, ensuring expiry alerts fire when the hook itself is broken
- Replace `set -e` with explicit error handling and structured logging
This commit is contained in:
2026-06-17 09:58:46 -04:00
parent 2f5a15eef5
commit 343b0e13d6
10 changed files with 665 additions and 46 deletions

View File

@@ -374,10 +374,10 @@ MinIO specifically expects certs at `~/.minio/certs/public.crt` and `~/.minio/ce
| Certbot location | On the host itself | OCI free host |
| Namecheap credentials | On the host | Only on OCI host |
| Cert delivery | Direct to HAProxy | Via OCI Vault → Ansible |
| Renewal hook | Docker HAProxy reload | OCI Vault upload |
| Renewal hook | Combine PEM + reload HAProxy | OCI Vault upload |
| Distribution | N/A (local only) | Ansible cron on controller |
| Environments served | Ouranos sandbox only | All environments |
| Service reload | `docker compose kill -s HUP` | `systemctl reload` per host_vars |
| Service reload | `systemctl reload haproxy` (native, via scoped sudo) | `systemctl reload` per host_vars |
Titania can remain self-contained (it's working) or migrate to this centralized model later.