fix(certbot): harden renewal hook and fix permission errors
The renewal deploy-hook ran as the certbot user but lacked permissions to write the combined PEM to /etc/haproxy/certs and to reload HAProxy, causing silent failures that left a stale certificate in production until expiry. - Add certbot user to the haproxy group so it can write the combined PEM - Grant certbot NOPASSWD sudo for `systemctl reload haproxy` only - Make the Prometheus textfile directory group-owned by certbot (0775) so cert-metrics.sh can atomically update ssl_cert.prom - Refactor renewal-hook.sh to always refresh cert metrics on exit via a trap, ensuring expiry alerts fire when the hook itself is broken - Replace `set -e` with explicit error handling and structured logging
This commit is contained in:
30
docs/pplg.md
30
docs/pplg.md
@@ -484,17 +484,35 @@ vault_casdoor_prometheus_access_key: "your-casdoor-access-key"
|
||||
vault_casdoor_prometheus_access_secret: "your-casdoor-access-secret"
|
||||
```
|
||||
|
||||
#### Certificate fetch fails
|
||||
#### TLS cert expired / not renewing on `*.ouranos.helu.ca`
|
||||
|
||||
**Cause**: Titania not running or certbot hasn't provisioned the cert yet.
|
||||
TLS for all PPLG subdomains is terminated by **Titania's native HAProxy** using
|
||||
the Let's Encrypt wildcard cert managed by certbot on Titania (see
|
||||
[certbot DNS-01 with Namecheap](cerbot.md)). PPLG itself holds no cert.
|
||||
|
||||
**Fix**: Ensure Titania is up and certbot has run:
|
||||
**Most likely cause**: certbot renewed the lineage but the deploy hook failed to
|
||||
install the new cert into HAProxy's served PEM (`/etc/haproxy/certs/ouranos.pem`),
|
||||
so HAProxy keeps serving the old file until it expires. Certbot reports such hook
|
||||
failures only as a WARNING, so the renewal looks successful.
|
||||
|
||||
**Diagnose** (on Titania):
|
||||
```bash
|
||||
ansible-playbook sandbox_up.yml
|
||||
ansible-playbook certbot/deploy.yml
|
||||
# Does the served file match the certbot lineage?
|
||||
sudo openssl x509 -enddate -noout -in /etc/haproxy/certs/ouranos.pem
|
||||
sudo openssl x509 -enddate -noout \
|
||||
-in /srv/certbot/config/live/wildcard.ouranos.helu.ca/fullchain.pem
|
||||
|
||||
# Look for a failing hook
|
||||
sudo grep -iE 'hook|Permission denied|reload failed|STALE' /srv/certbot/logs/letsencrypt.log*
|
||||
```
|
||||
|
||||
The playbook falls back to a self-signed certificate if Titania is unavailable.
|
||||
**Fix**: re-run the playbooks (in this order) and force a renewal to reinstall:
|
||||
```bash
|
||||
ansible-playbook haproxy/deploy.yml --limit titania.incus
|
||||
ansible-playbook certbot/deploy.yml --limit titania.incus
|
||||
```
|
||||
See the certbot doc's [permission model](cerbot.md#permission-model-why-renewals-can-silently-fail)
|
||||
for the `certbot`-user permissions the hook depends on.
|
||||
|
||||
#### OAuth2 redirect loops
|
||||
|
||||
|
||||
Reference in New Issue
Block a user