The renewal deploy-hook ran as the certbot user but lacked permissions to write the combined PEM to /etc/haproxy/certs and to reload HAProxy, causing silent failures that left a stale certificate in production until expiry. - Add certbot user to the haproxy group so it can write the combined PEM - Grant certbot NOPASSWD sudo for `systemctl reload haproxy` only - Make the Prometheus textfile directory group-owned by certbot (0775) so cert-metrics.sh can atomically update ssl_cert.prom - Refactor renewal-hook.sh to always refresh cert metrics on exit via a trap, ensuring expiry alerts fire when the hook itself is broken - Replace `set -e` with explicit error handling and structured logging
8.4 KiB
Certbot DNS-01 with Namecheap
This playbook deploys certbot with the Namecheap DNS plugin for DNS-01 validation, enabling wildcard SSL certificates.
Overview
| Component | Value |
|---|---|
| Installation | Python virtualenv in /srv/certbot/.venv |
| DNS Plugin | certbot-dns-namecheap |
| Validation | DNS-01 (supports wildcards) |
| Renewal | Systemd timer (twice daily), runs as the certbot user |
| Certificate Output | Combined PEM at haproxy_cert_path (Titania: /etc/haproxy/certs/ouranos.pem) |
| HAProxy Reload | systemctl reload haproxy (native systemd, not Docker) |
| Metrics | Prometheus textfile collector |
Deployments
Titania (ouranos.helu.ca)
Production deployment providing Let's Encrypt certificates for the Ouranos sandbox HAProxy reverse proxy.
| Setting | Value |
|---|---|
| Host | titania.incus |
| Domain | ouranos.helu.ca |
| Wildcard | *.ouranos.helu.ca |
| webmaster@helu.ca | |
| HAProxy | Port 443 (HTTPS), Port 80 (HTTP redirect) |
| Renewal | Twice daily, automatic HAProxy reload |
Other Deployments
The playbook can be deployed to any host with HAProxy. See the example configuration for hippocamp.helu.ca (d.helu.ca domain) below.
Prerequisites
- Namecheap API Access enabled on your account
- Namecheap API key generated
- IP whitelisted in Namecheap API settings
- Ansible Vault configured with Namecheap credentials
Setup
1. Add Secrets to Ansible Vault
Add Namecheap credentials to ansible/inventory/group_vars/all/vault.yml:
ansible-vault edit inventory/group_vars/all/vault.yml
Add the following variables:
vault_namecheap_username: "your_namecheap_username"
vault_namecheap_api_key: "your_namecheap_api_key"
Map these in inventory/group_vars/all/vars.yml:
namecheap_username: "{{ vault_namecheap_username }}"
namecheap_api_key: "{{ vault_namecheap_api_key }}"
2. Configure Host Variables
For Titania, the configuration is in inventory/host_vars/titania.incus.yml:
services:
- certbot
- haproxy
# ...
certbot_email: webmaster@helu.ca
certbot_certificates:
- cert_name: wildcard.ouranos.helu.ca
domains: ["*.ouranos.helu.ca", "ouranos.helu.ca"]
# Where the renewal hook writes the combined fullchain+privkey PEM for HAProxy
haproxy_cert_path: /etc/haproxy/certs/ouranos.pem
The certbot lineage name is
wildcard.ouranos.helu.ca, so the certbot config lives under/srv/certbot/config/live/wildcard.ouranos.helu.ca/. The combined PEM that HAProxy actually serves is a separate file athaproxy_cert_path(ouranos.pem) written by the renewal hook — do not confuse the two.The playbook also supports the single-cert form (
certbot_cert_name+certbot_domains) for hosts with one certificate.
3. Deploy
cd ansible
ansible-playbook certbot/deploy.yml --limit titania.incus
Files Created
| Path | Purpose |
|---|---|
/srv/certbot/.venv/ |
Python virtualenv with certbot |
/srv/certbot/config/ |
Certbot configuration and certificates |
/srv/certbot/credentials/namecheap.ini |
Namecheap API credentials (600 perms) |
/srv/certbot/hooks/renewal-hook.sh |
Post-renewal script |
/srv/certbot/hooks/cert-metrics.sh |
Prometheus metrics script |
/etc/haproxy/certs/ouranos.pem |
Combined cert for HAProxy (Titania), written by the renewal hook |
/etc/sudoers.d/certbot-haproxy-reload |
Scoped sudo rule letting certbot run systemctl reload haproxy |
/etc/systemd/system/certbot-renew.service |
Renewal service unit (runs as the certbot user) |
/etc/systemd/system/certbot-renew.timer |
Twice-daily renewal timer |
Renewal Process
- Systemd timer triggers at 00:00 and 12:00 (with random delay up to 1 hour)
- Certbot checks if certificate needs renewal (within 30 days of expiry)
- If renewal needed:
- Creates DNS TXT record via Namecheap API
- Waits 120 seconds for propagation
- Validates and downloads new certificate
- Runs
renewal-hook.sh
- Renewal hook (
renewal-hook.sh, run via certbot's--deploy-hook):- Combines fullchain + privkey into the HAProxy PEM at
haproxy_cert_path - Reloads native HAProxy via
sudo -n systemctl reload haproxy - Always refreshes Prometheus metrics (even on failure — see below)
- Combines fullchain + privkey into the HAProxy PEM at
HAProxy on Titania runs natively under systemd, not in Docker. The hook reloads it with
systemctl reload haproxy. (Only Casdoor runs in Docker on Titania.)
Permission model (why renewals can silently fail)
The renewal timer runs the hook as the unprivileged certbot user, so three
permissions must line up or the renewed cert never reaches HAProxy:
| Resource | Required state | Provided by |
|---|---|---|
/etc/haproxy/certs |
0770, group haproxy; certbot is a member of haproxy |
haproxy/deploy.yml (mode) + certbot/deploy.yml (group membership) |
systemctl reload haproxy |
allowed for certbot via sudo |
/etc/sudoers.d/certbot-haproxy-reload |
| Prometheus textfile dir | group-writable by certbot |
certbot/deploy.yml |
If any of these is wrong, the hook fails. Certbot treats a deploy-hook failure as a non-fatal WARNING and still reports "renewals succeeded" — so a broken hook will let the live cert renew while HAProxy keeps serving the old file until it expires. To make this visible, the hook now:
- checks each step and exits non-zero with an explicit
serving a STALE certificateerror (surfaced in the certbot/journal output), and - refreshes the Prometheus cert metrics on every exit, so the
SSLCertificateExpiringSoon/SSLCertificateExpiredalerts keep reflecting reality even when installation fails.
Prometheus Metrics
Metrics written to /var/lib/prometheus/node-exporter/ssl_cert.prom:
| Metric | Description |
|---|---|
ssl_certificate_expiry_timestamp |
Unix timestamp when cert expires |
ssl_certificate_expiry_seconds |
Seconds until cert expires |
ssl_certificate_valid |
1 if valid, 0 if expired/missing |
Example alert rule:
- alert: SSLCertificateExpiringSoon
expr: ssl_certificate_expiry_seconds < 604800 # 7 days
for: 1h
labels:
severity: warning
annotations:
summary: "SSL certificate expiring soon"
description: "Certificate for {{ $labels.domain }} expires in {{ $value | humanizeDuration }}"
Troubleshooting
View Certificate Status
# Check expiry of the cert HAProxy actually serves (Titania)
sudo openssl x509 -enddate -noout -in /etc/haproxy/certs/ouranos.pem
# Confirm HAProxy is serving it on the wire
echo | openssl s_client -connect titania.incus:8443 \
-servername grafana.ouranos.helu.ca 2>/dev/null \
| openssl x509 -noout -enddate -issuer
# Check the underlying certbot lineage (may be newer than the served file
# if the deploy hook failed to install it)
sudo openssl x509 -enddate -noout \
-in /srv/certbot/config/live/wildcard.ouranos.helu.ca/fullchain.pem
# Check certbot certificates
sudo -u certbot /srv/certbot/.venv/bin/certbot certificates \
--config-dir /srv/certbot/config
If the served file is older than the certbot lineage, the deploy hook is failing to install renewals. Check the hook output:
sudo grep -i hook /srv/certbot/logs/letsencrypt.log*— look forPermission denied,reload failed, orserving a STALE certificate.
Manual Renewal Test
# Dry run renewal
sudo -u certbot /srv/certbot/.venv/bin/certbot renew \
--config-dir /srv/certbot/config \
--work-dir /srv/certbot/work \
--logs-dir /srv/certbot/logs \
--dry-run
# Force renewal (if needed)
sudo -u certbot /srv/certbot/.venv/bin/certbot renew \
--config-dir /srv/certbot/config \
--work-dir /srv/certbot/work \
--logs-dir /srv/certbot/logs \
--force-renewal
Check Systemd Timer
# Timer status
systemctl status certbot-renew.timer
# Last run
journalctl -u certbot-renew.service --since "1 day ago"
# List timers
systemctl list-timers certbot-renew.timer
DNS Propagation Issues
If certificate requests fail due to DNS propagation:
- Check Namecheap API is accessible
- Verify IP is whitelisted
- Increase propagation wait time (default 120s)
- Check certbot logs:
/srv/certbot/logs/letsencrypt.log
Related Playbooks
haproxy/deploy.yml- Depends on certificate from certbotprometheus/node_deploy.yml- Deploys node_exporter for metrics collection