fix(certbot): harden renewal hook and fix permission errors

The renewal deploy-hook ran as the certbot user but lacked permissions to
write the combined PEM to /etc/haproxy/certs and to reload HAProxy,
causing silent failures that left a stale certificate in production until
expiry.

- Add certbot user to the haproxy group so it can write the combined PEM
- Grant certbot NOPASSWD sudo for `systemctl reload haproxy` only
- Make the Prometheus textfile directory group-owned by certbot (0775)
  so cert-metrics.sh can atomically update ssl_cert.prom
- Refactor renewal-hook.sh to always refresh cert metrics on exit via a
  trap, ensuring expiry alerts fire when the hook itself is broken
- Replace `set -e` with explicit error handling and structured logging
This commit is contained in:
2026-06-17 09:58:46 -04:00
parent 2f5a15eef5
commit 343b0e13d6
10 changed files with 665 additions and 46 deletions

View File

@@ -86,6 +86,19 @@
groups: "{{ certbot_group }}"
append: true
# The renewal deploy-hook runs as the certbot user and writes the combined
# PEM into the group-writable /etc/haproxy/certs (mode 0770, owned by the
# haproxy group). certbot must be a member of that group, otherwise the
# hook fails with "Permission denied" and HAProxy serves a stale cert until
# it expires.
- name: Add certbot user to the haproxy group
become: true
ansible.builtin.user:
name: "{{ certbot_user }}"
groups: "{{ haproxy_group }}"
append: true
when: "'haproxy' in services | default([])"
# -------------------------------------------------------------------------
# Directory Structure
# -------------------------------------------------------------------------
@@ -178,14 +191,32 @@
group: "{{ certbot_group }}"
mode: '0750'
# Group-owned by certbot and group-writable so cert-metrics.sh (run as the
# certbot user from the renewal hook) can atomically write ssl_cert.prom.
# node-exporter only needs to read these files, which 0775 still allows.
# The renewal hook reloads HAProxy after installing a new cert, but runs as
# the unprivileged certbot user. Grant exactly `systemctl reload haproxy`
# via sudo — nothing more. visudo validation prevents a malformed drop-in
# from locking out sudo.
- name: Allow certbot to reload HAProxy via sudo
become: true
ansible.builtin.copy:
dest: /etc/sudoers.d/certbot-haproxy-reload
content: "{{ certbot_user }} ALL=(root) NOPASSWD: /usr/bin/systemctl reload haproxy\n"
owner: root
group: root
mode: '0440'
validate: visudo -cf %s
when: "'haproxy' in services | default([])"
- name: Create Prometheus textfile directory
become: true
ansible.builtin.file:
path: "{{ prometheus_node_exporter_text_directory }}"
state: directory
owner: root
group: root
mode: '0755'
group: "{{ certbot_group }}"
mode: '0775'
- name: Template certificate metrics script
become: true