refactor: remove HAProxy from Prospero, centralize TLS on Titania
Move TLS termination and reverse proxying entirely to Titania's HAProxy, eliminating the redundant HAProxy instance on Prospero. Backends now communicate over plain HTTP within the internal network. - Remove HAProxy container, config, certs, and syslog from Prospero - Remove ssl_backend flags from Titania backend definitions - Replace pplg_haproxy_* vars with single pplg_domain variable - Remove HAProxy syslog source from Alloy config - Update OAuth2-Proxy to listen on all interfaces for Titania access
This commit is contained in:
150
docs/pplg.md
150
docs/pplg.md
@@ -2,12 +2,11 @@
|
||||
|
||||
## Overview
|
||||
|
||||
PPLG is the consolidated observability and administration stack running on **Prospero**. It bundles PgAdmin, Prometheus, Loki, and Grafana behind an internal HAProxy for TLS termination, with Casdoor SSO for user-facing services and OAuth2-Proxy as a sidecar for Prometheus UI authentication.
|
||||
PPLG is the consolidated observability and administration stack running on **Prospero**. It bundles PgAdmin, Prometheus, Loki, and Grafana with Casdoor SSO for user-facing services and OAuth2-Proxy as a sidecar for Prometheus UI authentication. TLS termination is handled by Titania's HAProxy, which routes directly to each service on Prospero.
|
||||
|
||||
**Host:** prospero.incus
|
||||
**Role:** Observability
|
||||
**Incus Ports:** 25510 → 443 (HTTPS), 25511 → 80 (HTTP redirect)
|
||||
**External Access:** Via Titania HAProxy → `prospero.incus:443`
|
||||
**External Access:** Via Titania HAProxy → `prospero.incus` (direct to service ports)
|
||||
|
||||
| Subdomain | Service | Auth Method |
|
||||
|-----------|---------|-------------|
|
||||
@@ -23,33 +22,23 @@ PPLG is the consolidated observability and administration stack running on **Pro
|
||||
┌──────────┐ ┌────────────┐ ┌─────────────────────────────────────────────────┐
|
||||
│ Client │─────▶│ HAProxy │─────▶│ Prospero (PPLG) │
|
||||
│ │ │ (Titania) │ │ │
|
||||
└──────────┘ │ :443 → :443 │ ┌──────────────────────────────────────────┐ │
|
||||
└────────────┘ │ │ HAProxy (systemd, :443/:80) │ │
|
||||
│ │ TLS termination + subdomain routing │ │
|
||||
┌──────────┐ │ └───┬──────┬──────┬──────┬──────┬──────────┘ │
|
||||
│ Alloy │──push──────────────────────────▶│ │ │ │ │
|
||||
│ (agents) │ loki.ouranos.helu.ca │ │ │ │ │ │
|
||||
│ │ prometheus.ouranos.helu.ca │ │ │ │ │
|
||||
└──────────┘ │ ▼ ▼ ▼ ▼ ▼ │
|
||||
│ Grafana PgAdmin OAuth2 Loki Alertmanager │
|
||||
│ :3000 :5050 Proxy :3100 :9093 │
|
||||
│ :9091 │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ Prometheus │
|
||||
│ :9090 │
|
||||
└─────────────────────────────────────────────────┘
|
||||
└──────────┘ │ :443 TLS │ │ Grafana (:3000) — Casdoor OAuth │
|
||||
│ termination│ │ PgAdmin (:5050) — Casdoor OAuth │
|
||||
┌──────────┐ └────────────┘ │ OAuth2-Proxy (:9091) → Prometheus (:9090) │
|
||||
│ Alloy │─────────────────────────▶│ Loki (:3100) — no auth │
|
||||
│ (agents) │ │ Alertmanager (:9093) — no auth │
|
||||
└──────────┘ └─────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Traffic Flow
|
||||
|
||||
| Source | Destination | Path | Auth |
|
||||
|--------|-------------|------|------|
|
||||
| Browser → Grafana | Titania :443 → Prospero :443 → HAProxy → :3000 | Subdomain ACL | Casdoor OAuth |
|
||||
| Browser → PgAdmin | Titania :443 → Prospero :443 → HAProxy → :5050 | Subdomain ACL | Casdoor OAuth |
|
||||
| Browser → Prometheus | Titania :443 → Prospero :443 → HAProxy → OAuth2-Proxy :9091 → :9090 | Subdomain ACL | OAuth2-Proxy → Casdoor |
|
||||
| Alloy → Loki | `https://loki.ouranos.helu.ca` → HAProxy :443 → :3100 | Subdomain ACL | None |
|
||||
| Alloy → Prometheus | `https://prometheus.ouranos.helu.ca/api/v1/write` → HAProxy :443 → :9090 | `skip_auth_route` | None |
|
||||
| Browser → Grafana | Titania :443 → Prospero :3000 | Subdomain ACL | Casdoor OAuth |
|
||||
| Browser → PgAdmin | Titania :443 → Prospero :5050 | Subdomain ACL | Casdoor OAuth |
|
||||
| Browser → Prometheus | Titania :443 → Prospero :9091 (OAuth2-Proxy) → :9090 | Subdomain ACL | OAuth2-Proxy → Casdoor |
|
||||
| Alloy → Loki | Titania :443 → Prospero :3100 | Subdomain ACL | None |
|
||||
| Alloy → Prometheus | Titania :443 → Prospero :9091 → :9090 | `skip_auth_routes` | None |
|
||||
|
||||
## Deployment
|
||||
|
||||
@@ -72,7 +61,6 @@ ansible-playbook pplg/deploy.yml
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `pplg/deploy.yml` | Main consolidated deployment playbook |
|
||||
| `pplg/pplg-haproxy.cfg.j2` | HAProxy TLS termination config (5 backends) |
|
||||
| `pplg/prometheus.yml.j2` | Prometheus scrape configuration |
|
||||
| `pplg/alert_rules.yml.j2` | Prometheus alerting rules |
|
||||
| `pplg/alertmanager.yml.j2` | Alertmanager routing and Pushover notifications |
|
||||
@@ -88,15 +76,13 @@ ansible-playbook pplg/deploy.yml
|
||||
### Deployment Steps
|
||||
|
||||
1. **APT Repositories**: Add Grafana and PgAdmin repos
|
||||
2. **Install Packages**: haproxy, prometheus, loki, grafana, pgadmin4-web, gunicorn
|
||||
2. **Install Packages**: prometheus, loki, grafana, pgadmin4-web
|
||||
3. **Prometheus**: Config, alert rules, systemd override for remote write receiver
|
||||
4. **Alertmanager**: Install, config with Pushover integration
|
||||
5. **Loki**: Create user/dirs, template config
|
||||
6. **Grafana**: Provisioning (datasources, users, dashboards), OAuth config
|
||||
7. **PgAdmin**: Create user/dirs, gunicorn systemd service, Casdoor OAuth config
|
||||
8. **OAuth2-Proxy**: Download binary (v7.6.0), config for Prometheus sidecar
|
||||
9. **SSL Certificate**: Fetch Let's Encrypt wildcard cert from Titania (self-signed fallback)
|
||||
10. **HAProxy**: Template config, enable and start systemd service
|
||||
|
||||
### Deployment Order
|
||||
|
||||
@@ -298,35 +284,18 @@ Register in Casdoor Admin UI (`https://id.ouranos.helu.ca`) or add to `ansible/c
|
||||
| **Loki** | None | Machine-to-machine (Alloy agents push logs) |
|
||||
| **Alertmanager** | None | Internal only |
|
||||
|
||||
## HAProxy Configuration
|
||||
## OAuth2-Proxy skip_auth_routes
|
||||
|
||||
### Backends
|
||||
The Prometheus write API (`/api/v1/write`) and health check (`/ping`) are accessed by Alloy agents for machine-to-machine metric pushes. OAuth2-Proxy's `skip_auth_routes` config bypasses authentication for these paths:
|
||||
|
||||
| Backend | Upstream | Health Check | Auth |
|
||||
|---------|----------|-------------|------|
|
||||
| `backend_grafana` | `127.0.0.1:3000` | `GET /api/health` | Grafana OAuth |
|
||||
| `backend_pgadmin` | `127.0.0.1:5050` | `GET /misc/ping` | PgAdmin OAuth |
|
||||
| `backend_prometheus` | `127.0.0.1:9091` (OAuth2-Proxy) | `GET /ping` | OAuth2-Proxy |
|
||||
| `backend_prometheus_direct` | `127.0.0.1:9090` | — | None (write API) |
|
||||
| `backend_loki` | `127.0.0.1:3100` | `GET /ready` | None |
|
||||
| `backend_alertmanager` | `127.0.0.1:9093` | `GET /-/healthy` | None |
|
||||
|
||||
### skip_auth_route Pattern
|
||||
|
||||
The Prometheus write API (`/api/v1/write`) is accessed by Alloy agents for machine-to-machine metric pushes. HAProxy uses an ACL to bypass OAuth2-Proxy:
|
||||
|
||||
```
|
||||
acl is_prometheus_write path_beg /api/v1/write
|
||||
use_backend backend_prometheus_direct if host_prometheus is_prometheus_write
|
||||
```toml
|
||||
skip_auth_routes = [
|
||||
"^/ping$",
|
||||
"^/api/v1/write$"
|
||||
]
|
||||
```
|
||||
|
||||
This routes `https://prometheus.ouranos.helu.ca/api/v1/write` directly to Prometheus on `:9090`, while all other Prometheus traffic goes through OAuth2-Proxy on `:9091`.
|
||||
|
||||
### SSL Certificate
|
||||
|
||||
- **Primary**: Let's Encrypt wildcard cert (`*.ouranos.helu.ca`) fetched from Titania
|
||||
- **Fallback**: Self-signed cert generated on Prospero (if Titania unavailable)
|
||||
- **Path**: `/etc/haproxy/certs/ouranos.pem`
|
||||
This allows `https://prometheus.ouranos.helu.ca/api/v1/write` to reach Prometheus without OAuth, while all other Prometheus traffic requires Casdoor SSO authentication.
|
||||
|
||||
## Host Variables
|
||||
|
||||
@@ -340,7 +309,7 @@ services:
|
||||
```
|
||||
|
||||
Key variable groups defined in `prospero.incus.yml`:
|
||||
- PPLG HAProxy (user, group, uid/gid 800, syslog port)
|
||||
- PPLG domain (`ouranos.helu.ca`)
|
||||
- Grafana (datasources, users, OAuth config)
|
||||
- Prometheus (scrape targets, OAuth2-Proxy sidecar config)
|
||||
- Alertmanager (Pushover integration)
|
||||
@@ -348,56 +317,36 @@ Key variable groups defined in `prospero.incus.yml`:
|
||||
- PgAdmin (user, data/log directories, OAuth config)
|
||||
- Casdoor Metrics (access key/secret for Prometheus scraping)
|
||||
|
||||
## Terraform
|
||||
## Titania Backend Routing
|
||||
|
||||
### Prospero Port Mapping
|
||||
|
||||
```hcl
|
||||
devices = [
|
||||
{
|
||||
name = "https_internal"
|
||||
type = "proxy"
|
||||
properties = {
|
||||
listen = "tcp:0.0.0.0:25510"
|
||||
connect = "tcp:127.0.0.1:443"
|
||||
}
|
||||
},
|
||||
{
|
||||
name = "http_redirect"
|
||||
type = "proxy"
|
||||
properties = {
|
||||
listen = "tcp:0.0.0.0:25511"
|
||||
connect = "tcp:127.0.0.1:80"
|
||||
}
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
Run `terraform apply` before deploying if port mappings changed.
|
||||
|
||||
### Titania Backend Routing
|
||||
|
||||
Titania's HAProxy routes external subdomains to Prospero's HTTPS port:
|
||||
Titania's HAProxy routes external subdomains directly to Prospero service ports:
|
||||
|
||||
```yaml
|
||||
# In titania.incus.yml haproxy_backends
|
||||
- subdomain: "grafana"
|
||||
backend_host: "prospero.incus"
|
||||
backend_port: 443
|
||||
backend_port: 3000
|
||||
health_path: "/api/health"
|
||||
ssl_backend: true
|
||||
|
||||
- subdomain: "pgadmin"
|
||||
backend_host: "prospero.incus"
|
||||
backend_port: 443
|
||||
backend_port: 5050
|
||||
health_path: "/misc/ping"
|
||||
ssl_backend: true
|
||||
|
||||
- subdomain: "prometheus"
|
||||
backend_host: "prospero.incus"
|
||||
backend_port: 443
|
||||
backend_port: 9091 # OAuth2-Proxy sidecar
|
||||
health_path: "/ping"
|
||||
ssl_backend: true
|
||||
|
||||
- subdomain: "loki"
|
||||
backend_host: "prospero.incus"
|
||||
backend_port: 3100
|
||||
health_path: "/ready"
|
||||
|
||||
- subdomain: "alertmanager"
|
||||
backend_host: "prospero.incus"
|
||||
backend_port: 9093
|
||||
health_path: "/-/healthy"
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
@@ -406,7 +355,6 @@ Titania's HAProxy routes external subdomains to Prospero's HTTPS port:
|
||||
|
||||
**File:** `ansible/alloy/prospero/config.alloy.j2`
|
||||
|
||||
- **HAProxy Syslog**: `loki.source.syslog` on `127.0.0.1:51405` (TCP) receives Docker syslog from HAProxy container
|
||||
- **Journal Labels**: Dedicated job labels for `grafana-server`, `prometheus`, `loki`, `alertmanager`, `pgadmin`, `oauth2-proxy-prometheus`
|
||||
- **System Logs**: `/var/log/syslog`, `/var/log/auth.log` → Loki
|
||||
- **Metrics**: Node exporter + process exporter → Prometheus remote write
|
||||
@@ -477,22 +425,11 @@ ssh prospero.incus
|
||||
sudo systemctl status prometheus grafana-server loki prometheus-alertmanager pgadmin oauth2-proxy-prometheus
|
||||
```
|
||||
|
||||
### HAProxy Service
|
||||
|
||||
```bash
|
||||
ssh prospero.incus
|
||||
sudo systemctl status haproxy
|
||||
sudo journalctl -u haproxy -f
|
||||
```
|
||||
|
||||
### View Logs
|
||||
|
||||
```bash
|
||||
# All PPLG services via journal
|
||||
sudo journalctl -u prometheus -u grafana-server -u loki -u prometheus-alertmanager -u pgadmin -u oauth2-proxy-prometheus -f
|
||||
|
||||
# HAProxy logs (shipped via syslog to Alloy → Loki)
|
||||
# Query in Grafana: {job="pplg-haproxy"}
|
||||
```
|
||||
|
||||
### Test Endpoints (from Prospero)
|
||||
@@ -512,18 +449,17 @@ curl -s http://127.0.0.1:3100/ready
|
||||
|
||||
# Alertmanager
|
||||
curl -s http://127.0.0.1:9093/-/healthy
|
||||
|
||||
# HAProxy stats
|
||||
curl -s http://127.0.0.1:8404/metrics | head
|
||||
```
|
||||
|
||||
### Test TLS (from any host)
|
||||
### Test External Access (from any host)
|
||||
|
||||
```bash
|
||||
# Direct to Prospero container
|
||||
curl -sk https://prospero.incus/api/health
|
||||
# Via Titania HAProxy
|
||||
curl -s https://grafana.ouranos.helu.ca/api/health
|
||||
curl -s https://pgadmin.ouranos.helu.ca/misc/ping
|
||||
curl -s https://prometheus.ouranos.helu.ca/ping
|
||||
curl -s https://loki.ouranos.helu.ca/ready
|
||||
curl -s https://alertmanager.ouranos.helu.ca/-/healthy
|
||||
```
|
||||
|
||||
### Common Errors
|
||||
|
||||
Reference in New Issue
Block a user