docs: add application conventions for health checks, logging, and endpoints

Establish standardized conventions across all Ouranos services:
- Kubernetes-style health endpoints (/live, /ready, /metrics)
- Logging level guidelines (health checks at DEBUG only)
- Protected vs unprotected endpoint definitions
- Prometheus metrics, browser telemetry, and Docker networking standards
- Update daedalus HAProxy health_path from /api/health to /ready/
This commit is contained in:
2026-04-10 11:29:56 +00:00
parent 257e743d9a
commit bd31dfd8d5
3 changed files with 153 additions and 9 deletions

View File

@@ -125,6 +125,79 @@ When a background worker (Celery task consumer, RabbitMQ subscriber, Gitea Runne
---
## Health Check Endpoints
All services MUST expose Kubernetes-style health endpoints at these paths:
| Endpoint | Purpose | Auth |
|----------|---------|------|
| `GET /live` | **Liveness** — process is running and accepting connections | None |
| `GET /ready` | **Readiness** — process is running AND all dependencies (DB, cache, upstream APIs) are healthy | None |
| `GET /metrics` | Prometheus metrics | IP-restricted (no JWT) |
- HAProxy uses `health_path: /ready/` for backend health checks — return HTTP 200 when ready
- Health endpoints MUST NOT require authentication
- Third-party services use their native paths (`/api/health`, `/api/healthz`, `/-/healthy`, etc.)
### Docker Compose Healthchecks
Use `curl -f` (install curl in images if needed). Do not use `wget --spider`.
```yaml
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/live"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
```
---
## Endpoint Protection
| Protected (require valid JWT) | Unprotected |
|-------------------------------|-------------|
| All `/api/v1/*` routes | `GET /live` |
| | `GET /ready` |
| | `GET /metrics` (IP-restricted to internal networks) |
| | `GET /api/auth/login-url` |
| | `POST /api/auth/token` |
| | `POST /api/v1/telemetry` (sendBeacon cannot set headers) |
> **Why `/api/v1/telemetry` is unprotected**: The browser `sendBeacon` API cannot set `Authorization` headers. The telemetry endpoint must be open to receive client-side error reports and performance data, or browser errors will be silently lost.
---
## Prometheus Metrics
All services SHOULD expose `GET /metrics` in Prometheus exposition format, scraped by Prospero's Prometheus at 15s intervals.
- **IP-restricted** to internal networks: `10.10.0.0/24`, `172.16.0.0/12`, `127.0.0.0/8`
- No JWT required — HAProxy and Prometheus scrapers cannot authenticate
- Useful metrics to expose: request totals and durations, error rates, active connections, queue depths, dependency health
---
## Browser Telemetry
Frontend/browser code MUST report errors and performance data back to the server.
- Send to `POST /api/v1/telemetry` — unprotected endpoint
- Capture: JavaScript exceptions, promise rejections, resource load failures, performance metrics
- The server MUST log client-side exceptions at **WARNING** level (they indicate user-facing problems but are not server failures)
- Include enough context to reproduce: URL, user agent, error message, stack trace (if available)
---
## Docker Networking
- Use the **default Docker bridge network** for simple deployments
- Add additional named networks only when required (e.g., isolating database traffic) or explicitly requested
- Do not define custom networks for single-service Docker Compose stacks
---
## Documentation Standards
Place documentation in the `/docs/` directory of the repository.
@@ -138,11 +211,3 @@ HTML documents must follow [docs/documentation_style_guide.html](documentation_s
- Use Bootstrap Icons for icons
- Use Bootstrap CSS for styles — avoid custom CSS
- Use **Mermaid** for diagrams
### Markdown Documents
Only these status symbols are approved:
- ✔ Success/Complete
- ❌ Error/Failed
- ⚠️ Warning/Caution
- Information/Note