diff --git a/docs/red_panda_standards.md b/docs/red_panda_standards.md index 405b695..3bddd4a 100644 --- a/docs/red_panda_standards.md +++ b/docs/red_panda_standards.md @@ -66,6 +66,7 @@ These are explicit violations of Ouranos logging standards: **Implementation guidance:** - **Django / Gunicorn**: Filter health paths in the access log handler or use middleware that skips logging for probe user-agents. +- **FastAPI / Uvicorn**: Add a `logging.Filter` on the `uvicorn.access` logger that matches health paths in the access log message. Uvicorn's access log format includes the full request line in quotes (e.g., `"GET /live HTTP/1.1"`), so filter regexes must account for that. See also the structured logging notes below. - **Docker services**: Configure the application's internal logging to exclude health routes — the syslog driver forwards everything it receives. - **HAProxy**: HAProxy's own health check logs (`option httpchk`) should remain at the HAProxy level for connection debugging, but backend application responses to those probes must not surface at INFO. @@ -92,6 +93,7 @@ When a background worker (Celery task consumer, RabbitMQ subscriber, Gitea Runne | Service Category | Default Level | Rationale | |-----------------|---------------|-----------| | Django apps (Angelia, Athena, Kairos, Icarlos, Spelunker, Peitho, MCP Switchboard) | `WARNING` | Business logic — only degraded or broken conditions surface. Lifecycle events (start/stop/deploy) still log at INFO via Gunicorn and systemd. | +| FastAPI apps (Periplus) | `WARNING` | Same rationale as Django. Uvicorn lifecycle events (start/stop) are pinned to INFO via the `uvicorn.error` logger regardless of app log level. | | Gunicorn access logs | Suppress 2xx/3xx health probes | Routine request logging deferred to HAProxy access logs in Loki. | | Infrastructure agents (Alloy, Prometheus, Node Exporter) | `warn` | Stable — do not change without cause. | | HAProxy (Titania) | `warning` | Connection-level logging handled by HAProxy's own log format → Alloy → Loki. | @@ -100,6 +102,20 @@ When a background worker (Celery task consumer, RabbitMQ subscriber, Gitea Runne | LLM Proxy (Arke) | `info` | Token usage tracking and provider routing decisions justify INFO. Review periodically for noise. | | Observability stack (Grafana, Loki, AlertManager) | `warn` | Should be quiet unless something is wrong with observability itself. | +### Structured Logging — FastAPI / Uvicorn + +FastAPI apps using uvicorn require special handling to achieve JSON-structured log output for the Alloy → Loki pipeline. Uvicorn manages its own loggers aggressively, and naive approaches will fail silently. + +**Required practices:** + +1. **Override uvicorn's handlers, don't just add to root** — Uvicorn's `config.load()` creates its own `StreamHandler` instances on `uvicorn`, `uvicorn.error`, and `uvicorn.access`. You must remove these handlers and set `propagate = True` so log records flow to the root logger where your JSON formatter lives. + +2. **Re-apply logging config in the lifespan** — Configuring logging at module import time is not sufficient. Uvicorn's `config.load()` runs *after* your module is imported but *before* the ASGI lifespan starts. Call your logging configuration function again inside the FastAPI `lifespan` context manager to recapture control. + +3. **Remap uvicorn logger names** — Uvicorn uses `uvicorn.error` for all lifecycle messages (startup, shutdown, errors) despite the misleading name. Remap it to `uvicorn` in your JSON formatter's output for clarity in Loki queries. + +4. **Use `pydantic-settings` with `extra = "ignore"`** — When loading config from `.env` files that contain variables for other services (e.g., oauth2-proxy), pydantic-settings will reject unknown fields by default. Always set `extra = "ignore"` in the model config. + ### Loki & Grafana Alignment **Label normalization**: Alloy pipelines (syslog listeners and journal relabeling) MUST extract and forward a `level` label on every log line. Without a `level` label, the log entry is invisible to level-based dashboard filters and alert rules. @@ -190,6 +206,19 @@ Frontend/browser code MUST report errors and performance data back to the server --- +## Environment Variable Naming + +All environment variables for an application MUST use a consistent prefix matching the service name (e.g., `PERIPLUS_`, `ARKE_`, `ANGELIA_`). This applies to every variable in the `.env` file, including those consumed by sidecar services like oauth2-proxy. + +**Rules:** +- All vars in `.env` use the `SERVICENAME_` prefix — no exceptions +- `compose.yaml` maps prefixed vars to the sidecar's expected names (e.g., `OAUTH2_PROXY_CLIENT_ID: ${PERIPLUS_CASDOOR_CLIENT_ID}`) +- The application's Settings model SHOULD declare all prefixed vars, even those only consumed by sidecars, so the full configuration is documented in one place +- Every repo MUST include a `.env.example` with placeholder values for all required variables. Add `!.env.example` to `.gitignore` if a broad `.env.*` pattern would otherwise exclude it +- `.env` files with real secrets are ALWAYS gitignored — no exceptions + +--- + ## Docker Networking - Use the **default Docker bridge network** for simple deployments