Update Red Panda Standards Doc
This commit is contained in:
@@ -66,6 +66,7 @@ These are explicit violations of Ouranos logging standards:
|
|||||||
|
|
||||||
**Implementation guidance:**
|
**Implementation guidance:**
|
||||||
- **Django / Gunicorn**: Filter health paths in the access log handler or use middleware that skips logging for probe user-agents.
|
- **Django / Gunicorn**: Filter health paths in the access log handler or use middleware that skips logging for probe user-agents.
|
||||||
|
- **FastAPI / Uvicorn**: Add a `logging.Filter` on the `uvicorn.access` logger that matches health paths in the access log message. Uvicorn's access log format includes the full request line in quotes (e.g., `"GET /live HTTP/1.1"`), so filter regexes must account for that. See also the structured logging notes below.
|
||||||
- **Docker services**: Configure the application's internal logging to exclude health routes — the syslog driver forwards everything it receives.
|
- **Docker services**: Configure the application's internal logging to exclude health routes — the syslog driver forwards everything it receives.
|
||||||
- **HAProxy**: HAProxy's own health check logs (`option httpchk`) should remain at the HAProxy level for connection debugging, but backend application responses to those probes must not surface at INFO.
|
- **HAProxy**: HAProxy's own health check logs (`option httpchk`) should remain at the HAProxy level for connection debugging, but backend application responses to those probes must not surface at INFO.
|
||||||
|
|
||||||
@@ -92,6 +93,7 @@ When a background worker (Celery task consumer, RabbitMQ subscriber, Gitea Runne
|
|||||||
| Service Category | Default Level | Rationale |
|
| Service Category | Default Level | Rationale |
|
||||||
|-----------------|---------------|-----------|
|
|-----------------|---------------|-----------|
|
||||||
| Django apps (Angelia, Athena, Kairos, Icarlos, Spelunker, Peitho, MCP Switchboard) | `WARNING` | Business logic — only degraded or broken conditions surface. Lifecycle events (start/stop/deploy) still log at INFO via Gunicorn and systemd. |
|
| Django apps (Angelia, Athena, Kairos, Icarlos, Spelunker, Peitho, MCP Switchboard) | `WARNING` | Business logic — only degraded or broken conditions surface. Lifecycle events (start/stop/deploy) still log at INFO via Gunicorn and systemd. |
|
||||||
|
| FastAPI apps (Periplus) | `WARNING` | Same rationale as Django. Uvicorn lifecycle events (start/stop) are pinned to INFO via the `uvicorn.error` logger regardless of app log level. |
|
||||||
| Gunicorn access logs | Suppress 2xx/3xx health probes | Routine request logging deferred to HAProxy access logs in Loki. |
|
| Gunicorn access logs | Suppress 2xx/3xx health probes | Routine request logging deferred to HAProxy access logs in Loki. |
|
||||||
| Infrastructure agents (Alloy, Prometheus, Node Exporter) | `warn` | Stable — do not change without cause. |
|
| Infrastructure agents (Alloy, Prometheus, Node Exporter) | `warn` | Stable — do not change without cause. |
|
||||||
| HAProxy (Titania) | `warning` | Connection-level logging handled by HAProxy's own log format → Alloy → Loki. |
|
| HAProxy (Titania) | `warning` | Connection-level logging handled by HAProxy's own log format → Alloy → Loki. |
|
||||||
@@ -100,6 +102,20 @@ When a background worker (Celery task consumer, RabbitMQ subscriber, Gitea Runne
|
|||||||
| LLM Proxy (Arke) | `info` | Token usage tracking and provider routing decisions justify INFO. Review periodically for noise. |
|
| LLM Proxy (Arke) | `info` | Token usage tracking and provider routing decisions justify INFO. Review periodically for noise. |
|
||||||
| Observability stack (Grafana, Loki, AlertManager) | `warn` | Should be quiet unless something is wrong with observability itself. |
|
| Observability stack (Grafana, Loki, AlertManager) | `warn` | Should be quiet unless something is wrong with observability itself. |
|
||||||
|
|
||||||
|
### Structured Logging — FastAPI / Uvicorn
|
||||||
|
|
||||||
|
FastAPI apps using uvicorn require special handling to achieve JSON-structured log output for the Alloy → Loki pipeline. Uvicorn manages its own loggers aggressively, and naive approaches will fail silently.
|
||||||
|
|
||||||
|
**Required practices:**
|
||||||
|
|
||||||
|
1. **Override uvicorn's handlers, don't just add to root** — Uvicorn's `config.load()` creates its own `StreamHandler` instances on `uvicorn`, `uvicorn.error`, and `uvicorn.access`. You must remove these handlers and set `propagate = True` so log records flow to the root logger where your JSON formatter lives.
|
||||||
|
|
||||||
|
2. **Re-apply logging config in the lifespan** — Configuring logging at module import time is not sufficient. Uvicorn's `config.load()` runs *after* your module is imported but *before* the ASGI lifespan starts. Call your logging configuration function again inside the FastAPI `lifespan` context manager to recapture control.
|
||||||
|
|
||||||
|
3. **Remap uvicorn logger names** — Uvicorn uses `uvicorn.error` for all lifecycle messages (startup, shutdown, errors) despite the misleading name. Remap it to `uvicorn` in your JSON formatter's output for clarity in Loki queries.
|
||||||
|
|
||||||
|
4. **Use `pydantic-settings` with `extra = "ignore"`** — When loading config from `.env` files that contain variables for other services (e.g., oauth2-proxy), pydantic-settings will reject unknown fields by default. Always set `extra = "ignore"` in the model config.
|
||||||
|
|
||||||
### Loki & Grafana Alignment
|
### Loki & Grafana Alignment
|
||||||
|
|
||||||
**Label normalization**: Alloy pipelines (syslog listeners and journal relabeling) MUST extract and forward a `level` label on every log line. Without a `level` label, the log entry is invisible to level-based dashboard filters and alert rules.
|
**Label normalization**: Alloy pipelines (syslog listeners and journal relabeling) MUST extract and forward a `level` label on every log line. Without a `level` label, the log entry is invisible to level-based dashboard filters and alert rules.
|
||||||
@@ -190,6 +206,19 @@ Frontend/browser code MUST report errors and performance data back to the server
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## Environment Variable Naming
|
||||||
|
|
||||||
|
All environment variables for an application MUST use a consistent prefix matching the service name (e.g., `PERIPLUS_`, `ARKE_`, `ANGELIA_`). This applies to every variable in the `.env` file, including those consumed by sidecar services like oauth2-proxy.
|
||||||
|
|
||||||
|
**Rules:**
|
||||||
|
- All vars in `.env` use the `SERVICENAME_` prefix — no exceptions
|
||||||
|
- `compose.yaml` maps prefixed vars to the sidecar's expected names (e.g., `OAUTH2_PROXY_CLIENT_ID: ${PERIPLUS_CASDOOR_CLIENT_ID}`)
|
||||||
|
- The application's Settings model SHOULD declare all prefixed vars, even those only consumed by sidecars, so the full configuration is documented in one place
|
||||||
|
- Every repo MUST include a `.env.example` with placeholder values for all required variables. Add `!.env.example` to `.gitignore` if a broad `.env.*` pattern would otherwise exclude it
|
||||||
|
- `.env` files with real secrets are ALWAYS gitignored — no exceptions
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Docker Networking
|
## Docker Networking
|
||||||
|
|
||||||
- Use the **default Docker bridge network** for simple deployments
|
- Use the **default Docker bridge network** for simple deployments
|
||||||
|
|||||||
Reference in New Issue
Block a user