# Alloy Log & Metric Collection Grafana Alloy runs as a **native systemd service** (never in Docker) on every Ouranos host with `alloy` in its `services` list. It collects logs and forwards them to **Loki on Prospero** (`http://prospero.incus:3100/loki/api/v1/push`), and scrapes host/container metrics that it **remote-writes** to **Prometheus on Prospero** (`http://prospero.incus:9090/api/v1/write`). ## Overview - **Default config:** [`ansible/alloy/config.alloy.j2`](../ansible/alloy/config.alloy.j2) — journal-only fallback for hosts without a dedicated config. - **Per-host config:** [`ansible/alloy//config.alloy.j2`](../ansible/alloy/) — overrides the default when present. - **Selection:** [`alloy/deploy.yml`](../ansible/alloy/deploy.yml) stat-checks `/config.alloy.j2` on the controller; if it exists, that template is rendered, otherwise the default is used. - **Log destination:** Loki on `prospero.incus:3100` via `loki.write "default"`. - **Metric destination:** Prometheus on `prospero.incus:9090` via `prometheus.remote_write "default"`. - **Environment:** every stream is labelled `environment="{{ deployment_environment }}"` (`ouranos`) and `hostname="{{ inventory_hostname }}"`. - **Deploy:** `ansible-playbook alloy/deploy.yml` (optionally `--limit `). `deploy.yml` also adds the `alloy` user to the host's `docker` group when the host has `docker` in its services — this is what lets Alloy read `/var/run/docker.sock` for the Docker discovery and cAdvisor blocks below. ## Log Sources Ouranos collects logs through three mechanisms. New Dockerised services should use the **Docker socket discovery** path (preferred); the per-service syslog listener is the older pattern, still in use on several hosts. ### 1. Systemd journal (native services) Every host includes a `loki.source.journal` component capturing all systemd unit output. By default journal entries are labelled `job="systemd"`; a `loki.relabel` component can promote specific units to a richer label set (see [Journal relabeling](#journal-relabeling-native-services)). This is the correct path for **native systemd services** (binaries managed by a `.service` unit) — they write to stdout/stderr, systemd captures it in the journal, and Alloy forwards it. No syslog port or log file needed. ### 2. Docker socket discovery (preferred for containers) > **Reference implementation:** [`ansible/alloy/puck/config.alloy.j2`](../ansible/alloy/puck/config.alloy.j2). > Puck is currently the lead host for this pattern; other Docker hosts still use > per-service syslog listeners and should migrate to this model over time. A **single** pair of `discovery.docker` + `loki.source.docker` blocks collects stdout from **every Compose project on the host**, current and future — no per-service configuration. Container log streams are labelled from Docker's own Compose metadata: - `service` ← Compose **project** name (e.g. `athena`, `mnemosyne`, `daedalus`) - `component` ← Compose **service** name (e.g. `app`, `mcp`, `nginx`, `worker`) - `container` ← raw container name (for non-Compose `docker run` containers) ```alloy discovery.docker "containers" { host = "unix:///var/run/docker.sock" refresh_interval = "30s" } discovery.relabel "containers" { targets = discovery.docker.containers.targets rule { // Compose project → service source_labels = ["__meta_docker_container_label_com_docker_compose_project"] target_label = "service" } rule { // Compose service → component source_labels = ["__meta_docker_container_label_com_docker_compose_service"] target_label = "component" } rule { // container name (non-Compose) source_labels = ["__meta_docker_container_name"] regex = "/(.*)" target_label = "container" } rule { // fall back to container name as service source_labels = ["service", "container"] separator = "@" regex = "@(.+)" target_label = "service" } } loki.source.docker "containers" { host = "unix:///var/run/docker.sock" targets = discovery.relabel.containers.output forward_to = [loki.write.default.receiver] labels = { hostname = "{{ inventory_hostname }}", environment = "{{ deployment_environment }}", } } ``` **Why this is preferred over syslog listeners:** - **Zero per-service wiring.** Adding a new Compose project requires no Alloy change — it is discovered automatically and labelled by its project name. - **No startup ordering hazard.** It scrapes Docker's default `json-file` log driver, so containers never block on an Alloy listener being up (contrast the syslog driver, below). - **Consistent `{service, component}` schema** across apps, matching the Prometheus `component` label used by multi-target scrape jobs (app vs web). **Requirements:** - The Compose project must use the default **`json-file`** log driver (i.e. it must *not* set `logging: { driver: syslog }`). The app must log to **stdout**. - The `alloy` user needs read access to `/var/run/docker.sock` (handled by `deploy.yml` adding it to the `docker` group on Docker hosts). - The `service` label is the **Compose project name**, which defaults to the deploy directory's basename. Confirm it (`docker compose config` → `name:`) when an alert or dashboard depends on a specific `service=` selector. ### 3. Docker syslog driver (legacy, per-service) The older pattern: each container ships logs via Docker's `syslog` driver to a dedicated Alloy `loki.source.syslog` listener on a localhost port, labelled with a static `job`. ```alloy loki.source.syslog "kairos_logs" { listener { address = "127.0.0.1:{{ kairos_syslog_port }}" protocol = "tcp" syslog_format = "{{ syslog_format }}" // rfc3164 labels = { job = "kairos", hostname = "{{ inventory_hostname }}", environment = "{{ deployment_environment }}", } } forward_to = [loki.write.default.receiver] } ``` Container side, in the service's `docker-compose.yml.j2`: ```yaml logging: driver: syslog options: syslog-address: "tcp://127.0.0.1:{{ kairos_syslog_port }}" syslog-format: "{{ syslog_format | default('rfc3164') }}" ``` Ports follow the `514XX` convention and live in the host's `host_vars`. > ⚠️ **Ordering hazard.** The listener must exist before the container starts. > If `docker compose up` runs while the Alloy listener is not bound, the > container fails immediately with `failed to initialize logging driver: dial > tcp 127.0.0.1:: connect: connection refused`. Deploy/verify Alloy on the > host *before* deploying a syslog-driver service. This hazard is the main > reason new services should prefer the Docker-socket path instead. > **Note — labels differ between the two Docker paths.** The syslog listener > sets `job=""` (no `service`/`component`). The Docker-socket block > sets `service=""` + `component=""` (no `job`). When > migrating a service off syslog, update any dashboards or alert annotations > that filter on `{job="…"}` to use `{service="…"}`. ## Journal relabeling (native services) By default all journal entries share `job="systemd"`, making per-service filtering impossible. A `loki.relabel` component overrides labels based on the systemd unit. The journal source forwards to the relabel component instead of directly to `loki.write`. ```alloy loki.source.journal "systemd_logs" { forward_to = [loki.write.default.receiver] relabel_rules = loki.relabel.journal_puck.rules labels = { hostname = "{{ inventory_hostname }}", environment = "{{ deployment_environment }}", } } loki.relabel "journal_puck" { forward_to = [] rule { // Pallas runtime → service/project schema source_labels = ["__journal_syslog_identifier"] regex = "kottos" target_label = "service" replacement = "pallas" } rule { // default fallback source_labels = ["__journal__systemd_unit"] regex = ".+" target_label = "job" replacement = "systemd" } } ``` Rules run top-to-bottom; the first match per `target_label` wins, so the generic `systemd` fallback stays **last**. Escape dots in unit regexes (`alloy\\.service`). The `__journal_*` fields are hidden metadata — used for relabeling, not shipped to Loki. ## Metrics On Docker hosts the per-host config also scrapes host and container metrics and **remote-writes** them to Prometheus (Alloy is the push agent; Prometheus does not scrape these hosts directly): - `prometheus.exporter.unix` — node metrics (Incus-safe collectors only). - `prometheus.exporter.process` — `namedprocess_namegroup_*` per command. - `prometheus.exporter.cadvisor` — `container_*` metrics via the Docker socket. These feed `prometheus.scrape` (`job_name` = the host, e.g. `puck`) → `prometheus.relabel` (adds `instance=`) → `prometheus.remote_write` → `prospero.incus:9090`. > Application `/metrics` endpoints (e.g. django-prometheus, the > nginx-prometheus-exporter sidecar) are **not** scraped by Alloy. Prometheus on > Prospero scrapes those directly — see > [`pplg/prometheus.yml.j2`](../ansible/pplg/prometheus.yml.j2). ## Current inventory ### Hosts using Docker socket discovery | Host | Block | Notes | |------|-------|-------| | `puck` | `discovery.docker` + `loki.source.docker "containers"` | Reference implementation. Covers all Compose projects (athena, mnemosyne, daedalus, kairos, …) as `service`/`component`. | ### Hosts using per-service syslog listeners | Host | Services (job labels) | |------|-----------------------| | `puck` | angelia, kairos, spelunker, jupyterlab *(transitional — see below)* | | `miranda` | argos, neo4j-cypher, grafana_mcp, gitea-mcp, searxng | | `oberon` | rabbitmq, smtp4dev | | `rosalind` | gitea, hass, lobechat, jellyfin, searxng (+ apache log files) | | `titania` | casdoor, haproxy | | `ariel`, `umbriel` | neo4j | ### Transitional state on puck `athena`, `mnemosyne`, and `daedalus` have **migrated off** their syslog listeners to the Docker-socket block; their old `*_syslog_port` host_vars are retained as reserved-but-unused and can be removed once each rollout is verified. The remaining `puck` syslog listeners (angelia, kairos, spelunker, jupyterlab) are candidates to migrate the same way. ## Querying in Grafana ```logql # All Athena container logs (any component) {service="athena"} # Just the Athena MCP container {service="athena", component="mcp"} # Superuser-login forensic line behind the DjangoSuperuserLogin alert {service="athena"} |= "event=superuser_login" # A syslog-driver service (legacy label scheme) {job="kairos"} # Errors across everything on one host {hostname="puck.incus"} |~ "(?i)error" ``` ## Adding a new Dockerised service **Preferred (Docker socket — no Alloy change needed):** 1. Ensure the service's Compose project uses the default `json-file` log driver (do **not** set `logging: { driver: syslog }`) and the app logs to stdout. 2. Confirm the host's per-host Alloy config has the `discovery.docker` + `loki.source.docker` blocks (currently `puck`). If not, add them once (copy from [`puck/config.alloy.j2`](../ansible/alloy/puck/config.alloy.j2)). 3. Deploy the service. Verify in Grafana: `{service=""}` returns entries, with `component=`. **Legacy (syslog driver — only if the host has no Docker-socket block):** 1. Allocate a `514XX` syslog port in the host's `host_vars`. 2. Add a `loki.source.syslog` block to `ansible/alloy//config.alloy.j2`. 3. Add the `syslog` logging driver to the service's `docker-compose.yml.j2`. 4. **Deploy Alloy first**, then the service. 5. Verify: `{job="