Files
ouranos/docs/alloy.md
Robert Helewka acf3419450 refactor(ansible): rename freecad_mcp env vars and rework deployment
- Drop `FREECAD_MCP_` prefix from env vars (use `FREECAD_*`)
- Update freecad_mcp port from 22032 to 22061
- Document that FreeCAD bridge is required for tool calls
- Replace kottos deployment with pallas deployment
2026-05-30 09:37:56 -04:00

12 KiB

Alloy Log & Metric Collection

Grafana Alloy runs as a native systemd service (never in Docker) on every Ouranos host with alloy in its services list. It collects logs and forwards them to Loki on Prospero (http://prospero.incus:3100/loki/api/v1/push), and scrapes host/container metrics that it remote-writes to Prometheus on Prospero (http://prospero.incus:9090/api/v1/write).

Overview

  • Default config: ansible/alloy/config.alloy.j2 — journal-only fallback for hosts without a dedicated config.
  • Per-host config: ansible/alloy/<hostname_short>/config.alloy.j2 — overrides the default when present.
  • Selection: alloy/deploy.yml stat-checks <hostname_short>/config.alloy.j2 on the controller; if it exists, that template is rendered, otherwise the default is used.
  • Log destination: Loki on prospero.incus:3100 via loki.write "default".
  • Metric destination: Prometheus on prospero.incus:9090 via prometheus.remote_write "default".
  • Environment: every stream is labelled environment="{{ deployment_environment }}" (ouranos) and hostname="{{ inventory_hostname }}".
  • Deploy: ansible-playbook alloy/deploy.yml (optionally --limit <host>).

deploy.yml also adds the alloy user to the host's docker group when the host has docker in its services — this is what lets Alloy read /var/run/docker.sock for the Docker discovery and cAdvisor blocks below.

Log Sources

Ouranos collects logs through three mechanisms. New Dockerised services should use the Docker socket discovery path (preferred); the per-service syslog listener is the older pattern, still in use on several hosts.

1. Systemd journal (native services)

Every host includes a loki.source.journal component capturing all systemd unit output. By default journal entries are labelled job="systemd"; a loki.relabel component can promote specific units to a richer label set (see Journal relabeling).

This is the correct path for native systemd services (binaries managed by a .service unit) — they write to stdout/stderr, systemd captures it in the journal, and Alloy forwards it. No syslog port or log file needed.

2. Docker socket discovery (preferred for containers)

Reference implementation: ansible/alloy/puck/config.alloy.j2. Puck is currently the lead host for this pattern; other Docker hosts still use per-service syslog listeners and should migrate to this model over time.

A single pair of discovery.docker + loki.source.docker blocks collects stdout from every Compose project on the host, current and future — no per-service configuration. Container log streams are labelled from Docker's own Compose metadata:

  • service ← Compose project name (e.g. athena, mnemosyne, daedalus)
  • component ← Compose service name (e.g. app, mcp, nginx, worker)
  • container ← raw container name (for non-Compose docker run containers)
discovery.docker "containers" {
  host             = "unix:///var/run/docker.sock"
  refresh_interval = "30s"
}

discovery.relabel "containers" {
  targets = discovery.docker.containers.targets

  rule {                                   // Compose project → service
    source_labels = ["__meta_docker_container_label_com_docker_compose_project"]
    target_label  = "service"
  }
  rule {                                   // Compose service → component
    source_labels = ["__meta_docker_container_label_com_docker_compose_service"]
    target_label  = "component"
  }
  rule {                                   // container name (non-Compose)
    source_labels = ["__meta_docker_container_name"]
    regex         = "/(.*)"
    target_label  = "container"
  }
  rule {                                   // fall back to container name as service
    source_labels = ["service", "container"]
    separator     = "@"
    regex         = "@(.+)"
    target_label  = "service"
  }
}

loki.source.docker "containers" {
  host       = "unix:///var/run/docker.sock"
  targets    = discovery.relabel.containers.output
  forward_to = [loki.write.default.receiver]
  labels = {
    hostname    = "{{ inventory_hostname }}",
    environment = "{{ deployment_environment }}",
  }
}

Why this is preferred over syslog listeners:

  • Zero per-service wiring. Adding a new Compose project requires no Alloy change — it is discovered automatically and labelled by its project name.
  • No startup ordering hazard. It scrapes Docker's default json-file log driver, so containers never block on an Alloy listener being up (contrast the syslog driver, below).
  • Consistent {service, component} schema across apps, matching the Prometheus component label used by multi-target scrape jobs (app vs web).

Requirements:

  • The Compose project must use the default json-file log driver (i.e. it must not set logging: { driver: syslog }). The app must log to stdout.
  • The alloy user needs read access to /var/run/docker.sock (handled by deploy.yml adding it to the docker group on Docker hosts).
  • The service label is the Compose project name, which defaults to the deploy directory's basename. Confirm it (docker compose configname:) when an alert or dashboard depends on a specific service= selector.

3. Docker syslog driver (legacy, per-service)

The older pattern: each container ships logs via Docker's syslog driver to a dedicated Alloy loki.source.syslog listener on a localhost port, labelled with a static job.

loki.source.syslog "kairos_logs" {
  listener {
    address       = "127.0.0.1:{{ kairos_syslog_port }}"
    protocol      = "tcp"
    syslog_format = "{{ syslog_format }}"     // rfc3164
    labels = {
      job         = "kairos",
      hostname    = "{{ inventory_hostname }}",
      environment = "{{ deployment_environment }}",
    }
  }
  forward_to = [loki.write.default.receiver]
}

Container side, in the service's docker-compose.yml.j2:

logging:
  driver: syslog
  options:
    syslog-address: "tcp://127.0.0.1:{{ kairos_syslog_port }}"
    syslog-format: "{{ syslog_format | default('rfc3164') }}"

Ports follow the 514XX convention and live in the host's host_vars.

⚠️ Ordering hazard. The listener must exist before the container starts. If docker compose up runs while the Alloy listener is not bound, the container fails immediately with failed to initialize logging driver: dial tcp 127.0.0.1:<port>: connect: connection refused. Deploy/verify Alloy on the host before deploying a syslog-driver service. This hazard is the main reason new services should prefer the Docker-socket path instead.

Note — labels differ between the two Docker paths. The syslog listener sets job="<service>" (no service/component). The Docker-socket block sets service="<project>" + component="<compose service>" (no job). When migrating a service off syslog, update any dashboards or alert annotations that filter on {job="…"} to use {service="…"}.

Journal relabeling (native services)

By default all journal entries share job="systemd", making per-service filtering impossible. A loki.relabel component overrides labels based on the systemd unit. The journal source forwards to the relabel component instead of directly to loki.write.

loki.source.journal "systemd_logs" {
  forward_to    = [loki.write.default.receiver]
  relabel_rules = loki.relabel.journal_puck.rules
  labels = {
    hostname    = "{{ inventory_hostname }}",
    environment = "{{ deployment_environment }}",
  }
}

loki.relabel "journal_puck" {
  forward_to = []

  rule {                                   // Pallas runtime → service/project schema
    source_labels = ["__journal_syslog_identifier"]
    regex         = "kottos"
    target_label  = "service"
    replacement   = "pallas"
  }

  rule {                                   // default fallback
    source_labels = ["__journal__systemd_unit"]
    regex         = ".+"
    target_label  = "job"
    replacement   = "systemd"
  }
}

Rules run top-to-bottom; the first match per target_label wins, so the generic systemd fallback stays last. Escape dots in unit regexes (alloy\\.service). The __journal_* fields are hidden metadata — used for relabeling, not shipped to Loki.

Metrics

On Docker hosts the per-host config also scrapes host and container metrics and remote-writes them to Prometheus (Alloy is the push agent; Prometheus does not scrape these hosts directly):

  • prometheus.exporter.unix — node metrics (Incus-safe collectors only).
  • prometheus.exporter.processnamedprocess_namegroup_* per command.
  • prometheus.exporter.cadvisorcontainer_* metrics via the Docker socket.

These feed prometheus.scrape (job_name = the host, e.g. puck) → prometheus.relabel (adds instance=<hostname>) → prometheus.remote_writeprospero.incus:9090.

Application /metrics endpoints (e.g. django-prometheus, the nginx-prometheus-exporter sidecar) are not scraped by Alloy. Prometheus on Prospero scrapes those directly — see pplg/prometheus.yml.j2.

Current inventory

Hosts using Docker socket discovery

Host Block Notes
puck discovery.docker + loki.source.docker "containers" Reference implementation. Covers all Compose projects (athena, mnemosyne, daedalus, kairos, …) as service/component.

Hosts using per-service syslog listeners

Host Services (job labels)
puck angelia, kairos, spelunker, jupyterlab (transitional — see below)
miranda argos, neo4j-cypher, grafana_mcp, gitea-mcp, searxng
oberon rabbitmq, smtp4dev
rosalind gitea, hass, lobechat, jellyfin, searxng (+ apache log files)
titania casdoor, haproxy
ariel, umbriel neo4j

Transitional state on puck

athena, mnemosyne, and daedalus have migrated off their syslog listeners to the Docker-socket block; their old *_syslog_port host_vars are retained as reserved-but-unused and can be removed once each rollout is verified. The remaining puck syslog listeners (angelia, kairos, spelunker, jupyterlab) are candidates to migrate the same way.

Querying in Grafana

# All Athena container logs (any component)
{service="athena"}

# Just the Athena MCP container
{service="athena", component="mcp"}

# Superuser-login forensic line behind the DjangoSuperuserLogin alert
{service="athena"} |= "event=superuser_login"

# A syslog-driver service (legacy label scheme)
{job="kairos"}

# Errors across everything on one host
{hostname="puck.incus"} |~ "(?i)error"

Adding a new Dockerised service

Preferred (Docker socket — no Alloy change needed):

  1. Ensure the service's Compose project uses the default json-file log driver (do not set logging: { driver: syslog }) and the app logs to stdout.
  2. Confirm the host's per-host Alloy config has the discovery.docker + loki.source.docker blocks (currently puck). If not, add them once (copy from puck/config.alloy.j2).
  3. Deploy the service. Verify in Grafana: {service="<compose-project>"} returns entries, with component=<compose-service>.

Legacy (syslog driver — only if the host has no Docker-socket block):

  1. Allocate a 514XX syslog port in the host's host_vars.
  2. Add a loki.source.syslog block to ansible/alloy/<host>/config.alloy.j2.
  3. Add the syslog logging driver to the service's docker-compose.yml.j2.
  4. Deploy Alloy first, then the service.
  5. Verify: {job="<label>", hostname="<host>"} returns entries.

Red Panda Seal of Approval 🐼