- Drop `FREECAD_MCP_` prefix from env vars (use `FREECAD_*`) - Update freecad_mcp port from 22032 to 22061 - Document that FreeCAD bridge is required for tool calls - Replace kottos deployment with pallas deployment
12 KiB
Alloy Log & Metric Collection
Grafana Alloy runs as a native systemd service (never in Docker) on every
Ouranos host with alloy in its services list. It collects logs and forwards
them to Loki on Prospero (http://prospero.incus:3100/loki/api/v1/push),
and scrapes host/container metrics that it remote-writes to Prometheus on
Prospero (http://prospero.incus:9090/api/v1/write).
Overview
- Default config:
ansible/alloy/config.alloy.j2— journal-only fallback for hosts without a dedicated config. - Per-host config:
ansible/alloy/<hostname_short>/config.alloy.j2— overrides the default when present. - Selection:
alloy/deploy.ymlstat-checks<hostname_short>/config.alloy.j2on the controller; if it exists, that template is rendered, otherwise the default is used. - Log destination: Loki on
prospero.incus:3100vialoki.write "default". - Metric destination: Prometheus on
prospero.incus:9090viaprometheus.remote_write "default". - Environment: every stream is labelled
environment="{{ deployment_environment }}"(ouranos) andhostname="{{ inventory_hostname }}". - Deploy:
ansible-playbook alloy/deploy.yml(optionally--limit <host>).
deploy.yml also adds the alloy user to the host's docker group when the
host has docker in its services — this is what lets Alloy read
/var/run/docker.sock for the Docker discovery and cAdvisor blocks below.
Log Sources
Ouranos collects logs through three mechanisms. New Dockerised services should use the Docker socket discovery path (preferred); the per-service syslog listener is the older pattern, still in use on several hosts.
1. Systemd journal (native services)
Every host includes a loki.source.journal component capturing all systemd
unit output. By default journal entries are labelled job="systemd"; a
loki.relabel component can promote specific units to a richer label set (see
Journal relabeling).
This is the correct path for native systemd services (binaries managed by a
.service unit) — they write to stdout/stderr, systemd captures it in the
journal, and Alloy forwards it. No syslog port or log file needed.
2. Docker socket discovery (preferred for containers)
Reference implementation:
ansible/alloy/puck/config.alloy.j2. Puck is currently the lead host for this pattern; other Docker hosts still use per-service syslog listeners and should migrate to this model over time.
A single pair of discovery.docker + loki.source.docker blocks collects
stdout from every Compose project on the host, current and future — no
per-service configuration. Container log streams are labelled from Docker's own
Compose metadata:
service← Compose project name (e.g.athena,mnemosyne,daedalus)component← Compose service name (e.g.app,mcp,nginx,worker)container← raw container name (for non-Composedocker runcontainers)
discovery.docker "containers" {
host = "unix:///var/run/docker.sock"
refresh_interval = "30s"
}
discovery.relabel "containers" {
targets = discovery.docker.containers.targets
rule { // Compose project → service
source_labels = ["__meta_docker_container_label_com_docker_compose_project"]
target_label = "service"
}
rule { // Compose service → component
source_labels = ["__meta_docker_container_label_com_docker_compose_service"]
target_label = "component"
}
rule { // container name (non-Compose)
source_labels = ["__meta_docker_container_name"]
regex = "/(.*)"
target_label = "container"
}
rule { // fall back to container name as service
source_labels = ["service", "container"]
separator = "@"
regex = "@(.+)"
target_label = "service"
}
}
loki.source.docker "containers" {
host = "unix:///var/run/docker.sock"
targets = discovery.relabel.containers.output
forward_to = [loki.write.default.receiver]
labels = {
hostname = "{{ inventory_hostname }}",
environment = "{{ deployment_environment }}",
}
}
Why this is preferred over syslog listeners:
- Zero per-service wiring. Adding a new Compose project requires no Alloy change — it is discovered automatically and labelled by its project name.
- No startup ordering hazard. It scrapes Docker's default
json-filelog driver, so containers never block on an Alloy listener being up (contrast the syslog driver, below). - Consistent
{service, component}schema across apps, matching the Prometheuscomponentlabel used by multi-target scrape jobs (app vs web).
Requirements:
- The Compose project must use the default
json-filelog driver (i.e. it must not setlogging: { driver: syslog }). The app must log to stdout. - The
alloyuser needs read access to/var/run/docker.sock(handled bydeploy.ymladding it to thedockergroup on Docker hosts). - The
servicelabel is the Compose project name, which defaults to the deploy directory's basename. Confirm it (docker compose config→name:) when an alert or dashboard depends on a specificservice=selector.
3. Docker syslog driver (legacy, per-service)
The older pattern: each container ships logs via Docker's syslog driver to a
dedicated Alloy loki.source.syslog listener on a localhost port, labelled with
a static job.
loki.source.syslog "kairos_logs" {
listener {
address = "127.0.0.1:{{ kairos_syslog_port }}"
protocol = "tcp"
syslog_format = "{{ syslog_format }}" // rfc3164
labels = {
job = "kairos",
hostname = "{{ inventory_hostname }}",
environment = "{{ deployment_environment }}",
}
}
forward_to = [loki.write.default.receiver]
}
Container side, in the service's docker-compose.yml.j2:
logging:
driver: syslog
options:
syslog-address: "tcp://127.0.0.1:{{ kairos_syslog_port }}"
syslog-format: "{{ syslog_format | default('rfc3164') }}"
Ports follow the 514XX convention and live in the host's host_vars.
⚠️ Ordering hazard. The listener must exist before the container starts. If
docker compose upruns while the Alloy listener is not bound, the container fails immediately withfailed to initialize logging driver: dial tcp 127.0.0.1:<port>: connect: connection refused. Deploy/verify Alloy on the host before deploying a syslog-driver service. This hazard is the main reason new services should prefer the Docker-socket path instead.
Note — labels differ between the two Docker paths. The syslog listener sets
job="<service>"(noservice/component). The Docker-socket block setsservice="<project>"+component="<compose service>"(nojob). When migrating a service off syslog, update any dashboards or alert annotations that filter on{job="…"}to use{service="…"}.
Journal relabeling (native services)
By default all journal entries share job="systemd", making per-service
filtering impossible. A loki.relabel component overrides labels based on the
systemd unit. The journal source forwards to the relabel component instead of
directly to loki.write.
loki.source.journal "systemd_logs" {
forward_to = [loki.write.default.receiver]
relabel_rules = loki.relabel.journal_puck.rules
labels = {
hostname = "{{ inventory_hostname }}",
environment = "{{ deployment_environment }}",
}
}
loki.relabel "journal_puck" {
forward_to = []
rule { // Pallas runtime → service/project schema
source_labels = ["__journal_syslog_identifier"]
regex = "kottos"
target_label = "service"
replacement = "pallas"
}
rule { // default fallback
source_labels = ["__journal__systemd_unit"]
regex = ".+"
target_label = "job"
replacement = "systemd"
}
}
Rules run top-to-bottom; the first match per target_label wins, so the
generic systemd fallback stays last. Escape dots in unit regexes
(alloy\\.service). The __journal_* fields are hidden metadata — used for
relabeling, not shipped to Loki.
Metrics
On Docker hosts the per-host config also scrapes host and container metrics and remote-writes them to Prometheus (Alloy is the push agent; Prometheus does not scrape these hosts directly):
prometheus.exporter.unix— node metrics (Incus-safe collectors only).prometheus.exporter.process—namedprocess_namegroup_*per command.prometheus.exporter.cadvisor—container_*metrics via the Docker socket.
These feed prometheus.scrape (job_name = the host, e.g. puck) →
prometheus.relabel (adds instance=<hostname>) →
prometheus.remote_write → prospero.incus:9090.
Application
/metricsendpoints (e.g. django-prometheus, the nginx-prometheus-exporter sidecar) are not scraped by Alloy. Prometheus on Prospero scrapes those directly — seepplg/prometheus.yml.j2.
Current inventory
Hosts using Docker socket discovery
| Host | Block | Notes |
|---|---|---|
puck |
discovery.docker + loki.source.docker "containers" |
Reference implementation. Covers all Compose projects (athena, mnemosyne, daedalus, kairos, …) as service/component. |
Hosts using per-service syslog listeners
| Host | Services (job labels) |
|---|---|
puck |
angelia, kairos, spelunker, jupyterlab (transitional — see below) |
miranda |
argos, neo4j-cypher, grafana_mcp, gitea-mcp, searxng |
oberon |
rabbitmq, smtp4dev |
rosalind |
gitea, hass, lobechat, jellyfin, searxng (+ apache log files) |
titania |
casdoor, haproxy |
ariel, umbriel |
neo4j |
Transitional state on puck
athena, mnemosyne, and daedalus have migrated off their syslog
listeners to the Docker-socket block; their old *_syslog_port host_vars are
retained as reserved-but-unused and can be removed once each rollout is
verified. The remaining puck syslog listeners (angelia, kairos, spelunker,
jupyterlab) are candidates to migrate the same way.
Querying in Grafana
# All Athena container logs (any component)
{service="athena"}
# Just the Athena MCP container
{service="athena", component="mcp"}
# Superuser-login forensic line behind the DjangoSuperuserLogin alert
{service="athena"} |= "event=superuser_login"
# A syslog-driver service (legacy label scheme)
{job="kairos"}
# Errors across everything on one host
{hostname="puck.incus"} |~ "(?i)error"
Adding a new Dockerised service
Preferred (Docker socket — no Alloy change needed):
- Ensure the service's Compose project uses the default
json-filelog driver (do not setlogging: { driver: syslog }) and the app logs to stdout. - Confirm the host's per-host Alloy config has the
discovery.docker+loki.source.dockerblocks (currentlypuck). If not, add them once (copy frompuck/config.alloy.j2). - Deploy the service. Verify in Grafana:
{service="<compose-project>"}returns entries, withcomponent=<compose-service>.
Legacy (syslog driver — only if the host has no Docker-socket block):
- Allocate a
514XXsyslog port in the host'shost_vars. - Add a
loki.source.syslogblock toansible/alloy/<host>/config.alloy.j2. - Add the
sysloglogging driver to the service'sdocker-compose.yml.j2. - Deploy Alloy first, then the service.
- Verify:
{job="<label>", hostname="<host>"}returns entries.