refactor(ansible): rename freecad_mcp env vars and rework deployment
- Drop `FREECAD_MCP_` prefix from env vars (use `FREECAD_*`) - Update freecad_mcp port from 22032 to 22061 - Document that FreeCAD bridge is required for tool calls - Replace kottos deployment with pallas deployment
This commit is contained in:
289
docs/alloy.md
Normal file
289
docs/alloy.md
Normal file
@@ -0,0 +1,289 @@
|
||||
# Alloy Log & Metric Collection
|
||||
|
||||
Grafana Alloy runs as a **native systemd service** (never in Docker) on every
|
||||
Ouranos host with `alloy` in its `services` list. It collects logs and forwards
|
||||
them to **Loki on Prospero** (`http://prospero.incus:3100/loki/api/v1/push`),
|
||||
and scrapes host/container metrics that it **remote-writes** to **Prometheus on
|
||||
Prospero** (`http://prospero.incus:9090/api/v1/write`).
|
||||
|
||||
## Overview
|
||||
|
||||
- **Default config:** [`ansible/alloy/config.alloy.j2`](../ansible/alloy/config.alloy.j2) — journal-only fallback for hosts without a dedicated config.
|
||||
- **Per-host config:** [`ansible/alloy/<hostname_short>/config.alloy.j2`](../ansible/alloy/) — overrides the default when present.
|
||||
- **Selection:** [`alloy/deploy.yml`](../ansible/alloy/deploy.yml) stat-checks `<hostname_short>/config.alloy.j2` on the controller; if it exists, that template is rendered, otherwise the default is used.
|
||||
- **Log destination:** Loki on `prospero.incus:3100` via `loki.write "default"`.
|
||||
- **Metric destination:** Prometheus on `prospero.incus:9090` via `prometheus.remote_write "default"`.
|
||||
- **Environment:** every stream is labelled `environment="{{ deployment_environment }}"` (`ouranos`) and `hostname="{{ inventory_hostname }}"`.
|
||||
- **Deploy:** `ansible-playbook alloy/deploy.yml` (optionally `--limit <host>`).
|
||||
|
||||
`deploy.yml` also adds the `alloy` user to the host's `docker` group when the
|
||||
host has `docker` in its services — this is what lets Alloy read
|
||||
`/var/run/docker.sock` for the Docker discovery and cAdvisor blocks below.
|
||||
|
||||
## Log Sources
|
||||
|
||||
Ouranos collects logs through three mechanisms. New Dockerised services should
|
||||
use the **Docker socket discovery** path (preferred); the per-service syslog
|
||||
listener is the older pattern, still in use on several hosts.
|
||||
|
||||
### 1. Systemd journal (native services)
|
||||
|
||||
Every host includes a `loki.source.journal` component capturing all systemd
|
||||
unit output. By default journal entries are labelled `job="systemd"`; a
|
||||
`loki.relabel` component can promote specific units to a richer label set (see
|
||||
[Journal relabeling](#journal-relabeling-native-services)).
|
||||
|
||||
This is the correct path for **native systemd services** (binaries managed by a
|
||||
`.service` unit) — they write to stdout/stderr, systemd captures it in the
|
||||
journal, and Alloy forwards it. No syslog port or log file needed.
|
||||
|
||||
### 2. Docker socket discovery (preferred for containers)
|
||||
|
||||
> **Reference implementation:** [`ansible/alloy/puck/config.alloy.j2`](../ansible/alloy/puck/config.alloy.j2).
|
||||
> Puck is currently the lead host for this pattern; other Docker hosts still use
|
||||
> per-service syslog listeners and should migrate to this model over time.
|
||||
|
||||
A **single** pair of `discovery.docker` + `loki.source.docker` blocks collects
|
||||
stdout from **every Compose project on the host**, current and future — no
|
||||
per-service configuration. Container log streams are labelled from Docker's own
|
||||
Compose metadata:
|
||||
|
||||
- `service` ← Compose **project** name (e.g. `athena`, `mnemosyne`, `daedalus`)
|
||||
- `component` ← Compose **service** name (e.g. `app`, `mcp`, `nginx`, `worker`)
|
||||
- `container` ← raw container name (for non-Compose `docker run` containers)
|
||||
|
||||
```alloy
|
||||
discovery.docker "containers" {
|
||||
host = "unix:///var/run/docker.sock"
|
||||
refresh_interval = "30s"
|
||||
}
|
||||
|
||||
discovery.relabel "containers" {
|
||||
targets = discovery.docker.containers.targets
|
||||
|
||||
rule { // Compose project → service
|
||||
source_labels = ["__meta_docker_container_label_com_docker_compose_project"]
|
||||
target_label = "service"
|
||||
}
|
||||
rule { // Compose service → component
|
||||
source_labels = ["__meta_docker_container_label_com_docker_compose_service"]
|
||||
target_label = "component"
|
||||
}
|
||||
rule { // container name (non-Compose)
|
||||
source_labels = ["__meta_docker_container_name"]
|
||||
regex = "/(.*)"
|
||||
target_label = "container"
|
||||
}
|
||||
rule { // fall back to container name as service
|
||||
source_labels = ["service", "container"]
|
||||
separator = "@"
|
||||
regex = "@(.+)"
|
||||
target_label = "service"
|
||||
}
|
||||
}
|
||||
|
||||
loki.source.docker "containers" {
|
||||
host = "unix:///var/run/docker.sock"
|
||||
targets = discovery.relabel.containers.output
|
||||
forward_to = [loki.write.default.receiver]
|
||||
labels = {
|
||||
hostname = "{{ inventory_hostname }}",
|
||||
environment = "{{ deployment_environment }}",
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Why this is preferred over syslog listeners:**
|
||||
|
||||
- **Zero per-service wiring.** Adding a new Compose project requires no Alloy
|
||||
change — it is discovered automatically and labelled by its project name.
|
||||
- **No startup ordering hazard.** It scrapes Docker's default `json-file` log
|
||||
driver, so containers never block on an Alloy listener being up (contrast the
|
||||
syslog driver, below).
|
||||
- **Consistent `{service, component}` schema** across apps, matching the
|
||||
Prometheus `component` label used by multi-target scrape jobs (app vs web).
|
||||
|
||||
**Requirements:**
|
||||
|
||||
- The Compose project must use the default **`json-file`** log driver (i.e. it
|
||||
must *not* set `logging: { driver: syslog }`). The app must log to **stdout**.
|
||||
- The `alloy` user needs read access to `/var/run/docker.sock` (handled by
|
||||
`deploy.yml` adding it to the `docker` group on Docker hosts).
|
||||
- The `service` label is the **Compose project name**, which defaults to the
|
||||
deploy directory's basename. Confirm it (`docker compose config` → `name:`)
|
||||
when an alert or dashboard depends on a specific `service=` selector.
|
||||
|
||||
### 3. Docker syslog driver (legacy, per-service)
|
||||
|
||||
The older pattern: each container ships logs via Docker's `syslog` driver to a
|
||||
dedicated Alloy `loki.source.syslog` listener on a localhost port, labelled with
|
||||
a static `job`.
|
||||
|
||||
```alloy
|
||||
loki.source.syslog "kairos_logs" {
|
||||
listener {
|
||||
address = "127.0.0.1:{{ kairos_syslog_port }}"
|
||||
protocol = "tcp"
|
||||
syslog_format = "{{ syslog_format }}" // rfc3164
|
||||
labels = {
|
||||
job = "kairos",
|
||||
hostname = "{{ inventory_hostname }}",
|
||||
environment = "{{ deployment_environment }}",
|
||||
}
|
||||
}
|
||||
forward_to = [loki.write.default.receiver]
|
||||
}
|
||||
```
|
||||
|
||||
Container side, in the service's `docker-compose.yml.j2`:
|
||||
|
||||
```yaml
|
||||
logging:
|
||||
driver: syslog
|
||||
options:
|
||||
syslog-address: "tcp://127.0.0.1:{{ kairos_syslog_port }}"
|
||||
syslog-format: "{{ syslog_format | default('rfc3164') }}"
|
||||
```
|
||||
|
||||
Ports follow the `514XX` convention and live in the host's `host_vars`.
|
||||
|
||||
> ⚠️ **Ordering hazard.** The listener must exist before the container starts.
|
||||
> If `docker compose up` runs while the Alloy listener is not bound, the
|
||||
> container fails immediately with `failed to initialize logging driver: dial
|
||||
> tcp 127.0.0.1:<port>: connect: connection refused`. Deploy/verify Alloy on the
|
||||
> host *before* deploying a syslog-driver service. This hazard is the main
|
||||
> reason new services should prefer the Docker-socket path instead.
|
||||
|
||||
> **Note — labels differ between the two Docker paths.** The syslog listener
|
||||
> sets `job="<service>"` (no `service`/`component`). The Docker-socket block
|
||||
> sets `service="<project>"` + `component="<compose service>"` (no `job`). When
|
||||
> migrating a service off syslog, update any dashboards or alert annotations
|
||||
> that filter on `{job="…"}` to use `{service="…"}`.
|
||||
|
||||
## Journal relabeling (native services)
|
||||
|
||||
By default all journal entries share `job="systemd"`, making per-service
|
||||
filtering impossible. A `loki.relabel` component overrides labels based on the
|
||||
systemd unit. The journal source forwards to the relabel component instead of
|
||||
directly to `loki.write`.
|
||||
|
||||
```alloy
|
||||
loki.source.journal "systemd_logs" {
|
||||
forward_to = [loki.write.default.receiver]
|
||||
relabel_rules = loki.relabel.journal_puck.rules
|
||||
labels = {
|
||||
hostname = "{{ inventory_hostname }}",
|
||||
environment = "{{ deployment_environment }}",
|
||||
}
|
||||
}
|
||||
|
||||
loki.relabel "journal_puck" {
|
||||
forward_to = []
|
||||
|
||||
rule { // Pallas runtime → service/project schema
|
||||
source_labels = ["__journal_syslog_identifier"]
|
||||
regex = "kottos"
|
||||
target_label = "service"
|
||||
replacement = "pallas"
|
||||
}
|
||||
|
||||
rule { // default fallback
|
||||
source_labels = ["__journal__systemd_unit"]
|
||||
regex = ".+"
|
||||
target_label = "job"
|
||||
replacement = "systemd"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Rules run top-to-bottom; the first match per `target_label` wins, so the
|
||||
generic `systemd` fallback stays **last**. Escape dots in unit regexes
|
||||
(`alloy\\.service`). The `__journal_*` fields are hidden metadata — used for
|
||||
relabeling, not shipped to Loki.
|
||||
|
||||
## Metrics
|
||||
|
||||
On Docker hosts the per-host config also scrapes host and container metrics and
|
||||
**remote-writes** them to Prometheus (Alloy is the push agent; Prometheus does
|
||||
not scrape these hosts directly):
|
||||
|
||||
- `prometheus.exporter.unix` — node metrics (Incus-safe collectors only).
|
||||
- `prometheus.exporter.process` — `namedprocess_namegroup_*` per command.
|
||||
- `prometheus.exporter.cadvisor` — `container_*` metrics via the Docker socket.
|
||||
|
||||
These feed `prometheus.scrape` (`job_name` = the host, e.g. `puck`) →
|
||||
`prometheus.relabel` (adds `instance=<hostname>`) →
|
||||
`prometheus.remote_write` → `prospero.incus:9090`.
|
||||
|
||||
> Application `/metrics` endpoints (e.g. django-prometheus, the
|
||||
> nginx-prometheus-exporter sidecar) are **not** scraped by Alloy. Prometheus on
|
||||
> Prospero scrapes those directly — see
|
||||
> [`pplg/prometheus.yml.j2`](../ansible/pplg/prometheus.yml.j2).
|
||||
|
||||
## Current inventory
|
||||
|
||||
### Hosts using Docker socket discovery
|
||||
|
||||
| Host | Block | Notes |
|
||||
|------|-------|-------|
|
||||
| `puck` | `discovery.docker` + `loki.source.docker "containers"` | Reference implementation. Covers all Compose projects (athena, mnemosyne, daedalus, kairos, …) as `service`/`component`. |
|
||||
|
||||
### Hosts using per-service syslog listeners
|
||||
|
||||
| Host | Services (job labels) |
|
||||
|------|-----------------------|
|
||||
| `puck` | angelia, kairos, spelunker, jupyterlab *(transitional — see below)* |
|
||||
| `miranda` | argos, neo4j-cypher, grafana_mcp, gitea-mcp, searxng |
|
||||
| `oberon` | rabbitmq, smtp4dev |
|
||||
| `rosalind` | gitea, hass, lobechat, jellyfin, searxng (+ apache log files) |
|
||||
| `titania` | casdoor, haproxy |
|
||||
| `ariel`, `umbriel` | neo4j |
|
||||
|
||||
### Transitional state on puck
|
||||
|
||||
`athena`, `mnemosyne`, and `daedalus` have **migrated off** their syslog
|
||||
listeners to the Docker-socket block; their old `*_syslog_port` host_vars are
|
||||
retained as reserved-but-unused and can be removed once each rollout is
|
||||
verified. The remaining `puck` syslog listeners (angelia, kairos, spelunker,
|
||||
jupyterlab) are candidates to migrate the same way.
|
||||
|
||||
## Querying in Grafana
|
||||
|
||||
```logql
|
||||
# All Athena container logs (any component)
|
||||
{service="athena"}
|
||||
|
||||
# Just the Athena MCP container
|
||||
{service="athena", component="mcp"}
|
||||
|
||||
# Superuser-login forensic line behind the DjangoSuperuserLogin alert
|
||||
{service="athena"} |= "event=superuser_login"
|
||||
|
||||
# A syslog-driver service (legacy label scheme)
|
||||
{job="kairos"}
|
||||
|
||||
# Errors across everything on one host
|
||||
{hostname="puck.incus"} |~ "(?i)error"
|
||||
```
|
||||
|
||||
## Adding a new Dockerised service
|
||||
|
||||
**Preferred (Docker socket — no Alloy change needed):**
|
||||
|
||||
1. Ensure the service's Compose project uses the default `json-file` log driver
|
||||
(do **not** set `logging: { driver: syslog }`) and the app logs to stdout.
|
||||
2. Confirm the host's per-host Alloy config has the `discovery.docker` +
|
||||
`loki.source.docker` blocks (currently `puck`). If not, add them once
|
||||
(copy from [`puck/config.alloy.j2`](../ansible/alloy/puck/config.alloy.j2)).
|
||||
3. Deploy the service. Verify in Grafana: `{service="<compose-project>"}`
|
||||
returns entries, with `component=<compose-service>`.
|
||||
|
||||
**Legacy (syslog driver — only if the host has no Docker-socket block):**
|
||||
|
||||
1. Allocate a `514XX` syslog port in the host's `host_vars`.
|
||||
2. Add a `loki.source.syslog` block to `ansible/alloy/<host>/config.alloy.j2`.
|
||||
3. Add the `syslog` logging driver to the service's `docker-compose.yml.j2`.
|
||||
4. **Deploy Alloy first**, then the service.
|
||||
5. Verify: `{job="<label>", hostname="<host>"}` returns entries.
|
||||
|
||||
# Red Panda Seal of Approval 🐼
|
||||
@@ -54,8 +54,8 @@ Autonomous computer agent learning through environmental interaction.
|
||||
- Docker engine
|
||||
- Agent S MCP Server (MATE desktop, AT-SPI automation)
|
||||
- Kernos MCP Shell Server (port 22062)
|
||||
- Rommie MCP Server (port 22061) — agent-to-agent GUI automation via Agent S
|
||||
- FreeCAD Robust MCP Server (port 22063) — CAD automation via FreeCAD XML-RPC
|
||||
- Rommie MCP Server (port 20361) — agent-to-agent GUI automation via Agent S
|
||||
- FreeCAD Robust MCP Server (port 22061) — CAD automation via FreeCAD XML-RPC
|
||||
- GPU passthrough
|
||||
- RDP access (port 25521)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user