chore(ansible): update vault credentials

This commit is contained in:
2026-05-26 21:45:17 -04:00
parent f4a25316de
commit a01feee663
7 changed files with 3213 additions and 480 deletions

2419
docs/brave_search_api.md Normal file

File diff suppressed because it is too large Load Diff

284
docs/searxng.md Normal file
View File

@@ -0,0 +1,284 @@
# SearXNG
## Overview
SearXNG is a privacy-respecting metasearch engine that aggregates results from
multiple upstream search providers and re-ranks them. The Ouranos deployment runs
as a single Docker container behind an authenticating OAuth2-Proxy sidecar (see
[`searxng-auth.md`](./searxng-auth.md) for the auth design).
**Host:** `rosalind.incus`
**Container port:** 22089 (host) → 8080 (container)
**Public URL:** `https://searxng.ouranos.helu.ca/` (via HAProxy → OAuth2-Proxy → SearXNG)
**Internal URL:** `http://rosalind.incus:22089/` (used by LobeChat, Argos, etc.)
## Ansible Deployment
### Layout
```
ansible/searxng/
├── deploy.yml # Main deployment playbook
├── deploy_oauth2.yml # OAuth2-Proxy sidecar playbook
├── docker-compose.yml.j2 # Docker Compose template
├── searxng-settings.yml.j2 # SearXNG settings.yml template
├── oauth2-proxy-searxng.cfg.j2 # OAuth2-Proxy config (see searxng-auth.md)
└── oauth2-proxy-searxng.service.j2 # Systemd unit for the sidecar
```
### Run
```bash
cd ansible
ansible-playbook searxng/deploy.yml --limit rosalind.incus
ansible-playbook searxng/deploy_oauth2.yml --limit rosalind.incus
```
`deploy.yml`:
1. Skips hosts that don't list `searxng` in their `services` list.
2. Creates the `searxng` system user and `/srv/searxng` directory.
3. Templates `docker-compose.yml` and `searxng-settings.yml` into `/srv/searxng/`.
4. Brings up the container with `community.docker.docker_compose_v2` (`pull: always`).
The container mounts `searxng-settings.yml` read-only at
`/etc/searxng/settings.yml`. There is no persistent volume — the cache lives in
the container's `/tmp` and is rebuilt on restart.
### Variables
#### Host Variables (`inventory/host_vars/rosalind.incus.yml`)
| Variable | Value | Purpose |
|--------------------------|----------------------------------|----------------------------------|
| `searxng_port` | `22089` | Host-side container port |
| `searxng_base_url` | `http://rosalind.incus:22089/` | Used by SearXNG to build URLs |
| `searxng_instance_name` | `Ouranos Search` | Shown in the UI header |
| `searxng_directory` | `/srv/searxng` | Compose project dir on the host |
| `searxng_user`/`group` | `searxng` | Owns templated config files |
| `searxng_syslog_port` | `51403` | Alloy syslog receiver port |
#### Vault Variables (`group_vars/all/vault.yml`)
| Variable | Purpose |
|--------------------------------|------------------------------------------------------------|
| `vault_searxng_secret_key` | `server.secret_key` — also used as cache DB password |
| `vault_searxng_brave_api_key` | Brave Search API subscription token (see below) |
| `vault_searxng_oauth_*` | OAuth2-Proxy sidecar — see `searxng-auth.md` |
> ⚠️ **Changing `vault_searxng_secret_key` truncates the cache.** SearXNG hashes
> cache keys with the secret key; on mismatch it drops every cache table on next
> startup. Harmless, but be aware that engines like `wikidata` and
> `radio_browser` will need to re-fetch their on-disk indexes.
## Search Engine Configuration
The engine list is templated in `searxng-settings.yml.j2` and merges with the
upstream defaults via `use_default_settings: true`. The merge is keyed by engine
`name` and is shallow — **only fields you explicitly set override the
defaults**, everything else (including hidden ones like `inactive`) is inherited.
### Enabled engines
| Engine | Notes |
|--------------|----------------------------------------------------|
| `duckduckgo` | General web |
| `startpage` | General web |
| `mojeek` | General web |
| `braveapi` | Brave Search via official REST API (see below) |
### Disabled engines
| Engine | Reason |
|--------------------------------|------------------------------------------------------------|
| `google` | Aggressive bot detection / unstable scraping results |
| `bing news` | Frequent parsing errors |
| `brave` (HTML scraper) | Replaced by `braveapi` — keeping both duplicates results |
| `brave.images` / `.videos` / `.news` | Scraping endpoints return 451 / access-denied |
| `duckduckgo images` | Suspended / access-denied responses |
| `pexels`, `vimeo` | Same — suspended / access-denied |
> **Why disable Google and Bing's web search?** Google's HTML scraper is
> blocked aggressively and produces low-quality / inconsistent results. Bing's
> news scraper hits parser failures often enough to be more noise than signal.
> The remaining four engines (Brave API, DuckDuckGo, Startpage, Mojeek) cover
> general web search with stable results and no API rate-limit surprises.
### Brave Search API (`braveapi`)
`braveapi` is the official REST API engine — distinct from the `brave` engine,
which scrapes the public Brave Search HTML. The API engine is more reliable, has
proper rate limiting, and supports paging and time-range filters.
#### Configuration
```yaml
- name: braveapi
engine: braveapi
api_key: "{{ searxng_brave_api_key }}"
results_per_page: 20
inactive: false
disabled: false
```
#### `inactive: false` is required
The upstream SearXNG `settings.yml` ships `braveapi` with `inactive: true` and
an empty API key. Because `use_default_settings` does a shallow merge, an
override that only sets `disabled: false` leaves the inherited `inactive: true`
in place — and `inactive` engines are filtered out before `load_engine()` runs.
The result is a silent disable: no error appears in the logs, and the engine
never shows up in `/config`.
`disabled` and `inactive` are different gates:
- **`disabled`** — engine still loads; user can toggle it on/off via Preferences.
- **`inactive`** — engine is filtered out before loading; the UI never sees it.
You need both `inactive: false` and `disabled: false` (or omit `disabled` and
let the default `false` apply).
#### Endpoint and result handling
The engine implementation (`searx/engines/braveapi.py`) hits a single endpoint:
```
https://api.search.brave.com/res/v1/web/search
```
with the `X-Subscription-Token` header. Although the Brave API can return
multiple result sections (`web`, `news`, `videos`, `discussions`, `infobox`,
`locations`, etc.), the SearXNG engine **only consumes `data["web"]["results"]`**.
Other sections in the response are silently discarded.
This means `braveapi` cannot be split into `braveapi.images` / `braveapi.news`
/ `braveapi.videos` engines the way the HTML-scraper `brave` engine is. To
surface those result types from Brave you'd need to patch the upstream engine
module. For now, the disabled `brave.*` scrapers and other category-specific
engines fill that role.
#### Categories
`braveapi` declares `categories = ["general", "web"]` at module level. You don't
need to override this in the YAML.
### Verifying the engine is live
After `ansible-playbook searxng/deploy.yml` and a container restart:
```bash
# 1. Engine is loaded and registered
curl -s 'http://rosalind.incus:22089/config' \
| jq '.engines[] | select(.name=="braveapi")'
# 2. Direct query — bypasses any UI/category filtering
curl -s 'http://rosalind.incus:22089/search?q=python&format=json&engines=braveapi' \
| jq '.results | length, .unresponsive_engines'
# 3. Container logs — look for braveapi-specific errors
docker logs searxng 2>&1 | grep -i braveapi
```
## Authentication
SearXNG itself does not authenticate users. All public access goes through an
OAuth2-Proxy sidecar that talks to Casdoor for OIDC. Internal callers
(LobeChat, Argos, etc.) hit `http://rosalind.incus:22089/` directly and bypass
auth.
See [`searxng-auth.md`](./searxng-auth.md) for the full design and Casdoor
application setup.
## Monitoring
### Logs
The container is configured to ship its stdout/stderr to Alloy's syslog
receiver:
```yaml
logging:
driver: syslog
options:
syslog-address: "tcp://127.0.0.1:51403"
syslog-format: "{{syslog_format}}"
tag: "searxng"
```
Alloy on `rosalind.incus` forwards these to Loki. Query in Grafana with:
```
{job="searxng", host="rosalind.incus"}
```
### Health check
```bash
curl -fsS http://rosalind.incus:22089/healthz
```
## Operations
### Restart
```bash
ssh rosalind.incus
cd /srv/searxng
docker compose restart
```
### Force pull a newer image
```bash
ssh rosalind.incus
cd /srv/searxng
docker compose pull
docker compose up -d
```
Or just re-run the playbook — `pull: always` is set on the deploy task.
### Inspect rendered settings inside the container
```bash
ssh rosalind.incus
docker exec searxng cat /etc/searxng/settings.yml | grep -A6 -B1 braveapi
```
## Troubleshooting
### "Brave doesn't work"
1. Confirm the engine is registered: `/config` JSON should include a `braveapi`
entry. If absent, `inactive: false` is missing or the template didn't deploy.
2. Confirm the API key is non-empty inside the container — see "Inspect rendered
settings" above.
3. Hit the engine directly with `&engines=braveapi`. If `unresponsive_engines`
contains it with a reason, that's your real error (auth, rate limit, network).
### `radio_browser` / `wikidata` init errors at startup
These are unrelated to your engine config:
- **`radio_browser`** — known cache init-order bug in recent
`searxng/searxng:latest` images. The SQLite `properties` table isn't created
before `radio_browser.init()` calls `CACHE.get(...)`. The engine simply stays
unregistered; other engines work normally. Pinning to an older image tag
works around it.
- **`wikidata`** — transient: `query.wikidata.org` returned a truncated SPARQL
response during the startup language-fetch. Restart the container; if it
persists, Wikidata is rate-limiting the source IP.
### Cache appears stale after rotating `vault_searxng_secret_key`
Expected. The secret key is hashed and used as the cache password; on mismatch
SearXNG truncates every cache table at startup. No data loss — search still
works, the engines just rebuild their indexes lazily.
## References
- Upstream docs: <https://docs.searxng.org/>
- Brave Search API engine: <https://docs.searxng.org/dev/engines/online/brave.html>
- Brave Search API reference: [`brave_search_api.md`](./brave_search_api.md)
- SearXNG authentication design: [`searxng-auth.md`](./searxng-auth.md)
- [Ansible Practices](./ansible.md)