# SearXNG ## Overview SearXNG is a privacy-respecting metasearch engine that aggregates results from multiple upstream search providers and re-ranks them. The Ouranos deployment runs as a single Docker container behind an authenticating OAuth2-Proxy sidecar (see [`searxng-auth.md`](./searxng-auth.md) for the auth design). **Host:** `rosalind.incus` **Container port:** 22089 (host) → 8080 (container) **Public URL:** `https://searxng.ouranos.helu.ca/` (via HAProxy → OAuth2-Proxy → SearXNG) **Internal URL:** `http://rosalind.incus:22089/` (used by LobeChat, Argos, etc.) ## Ansible Deployment ### Layout ``` ansible/searxng/ ├── deploy.yml # Main deployment playbook ├── deploy_oauth2.yml # OAuth2-Proxy sidecar playbook ├── docker-compose.yml.j2 # Docker Compose template ├── searxng-settings.yml.j2 # SearXNG settings.yml template ├── oauth2-proxy-searxng.cfg.j2 # OAuth2-Proxy config (see searxng-auth.md) └── oauth2-proxy-searxng.service.j2 # Systemd unit for the sidecar ``` ### Run ```bash cd ansible ansible-playbook searxng/deploy.yml --limit rosalind.incus ansible-playbook searxng/deploy_oauth2.yml --limit rosalind.incus ``` `deploy.yml`: 1. Skips hosts that don't list `searxng` in their `services` list. 2. Creates the `searxng` system user and `/srv/searxng` directory. 3. Templates `docker-compose.yml` and `searxng-settings.yml` into `/srv/searxng/`. 4. Brings up the container with `community.docker.docker_compose_v2` (`pull: always`). The container mounts `searxng-settings.yml` read-only at `/etc/searxng/settings.yml`. There is no persistent volume — the cache lives in the container's `/tmp` and is rebuilt on restart. ### Variables #### Host Variables (`inventory/host_vars/rosalind.incus.yml`) | Variable | Value | Purpose | |--------------------------|----------------------------------|----------------------------------| | `searxng_port` | `22089` | Host-side container port | | `searxng_base_url` | `http://rosalind.incus:22089/` | Used by SearXNG to build URLs | | `searxng_instance_name` | `Ouranos Search` | Shown in the UI header | | `searxng_directory` | `/srv/searxng` | Compose project dir on the host | | `searxng_user`/`group` | `searxng` | Owns templated config files | | `searxng_syslog_port` | `51403` | Alloy syslog receiver port | #### Vault Variables (`group_vars/all/vault.yml`) | Variable | Purpose | |--------------------------------|------------------------------------------------------------| | `vault_searxng_secret_key` | `server.secret_key` — also used as cache DB password | | `vault_searxng_brave_api_key` | Brave Search API subscription token (see below) | | `vault_searxng_oauth_*` | OAuth2-Proxy sidecar — see `searxng-auth.md` | > ⚠️ **Changing `vault_searxng_secret_key` truncates the cache.** SearXNG hashes > cache keys with the secret key; on mismatch it drops every cache table on next > startup. Harmless, but be aware that engines like `wikidata` and > `radio_browser` will need to re-fetch their on-disk indexes. ## Search Engine Configuration The engine list is templated in `searxng-settings.yml.j2` and merges with the upstream defaults via `use_default_settings: true`. The merge is keyed by engine `name` and is shallow — **only fields you explicitly set override the defaults**, everything else (including hidden ones like `inactive`) is inherited. ### Enabled engines | Engine | Notes | |--------------|----------------------------------------------------| | `duckduckgo` | General web | | `startpage` | General web | | `mojeek` | General web | | `braveapi` | Brave Search via official REST API (see below) | ### Disabled engines | Engine | Reason | |--------------------------------|------------------------------------------------------------| | `google` | Aggressive bot detection / unstable scraping results | | `bing news` | Frequent parsing errors | | `brave` (HTML scraper) | Replaced by `braveapi` — keeping both duplicates results | | `brave.images` / `.videos` / `.news` | Scraping endpoints return 451 / access-denied | | `duckduckgo images` | Suspended / access-denied responses | | `pexels`, `vimeo` | Same — suspended / access-denied | > ℹ️ **Why disable Google and Bing's web search?** Google's HTML scraper is > blocked aggressively and produces low-quality / inconsistent results. Bing's > news scraper hits parser failures often enough to be more noise than signal. > The remaining four engines (Brave API, DuckDuckGo, Startpage, Mojeek) cover > general web search with stable results and no API rate-limit surprises. ### Brave Search API (`braveapi`) `braveapi` is the official REST API engine — distinct from the `brave` engine, which scrapes the public Brave Search HTML. The API engine is more reliable, has proper rate limiting, and supports paging and time-range filters. #### Configuration ```yaml - name: braveapi engine: braveapi api_key: "{{ searxng_brave_api_key }}" results_per_page: 20 inactive: false disabled: false ``` #### `inactive: false` is required The upstream SearXNG `settings.yml` ships `braveapi` with `inactive: true` and an empty API key. Because `use_default_settings` does a shallow merge, an override that only sets `disabled: false` leaves the inherited `inactive: true` in place — and `inactive` engines are filtered out before `load_engine()` runs. The result is a silent disable: no error appears in the logs, and the engine never shows up in `/config`. `disabled` and `inactive` are different gates: - **`disabled`** — engine still loads; user can toggle it on/off via Preferences. - **`inactive`** — engine is filtered out before loading; the UI never sees it. You need both `inactive: false` and `disabled: false` (or omit `disabled` and let the default `false` apply). #### Endpoint and result handling The engine implementation (`searx/engines/braveapi.py`) hits a single endpoint: ``` https://api.search.brave.com/res/v1/web/search ``` with the `X-Subscription-Token` header. Although the Brave API can return multiple result sections (`web`, `news`, `videos`, `discussions`, `infobox`, `locations`, etc.), the SearXNG engine **only consumes `data["web"]["results"]`**. Other sections in the response are silently discarded. This means `braveapi` cannot be split into `braveapi.images` / `braveapi.news` / `braveapi.videos` engines the way the HTML-scraper `brave` engine is. To surface those result types from Brave you'd need to patch the upstream engine module. For now, the disabled `brave.*` scrapers and other category-specific engines fill that role. #### Categories `braveapi` declares `categories = ["general", "web"]` at module level. You don't need to override this in the YAML. ### Verifying the engine is live After `ansible-playbook searxng/deploy.yml` and a container restart: ```bash # 1. Engine is loaded and registered curl -s 'http://rosalind.incus:22089/config' \ | jq '.engines[] | select(.name=="braveapi")' # 2. Direct query — bypasses any UI/category filtering curl -s 'http://rosalind.incus:22089/search?q=python&format=json&engines=braveapi' \ | jq '.results | length, .unresponsive_engines' # 3. Container logs — look for braveapi-specific errors docker logs searxng 2>&1 | grep -i braveapi ``` ## Authentication SearXNG itself does not authenticate users. All public access goes through an OAuth2-Proxy sidecar that talks to Casdoor for OIDC. Internal callers (LobeChat, Argos, etc.) hit `http://rosalind.incus:22089/` directly and bypass auth. See [`searxng-auth.md`](./searxng-auth.md) for the full design and Casdoor application setup. ## Monitoring ### Logs The container is configured to ship its stdout/stderr to Alloy's syslog receiver: ```yaml logging: driver: syslog options: syslog-address: "tcp://127.0.0.1:51403" syslog-format: "{{syslog_format}}" tag: "searxng" ``` Alloy on `rosalind.incus` forwards these to Loki. Query in Grafana with: ``` {job="searxng", host="rosalind.incus"} ``` ### Health check ```bash curl -fsS http://rosalind.incus:22089/healthz ``` ## Operations ### Restart ```bash ssh rosalind.incus cd /srv/searxng docker compose restart ``` ### Force pull a newer image ```bash ssh rosalind.incus cd /srv/searxng docker compose pull docker compose up -d ``` Or just re-run the playbook — `pull: always` is set on the deploy task. ### Inspect rendered settings inside the container ```bash ssh rosalind.incus docker exec searxng cat /etc/searxng/settings.yml | grep -A6 -B1 braveapi ``` ## Troubleshooting ### "Brave doesn't work" 1. Confirm the engine is registered: `/config` JSON should include a `braveapi` entry. If absent, `inactive: false` is missing or the template didn't deploy. 2. Confirm the API key is non-empty inside the container — see "Inspect rendered settings" above. 3. Hit the engine directly with `&engines=braveapi`. If `unresponsive_engines` contains it with a reason, that's your real error (auth, rate limit, network). ### `radio_browser` / `wikidata` init errors at startup These are unrelated to your engine config: - **`radio_browser`** — known cache init-order bug in recent `searxng/searxng:latest` images. The SQLite `properties` table isn't created before `radio_browser.init()` calls `CACHE.get(...)`. The engine simply stays unregistered; other engines work normally. Pinning to an older image tag works around it. - **`wikidata`** — transient: `query.wikidata.org` returned a truncated SPARQL response during the startup language-fetch. Restart the container; if it persists, Wikidata is rate-limiting the source IP. ### Cache appears stale after rotating `vault_searxng_secret_key` Expected. The secret key is hashed and used as the cache password; on mismatch SearXNG truncates every cache table at startup. No data loss — search still works, the engines just rebuild their indexes lazily. ## References - Upstream docs: - Brave Search API engine: - Brave Search API reference: [`brave_search_api.md`](./brave_search_api.md) - SearXNG authentication design: [`searxng-auth.md`](./searxng-auth.md) - [Ansible Practices](./ansible.md)