Files
ouranos/docs/searxng.md

11 KiB
Raw Blame History

SearXNG

Overview

SearXNG is a privacy-respecting metasearch engine that aggregates results from multiple upstream search providers and re-ranks them. The Ouranos deployment runs as a single Docker container behind an authenticating OAuth2-Proxy sidecar (see searxng-auth.md for the auth design).

Host: rosalind.incus Container port: 22089 (host) → 8080 (container) Public URL: https://searxng.ouranos.helu.ca/ (via HAProxy → OAuth2-Proxy → SearXNG) Internal URL: http://rosalind.incus:22089/ (used by LobeChat, Argos, etc.)

Ansible Deployment

Layout

ansible/searxng/
├── deploy.yml                      # Main deployment playbook
├── deploy_oauth2.yml               # OAuth2-Proxy sidecar playbook
├── docker-compose.yml.j2           # Docker Compose template
├── searxng-settings.yml.j2         # SearXNG settings.yml template
├── oauth2-proxy-searxng.cfg.j2     # OAuth2-Proxy config (see searxng-auth.md)
└── oauth2-proxy-searxng.service.j2 # Systemd unit for the sidecar

Run

cd ansible
ansible-playbook searxng/deploy.yml --limit rosalind.incus
ansible-playbook searxng/deploy_oauth2.yml --limit rosalind.incus

deploy.yml:

  1. Skips hosts that don't list searxng in their services list.
  2. Creates the searxng system user and /srv/searxng directory.
  3. Templates docker-compose.yml and searxng-settings.yml into /srv/searxng/.
  4. Brings up the container with community.docker.docker_compose_v2 (pull: always).

The container mounts searxng-settings.yml read-only at /etc/searxng/settings.yml. There is no persistent volume — the cache lives in the container's /tmp and is rebuilt on restart.

Variables

Host Variables (inventory/host_vars/rosalind.incus.yml)

Variable Value Purpose
searxng_port 22089 Host-side container port
searxng_base_url http://rosalind.incus:22089/ Used by SearXNG to build URLs
searxng_instance_name Ouranos Search Shown in the UI header
searxng_directory /srv/searxng Compose project dir on the host
searxng_user/group searxng Owns templated config files
searxng_syslog_port 51403 Alloy syslog receiver port

Vault Variables (group_vars/all/vault.yml)

Variable Purpose
vault_searxng_secret_key server.secret_key — also used as cache DB password
vault_searxng_brave_api_key Brave Search API subscription token (see below)
vault_searxng_oauth_* OAuth2-Proxy sidecar — see searxng-auth.md

⚠️ Changing vault_searxng_secret_key truncates the cache. SearXNG hashes cache keys with the secret key; on mismatch it drops every cache table on next startup. Harmless, but be aware that engines like wikidata and radio_browser will need to re-fetch their on-disk indexes.

Search Engine Configuration

The engine list is templated in searxng-settings.yml.j2 and merges with the upstream defaults via use_default_settings: true. The merge is keyed by engine name and is shallow — only fields you explicitly set override the defaults, everything else (including hidden ones like inactive) is inherited.

Enabled engines

Engine Notes
duckduckgo General web
startpage General web
mojeek General web
braveapi Brave Search via official REST API (see below)

Disabled engines

Engine Reason
google Aggressive bot detection / unstable scraping results
bing news Frequent parsing errors
brave (HTML scraper) Replaced by braveapi — keeping both duplicates results
brave.images / .videos / .news Scraping endpoints return 451 / access-denied
duckduckgo images Suspended / access-denied responses
pexels, vimeo Same — suspended / access-denied

Why disable Google and Bing's web search? Google's HTML scraper is blocked aggressively and produces low-quality / inconsistent results. Bing's news scraper hits parser failures often enough to be more noise than signal. The remaining four engines (Brave API, DuckDuckGo, Startpage, Mojeek) cover general web search with stable results and no API rate-limit surprises.

Brave Search API (braveapi)

braveapi is the official REST API engine — distinct from the brave engine, which scrapes the public Brave Search HTML. The API engine is more reliable, has proper rate limiting, and supports paging and time-range filters.

Configuration

- name: braveapi
  engine: braveapi
  api_key: "{{ searxng_brave_api_key }}"
  results_per_page: 20
  inactive: false
  disabled: false

inactive: false is required

The upstream SearXNG settings.yml ships braveapi with inactive: true and an empty API key. Because use_default_settings does a shallow merge, an override that only sets disabled: false leaves the inherited inactive: true in place — and inactive engines are filtered out before load_engine() runs. The result is a silent disable: no error appears in the logs, and the engine never shows up in /config.

disabled and inactive are different gates:

  • disabled — engine still loads; user can toggle it on/off via Preferences.
  • inactive — engine is filtered out before loading; the UI never sees it.

You need both inactive: false and disabled: false (or omit disabled and let the default false apply).

Endpoint and result handling

The engine implementation (searx/engines/braveapi.py) hits a single endpoint:

https://api.search.brave.com/res/v1/web/search

with the X-Subscription-Token header. Although the Brave API can return multiple result sections (web, news, videos, discussions, infobox, locations, etc.), the SearXNG engine only consumes data["web"]["results"]. Other sections in the response are silently discarded.

This means braveapi cannot be split into braveapi.images / braveapi.news / braveapi.videos engines the way the HTML-scraper brave engine is. To surface those result types from Brave you'd need to patch the upstream engine module. For now, the disabled brave.* scrapers and other category-specific engines fill that role.

Categories

braveapi declares categories = ["general", "web"] at module level. You don't need to override this in the YAML.

Verifying the engine is live

After ansible-playbook searxng/deploy.yml and a container restart:

# 1. Engine is loaded and registered
curl -s 'http://rosalind.incus:22089/config' \
  | jq '.engines[] | select(.name=="braveapi")'

# 2. Direct query — bypasses any UI/category filtering
curl -s 'http://rosalind.incus:22089/search?q=python&format=json&engines=braveapi' \
  | jq '.results | length, .unresponsive_engines'

# 3. Container logs — look for braveapi-specific errors
docker logs searxng 2>&1 | grep -i braveapi

Authentication

SearXNG itself does not authenticate users. All public access goes through an OAuth2-Proxy sidecar that talks to Casdoor for OIDC. Internal callers (LobeChat, Argos, etc.) hit http://rosalind.incus:22089/ directly and bypass auth.

See searxng-auth.md for the full design and Casdoor application setup.

Monitoring

Logs

The container is configured to ship its stdout/stderr to Alloy's syslog receiver:

logging:
  driver: syslog
  options:
    syslog-address: "tcp://127.0.0.1:51403"
    syslog-format: "{{syslog_format}}"
    tag: "searxng"

Alloy on rosalind.incus forwards these to Loki. Query in Grafana with:

{job="searxng", host="rosalind.incus"}

Health check

curl -fsS http://rosalind.incus:22089/healthz

Operations

Restart

ssh rosalind.incus
cd /srv/searxng
docker compose restart

Force pull a newer image

ssh rosalind.incus
cd /srv/searxng
docker compose pull
docker compose up -d

Or just re-run the playbook — pull: always is set on the deploy task.

Inspect rendered settings inside the container

ssh rosalind.incus
docker exec searxng cat /etc/searxng/settings.yml | grep -A6 -B1 braveapi

Troubleshooting

"Brave doesn't work"

  1. Confirm the engine is registered: /config JSON should include a braveapi entry. If absent, inactive: false is missing or the template didn't deploy.
  2. Confirm the API key is non-empty inside the container — see "Inspect rendered settings" above.
  3. Hit the engine directly with &engines=braveapi. If unresponsive_engines contains it with a reason, that's your real error (auth, rate limit, network).

radio_browser / wikidata init errors at startup

These are unrelated to your engine config:

  • radio_browser — known cache init-order bug in recent searxng/searxng:latest images. The SQLite properties table isn't created before radio_browser.init() calls CACHE.get(...). The engine simply stays unregistered; other engines work normally. Pinning to an older image tag works around it.
  • wikidata — transient: query.wikidata.org returned a truncated SPARQL response during the startup language-fetch. Restart the container; if it persists, Wikidata is rate-limiting the source IP.

Cache appears stale after rotating vault_searxng_secret_key

Expected. The secret key is hashed and used as the cache password; on mismatch SearXNG truncates every cache table at startup. No data loss — search still works, the engines just rebuild their indexes lazily.

References