11 KiB
SearXNG
Overview
SearXNG is a privacy-respecting metasearch engine that aggregates results from
multiple upstream search providers and re-ranks them. The Ouranos deployment runs
as a single Docker container behind an authenticating OAuth2-Proxy sidecar (see
searxng-auth.md for the auth design).
Host: rosalind.incus
Container port: 22089 (host) → 8080 (container)
Public URL: https://searxng.ouranos.helu.ca/ (via HAProxy → OAuth2-Proxy → SearXNG)
Internal URL: http://rosalind.incus:22089/ (used by LobeChat, Argos, etc.)
Ansible Deployment
Layout
ansible/searxng/
├── deploy.yml # Main deployment playbook
├── deploy_oauth2.yml # OAuth2-Proxy sidecar playbook
├── docker-compose.yml.j2 # Docker Compose template
├── searxng-settings.yml.j2 # SearXNG settings.yml template
├── oauth2-proxy-searxng.cfg.j2 # OAuth2-Proxy config (see searxng-auth.md)
└── oauth2-proxy-searxng.service.j2 # Systemd unit for the sidecar
Run
cd ansible
ansible-playbook searxng/deploy.yml --limit rosalind.incus
ansible-playbook searxng/deploy_oauth2.yml --limit rosalind.incus
deploy.yml:
- Skips hosts that don't list
searxngin theirserviceslist. - Creates the
searxngsystem user and/srv/searxngdirectory. - Templates
docker-compose.ymlandsearxng-settings.ymlinto/srv/searxng/. - Brings up the container with
community.docker.docker_compose_v2(pull: always).
The container mounts searxng-settings.yml read-only at
/etc/searxng/settings.yml. There is no persistent volume — the cache lives in
the container's /tmp and is rebuilt on restart.
Variables
Host Variables (inventory/host_vars/rosalind.incus.yml)
| Variable | Value | Purpose |
|---|---|---|
searxng_port |
22089 |
Host-side container port |
searxng_base_url |
http://rosalind.incus:22089/ |
Used by SearXNG to build URLs |
searxng_instance_name |
Ouranos Search |
Shown in the UI header |
searxng_directory |
/srv/searxng |
Compose project dir on the host |
searxng_user/group |
searxng |
Owns templated config files |
searxng_syslog_port |
51403 |
Alloy syslog receiver port |
Vault Variables (group_vars/all/vault.yml)
| Variable | Purpose |
|---|---|
vault_searxng_secret_key |
server.secret_key — also used as cache DB password |
vault_searxng_brave_api_key |
Brave Search API subscription token (see below) |
vault_searxng_oauth_* |
OAuth2-Proxy sidecar — see searxng-auth.md |
⚠️ Changing
vault_searxng_secret_keytruncates the cache. SearXNG hashes cache keys with the secret key; on mismatch it drops every cache table on next startup. Harmless, but be aware that engines likewikidataandradio_browserwill need to re-fetch their on-disk indexes.
Search Engine Configuration
The engine list is templated in searxng-settings.yml.j2 and merges with the
upstream defaults via use_default_settings: true. The merge is keyed by engine
name and is shallow — only fields you explicitly set override the
defaults, everything else (including hidden ones like inactive) is inherited.
Enabled engines
| Engine | Notes |
|---|---|
duckduckgo |
General web |
startpage |
General web |
mojeek |
General web |
braveapi |
Brave Search via official REST API (see below) |
Disabled engines
| Engine | Reason |
|---|---|
google |
Aggressive bot detection / unstable scraping results |
bing news |
Frequent parsing errors |
brave (HTML scraper) |
Replaced by braveapi — keeping both duplicates results |
brave.images / .videos / .news |
Scraping endpoints return 451 / access-denied |
duckduckgo images |
Suspended / access-denied responses |
pexels, vimeo |
Same — suspended / access-denied |
ℹ️ Why disable Google and Bing's web search? Google's HTML scraper is blocked aggressively and produces low-quality / inconsistent results. Bing's news scraper hits parser failures often enough to be more noise than signal. The remaining four engines (Brave API, DuckDuckGo, Startpage, Mojeek) cover general web search with stable results and no API rate-limit surprises.
Brave Search API (braveapi)
braveapi is the official REST API engine — distinct from the brave engine,
which scrapes the public Brave Search HTML. The API engine is more reliable, has
proper rate limiting, and supports paging and time-range filters.
Configuration
- name: braveapi
engine: braveapi
api_key: "{{ searxng_brave_api_key }}"
results_per_page: 20
inactive: false
disabled: false
inactive: false is required
The upstream SearXNG settings.yml ships braveapi with inactive: true and
an empty API key. Because use_default_settings does a shallow merge, an
override that only sets disabled: false leaves the inherited inactive: true
in place — and inactive engines are filtered out before load_engine() runs.
The result is a silent disable: no error appears in the logs, and the engine
never shows up in /config.
disabled and inactive are different gates:
disabled— engine still loads; user can toggle it on/off via Preferences.inactive— engine is filtered out before loading; the UI never sees it.
You need both inactive: false and disabled: false (or omit disabled and
let the default false apply).
Endpoint and result handling
The engine implementation (searx/engines/braveapi.py) hits a single endpoint:
https://api.search.brave.com/res/v1/web/search
with the X-Subscription-Token header. Although the Brave API can return
multiple result sections (web, news, videos, discussions, infobox,
locations, etc.), the SearXNG engine only consumes data["web"]["results"].
Other sections in the response are silently discarded.
This means braveapi cannot be split into braveapi.images / braveapi.news
/ braveapi.videos engines the way the HTML-scraper brave engine is. To
surface those result types from Brave you'd need to patch the upstream engine
module. For now, the disabled brave.* scrapers and other category-specific
engines fill that role.
Categories
braveapi declares categories = ["general", "web"] at module level. You don't
need to override this in the YAML.
Verifying the engine is live
After ansible-playbook searxng/deploy.yml and a container restart:
# 1. Engine is loaded and registered
curl -s 'http://rosalind.incus:22089/config' \
| jq '.engines[] | select(.name=="braveapi")'
# 2. Direct query — bypasses any UI/category filtering
curl -s 'http://rosalind.incus:22089/search?q=python&format=json&engines=braveapi' \
| jq '.results | length, .unresponsive_engines'
# 3. Container logs — look for braveapi-specific errors
docker logs searxng 2>&1 | grep -i braveapi
Authentication
SearXNG itself does not authenticate users. All public access goes through an
OAuth2-Proxy sidecar that talks to Casdoor for OIDC. Internal callers
(LobeChat, Argos, etc.) hit http://rosalind.incus:22089/ directly and bypass
auth.
See searxng-auth.md for the full design and Casdoor
application setup.
Monitoring
Logs
The container is configured to ship its stdout/stderr to Alloy's syslog receiver:
logging:
driver: syslog
options:
syslog-address: "tcp://127.0.0.1:51403"
syslog-format: "{{syslog_format}}"
tag: "searxng"
Alloy on rosalind.incus forwards these to Loki. Query in Grafana with:
{job="searxng", host="rosalind.incus"}
Health check
curl -fsS http://rosalind.incus:22089/healthz
Operations
Restart
ssh rosalind.incus
cd /srv/searxng
docker compose restart
Force pull a newer image
ssh rosalind.incus
cd /srv/searxng
docker compose pull
docker compose up -d
Or just re-run the playbook — pull: always is set on the deploy task.
Inspect rendered settings inside the container
ssh rosalind.incus
docker exec searxng cat /etc/searxng/settings.yml | grep -A6 -B1 braveapi
Troubleshooting
"Brave doesn't work"
- Confirm the engine is registered:
/configJSON should include abraveapientry. If absent,inactive: falseis missing or the template didn't deploy. - Confirm the API key is non-empty inside the container — see "Inspect rendered settings" above.
- Hit the engine directly with
&engines=braveapi. Ifunresponsive_enginescontains it with a reason, that's your real error (auth, rate limit, network).
radio_browser / wikidata init errors at startup
These are unrelated to your engine config:
radio_browser— known cache init-order bug in recentsearxng/searxng:latestimages. The SQLitepropertiestable isn't created beforeradio_browser.init()callsCACHE.get(...). The engine simply stays unregistered; other engines work normally. Pinning to an older image tag works around it.wikidata— transient:query.wikidata.orgreturned a truncated SPARQL response during the startup language-fetch. Restart the container; if it persists, Wikidata is rate-limiting the source IP.
Cache appears stale after rotating vault_searxng_secret_key
Expected. The secret key is hashed and used as the cache password; on mismatch SearXNG truncates every cache table at startup. No data loss — search still works, the engines just rebuild their indexes lazily.
References
- Upstream docs: https://docs.searxng.org/
- Brave Search API engine: https://docs.searxng.org/dev/engines/online/brave.html
- Brave Search API reference:
brave_search_api.md - SearXNG authentication design:
searxng-auth.md - Ansible Practices