mnemosyne/mnemosyne/library/services/model_health.py at 70b1fc510b8da160d83c552d476b2d605019618a

r/mnemosyne

Fork 0

Files

Robert Helewka a90c6e7479

CVE Scan & Docker Build / security-scan (push) Successful in 3m49s

Details

Build & Deploy Docs / build-and-deploy (push) Successful in 1m9s

Details

CVE Scan & Docker Build / build-and-push (push) Successful in 3m32s

Details

feat(metrics): add scrape-time system model health collector

Add a Prometheus custom collector that probes the four system-default
models (chat, vision, embedding, reranker) at /metrics scrape time and
emits up/down, configured, and probe-latency gauges. This complements
the ingest-pipeline counters in the Celery worker, which only move
during active ingests and cannot signal model outages on an idle queue.

- New `library/health_collector.py` registers a custom collector with
  a 55s in-process cache to avoid hammering GPU endpoints on rapid
  scrapes or across multiple gunicorn workers.
- New `library/services/model_health.py` centralises the probe logic,
  resolving system-default models via SystemSettings and dispatching
  to chat/embedding/rerank endpoints with a short timeout.
- Register the collector only in the web process (gunicorn/runserver)
  via `LibraryConfig.ready`, excluding Celery, pytest, and management
  commands to prevent duplicate registration and stray probes.
- Add unit tests covering the collector cache, metric shape, and
  per-role probe dispatch.

2026-06-17 09:06:11 -04:00

4.0 KiB

Raw Blame History

View Raw

4.0 KiB Raw Blame History

4.0 KiB

Raw Blame History