Files

Robert Helewka ecf37658ce feat: add initial Hold Slayer AI telephony gateway implementation

Complete project scaffolding and core implementation of an AI-powered
telephony system that calls companies, navigates IVR menus, waits on
hold, and transfers to the user when a human answers.

Key components:
- FastAPI server with REST API, WebSocket, and MCP (SSE) interfaces
- SIP/VoIP call management via PJSUA2 with RTP audio streaming
- LLM-powered IVR navigation using OpenAI/Anthropic with tool calling
- Hold detection service combining audio analysis and silence detection
- Real-time STT (Whisper/Deepgram) and TTS (OpenAI/Piper) pipelines
- Call recording with per-channel and mixed audio capture
- Event bus (asyncio pub/sub) for real-time client updates
- Web dashboard with live call monitoring
- SQLite persistence via SQLAlchemy with call history and analytics
- Notification support (email, SMS, webhook, desktop)
- Docker Compose deployment with Opal VoIP and Opal Media containers
- Comprehensive test suite with unit, integration, and E2E tests
- Simplified .gitignore and full project documentation in README

2026-03-21 19:23:26 +00:00

5.2 KiB

Raw Blame History

Configuration

All configuration is via environment variables, loaded through Pydantic Settings. Copy .env.example to .env and edit.

Environment Variables

SIP Trunk

Variable	Description	Default	Required
`SIP_TRUNK_HOST`	Your SIP provider hostname	—	Yes
`SIP_TRUNK_PORT`	SIP signaling port	`5060`	No
`SIP_TRUNK_USERNAME`	SIP auth username	—	Yes
`SIP_TRUNK_PASSWORD`	SIP auth password	—	Yes
`SIP_TRUNK_DID`	Your phone number (E.164)	—	Yes
`SIP_TRUNK_TRANSPORT`	Transport protocol (`udp`, `tcp`, `tls`)	`udp`	No

Gateway

Variable	Description	Default	Required
`GATEWAY_SIP_PORT`	Port for device SIP registration	`5080`	No
`GATEWAY_RTP_PORT_MIN`	Minimum RTP port	`10000`	No
`GATEWAY_RTP_PORT_MAX`	Maximum RTP port	`20000`	No
`GATEWAY_HOST`	Bind address	`0.0.0.0`	No

LLM

Variable	Description	Default	Required
`LLM_BASE_URL`	OpenAI-compatible API endpoint	`http://localhost:11434/v1`	No
`LLM_MODEL`	Model name for IVR analysis	`llama3`	No
`LLM_API_KEY`	API key (if required)	`not-needed`	No
`LLM_TIMEOUT`	Request timeout in seconds	`30.0`	No
`LLM_MAX_TOKENS`	Max tokens per response	`1024`	No
`LLM_TEMPERATURE`	Sampling temperature	`0.3`	No

Speech-to-Text

Variable	Description	Default	Required
`SPEACHES_URL`	Speaches/Whisper STT endpoint	`http://localhost:22070`	No
`SPEACHES_MODEL`	Whisper model name	`whisper-large-v3`	No

Database

Variable	Description	Default	Required
`DATABASE_URL`	PostgreSQL or SQLite connection string	`sqlite+aiosqlite:///./hold_slayer.db`	No

Notifications

Variable	Description	Default	Required
`NOTIFY_SMS_NUMBER`	Phone number for SMS alerts (E.164)	—	No

Audio Classifier

Variable	Description	Default	Required
`CLASSIFIER_WINDOW_SECONDS`	Audio window size for classification	`3.0`	No
`CLASSIFIER_SILENCE_THRESHOLD`	RMS below this = silence	`0.85`	No
`CLASSIFIER_MUSIC_THRESHOLD`	Spectral flatness below this = music	`0.7`	No
`CLASSIFIER_SPEECH_THRESHOLD`	Spectral flatness above this = speech	`0.6`	No

Hold Slayer

Variable	Description	Default	Required
`MAX_HOLD_TIME`	Maximum seconds to wait on hold	`7200`	No
`HOLD_CHECK_INTERVAL`	Seconds between audio checks	`2.0`	No
`DEFAULT_TRANSFER_DEVICE`	Device to transfer to	`sip_phone`	No

Recording

Variable	Description	Default	Required
`RECORDING_DIR`	Directory for WAV recordings	`recordings`	No
`RECORDING_MAX_SECONDS`	Maximum recording duration	`7200`	No
`RECORDING_SAMPLE_RATE`	Audio sample rate	`16000`	No

Settings Architecture

Configuration is managed by Pydantic Settings in config.py:

from config import get_settings

settings = get_settings()
settings.sip_trunk_host      # "sip.provider.com"
settings.llm.base_url        # "http://localhost:11434/v1"
settings.llm.model           # "llama3"
settings.speaches_url        # "http://localhost:22070"
settings.database_url        # "sqlite+aiosqlite:///./hold_slayer.db"

LLM settings are nested under settings.llm as a LLMSettings sub-model.

Deployment

Development

# 1. Clone and install
git clone <repo-url>
cd hold-slayer
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

# 2. Configure
cp .env.example .env
# Edit .env

# 3. Start Ollama (for LLM)
ollama serve
ollama pull llama3

# 4. Start Speaches (for STT)
docker run -p 22070:8000 ghcr.io/speaches-ai/speaches

# 5. Run
uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Production

# Use PostgreSQL instead of SQLite
DATABASE_URL=postgresql+asyncpg://user:pass@localhost/hold_slayer

# Use vLLM for faster inference
LLM_BASE_URL=http://localhost:8000/v1
LLM_MODEL=meta-llama/Llama-3-8B-Instruct

# Run with multiple workers (note: each worker is independent)
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 1

Note: Hold Slayer is designed as a single-process application. Multiple workers would each have their own SIP engine and call state. For high availability, run behind a load balancer with sticky sessions.

Docker

FROM python:3.13-slim

# Install system dependencies for PJSUA2 and Sippy
RUN apt-get update && apt-get install -y \
    build-essential \
    libpjproject-dev \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app
COPY . .
RUN pip install -e .

EXPOSE 8000 5080/udp 10000-20000/udp

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Port mapping:

8000 — HTTP API + WebSocket + MCP
5080/udp — SIP device registration
10000-20000/udp — RTP media ports

5.2 KiB Raw Blame History