Files
hold-slayer/docs/configuration.md
Robert Helewka ecf37658ce feat: add initial Hold Slayer AI telephony gateway implementation
Complete project scaffolding and core implementation of an AI-powered
telephony system that calls companies, navigates IVR menus, waits on
hold, and transfers to the user when a human answers.

Key components:
- FastAPI server with REST API, WebSocket, and MCP (SSE) interfaces
- SIP/VoIP call management via PJSUA2 with RTP audio streaming
- LLM-powered IVR navigation using OpenAI/Anthropic with tool calling
- Hold detection service combining audio analysis and silence detection
- Real-time STT (Whisper/Deepgram) and TTS (OpenAI/Piper) pipelines
- Call recording with per-channel and mixed audio capture
- Event bus (asyncio pub/sub) for real-time client updates
- Web dashboard with live call monitoring
- SQLite persistence via SQLAlchemy with call history and analytics
- Notification support (email, SMS, webhook, desktop)
- Docker Compose deployment with Opal VoIP and Opal Media containers
- Comprehensive test suite with unit, integration, and E2E tests
- Simplified .gitignore and full project documentation in README
2026-03-21 19:23:26 +00:00

5.2 KiB

Configuration

All configuration is via environment variables, loaded through Pydantic Settings. Copy .env.example to .env and edit.

Environment Variables

SIP Trunk

Variable Description Default Required
SIP_TRUNK_HOST Your SIP provider hostname Yes
SIP_TRUNK_PORT SIP signaling port 5060 No
SIP_TRUNK_USERNAME SIP auth username Yes
SIP_TRUNK_PASSWORD SIP auth password Yes
SIP_TRUNK_DID Your phone number (E.164) Yes
SIP_TRUNK_TRANSPORT Transport protocol (udp, tcp, tls) udp No

Gateway

Variable Description Default Required
GATEWAY_SIP_PORT Port for device SIP registration 5080 No
GATEWAY_RTP_PORT_MIN Minimum RTP port 10000 No
GATEWAY_RTP_PORT_MAX Maximum RTP port 20000 No
GATEWAY_HOST Bind address 0.0.0.0 No

LLM

Variable Description Default Required
LLM_BASE_URL OpenAI-compatible API endpoint http://localhost:11434/v1 No
LLM_MODEL Model name for IVR analysis llama3 No
LLM_API_KEY API key (if required) not-needed No
LLM_TIMEOUT Request timeout in seconds 30.0 No
LLM_MAX_TOKENS Max tokens per response 1024 No
LLM_TEMPERATURE Sampling temperature 0.3 No

Speech-to-Text

Variable Description Default Required
SPEACHES_URL Speaches/Whisper STT endpoint http://localhost:22070 No
SPEACHES_MODEL Whisper model name whisper-large-v3 No

Database

Variable Description Default Required
DATABASE_URL PostgreSQL or SQLite connection string sqlite+aiosqlite:///./hold_slayer.db No

Notifications

Variable Description Default Required
NOTIFY_SMS_NUMBER Phone number for SMS alerts (E.164) No

Audio Classifier

Variable Description Default Required
CLASSIFIER_WINDOW_SECONDS Audio window size for classification 3.0 No
CLASSIFIER_SILENCE_THRESHOLD RMS below this = silence 0.85 No
CLASSIFIER_MUSIC_THRESHOLD Spectral flatness below this = music 0.7 No
CLASSIFIER_SPEECH_THRESHOLD Spectral flatness above this = speech 0.6 No

Hold Slayer

Variable Description Default Required
MAX_HOLD_TIME Maximum seconds to wait on hold 7200 No
HOLD_CHECK_INTERVAL Seconds between audio checks 2.0 No
DEFAULT_TRANSFER_DEVICE Device to transfer to sip_phone No

Recording

Variable Description Default Required
RECORDING_DIR Directory for WAV recordings recordings No
RECORDING_MAX_SECONDS Maximum recording duration 7200 No
RECORDING_SAMPLE_RATE Audio sample rate 16000 No

Settings Architecture

Configuration is managed by Pydantic Settings in config.py:

from config import get_settings

settings = get_settings()
settings.sip_trunk_host      # "sip.provider.com"
settings.llm.base_url        # "http://localhost:11434/v1"
settings.llm.model           # "llama3"
settings.speaches_url        # "http://localhost:22070"
settings.database_url        # "sqlite+aiosqlite:///./hold_slayer.db"

LLM settings are nested under settings.llm as a LLMSettings sub-model.

Deployment

Development

# 1. Clone and install
git clone <repo-url>
cd hold-slayer
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

# 2. Configure
cp .env.example .env
# Edit .env

# 3. Start Ollama (for LLM)
ollama serve
ollama pull llama3

# 4. Start Speaches (for STT)
docker run -p 22070:8000 ghcr.io/speaches-ai/speaches

# 5. Run
uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Production

# Use PostgreSQL instead of SQLite
DATABASE_URL=postgresql+asyncpg://user:pass@localhost/hold_slayer

# Use vLLM for faster inference
LLM_BASE_URL=http://localhost:8000/v1
LLM_MODEL=meta-llama/Llama-3-8B-Instruct

# Run with multiple workers (note: each worker is independent)
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 1

Note: Hold Slayer is designed as a single-process application. Multiple workers would each have their own SIP engine and call state. For high availability, run behind a load balancer with sticky sessions.

Docker

FROM python:3.13-slim

# Install system dependencies for PJSUA2 and Sippy
RUN apt-get update && apt-get install -y \
    build-essential \
    libpjproject-dev \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app
COPY . .
RUN pip install -e .

EXPOSE 8000 5080/udp 10000-20000/udp

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Port mapping:

  • 8000 — HTTP API + WebSocket + MCP
  • 5080/udp — SIP device registration
  • 10000-20000/udp — RTP media ports