Complete project scaffolding and core implementation of an AI-powered telephony system that calls companies, navigates IVR menus, waits on hold, and transfers to the user when a human answers. Key components: - FastAPI server with REST API, WebSocket, and MCP (SSE) interfaces - SIP/VoIP call management via PJSUA2 with RTP audio streaming - LLM-powered IVR navigation using OpenAI/Anthropic with tool calling - Hold detection service combining audio analysis and silence detection - Real-time STT (Whisper/Deepgram) and TTS (OpenAI/Piper) pipelines - Call recording with per-channel and mixed audio capture - Event bus (asyncio pub/sub) for real-time client updates - Web dashboard with live call monitoring - SQLite persistence via SQLAlchemy with call history and analytics - Notification support (email, SMS, webhook, desktop) - Docker Compose deployment with Opal VoIP and Opal Media containers - Comprehensive test suite with unit, integration, and E2E tests - Simplified .gitignore and full project documentation in README
7.9 KiB
Services
The intelligence layer services that power Hold Slayer's decision-making, transcription, recording, analytics, and notifications.
LLM Client (services/llm_client.py)
Async HTTP client for any OpenAI-compatible chat completion API. No SDK dependency — just httpx.
Supported Backends
| Backend | URL | Notes |
|---|---|---|
| Ollama | http://localhost:11434/v1 |
Local, free, good for dev |
| LM Studio | http://localhost:1234/v1 |
Local, free, GUI |
| vLLM | http://localhost:8000/v1 |
Local, fast, production |
| OpenAI | https://api.openai.com/v1 |
Cloud, paid, best quality |
Usage
client = LLMClient(
base_url="http://localhost:11434/v1",
model="llama3",
api_key="not-needed", # Ollama doesn't need a key
timeout=30.0,
max_tokens=1024,
temperature=0.3,
)
# Simple chat
response = await client.chat("What is 2+2?")
# "4"
# Chat with system prompt
response = await client.chat(
"Parse this menu transcript...",
system="You are a phone menu parser. Return JSON.",
)
# Structured JSON response (auto-parses)
result = await client.chat_json(
"Extract menu options from: Press 1 for billing, press 2 for support",
system="Return JSON with 'options' array.",
)
# {"options": [{"digit": "1", "label": "billing"}, {"digit": "2", "label": "support"}]}
IVR Menu Analysis
The primary use case — analyzing IVR transcripts to pick the right menu option:
decision = await client.analyze_ivr_menu(
transcript="Welcome to Chase Bank. Press 1 for account balance, press 2 for recent transactions, press 3 for disputes, press 0 for an agent.",
intent="dispute a charge from Amazon on December 15th",
previous_selections=["main_menu"],
)
# {"action": "dtmf", "digits": "3", "reasoning": "Disputes is the correct department"}
JSON Extraction
The client handles messy LLM output gracefully:
- Try
json.loads()on the raw response - If that fails, look for
json ...markdown blocks - If that fails, look for
{...}patterns in the text - If all fail, return empty dict (caller handles gracefully)
Stats Tracking
stats = client.stats
# {
# "total_requests": 47,
# "total_errors": 2,
# "avg_latency_ms": 234.5,
# "model": "llama3",
# "base_url": "http://localhost:11434/v1"
# }
Error Handling
- HTTP errors return empty string/dict (never crashes the call)
- Timeouts are configurable (default 30s)
- All errors are logged with full context
- Stats track error rates for monitoring
Transcription Service (services/transcription.py)
Real-time speech-to-text using Speaches (a self-hosted Whisper API).
Architecture
Audio frames (from AudioTap)
│
└── POST /v1/audio/transcriptions
├── model: whisper-large-v3
├── audio: WAV bytes
└── language: en
│
└── Response: { "text": "Press 1 for billing..." }
Usage
service = TranscriptionService(
speaches_url="http://perseus.helu.ca:22070",
model="whisper-large-v3",
)
# Transcribe audio bytes
text = await service.transcribe(audio_bytes)
# "Welcome to Chase Bank. For English, press 1."
# Transcribe with language hint
text = await service.transcribe(audio_bytes, language="fr")
Integration with Hold Slayer
The transcription service is called when the audio classifier detects speech (IVR_PROMPT or LIVE_HUMAN). The transcript is then:
- Published as a
TRANSCRIPT_CHUNKevent (→ WebSocket clients) - Fed to the LLM for IVR menu analysis
- Stored in the call's transcript history
- Used by the Call Flow Learner to build reusable flows
Recording Service (services/recording.py)
Manages call recordings via the PJSUA2 media pipeline.
Storage Structure
recordings/
├── 2026/
│ ├── 01/
│ │ ├── 15/
│ │ │ ├── call_abc123_outbound.wav
│ │ │ ├── call_abc123_mixed.wav
│ │ │ └── call_def456_outbound.wav
│ │ └── 16/
│ │ └── ...
│ └── 02/
│ └── ...
Recording Types
| Type | Description |
|---|---|
| Outbound | Audio from the company (IVR, hold music, agent) |
| Inbound | Audio from the user's device (after transfer) |
| Mixed | Both parties in one file (for review) |
Usage
service = RecordingService(
storage_dir="recordings",
max_recording_seconds=7200, # 2 hours
sample_rate=16000,
)
# Start recording
session = await service.start_recording(call_id, stream_id)
# session.path = "recordings/2026/01/15/call_abc123_outbound.wav"
# Stop recording
metadata = await service.stop_recording(call_id)
# metadata = { "duration": 847.3, "file_size": 27113600, "path": "..." }
# List recordings for a call
recordings = service.get_recordings(call_id)
Call Analytics (services/call_analytics.py)
Tracks call metrics and provides insights for monitoring and optimization.
Metrics Tracked
| Metric | Description |
|---|---|
| Hold time | Duration spent on hold per call |
| Total call duration | End-to-end call time |
| Success rate | Percentage of calls that reached a human |
| IVR navigation time | Time spent navigating menus |
| Company patterns | Per-company hold time averages |
| Time-of-day trends | When hold times are shortest |
Usage
analytics = CallAnalytics(max_history=10000)
# Record a completed call
analytics.record_call(
call_id="call_abc123",
number="+18005551234",
company="Chase Bank",
hold_time=780,
total_duration=847,
success=True,
ivr_steps=6,
)
# Get summary
summary = analytics.get_summary()
# {
# "total_calls": 142,
# "success_rate": 0.89,
# "avg_hold_time": 623.4,
# "avg_total_duration": 712.1,
# }
# Per-company stats
stats = analytics.get_company_stats("Chase Bank")
# {
# "total_calls": 23,
# "avg_hold_time": 845.2,
# "best_time": "Tuesday 10:00 AM",
# "success_rate": 0.91,
# }
# Top numbers by call volume
top = analytics.get_top_numbers(limit=10)
# Hold time trends by hour
trends = analytics.get_hold_time_trend()
# [{"hour": 9, "avg_hold": 320}, {"hour": 10, "avg_hold": 480}, ...]
Notification Service (services/notification.py)
Sends alerts when important things happen on calls.
Notification Channels
| Channel | Status | Use Case |
|---|---|---|
| WebSocket | ✅ Active | Real-time UI updates (always on) |
| SMS | ✅ Active | Critical alerts (human detected, call failed) |
| Push | 🔮 Future | Mobile app notifications |
Notification Priority
| Priority | Events | Delivery |
|---|---|---|
CRITICAL |
Human detected, transfer started | WebSocket + SMS |
HIGH |
Call failed, call timeout | WebSocket + SMS |
NORMAL |
Hold detected, call ended | WebSocket only |
LOW |
IVR step, DTMF sent | WebSocket only |
Event → Notification Mapping
| Event | Notification |
|---|---|
HUMAN_DETECTED |
🚨 "A live person picked up — transferring you now!" |
TRANSFER_STARTED |
📞 "Your call has been connected. Pick up your phone!" |
CALL_FAILED |
❌ "The call couldn't be completed." |
HOLD_DETECTED |
⏳ "You're on hold. We'll notify you when someone picks up." |
IVR_STEP |
📍 "Navigating phone menu..." |
IVR_DTMF_SENT |
📱 "Pressed 3" |
CALL_ENDED |
📴 "The call has ended." |
Deduplication
The notification service tracks what's been sent per call to avoid spamming:
# Won't send duplicate "on hold" notifications for the same call
self._notified: dict[str, set[str]] # call_id → set of event dedup keys
Tracking is cleaned up when a call ends.
SMS Configuration
SMS is sent for CRITICAL priority notifications when NOTIFY_SMS_NUMBER is configured:
NOTIFY_SMS_NUMBER=+15559876543
The SMS sender is a placeholder — wire up your preferred provider (Twilio, AWS SNS, etc.).