Files

Robert Helewka ecf37658ce feat: add initial Hold Slayer AI telephony gateway implementation

Complete project scaffolding and core implementation of an AI-powered
telephony system that calls companies, navigates IVR menus, waits on
hold, and transfers to the user when a human answers.

Key components:
- FastAPI server with REST API, WebSocket, and MCP (SSE) interfaces
- SIP/VoIP call management via PJSUA2 with RTP audio streaming
- LLM-powered IVR navigation using OpenAI/Anthropic with tool calling
- Hold detection service combining audio analysis and silence detection
- Real-time STT (Whisper/Deepgram) and TTS (OpenAI/Piper) pipelines
- Call recording with per-channel and mixed audio capture
- Event bus (asyncio pub/sub) for real-time client updates
- Web dashboard with live call monitoring
- SQLite persistence via SQLAlchemy with call history and analytics
- Notification support (email, SMS, webhook, desktop)
- Docker Compose deployment with Opal VoIP and Opal Media containers
- Comprehensive test suite with unit, integration, and E2E tests
- Simplified .gitignore and full project documentation in README

2026-03-21 19:23:26 +00:00

7.9 KiB

Raw Permalink Blame History

Services

The intelligence layer services that power Hold Slayer's decision-making, transcription, recording, analytics, and notifications.

LLM Client (`services/llm_client.py`)

Async HTTP client for any OpenAI-compatible chat completion API. No SDK dependency — just httpx.

Supported Backends

Backend	URL	Notes
Ollama	`http://localhost:11434/v1`	Local, free, good for dev
LM Studio	`http://localhost:1234/v1`	Local, free, GUI
vLLM	`http://localhost:8000/v1`	Local, fast, production
OpenAI	`https://api.openai.com/v1`	Cloud, paid, best quality

Usage

client = LLMClient(
    base_url="http://localhost:11434/v1",
    model="llama3",
    api_key="not-needed",  # Ollama doesn't need a key
    timeout=30.0,
    max_tokens=1024,
    temperature=0.3,
)

# Simple chat
response = await client.chat("What is 2+2?")
# "4"

# Chat with system prompt
response = await client.chat(
    "Parse this menu transcript...",
    system="You are a phone menu parser. Return JSON.",
)

# Structured JSON response (auto-parses)
result = await client.chat_json(
    "Extract menu options from: Press 1 for billing, press 2 for support",
    system="Return JSON with 'options' array.",
)
# {"options": [{"digit": "1", "label": "billing"}, {"digit": "2", "label": "support"}]}

The primary use case — analyzing IVR transcripts to pick the right menu option:

decision = await client.analyze_ivr_menu(
    transcript="Welcome to Chase Bank. Press 1 for account balance, press 2 for recent transactions, press 3 for disputes, press 0 for an agent.",
    intent="dispute a charge from Amazon on December 15th",
    previous_selections=["main_menu"],
)
# {"action": "dtmf", "digits": "3", "reasoning": "Disputes is the correct department"}

JSON Extraction

The client handles messy LLM output gracefully:

Try json.loads() on the raw response
If that fails, look for json ... markdown blocks
If that fails, look for {...} patterns in the text
If all fail, return empty dict (caller handles gracefully)

Stats Tracking

stats = client.stats
# {
#     "total_requests": 47,
#     "total_errors": 2,
#     "avg_latency_ms": 234.5,
#     "model": "llama3",
#     "base_url": "http://localhost:11434/v1"
# }

Error Handling

HTTP errors return empty string/dict (never crashes the call)
Timeouts are configurable (default 30s)
All errors are logged with full context
Stats track error rates for monitoring

Transcription Service (`services/transcription.py`)

Real-time speech-to-text using Speaches (a self-hosted Whisper API).

Architecture

Audio frames (from AudioTap)
  │
  └── POST /v1/audio/transcriptions
      ├── model: whisper-large-v3
      ├── audio: WAV bytes
      └── language: en
          │
          └── Response: { "text": "Press 1 for billing..." }

Usage

service = TranscriptionService(
    speaches_url="http://perseus.helu.ca:22070",
    model="whisper-large-v3",
)

# Transcribe audio bytes
text = await service.transcribe(audio_bytes)
# "Welcome to Chase Bank. For English, press 1."

# Transcribe with language hint
text = await service.transcribe(audio_bytes, language="fr")

Integration with Hold Slayer

The transcription service is called when the audio classifier detects speech (IVR_PROMPT or LIVE_HUMAN). The transcript is then:

Published as a TRANSCRIPT_CHUNK event (→ WebSocket clients)
Fed to the LLM for IVR menu analysis
Stored in the call's transcript history
Used by the Call Flow Learner to build reusable flows

Recording Service (`services/recording.py`)

Manages call recordings via the PJSUA2 media pipeline.

Storage Structure

recordings/
├── 2026/
│   ├── 01/
│   │   ├── 15/
│   │   │   ├── call_abc123_outbound.wav
│   │   │   ├── call_abc123_mixed.wav
│   │   │   └── call_def456_outbound.wav
│   │   └── 16/
│   │       └── ...
│   └── 02/
│       └── ...

Recording Types

Type	Description
Outbound	Audio from the company (IVR, hold music, agent)
Inbound	Audio from the user's device (after transfer)
Mixed	Both parties in one file (for review)

Usage

service = RecordingService(
    storage_dir="recordings",
    max_recording_seconds=7200,  # 2 hours
    sample_rate=16000,
)

# Start recording
session = await service.start_recording(call_id, stream_id)
# session.path = "recordings/2026/01/15/call_abc123_outbound.wav"

# Stop recording
metadata = await service.stop_recording(call_id)
# metadata = { "duration": 847.3, "file_size": 27113600, "path": "..." }

# List recordings for a call
recordings = service.get_recordings(call_id)

Call Analytics (`services/call_analytics.py`)

Tracks call metrics and provides insights for monitoring and optimization.

Metrics Tracked

Metric	Description
Hold time	Duration spent on hold per call
Total call duration	End-to-end call time
Success rate	Percentage of calls that reached a human
IVR navigation time	Time spent navigating menus
Company patterns	Per-company hold time averages
Time-of-day trends	When hold times are shortest

Usage

analytics = CallAnalytics(max_history=10000)

# Record a completed call
analytics.record_call(
    call_id="call_abc123",
    number="+18005551234",
    company="Chase Bank",
    hold_time=780,
    total_duration=847,
    success=True,
    ivr_steps=6,
)

# Get summary
summary = analytics.get_summary()
# {
#     "total_calls": 142,
#     "success_rate": 0.89,
#     "avg_hold_time": 623.4,
#     "avg_total_duration": 712.1,
# }

# Per-company stats
stats = analytics.get_company_stats("Chase Bank")
# {
#     "total_calls": 23,
#     "avg_hold_time": 845.2,
#     "best_time": "Tuesday 10:00 AM",
#     "success_rate": 0.91,
# }

# Top numbers by call volume
top = analytics.get_top_numbers(limit=10)

# Hold time trends by hour
trends = analytics.get_hold_time_trend()
# [{"hour": 9, "avg_hold": 320}, {"hour": 10, "avg_hold": 480}, ...]

Notification Service (`services/notification.py`)

Sends alerts when important things happen on calls.

Notification Channels

Channel	Status	Use Case
WebSocket	✅ Active	Real-time UI updates (always on)
SMS	✅ Active	Critical alerts (human detected, call failed)
Push	🔮 Future	Mobile app notifications

Notification Priority

Priority	Events	Delivery
`CRITICAL`	Human detected, transfer started	WebSocket + SMS
`HIGH`	Call failed, call timeout	WebSocket + SMS
`NORMAL`	Hold detected, call ended	WebSocket only
`LOW`	IVR step, DTMF sent	WebSocket only

Event → Notification Mapping

Event	Notification
`HUMAN_DETECTED`	🚨 "A live person picked up — transferring you now!"
`TRANSFER_STARTED`	📞 "Your call has been connected. Pick up your phone!"
`CALL_FAILED`	❌ "The call couldn't be completed."
`HOLD_DETECTED`	⏳ "You're on hold. We'll notify you when someone picks up."
`IVR_STEP`	📍 "Navigating phone menu..."
`IVR_DTMF_SENT`	📱 "Pressed 3"
`CALL_ENDED`	📴 "The call has ended."

Deduplication

The notification service tracks what's been sent per call to avoid spamming:

# Won't send duplicate "on hold" notifications for the same call
self._notified: dict[str, set[str]]  # call_id → set of event dedup keys

Tracking is cleaned up when a call ends.

SMS Configuration

SMS is sent for CRITICAL priority notifications when NOTIFY_SMS_NUMBER is configured:

NOTIFY_SMS_NUMBER=+15559876543

The SMS sender is a placeholder — wire up your preferred provider (Twilio, AWS SNS, etc.).

7.9 KiB Raw Permalink Blame History

Services

LLM Client (services/llm_client.py)

Supported Backends

Usage

IVR Menu Analysis

JSON Extraction

Stats Tracking

Error Handling

Transcription Service (services/transcription.py)

Architecture

Usage

Integration with Hold Slayer

Recording Service (services/recording.py)

Storage Structure

Recording Types

Usage

Call Analytics (services/call_analytics.py)

Metrics Tracked

Usage

Notification Service (services/notification.py)

Notification Channels

Notification Priority

Event → Notification Mapping

Deduplication

SMS Configuration

7.9 KiB

Raw Permalink Blame History

LLM Client (`services/llm_client.py`)

Transcription Service (`services/transcription.py`)

Recording Service (`services/recording.py`)

Call Analytics (`services/call_analytics.py`)

Notification Service (`services/notification.py`)