Files
hold-slayer/docs/services.md
Robert Helewka ecf37658ce feat: add initial Hold Slayer AI telephony gateway implementation
Complete project scaffolding and core implementation of an AI-powered
telephony system that calls companies, navigates IVR menus, waits on
hold, and transfers to the user when a human answers.

Key components:
- FastAPI server with REST API, WebSocket, and MCP (SSE) interfaces
- SIP/VoIP call management via PJSUA2 with RTP audio streaming
- LLM-powered IVR navigation using OpenAI/Anthropic with tool calling
- Hold detection service combining audio analysis and silence detection
- Real-time STT (Whisper/Deepgram) and TTS (OpenAI/Piper) pipelines
- Call recording with per-channel and mixed audio capture
- Event bus (asyncio pub/sub) for real-time client updates
- Web dashboard with live call monitoring
- SQLite persistence via SQLAlchemy with call history and analytics
- Notification support (email, SMS, webhook, desktop)
- Docker Compose deployment with Opal VoIP and Opal Media containers
- Comprehensive test suite with unit, integration, and E2E tests
- Simplified .gitignore and full project documentation in README
2026-03-21 19:23:26 +00:00

7.9 KiB

Services

The intelligence layer services that power Hold Slayer's decision-making, transcription, recording, analytics, and notifications.

LLM Client (services/llm_client.py)

Async HTTP client for any OpenAI-compatible chat completion API. No SDK dependency — just httpx.

Supported Backends

Backend URL Notes
Ollama http://localhost:11434/v1 Local, free, good for dev
LM Studio http://localhost:1234/v1 Local, free, GUI
vLLM http://localhost:8000/v1 Local, fast, production
OpenAI https://api.openai.com/v1 Cloud, paid, best quality

Usage

client = LLMClient(
    base_url="http://localhost:11434/v1",
    model="llama3",
    api_key="not-needed",  # Ollama doesn't need a key
    timeout=30.0,
    max_tokens=1024,
    temperature=0.3,
)

# Simple chat
response = await client.chat("What is 2+2?")
# "4"

# Chat with system prompt
response = await client.chat(
    "Parse this menu transcript...",
    system="You are a phone menu parser. Return JSON.",
)

# Structured JSON response (auto-parses)
result = await client.chat_json(
    "Extract menu options from: Press 1 for billing, press 2 for support",
    system="Return JSON with 'options' array.",
)
# {"options": [{"digit": "1", "label": "billing"}, {"digit": "2", "label": "support"}]}

IVR Menu Analysis

The primary use case — analyzing IVR transcripts to pick the right menu option:

decision = await client.analyze_ivr_menu(
    transcript="Welcome to Chase Bank. Press 1 for account balance, press 2 for recent transactions, press 3 for disputes, press 0 for an agent.",
    intent="dispute a charge from Amazon on December 15th",
    previous_selections=["main_menu"],
)
# {"action": "dtmf", "digits": "3", "reasoning": "Disputes is the correct department"}

JSON Extraction

The client handles messy LLM output gracefully:

  1. Try json.loads() on the raw response
  2. If that fails, look for json ... markdown blocks
  3. If that fails, look for {...} patterns in the text
  4. If all fail, return empty dict (caller handles gracefully)

Stats Tracking

stats = client.stats
# {
#     "total_requests": 47,
#     "total_errors": 2,
#     "avg_latency_ms": 234.5,
#     "model": "llama3",
#     "base_url": "http://localhost:11434/v1"
# }

Error Handling

  • HTTP errors return empty string/dict (never crashes the call)
  • Timeouts are configurable (default 30s)
  • All errors are logged with full context
  • Stats track error rates for monitoring

Transcription Service (services/transcription.py)

Real-time speech-to-text using Speaches (a self-hosted Whisper API).

Architecture

Audio frames (from AudioTap)
  │
  └── POST /v1/audio/transcriptions
      ├── model: whisper-large-v3
      ├── audio: WAV bytes
      └── language: en
          │
          └── Response: { "text": "Press 1 for billing..." }

Usage

service = TranscriptionService(
    speaches_url="http://perseus.helu.ca:22070",
    model="whisper-large-v3",
)

# Transcribe audio bytes
text = await service.transcribe(audio_bytes)
# "Welcome to Chase Bank. For English, press 1."

# Transcribe with language hint
text = await service.transcribe(audio_bytes, language="fr")

Integration with Hold Slayer

The transcription service is called when the audio classifier detects speech (IVR_PROMPT or LIVE_HUMAN). The transcript is then:

  1. Published as a TRANSCRIPT_CHUNK event (→ WebSocket clients)
  2. Fed to the LLM for IVR menu analysis
  3. Stored in the call's transcript history
  4. Used by the Call Flow Learner to build reusable flows

Recording Service (services/recording.py)

Manages call recordings via the PJSUA2 media pipeline.

Storage Structure

recordings/
├── 2026/
│   ├── 01/
│   │   ├── 15/
│   │   │   ├── call_abc123_outbound.wav
│   │   │   ├── call_abc123_mixed.wav
│   │   │   └── call_def456_outbound.wav
│   │   └── 16/
│   │       └── ...
│   └── 02/
│       └── ...

Recording Types

Type Description
Outbound Audio from the company (IVR, hold music, agent)
Inbound Audio from the user's device (after transfer)
Mixed Both parties in one file (for review)

Usage

service = RecordingService(
    storage_dir="recordings",
    max_recording_seconds=7200,  # 2 hours
    sample_rate=16000,
)

# Start recording
session = await service.start_recording(call_id, stream_id)
# session.path = "recordings/2026/01/15/call_abc123_outbound.wav"

# Stop recording
metadata = await service.stop_recording(call_id)
# metadata = { "duration": 847.3, "file_size": 27113600, "path": "..." }

# List recordings for a call
recordings = service.get_recordings(call_id)

Call Analytics (services/call_analytics.py)

Tracks call metrics and provides insights for monitoring and optimization.

Metrics Tracked

Metric Description
Hold time Duration spent on hold per call
Total call duration End-to-end call time
Success rate Percentage of calls that reached a human
IVR navigation time Time spent navigating menus
Company patterns Per-company hold time averages
Time-of-day trends When hold times are shortest

Usage

analytics = CallAnalytics(max_history=10000)

# Record a completed call
analytics.record_call(
    call_id="call_abc123",
    number="+18005551234",
    company="Chase Bank",
    hold_time=780,
    total_duration=847,
    success=True,
    ivr_steps=6,
)

# Get summary
summary = analytics.get_summary()
# {
#     "total_calls": 142,
#     "success_rate": 0.89,
#     "avg_hold_time": 623.4,
#     "avg_total_duration": 712.1,
# }

# Per-company stats
stats = analytics.get_company_stats("Chase Bank")
# {
#     "total_calls": 23,
#     "avg_hold_time": 845.2,
#     "best_time": "Tuesday 10:00 AM",
#     "success_rate": 0.91,
# }

# Top numbers by call volume
top = analytics.get_top_numbers(limit=10)

# Hold time trends by hour
trends = analytics.get_hold_time_trend()
# [{"hour": 9, "avg_hold": 320}, {"hour": 10, "avg_hold": 480}, ...]

Notification Service (services/notification.py)

Sends alerts when important things happen on calls.

Notification Channels

Channel Status Use Case
WebSocket Active Real-time UI updates (always on)
SMS Active Critical alerts (human detected, call failed)
Push 🔮 Future Mobile app notifications

Notification Priority

Priority Events Delivery
CRITICAL Human detected, transfer started WebSocket + SMS
HIGH Call failed, call timeout WebSocket + SMS
NORMAL Hold detected, call ended WebSocket only
LOW IVR step, DTMF sent WebSocket only

Event → Notification Mapping

Event Notification
HUMAN_DETECTED 🚨 "A live person picked up — transferring you now!"
TRANSFER_STARTED 📞 "Your call has been connected. Pick up your phone!"
CALL_FAILED "The call couldn't be completed."
HOLD_DETECTED "You're on hold. We'll notify you when someone picks up."
IVR_STEP 📍 "Navigating phone menu..."
IVR_DTMF_SENT 📱 "Pressed 3"
CALL_ENDED 📴 "The call has ended."

Deduplication

The notification service tracks what's been sent per call to avoid spamming:

# Won't send duplicate "on hold" notifications for the same call
self._notified: dict[str, set[str]]  # call_id → set of event dedup keys

Tracking is cleaned up when a call ends.

SMS Configuration

SMS is sent for CRITICAL priority notifications when NOTIFY_SMS_NUMBER is configured:

NOTIFY_SMS_NUMBER=+15559876543

The SMS sender is a placeholder — wire up your preferred provider (Twilio, AWS SNS, etc.).