Complete project scaffolding and core implementation of an AI-powered telephony system that calls companies, navigates IVR menus, waits on hold, and transfers to the user when a human answers. Key components: - FastAPI server with REST API, WebSocket, and MCP (SSE) interfaces - SIP/VoIP call management via PJSUA2 with RTP audio streaming - LLM-powered IVR navigation using OpenAI/Anthropic with tool calling - Hold detection service combining audio analysis and silence detection - Real-time STT (Whisper/Deepgram) and TTS (OpenAI/Piper) pipelines - Call recording with per-channel and mixed audio capture - Event bus (asyncio pub/sub) for real-time client updates - Web dashboard with live call monitoring - SQLite persistence via SQLAlchemy with call history and analytics - Notification support (email, SMS, webhook, desktop) - Docker Compose deployment with Opal VoIP and Opal Media containers - Comprehensive test suite with unit, integration, and E2E tests - Simplified .gitignore and full project documentation in README
291 lines
7.9 KiB
Markdown
291 lines
7.9 KiB
Markdown
# Services
|
|
|
|
The intelligence layer services that power Hold Slayer's decision-making, transcription, recording, analytics, and notifications.
|
|
|
|
## LLM Client (`services/llm_client.py`)
|
|
|
|
Async HTTP client for any OpenAI-compatible chat completion API. No SDK dependency — just httpx.
|
|
|
|
### Supported Backends
|
|
|
|
| Backend | URL | Notes |
|
|
|---------|-----|-------|
|
|
| Ollama | `http://localhost:11434/v1` | Local, free, good for dev |
|
|
| LM Studio | `http://localhost:1234/v1` | Local, free, GUI |
|
|
| vLLM | `http://localhost:8000/v1` | Local, fast, production |
|
|
| OpenAI | `https://api.openai.com/v1` | Cloud, paid, best quality |
|
|
|
|
### Usage
|
|
|
|
```python
|
|
client = LLMClient(
|
|
base_url="http://localhost:11434/v1",
|
|
model="llama3",
|
|
api_key="not-needed", # Ollama doesn't need a key
|
|
timeout=30.0,
|
|
max_tokens=1024,
|
|
temperature=0.3,
|
|
)
|
|
|
|
# Simple chat
|
|
response = await client.chat("What is 2+2?")
|
|
# "4"
|
|
|
|
# Chat with system prompt
|
|
response = await client.chat(
|
|
"Parse this menu transcript...",
|
|
system="You are a phone menu parser. Return JSON.",
|
|
)
|
|
|
|
# Structured JSON response (auto-parses)
|
|
result = await client.chat_json(
|
|
"Extract menu options from: Press 1 for billing, press 2 for support",
|
|
system="Return JSON with 'options' array.",
|
|
)
|
|
# {"options": [{"digit": "1", "label": "billing"}, {"digit": "2", "label": "support"}]}
|
|
```
|
|
|
|
### IVR Menu Analysis
|
|
|
|
The primary use case — analyzing IVR transcripts to pick the right menu option:
|
|
|
|
```python
|
|
decision = await client.analyze_ivr_menu(
|
|
transcript="Welcome to Chase Bank. Press 1 for account balance, press 2 for recent transactions, press 3 for disputes, press 0 for an agent.",
|
|
intent="dispute a charge from Amazon on December 15th",
|
|
previous_selections=["main_menu"],
|
|
)
|
|
# {"action": "dtmf", "digits": "3", "reasoning": "Disputes is the correct department"}
|
|
```
|
|
|
|
### JSON Extraction
|
|
|
|
The client handles messy LLM output gracefully:
|
|
|
|
1. Try `json.loads()` on the raw response
|
|
2. If that fails, look for ```json ... ``` markdown blocks
|
|
3. If that fails, look for `{...}` patterns in the text
|
|
4. If all fail, return empty dict (caller handles gracefully)
|
|
|
|
### Stats Tracking
|
|
|
|
```python
|
|
stats = client.stats
|
|
# {
|
|
# "total_requests": 47,
|
|
# "total_errors": 2,
|
|
# "avg_latency_ms": 234.5,
|
|
# "model": "llama3",
|
|
# "base_url": "http://localhost:11434/v1"
|
|
# }
|
|
```
|
|
|
|
### Error Handling
|
|
|
|
- HTTP errors return empty string/dict (never crashes the call)
|
|
- Timeouts are configurable (default 30s)
|
|
- All errors are logged with full context
|
|
- Stats track error rates for monitoring
|
|
|
|
## Transcription Service (`services/transcription.py`)
|
|
|
|
Real-time speech-to-text using Speaches (a self-hosted Whisper API).
|
|
|
|
### Architecture
|
|
|
|
```
|
|
Audio frames (from AudioTap)
|
|
│
|
|
└── POST /v1/audio/transcriptions
|
|
├── model: whisper-large-v3
|
|
├── audio: WAV bytes
|
|
└── language: en
|
|
│
|
|
└── Response: { "text": "Press 1 for billing..." }
|
|
```
|
|
|
|
### Usage
|
|
|
|
```python
|
|
service = TranscriptionService(
|
|
speaches_url="http://perseus.helu.ca:22070",
|
|
model="whisper-large-v3",
|
|
)
|
|
|
|
# Transcribe audio bytes
|
|
text = await service.transcribe(audio_bytes)
|
|
# "Welcome to Chase Bank. For English, press 1."
|
|
|
|
# Transcribe with language hint
|
|
text = await service.transcribe(audio_bytes, language="fr")
|
|
```
|
|
|
|
### Integration with Hold Slayer
|
|
|
|
The transcription service is called when the audio classifier detects speech (IVR_PROMPT or LIVE_HUMAN). The transcript is then:
|
|
|
|
1. Published as a `TRANSCRIPT_CHUNK` event (→ WebSocket clients)
|
|
2. Fed to the LLM for IVR menu analysis
|
|
3. Stored in the call's transcript history
|
|
4. Used by the Call Flow Learner to build reusable flows
|
|
|
|
## Recording Service (`services/recording.py`)
|
|
|
|
Manages call recordings via the PJSUA2 media pipeline.
|
|
|
|
### Storage Structure
|
|
|
|
```
|
|
recordings/
|
|
├── 2026/
|
|
│ ├── 01/
|
|
│ │ ├── 15/
|
|
│ │ │ ├── call_abc123_outbound.wav
|
|
│ │ │ ├── call_abc123_mixed.wav
|
|
│ │ │ └── call_def456_outbound.wav
|
|
│ │ └── 16/
|
|
│ │ └── ...
|
|
│ └── 02/
|
|
│ └── ...
|
|
```
|
|
|
|
### Recording Types
|
|
|
|
| Type | Description |
|
|
|------|-------------|
|
|
| **Outbound** | Audio from the company (IVR, hold music, agent) |
|
|
| **Inbound** | Audio from the user's device (after transfer) |
|
|
| **Mixed** | Both parties in one file (for review) |
|
|
|
|
### Usage
|
|
|
|
```python
|
|
service = RecordingService(
|
|
storage_dir="recordings",
|
|
max_recording_seconds=7200, # 2 hours
|
|
sample_rate=16000,
|
|
)
|
|
|
|
# Start recording
|
|
session = await service.start_recording(call_id, stream_id)
|
|
# session.path = "recordings/2026/01/15/call_abc123_outbound.wav"
|
|
|
|
# Stop recording
|
|
metadata = await service.stop_recording(call_id)
|
|
# metadata = { "duration": 847.3, "file_size": 27113600, "path": "..." }
|
|
|
|
# List recordings for a call
|
|
recordings = service.get_recordings(call_id)
|
|
```
|
|
|
|
## Call Analytics (`services/call_analytics.py`)
|
|
|
|
Tracks call metrics and provides insights for monitoring and optimization.
|
|
|
|
### Metrics Tracked
|
|
|
|
| Metric | Description |
|
|
|--------|-------------|
|
|
| Hold time | Duration spent on hold per call |
|
|
| Total call duration | End-to-end call time |
|
|
| Success rate | Percentage of calls that reached a human |
|
|
| IVR navigation time | Time spent navigating menus |
|
|
| Company patterns | Per-company hold time averages |
|
|
| Time-of-day trends | When hold times are shortest |
|
|
|
|
### Usage
|
|
|
|
```python
|
|
analytics = CallAnalytics(max_history=10000)
|
|
|
|
# Record a completed call
|
|
analytics.record_call(
|
|
call_id="call_abc123",
|
|
number="+18005551234",
|
|
company="Chase Bank",
|
|
hold_time=780,
|
|
total_duration=847,
|
|
success=True,
|
|
ivr_steps=6,
|
|
)
|
|
|
|
# Get summary
|
|
summary = analytics.get_summary()
|
|
# {
|
|
# "total_calls": 142,
|
|
# "success_rate": 0.89,
|
|
# "avg_hold_time": 623.4,
|
|
# "avg_total_duration": 712.1,
|
|
# }
|
|
|
|
# Per-company stats
|
|
stats = analytics.get_company_stats("Chase Bank")
|
|
# {
|
|
# "total_calls": 23,
|
|
# "avg_hold_time": 845.2,
|
|
# "best_time": "Tuesday 10:00 AM",
|
|
# "success_rate": 0.91,
|
|
# }
|
|
|
|
# Top numbers by call volume
|
|
top = analytics.get_top_numbers(limit=10)
|
|
|
|
# Hold time trends by hour
|
|
trends = analytics.get_hold_time_trend()
|
|
# [{"hour": 9, "avg_hold": 320}, {"hour": 10, "avg_hold": 480}, ...]
|
|
```
|
|
|
|
## Notification Service (`services/notification.py`)
|
|
|
|
Sends alerts when important things happen on calls.
|
|
|
|
### Notification Channels
|
|
|
|
| Channel | Status | Use Case |
|
|
|---------|--------|----------|
|
|
| **WebSocket** | ✅ Active | Real-time UI updates (always on) |
|
|
| **SMS** | ✅ Active | Critical alerts (human detected, call failed) |
|
|
| **Push** | 🔮 Future | Mobile app notifications |
|
|
|
|
### Notification Priority
|
|
|
|
| Priority | Events | Delivery |
|
|
|----------|--------|----------|
|
|
| `CRITICAL` | Human detected, transfer started | WebSocket + SMS |
|
|
| `HIGH` | Call failed, call timeout | WebSocket + SMS |
|
|
| `NORMAL` | Hold detected, call ended | WebSocket only |
|
|
| `LOW` | IVR step, DTMF sent | WebSocket only |
|
|
|
|
### Event → Notification Mapping
|
|
|
|
| Event | Notification |
|
|
|-------|-------------|
|
|
| `HUMAN_DETECTED` | 🚨 "A live person picked up — transferring you now!" |
|
|
| `TRANSFER_STARTED` | 📞 "Your call has been connected. Pick up your phone!" |
|
|
| `CALL_FAILED` | ❌ "The call couldn't be completed." |
|
|
| `HOLD_DETECTED` | ⏳ "You're on hold. We'll notify you when someone picks up." |
|
|
| `IVR_STEP` | 📍 "Navigating phone menu..." |
|
|
| `IVR_DTMF_SENT` | 📱 "Pressed 3" |
|
|
| `CALL_ENDED` | 📴 "The call has ended." |
|
|
|
|
### Deduplication
|
|
|
|
The notification service tracks what's been sent per call to avoid spamming:
|
|
|
|
```python
|
|
# Won't send duplicate "on hold" notifications for the same call
|
|
self._notified: dict[str, set[str]] # call_id → set of event dedup keys
|
|
```
|
|
|
|
Tracking is cleaned up when a call ends.
|
|
|
|
### SMS Configuration
|
|
|
|
SMS is sent for `CRITICAL` priority notifications when `NOTIFY_SMS_NUMBER` is configured:
|
|
|
|
```env
|
|
NOTIFY_SMS_NUMBER=+15559876543
|
|
```
|
|
|
|
The SMS sender is a placeholder — wire up your preferred provider (Twilio, AWS SNS, etc.).
|