feat: add initial Hold Slayer AI telephony gateway implementation
Complete project scaffolding and core implementation of an AI-powered telephony system that calls companies, navigates IVR menus, waits on hold, and transfers to the user when a human answers. Key components: - FastAPI server with REST API, WebSocket, and MCP (SSE) interfaces - SIP/VoIP call management via PJSUA2 with RTP audio streaming - LLM-powered IVR navigation using OpenAI/Anthropic with tool calling - Hold detection service combining audio analysis and silence detection - Real-time STT (Whisper/Deepgram) and TTS (OpenAI/Piper) pipelines - Call recording with per-channel and mixed audio capture - Event bus (asyncio pub/sub) for real-time client updates - Web dashboard with live call monitoring - SQLite persistence via SQLAlchemy with call history and analytics - Notification support (email, SMS, webhook, desktop) - Docker Compose deployment with Opal VoIP and Opal Media containers - Comprehensive test suite with unit, integration, and E2E tests - Simplified .gitignore and full project documentation in README
This commit is contained in:
290
docs/services.md
Normal file
290
docs/services.md
Normal file
@@ -0,0 +1,290 @@
|
||||
# Services
|
||||
|
||||
The intelligence layer services that power Hold Slayer's decision-making, transcription, recording, analytics, and notifications.
|
||||
|
||||
## LLM Client (`services/llm_client.py`)
|
||||
|
||||
Async HTTP client for any OpenAI-compatible chat completion API. No SDK dependency — just httpx.
|
||||
|
||||
### Supported Backends
|
||||
|
||||
| Backend | URL | Notes |
|
||||
|---------|-----|-------|
|
||||
| Ollama | `http://localhost:11434/v1` | Local, free, good for dev |
|
||||
| LM Studio | `http://localhost:1234/v1` | Local, free, GUI |
|
||||
| vLLM | `http://localhost:8000/v1` | Local, fast, production |
|
||||
| OpenAI | `https://api.openai.com/v1` | Cloud, paid, best quality |
|
||||
|
||||
### Usage
|
||||
|
||||
```python
|
||||
client = LLMClient(
|
||||
base_url="http://localhost:11434/v1",
|
||||
model="llama3",
|
||||
api_key="not-needed", # Ollama doesn't need a key
|
||||
timeout=30.0,
|
||||
max_tokens=1024,
|
||||
temperature=0.3,
|
||||
)
|
||||
|
||||
# Simple chat
|
||||
response = await client.chat("What is 2+2?")
|
||||
# "4"
|
||||
|
||||
# Chat with system prompt
|
||||
response = await client.chat(
|
||||
"Parse this menu transcript...",
|
||||
system="You are a phone menu parser. Return JSON.",
|
||||
)
|
||||
|
||||
# Structured JSON response (auto-parses)
|
||||
result = await client.chat_json(
|
||||
"Extract menu options from: Press 1 for billing, press 2 for support",
|
||||
system="Return JSON with 'options' array.",
|
||||
)
|
||||
# {"options": [{"digit": "1", "label": "billing"}, {"digit": "2", "label": "support"}]}
|
||||
```
|
||||
|
||||
### IVR Menu Analysis
|
||||
|
||||
The primary use case — analyzing IVR transcripts to pick the right menu option:
|
||||
|
||||
```python
|
||||
decision = await client.analyze_ivr_menu(
|
||||
transcript="Welcome to Chase Bank. Press 1 for account balance, press 2 for recent transactions, press 3 for disputes, press 0 for an agent.",
|
||||
intent="dispute a charge from Amazon on December 15th",
|
||||
previous_selections=["main_menu"],
|
||||
)
|
||||
# {"action": "dtmf", "digits": "3", "reasoning": "Disputes is the correct department"}
|
||||
```
|
||||
|
||||
### JSON Extraction
|
||||
|
||||
The client handles messy LLM output gracefully:
|
||||
|
||||
1. Try `json.loads()` on the raw response
|
||||
2. If that fails, look for ```json ... ``` markdown blocks
|
||||
3. If that fails, look for `{...}` patterns in the text
|
||||
4. If all fail, return empty dict (caller handles gracefully)
|
||||
|
||||
### Stats Tracking
|
||||
|
||||
```python
|
||||
stats = client.stats
|
||||
# {
|
||||
# "total_requests": 47,
|
||||
# "total_errors": 2,
|
||||
# "avg_latency_ms": 234.5,
|
||||
# "model": "llama3",
|
||||
# "base_url": "http://localhost:11434/v1"
|
||||
# }
|
||||
```
|
||||
|
||||
### Error Handling
|
||||
|
||||
- HTTP errors return empty string/dict (never crashes the call)
|
||||
- Timeouts are configurable (default 30s)
|
||||
- All errors are logged with full context
|
||||
- Stats track error rates for monitoring
|
||||
|
||||
## Transcription Service (`services/transcription.py`)
|
||||
|
||||
Real-time speech-to-text using Speaches (a self-hosted Whisper API).
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
Audio frames (from AudioTap)
|
||||
│
|
||||
└── POST /v1/audio/transcriptions
|
||||
├── model: whisper-large-v3
|
||||
├── audio: WAV bytes
|
||||
└── language: en
|
||||
│
|
||||
└── Response: { "text": "Press 1 for billing..." }
|
||||
```
|
||||
|
||||
### Usage
|
||||
|
||||
```python
|
||||
service = TranscriptionService(
|
||||
speaches_url="http://perseus.helu.ca:22070",
|
||||
model="whisper-large-v3",
|
||||
)
|
||||
|
||||
# Transcribe audio bytes
|
||||
text = await service.transcribe(audio_bytes)
|
||||
# "Welcome to Chase Bank. For English, press 1."
|
||||
|
||||
# Transcribe with language hint
|
||||
text = await service.transcribe(audio_bytes, language="fr")
|
||||
```
|
||||
|
||||
### Integration with Hold Slayer
|
||||
|
||||
The transcription service is called when the audio classifier detects speech (IVR_PROMPT or LIVE_HUMAN). The transcript is then:
|
||||
|
||||
1. Published as a `TRANSCRIPT_CHUNK` event (→ WebSocket clients)
|
||||
2. Fed to the LLM for IVR menu analysis
|
||||
3. Stored in the call's transcript history
|
||||
4. Used by the Call Flow Learner to build reusable flows
|
||||
|
||||
## Recording Service (`services/recording.py`)
|
||||
|
||||
Manages call recordings via the PJSUA2 media pipeline.
|
||||
|
||||
### Storage Structure
|
||||
|
||||
```
|
||||
recordings/
|
||||
├── 2026/
|
||||
│ ├── 01/
|
||||
│ │ ├── 15/
|
||||
│ │ │ ├── call_abc123_outbound.wav
|
||||
│ │ │ ├── call_abc123_mixed.wav
|
||||
│ │ │ └── call_def456_outbound.wav
|
||||
│ │ └── 16/
|
||||
│ │ └── ...
|
||||
│ └── 02/
|
||||
│ └── ...
|
||||
```
|
||||
|
||||
### Recording Types
|
||||
|
||||
| Type | Description |
|
||||
|------|-------------|
|
||||
| **Outbound** | Audio from the company (IVR, hold music, agent) |
|
||||
| **Inbound** | Audio from the user's device (after transfer) |
|
||||
| **Mixed** | Both parties in one file (for review) |
|
||||
|
||||
### Usage
|
||||
|
||||
```python
|
||||
service = RecordingService(
|
||||
storage_dir="recordings",
|
||||
max_recording_seconds=7200, # 2 hours
|
||||
sample_rate=16000,
|
||||
)
|
||||
|
||||
# Start recording
|
||||
session = await service.start_recording(call_id, stream_id)
|
||||
# session.path = "recordings/2026/01/15/call_abc123_outbound.wav"
|
||||
|
||||
# Stop recording
|
||||
metadata = await service.stop_recording(call_id)
|
||||
# metadata = { "duration": 847.3, "file_size": 27113600, "path": "..." }
|
||||
|
||||
# List recordings for a call
|
||||
recordings = service.get_recordings(call_id)
|
||||
```
|
||||
|
||||
## Call Analytics (`services/call_analytics.py`)
|
||||
|
||||
Tracks call metrics and provides insights for monitoring and optimization.
|
||||
|
||||
### Metrics Tracked
|
||||
|
||||
| Metric | Description |
|
||||
|--------|-------------|
|
||||
| Hold time | Duration spent on hold per call |
|
||||
| Total call duration | End-to-end call time |
|
||||
| Success rate | Percentage of calls that reached a human |
|
||||
| IVR navigation time | Time spent navigating menus |
|
||||
| Company patterns | Per-company hold time averages |
|
||||
| Time-of-day trends | When hold times are shortest |
|
||||
|
||||
### Usage
|
||||
|
||||
```python
|
||||
analytics = CallAnalytics(max_history=10000)
|
||||
|
||||
# Record a completed call
|
||||
analytics.record_call(
|
||||
call_id="call_abc123",
|
||||
number="+18005551234",
|
||||
company="Chase Bank",
|
||||
hold_time=780,
|
||||
total_duration=847,
|
||||
success=True,
|
||||
ivr_steps=6,
|
||||
)
|
||||
|
||||
# Get summary
|
||||
summary = analytics.get_summary()
|
||||
# {
|
||||
# "total_calls": 142,
|
||||
# "success_rate": 0.89,
|
||||
# "avg_hold_time": 623.4,
|
||||
# "avg_total_duration": 712.1,
|
||||
# }
|
||||
|
||||
# Per-company stats
|
||||
stats = analytics.get_company_stats("Chase Bank")
|
||||
# {
|
||||
# "total_calls": 23,
|
||||
# "avg_hold_time": 845.2,
|
||||
# "best_time": "Tuesday 10:00 AM",
|
||||
# "success_rate": 0.91,
|
||||
# }
|
||||
|
||||
# Top numbers by call volume
|
||||
top = analytics.get_top_numbers(limit=10)
|
||||
|
||||
# Hold time trends by hour
|
||||
trends = analytics.get_hold_time_trend()
|
||||
# [{"hour": 9, "avg_hold": 320}, {"hour": 10, "avg_hold": 480}, ...]
|
||||
```
|
||||
|
||||
## Notification Service (`services/notification.py`)
|
||||
|
||||
Sends alerts when important things happen on calls.
|
||||
|
||||
### Notification Channels
|
||||
|
||||
| Channel | Status | Use Case |
|
||||
|---------|--------|----------|
|
||||
| **WebSocket** | ✅ Active | Real-time UI updates (always on) |
|
||||
| **SMS** | ✅ Active | Critical alerts (human detected, call failed) |
|
||||
| **Push** | 🔮 Future | Mobile app notifications |
|
||||
|
||||
### Notification Priority
|
||||
|
||||
| Priority | Events | Delivery |
|
||||
|----------|--------|----------|
|
||||
| `CRITICAL` | Human detected, transfer started | WebSocket + SMS |
|
||||
| `HIGH` | Call failed, call timeout | WebSocket + SMS |
|
||||
| `NORMAL` | Hold detected, call ended | WebSocket only |
|
||||
| `LOW` | IVR step, DTMF sent | WebSocket only |
|
||||
|
||||
### Event → Notification Mapping
|
||||
|
||||
| Event | Notification |
|
||||
|-------|-------------|
|
||||
| `HUMAN_DETECTED` | 🚨 "A live person picked up — transferring you now!" |
|
||||
| `TRANSFER_STARTED` | 📞 "Your call has been connected. Pick up your phone!" |
|
||||
| `CALL_FAILED` | ❌ "The call couldn't be completed." |
|
||||
| `HOLD_DETECTED` | ⏳ "You're on hold. We'll notify you when someone picks up." |
|
||||
| `IVR_STEP` | 📍 "Navigating phone menu..." |
|
||||
| `IVR_DTMF_SENT` | 📱 "Pressed 3" |
|
||||
| `CALL_ENDED` | 📴 "The call has ended." |
|
||||
|
||||
### Deduplication
|
||||
|
||||
The notification service tracks what's been sent per call to avoid spamming:
|
||||
|
||||
```python
|
||||
# Won't send duplicate "on hold" notifications for the same call
|
||||
self._notified: dict[str, set[str]] # call_id → set of event dedup keys
|
||||
```
|
||||
|
||||
Tracking is cleaned up when a call ends.
|
||||
|
||||
### SMS Configuration
|
||||
|
||||
SMS is sent for `CRITICAL` priority notifications when `NOTIFY_SMS_NUMBER` is configured:
|
||||
|
||||
```env
|
||||
NOTIFY_SMS_NUMBER=+15559876543
|
||||
```
|
||||
|
||||
The SMS sender is a placeholder — wire up your preferred provider (Twilio, AWS SNS, etc.).
|
||||
Reference in New Issue
Block a user