feat: add initial Hold Slayer AI telephony gateway implementation

Complete project scaffolding and core implementation of an AI-powered
telephony system that calls companies, navigates IVR menus, waits on
hold, and transfers to the user when a human answers.

Key components:
- FastAPI server with REST API, WebSocket, and MCP (SSE) interfaces
- SIP/VoIP call management via PJSUA2 with RTP audio streaming
- LLM-powered IVR navigation using OpenAI/Anthropic with tool calling
- Hold detection service combining audio analysis and silence detection
- Real-time STT (Whisper/Deepgram) and TTS (OpenAI/Piper) pipelines
- Call recording with per-channel and mixed audio capture
- Event bus (asyncio pub/sub) for real-time client updates
- Web dashboard with live call monitoring
- SQLite persistence via SQLAlchemy with call history and analytics
- Notification support (email, SMS, webhook, desktop)
- Docker Compose deployment with Opal VoIP and Opal Media containers
- Comprehensive test suite with unit, integration, and E2E tests
- Simplified .gitignore and full project documentation in README
This commit is contained in:
2026-03-21 19:23:26 +00:00
parent c9ff60702b
commit ecf37658ce
56 changed files with 11601 additions and 164 deletions

290
docs/services.md Normal file
View File

@@ -0,0 +1,290 @@
# Services
The intelligence layer services that power Hold Slayer's decision-making, transcription, recording, analytics, and notifications.
## LLM Client (`services/llm_client.py`)
Async HTTP client for any OpenAI-compatible chat completion API. No SDK dependency — just httpx.
### Supported Backends
| Backend | URL | Notes |
|---------|-----|-------|
| Ollama | `http://localhost:11434/v1` | Local, free, good for dev |
| LM Studio | `http://localhost:1234/v1` | Local, free, GUI |
| vLLM | `http://localhost:8000/v1` | Local, fast, production |
| OpenAI | `https://api.openai.com/v1` | Cloud, paid, best quality |
### Usage
```python
client = LLMClient(
base_url="http://localhost:11434/v1",
model="llama3",
api_key="not-needed", # Ollama doesn't need a key
timeout=30.0,
max_tokens=1024,
temperature=0.3,
)
# Simple chat
response = await client.chat("What is 2+2?")
# "4"
# Chat with system prompt
response = await client.chat(
"Parse this menu transcript...",
system="You are a phone menu parser. Return JSON.",
)
# Structured JSON response (auto-parses)
result = await client.chat_json(
"Extract menu options from: Press 1 for billing, press 2 for support",
system="Return JSON with 'options' array.",
)
# {"options": [{"digit": "1", "label": "billing"}, {"digit": "2", "label": "support"}]}
```
### IVR Menu Analysis
The primary use case — analyzing IVR transcripts to pick the right menu option:
```python
decision = await client.analyze_ivr_menu(
transcript="Welcome to Chase Bank. Press 1 for account balance, press 2 for recent transactions, press 3 for disputes, press 0 for an agent.",
intent="dispute a charge from Amazon on December 15th",
previous_selections=["main_menu"],
)
# {"action": "dtmf", "digits": "3", "reasoning": "Disputes is the correct department"}
```
### JSON Extraction
The client handles messy LLM output gracefully:
1. Try `json.loads()` on the raw response
2. If that fails, look for ```json ... ``` markdown blocks
3. If that fails, look for `{...}` patterns in the text
4. If all fail, return empty dict (caller handles gracefully)
### Stats Tracking
```python
stats = client.stats
# {
# "total_requests": 47,
# "total_errors": 2,
# "avg_latency_ms": 234.5,
# "model": "llama3",
# "base_url": "http://localhost:11434/v1"
# }
```
### Error Handling
- HTTP errors return empty string/dict (never crashes the call)
- Timeouts are configurable (default 30s)
- All errors are logged with full context
- Stats track error rates for monitoring
## Transcription Service (`services/transcription.py`)
Real-time speech-to-text using Speaches (a self-hosted Whisper API).
### Architecture
```
Audio frames (from AudioTap)
└── POST /v1/audio/transcriptions
├── model: whisper-large-v3
├── audio: WAV bytes
└── language: en
└── Response: { "text": "Press 1 for billing..." }
```
### Usage
```python
service = TranscriptionService(
speaches_url="http://perseus.helu.ca:22070",
model="whisper-large-v3",
)
# Transcribe audio bytes
text = await service.transcribe(audio_bytes)
# "Welcome to Chase Bank. For English, press 1."
# Transcribe with language hint
text = await service.transcribe(audio_bytes, language="fr")
```
### Integration with Hold Slayer
The transcription service is called when the audio classifier detects speech (IVR_PROMPT or LIVE_HUMAN). The transcript is then:
1. Published as a `TRANSCRIPT_CHUNK` event (→ WebSocket clients)
2. Fed to the LLM for IVR menu analysis
3. Stored in the call's transcript history
4. Used by the Call Flow Learner to build reusable flows
## Recording Service (`services/recording.py`)
Manages call recordings via the PJSUA2 media pipeline.
### Storage Structure
```
recordings/
├── 2026/
│ ├── 01/
│ │ ├── 15/
│ │ │ ├── call_abc123_outbound.wav
│ │ │ ├── call_abc123_mixed.wav
│ │ │ └── call_def456_outbound.wav
│ │ └── 16/
│ │ └── ...
│ └── 02/
│ └── ...
```
### Recording Types
| Type | Description |
|------|-------------|
| **Outbound** | Audio from the company (IVR, hold music, agent) |
| **Inbound** | Audio from the user's device (after transfer) |
| **Mixed** | Both parties in one file (for review) |
### Usage
```python
service = RecordingService(
storage_dir="recordings",
max_recording_seconds=7200, # 2 hours
sample_rate=16000,
)
# Start recording
session = await service.start_recording(call_id, stream_id)
# session.path = "recordings/2026/01/15/call_abc123_outbound.wav"
# Stop recording
metadata = await service.stop_recording(call_id)
# metadata = { "duration": 847.3, "file_size": 27113600, "path": "..." }
# List recordings for a call
recordings = service.get_recordings(call_id)
```
## Call Analytics (`services/call_analytics.py`)
Tracks call metrics and provides insights for monitoring and optimization.
### Metrics Tracked
| Metric | Description |
|--------|-------------|
| Hold time | Duration spent on hold per call |
| Total call duration | End-to-end call time |
| Success rate | Percentage of calls that reached a human |
| IVR navigation time | Time spent navigating menus |
| Company patterns | Per-company hold time averages |
| Time-of-day trends | When hold times are shortest |
### Usage
```python
analytics = CallAnalytics(max_history=10000)
# Record a completed call
analytics.record_call(
call_id="call_abc123",
number="+18005551234",
company="Chase Bank",
hold_time=780,
total_duration=847,
success=True,
ivr_steps=6,
)
# Get summary
summary = analytics.get_summary()
# {
# "total_calls": 142,
# "success_rate": 0.89,
# "avg_hold_time": 623.4,
# "avg_total_duration": 712.1,
# }
# Per-company stats
stats = analytics.get_company_stats("Chase Bank")
# {
# "total_calls": 23,
# "avg_hold_time": 845.2,
# "best_time": "Tuesday 10:00 AM",
# "success_rate": 0.91,
# }
# Top numbers by call volume
top = analytics.get_top_numbers(limit=10)
# Hold time trends by hour
trends = analytics.get_hold_time_trend()
# [{"hour": 9, "avg_hold": 320}, {"hour": 10, "avg_hold": 480}, ...]
```
## Notification Service (`services/notification.py`)
Sends alerts when important things happen on calls.
### Notification Channels
| Channel | Status | Use Case |
|---------|--------|----------|
| **WebSocket** | ✅ Active | Real-time UI updates (always on) |
| **SMS** | ✅ Active | Critical alerts (human detected, call failed) |
| **Push** | 🔮 Future | Mobile app notifications |
### Notification Priority
| Priority | Events | Delivery |
|----------|--------|----------|
| `CRITICAL` | Human detected, transfer started | WebSocket + SMS |
| `HIGH` | Call failed, call timeout | WebSocket + SMS |
| `NORMAL` | Hold detected, call ended | WebSocket only |
| `LOW` | IVR step, DTMF sent | WebSocket only |
### Event → Notification Mapping
| Event | Notification |
|-------|-------------|
| `HUMAN_DETECTED` | 🚨 "A live person picked up — transferring you now!" |
| `TRANSFER_STARTED` | 📞 "Your call has been connected. Pick up your phone!" |
| `CALL_FAILED` | ❌ "The call couldn't be completed." |
| `HOLD_DETECTED` | ⏳ "You're on hold. We'll notify you when someone picks up." |
| `IVR_STEP` | 📍 "Navigating phone menu..." |
| `IVR_DTMF_SENT` | 📱 "Pressed 3" |
| `CALL_ENDED` | 📴 "The call has ended." |
### Deduplication
The notification service tracks what's been sent per call to avoid spamming:
```python
# Won't send duplicate "on hold" notifications for the same call
self._notified: dict[str, set[str]] # call_id → set of event dedup keys
```
Tracking is cleaned up when a call ends.
### SMS Configuration
SMS is sent for `CRITICAL` priority notifications when `NOTIFY_SMS_NUMBER` is configured:
```env
NOTIFY_SMS_NUMBER=+15559876543
```
The SMS sender is a placeholder — wire up your preferred provider (Twilio, AWS SNS, etc.).