feat: add initial Hold Slayer AI telephony gateway implementation
Complete project scaffolding and core implementation of an AI-powered telephony system that calls companies, navigates IVR menus, waits on hold, and transfers to the user when a human answers. Key components: - FastAPI server with REST API, WebSocket, and MCP (SSE) interfaces - SIP/VoIP call management via PJSUA2 with RTP audio streaming - LLM-powered IVR navigation using OpenAI/Anthropic with tool calling - Hold detection service combining audio analysis and silence detection - Real-time STT (Whisper/Deepgram) and TTS (OpenAI/Piper) pipelines - Call recording with per-channel and mixed audio capture - Event bus (asyncio pub/sub) for real-time client updates - Web dashboard with live call monitoring - SQLite persistence via SQLAlchemy with call history and analytics - Notification support (email, SMS, webhook, desktop) - Docker Compose deployment with Opal VoIP and Opal Media containers - Comprehensive test suite with unit, integration, and E2E tests - Simplified .gitignore and full project documentation in README
This commit is contained in:
18
docs/README.md
Normal file
18
docs/README.md
Normal file
@@ -0,0 +1,18 @@
|
||||
# Hold Slayer Documentation
|
||||
|
||||
Comprehensive documentation for the Hold Slayer AI telephony gateway.
|
||||
|
||||
## Contents
|
||||
|
||||
| Document | Description |
|
||||
|----------|-------------|
|
||||
| [Architecture](architecture.md) | System architecture, component diagram, data flow |
|
||||
| [Core Engine](core-engine.md) | SIP engine, media pipeline, call manager, event bus |
|
||||
| [Hold Slayer Service](hold-slayer-service.md) | IVR navigation, hold detection, human detection, transfer |
|
||||
| [Audio Classifier](audio-classifier.md) | Waveform analysis, feature extraction, classification logic |
|
||||
| [Services](services.md) | LLM client, transcription, recording, analytics, notifications |
|
||||
| [Call Flows](call-flows.md) | Call flow model, step types, learner, CRUD API |
|
||||
| [API Reference](api-reference.md) | REST endpoints, WebSocket protocol, request/response schemas |
|
||||
| [MCP Server](mcp-server.md) | MCP tools and resources for AI assistant integration |
|
||||
| [Configuration](configuration.md) | Environment variables, settings, deployment options |
|
||||
| [Development](development.md) | Setup, testing, contributing, project conventions |
|
||||
378
docs/api-reference.md
Normal file
378
docs/api-reference.md
Normal file
@@ -0,0 +1,378 @@
|
||||
# API Reference
|
||||
|
||||
Hold Slayer exposes a REST API, WebSocket endpoint, and MCP server.
|
||||
|
||||
## REST API
|
||||
|
||||
Base URL: `http://localhost:8000/api`
|
||||
|
||||
### Calls
|
||||
|
||||
#### Place an Outbound Call
|
||||
|
||||
```
|
||||
POST /api/calls/outbound
|
||||
```
|
||||
|
||||
**Request:**
|
||||
|
||||
```json
|
||||
{
|
||||
"number": "+18005551234",
|
||||
"mode": "hold_slayer",
|
||||
"intent": "dispute Amazon charge from December 15th",
|
||||
"device": "sip_phone",
|
||||
"call_flow_id": "chase_bank_disputes",
|
||||
"services": {
|
||||
"recording": true,
|
||||
"transcription": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Call Modes:**
|
||||
|
||||
| Mode | Description |
|
||||
|------|-------------|
|
||||
| `direct` | Dial and connect to your device immediately |
|
||||
| `hold_slayer` | Navigate IVR, wait on hold, transfer when human detected |
|
||||
| `ai_assisted` | Connect with noise cancel, transcription, recording |
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"call_id": "call_abc123",
|
||||
"status": "trying",
|
||||
"number": "+18005551234",
|
||||
"mode": "hold_slayer",
|
||||
"started_at": "2026-01-15T10:30:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
#### Launch Hold Slayer
|
||||
|
||||
```
|
||||
POST /api/calls/hold-slayer
|
||||
```
|
||||
|
||||
Convenience endpoint — equivalent to `POST /outbound` with `mode=hold_slayer`.
|
||||
|
||||
**Request:**
|
||||
|
||||
```json
|
||||
{
|
||||
"number": "+18005551234",
|
||||
"intent": "dispute Amazon charge from December 15th",
|
||||
"call_flow_id": "chase_bank_disputes",
|
||||
"transfer_to": "sip_phone"
|
||||
}
|
||||
```
|
||||
|
||||
#### Get Call Status
|
||||
|
||||
```
|
||||
GET /api/calls/{call_id}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"call_id": "call_abc123",
|
||||
"status": "on_hold",
|
||||
"number": "+18005551234",
|
||||
"mode": "hold_slayer",
|
||||
"duration": 847,
|
||||
"hold_time": 780,
|
||||
"audio_type": "music",
|
||||
"transcript_excerpt": "...your call is important to us...",
|
||||
"classification_history": [
|
||||
{"timestamp": 1706000000, "type": "ringing", "confidence": 0.95},
|
||||
{"timestamp": 1706000003, "type": "ivr_prompt", "confidence": 0.88},
|
||||
{"timestamp": 1706000010, "type": "music", "confidence": 0.92}
|
||||
],
|
||||
"services": {"recording": true, "transcription": true}
|
||||
}
|
||||
```
|
||||
|
||||
#### List Active Calls
|
||||
|
||||
```
|
||||
GET /api/calls
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"calls": [
|
||||
{"call_id": "call_abc123", "status": "on_hold", "number": "+18005551234", "duration": 847},
|
||||
{"call_id": "call_def456", "status": "connected", "number": "+18009876543", "duration": 120}
|
||||
],
|
||||
"total": 2
|
||||
}
|
||||
```
|
||||
|
||||
#### End a Call
|
||||
|
||||
```
|
||||
POST /api/calls/{call_id}/hangup
|
||||
```
|
||||
|
||||
#### Transfer a Call
|
||||
|
||||
```
|
||||
POST /api/calls/{call_id}/transfer
|
||||
```
|
||||
|
||||
**Request:**
|
||||
|
||||
```json
|
||||
{
|
||||
"device": "sip_phone"
|
||||
}
|
||||
```
|
||||
|
||||
### Call Flows
|
||||
|
||||
#### List Call Flows
|
||||
|
||||
```
|
||||
GET /api/call-flows
|
||||
GET /api/call-flows?company=Chase+Bank
|
||||
GET /api/call-flows?tag=banking
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"flows": [
|
||||
{
|
||||
"id": "chase_bank_disputes",
|
||||
"name": "Chase Bank — Disputes",
|
||||
"company": "Chase Bank",
|
||||
"phone_number": "+18005551234",
|
||||
"step_count": 7,
|
||||
"success_count": 12,
|
||||
"fail_count": 1,
|
||||
"tags": ["banking", "disputes"]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
#### Get Call Flow
|
||||
|
||||
```
|
||||
GET /api/call-flows/{flow_id}
|
||||
```
|
||||
|
||||
Returns the full call flow with all steps.
|
||||
|
||||
#### Create Call Flow
|
||||
|
||||
```
|
||||
POST /api/call-flows
|
||||
```
|
||||
|
||||
**Request:**
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "Chase Bank — Disputes",
|
||||
"company": "Chase Bank",
|
||||
"phone_number": "+18005551234",
|
||||
"steps": [
|
||||
{"id": "wait", "type": "WAIT", "description": "Wait for greeting", "timeout": 5.0, "next_step": "menu"},
|
||||
{"id": "menu", "type": "LISTEN", "description": "Main menu", "next_step": "press3"},
|
||||
{"id": "press3", "type": "DTMF", "description": "Account services", "dtmf": "3", "next_step": "hold"},
|
||||
{"id": "hold", "type": "HOLD", "description": "Wait for agent", "next_step": "transfer"},
|
||||
{"id": "transfer", "type": "TRANSFER", "description": "Connect to user"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
#### Update Call Flow
|
||||
|
||||
```
|
||||
PUT /api/call-flows/{flow_id}
|
||||
```
|
||||
|
||||
#### Delete Call Flow
|
||||
|
||||
```
|
||||
DELETE /api/call-flows/{flow_id}
|
||||
```
|
||||
|
||||
### Devices
|
||||
|
||||
#### List Registered Devices
|
||||
|
||||
```
|
||||
GET /api/devices
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"devices": [
|
||||
{
|
||||
"id": "dev_001",
|
||||
"name": "Office SIP Phone",
|
||||
"type": "sip_phone",
|
||||
"sip_uri": "sip:robert@gateway.helu.ca",
|
||||
"is_online": true,
|
||||
"priority": 10
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
#### Register a Device
|
||||
|
||||
```
|
||||
POST /api/devices
|
||||
```
|
||||
|
||||
**Request:**
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "Office SIP Phone",
|
||||
"type": "sip_phone",
|
||||
"sip_uri": "sip:robert@gateway.helu.ca",
|
||||
"priority": 10,
|
||||
"capabilities": ["voice"]
|
||||
}
|
||||
```
|
||||
|
||||
#### Update Device
|
||||
|
||||
```
|
||||
PUT /api/devices/{device_id}
|
||||
```
|
||||
|
||||
#### Remove Device
|
||||
|
||||
```
|
||||
DELETE /api/devices/{device_id}
|
||||
```
|
||||
|
||||
### Error Responses
|
||||
|
||||
All errors follow a consistent format:
|
||||
|
||||
```json
|
||||
{
|
||||
"detail": "Call not found: call_xyz789"
|
||||
}
|
||||
```
|
||||
|
||||
| Status Code | Meaning |
|
||||
|-------------|---------|
|
||||
| `400` | Bad request (invalid parameters) |
|
||||
| `404` | Resource not found (call, flow, device) |
|
||||
| `409` | Conflict (call already ended, device already registered) |
|
||||
| `500` | Internal server error |
|
||||
|
||||
## WebSocket
|
||||
|
||||
### Event Stream
|
||||
|
||||
```
|
||||
ws://localhost:8000/ws/events
|
||||
ws://localhost:8000/ws/events?call_id=call_abc123
|
||||
ws://localhost:8000/ws/events?types=human_detected,hold_detected
|
||||
```
|
||||
|
||||
**Query Parameters:**
|
||||
|
||||
| Param | Description |
|
||||
|-------|-------------|
|
||||
| `call_id` | Filter events for a specific call |
|
||||
| `types` | Comma-separated event types to receive |
|
||||
|
||||
**Event Format:**
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "hold_detected",
|
||||
"call_id": "call_abc123",
|
||||
"timestamp": "2026-01-15T10:35:00Z",
|
||||
"data": {
|
||||
"audio_type": "music",
|
||||
"confidence": 0.92,
|
||||
"hold_duration": 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Event Types
|
||||
|
||||
| Type | Data Fields |
|
||||
|------|------------|
|
||||
| `call_started` | `number`, `mode`, `intent` |
|
||||
| `call_ringing` | `number` |
|
||||
| `call_connected` | `number`, `duration` |
|
||||
| `call_ended` | `number`, `duration`, `reason` |
|
||||
| `call_failed` | `number`, `error` |
|
||||
| `hold_detected` | `audio_type`, `confidence` |
|
||||
| `human_detected` | `confidence`, `transcript_excerpt` |
|
||||
| `transfer_started` | `device`, `from_call_id` |
|
||||
| `transfer_complete` | `device`, `bridge_id` |
|
||||
| `ivr_step` | `step_id`, `step_type`, `description` |
|
||||
| `ivr_dtmf_sent` | `digits`, `step_id` |
|
||||
| `ivr_menu_detected` | `transcript`, `options` |
|
||||
| `audio_classified` | `audio_type`, `confidence`, `features` |
|
||||
| `transcript_chunk` | `text`, `speaker`, `is_final` |
|
||||
| `recording_started` | `recording_id`, `path` |
|
||||
| `recording_stopped` | `recording_id`, `duration`, `file_size` |
|
||||
|
||||
### Client Example
|
||||
|
||||
```javascript
|
||||
const ws = new WebSocket("ws://localhost:8000/ws/events");
|
||||
|
||||
ws.onopen = () => {
|
||||
console.log("Connected to Hold Slayer events");
|
||||
};
|
||||
|
||||
ws.onmessage = (event) => {
|
||||
const data = JSON.parse(event.data);
|
||||
|
||||
switch (data.type) {
|
||||
case "human_detected":
|
||||
alert("🚨 A live person picked up! Pick up your phone!");
|
||||
break;
|
||||
case "hold_detected":
|
||||
console.log("⏳ On hold...");
|
||||
break;
|
||||
case "transcript_chunk":
|
||||
console.log(`📝 ${data.data.speaker}: ${data.data.text}`);
|
||||
break;
|
||||
}
|
||||
};
|
||||
|
||||
ws.onerror = (error) => {
|
||||
console.error("WebSocket error:", error);
|
||||
};
|
||||
```
|
||||
|
||||
### Python Client Example
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
import websockets
|
||||
import json
|
||||
|
||||
async def listen():
|
||||
async with websockets.connect("ws://localhost:8000/ws/events") as ws:
|
||||
async for message in ws:
|
||||
event = json.loads(message)
|
||||
print(f"[{event['type']}] {event.get('data', {})}")
|
||||
|
||||
asyncio.run(listen())
|
||||
```
|
||||
178
docs/architecture.md
Normal file
178
docs/architecture.md
Normal file
@@ -0,0 +1,178 @@
|
||||
# Architecture
|
||||
|
||||
Hold Slayer is a single-process async Python application built on FastAPI. It acts as an intelligent B2BUA (Back-to-Back User Agent) sitting between your SIP trunk (PSTN access) and your desk phone/softphone.
|
||||
|
||||
## System Diagram
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ FastAPI Server │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌───────────┐ ┌──────────────┐ │
|
||||
│ │ REST API │ │WebSocket │ │MCP Server │ │ Dashboard │ │
|
||||
│ │ /api/* │ │ /ws/* │ │ (SSE) │ │ /dashboard │ │
|
||||
│ └────┬─────┘ └────┬─────┘ └─────┬─────┘ └──────────────┘ │
|
||||
│ │ │ │ │
|
||||
│ ┌────┴──────────────┴──────────────┴────┐ │
|
||||
│ │ Event Bus │ │
|
||||
│ │ (asyncio Queue pub/sub per client) │ │
|
||||
│ └────┬──────────────┬──────────────┬────┘ │
|
||||
│ │ │ │ │
|
||||
│ ┌────┴─────┐ ┌─────┴─────┐ ┌────┴──────────┐ │
|
||||
│ │ Call │ │ Hold │ │ Services │ │
|
||||
│ │ Manager │ │ Slayer │ │ (LLM, STT, │ │
|
||||
│ │ │ │ │ │ Recording, │ │
|
||||
│ │ │ │ │ │ Analytics, │ │
|
||||
│ │ │ │ │ │ Notify) │ │
|
||||
│ └────┬─────┘ └─────┬─────┘ └──────────────┘ │
|
||||
│ │ │ │
|
||||
│ ┌────┴──────────────┴───────────────────┐ │
|
||||
│ │ Sippy B2BUA Engine │ │
|
||||
│ │ (SIP calls, DTMF, conference bridge) │ │
|
||||
│ └────┬──────────────────────────────────┘ │
|
||||
│ │ │
|
||||
└───────┼─────────────────────────────────────────────────────────┘
|
||||
│
|
||||
┌────┴────┐
|
||||
│SIP Trunk│ ──→ PSTN
|
||||
└─────────┘
|
||||
```
|
||||
|
||||
## Component Overview
|
||||
|
||||
### Presentation Layer
|
||||
|
||||
| Component | File | Protocol | Purpose |
|
||||
|-----------|------|----------|---------|
|
||||
| REST API | `api/calls.py`, `api/call_flows.py`, `api/devices.py` | HTTP | Call management, CRUD, configuration |
|
||||
| WebSocket | `api/websocket.py` | WS | Real-time event streaming to clients |
|
||||
| MCP Server | `mcp_server/server.py` | SSE | AI assistant tool integration |
|
||||
|
||||
### Orchestration Layer
|
||||
|
||||
| Component | File | Purpose |
|
||||
|-----------|------|---------|
|
||||
| Gateway | `core/gateway.py` | Top-level orchestrator — owns all services, routes calls |
|
||||
| Call Manager | `core/call_manager.py` | Active call state, lifecycle, transcript tracking |
|
||||
| Event Bus | `core/event_bus.py` | Async pub/sub connecting everything together |
|
||||
|
||||
### Intelligence Layer
|
||||
|
||||
| Component | File | Purpose |
|
||||
|-----------|------|---------|
|
||||
| Hold Slayer | `services/hold_slayer.py` | IVR navigation, hold monitoring, human detection |
|
||||
| Audio Classifier | `services/audio_classifier.py` | Real-time waveform analysis (music/speech/DTMF/silence) |
|
||||
| LLM Client | `services/llm_client.py` | OpenAI-compatible LLM for IVR menu decisions |
|
||||
| Transcription | `services/transcription.py` | Speaches/Whisper STT for live audio |
|
||||
| Call Flow Learner | `services/call_flow_learner.py` | Builds reusable IVR trees from exploration data |
|
||||
|
||||
### Infrastructure Layer
|
||||
|
||||
| Component | File | Purpose |
|
||||
|-----------|------|---------|
|
||||
| Sippy Engine | `core/sippy_engine.py` | SIP signaling (INVITE, BYE, REGISTER, DTMF) |
|
||||
| Media Pipeline | `core/media_pipeline.py` | PJSUA2 RTP media handling, conference bridge, recording |
|
||||
| Recording | `services/recording.py` | WAV file management and storage |
|
||||
| Analytics | `services/call_analytics.py` | Call metrics, hold time stats, trends |
|
||||
| Notifications | `services/notification.py` | WebSocket + SMS alerts |
|
||||
| Database | `db/database.py` | SQLAlchemy async (PostgreSQL or SQLite) |
|
||||
|
||||
## Data Flow — Hold Slayer Call
|
||||
|
||||
```
|
||||
1. User Request
|
||||
POST /api/calls/hold-slayer { number, intent, call_flow_id }
|
||||
│
|
||||
2. Gateway.make_call()
|
||||
├── CallManager.create_call() → track state
|
||||
├── SippyEngine.make_call() → SIP INVITE to trunk
|
||||
└── MediaPipeline.add_stream() → RTP media setup
|
||||
│
|
||||
3. HoldSlayer.run_with_flow() or run_exploration()
|
||||
├── AudioClassifier.classify() → analyze 3s audio windows
|
||||
│ ├── silence? → wait
|
||||
│ ├── ringing? → wait
|
||||
│ ├── DTMF? → detect tones
|
||||
│ ├── music? → HOLD_DETECTED event
|
||||
│ └── speech? → transcribe + decide
|
||||
│
|
||||
├── TranscriptionService.transcribe() → STT on speech audio
|
||||
│
|
||||
├── LLMClient.analyze_ivr_menu() → pick menu option (fallback)
|
||||
│ └── SippyEngine.send_dtmf() → press the button
|
||||
│
|
||||
└── detect_hold_to_human_transition()
|
||||
└── HUMAN_DETECTED! → transfer
|
||||
│
|
||||
4. Transfer
|
||||
├── SippyEngine.bridge() → connect call legs
|
||||
├── MediaPipeline.bridge_streams() → bridge RTP
|
||||
├── EventBus.publish(TRANSFER_STARTED)
|
||||
└── NotificationService → "Pick up your phone!"
|
||||
│
|
||||
5. Real-Time Updates (throughout)
|
||||
EventBus.publish() → WebSocket clients
|
||||
→ MCP server resources
|
||||
→ Notification service
|
||||
→ Analytics tracking
|
||||
```
|
||||
|
||||
## Threading Model
|
||||
|
||||
Hold Slayer is primarily single-threaded async (asyncio), with one exception:
|
||||
|
||||
- **Main thread**: FastAPI + all async services (event bus, hold slayer, classifier, etc.)
|
||||
- **Sippy thread**: Sippy B2BUA runs its own event loop in a dedicated daemon thread. The `SippyEngine` bridges async↔sync via `asyncio.run_in_executor()`.
|
||||
- **PJSUA2**: Runs in the main thread using null audio device (no sound card needed — headless server mode).
|
||||
|
||||
```
|
||||
Main Thread (asyncio)
|
||||
├── FastAPI (uvicorn)
|
||||
├── EventBus
|
||||
├── CallManager
|
||||
├── HoldSlayer
|
||||
├── AudioClassifier
|
||||
├── TranscriptionService
|
||||
├── LLMClient
|
||||
├── MediaPipeline (PJSUA2)
|
||||
├── NotificationService
|
||||
└── RecordingService
|
||||
|
||||
Sippy Thread (daemon)
|
||||
└── Sippy B2BUA event loop
|
||||
├── SIP signaling
|
||||
├── DTMF relay
|
||||
└── Call leg management
|
||||
```
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### Why Sippy B2BUA + PJSUA2?
|
||||
|
||||
We split SIP signaling and media handling into two separate libraries:
|
||||
|
||||
- **Sippy B2BUA** handles SIP signaling (INVITE, BYE, REGISTER, re-INVITE, DTMF relay). It's battle-tested for telephony and handles the complex SIP state machine.
|
||||
- **PJSUA2** handles RTP media (audio streams, conference bridge, recording, tone generation). It provides a clean C++/Python API for media manipulation without needing to deal with raw RTP.
|
||||
|
||||
This split lets us tap into the audio stream (for classification and STT) without interfering with SIP signaling, and bridge calls through a conference bridge for clean transfer.
|
||||
|
||||
### Why asyncio Queue-based EventBus?
|
||||
|
||||
- **Single process** — no need for Redis/RabbitMQ cross-process messaging
|
||||
- **Zero dependencies** — pure asyncio, no external services to deploy
|
||||
- **Per-subscriber queues** — slow consumers don't block fast publishers
|
||||
- **Dead subscriber cleanup** — full queues are automatically removed
|
||||
- **Event history** — late joiners can catch up on recent events
|
||||
|
||||
If scaling to multiple gateway processes becomes necessary, the EventBus interface can be backed by Redis pub/sub without changing consumers.
|
||||
|
||||
### Why OpenAI-compatible LLM API?
|
||||
|
||||
The LLM client uses raw HTTP (httpx) against any OpenAI-compatible endpoint. This means:
|
||||
|
||||
- **Ollama** (local, free) — `http://localhost:11434/v1`
|
||||
- **LM Studio** (local, free) — `http://localhost:1234/v1`
|
||||
- **vLLM** (local, fast) — `http://localhost:8000/v1`
|
||||
- **OpenAI** (cloud) — `https://api.openai.com/v1`
|
||||
|
||||
No SDK dependency. No vendor lock-in. Switch models by changing one env var.
|
||||
174
docs/audio-classifier.md
Normal file
174
docs/audio-classifier.md
Normal file
@@ -0,0 +1,174 @@
|
||||
# Audio Classifier
|
||||
|
||||
The Audio Classifier (`services/audio_classifier.py`) performs real-time waveform analysis on phone audio to determine what's happening on the call: silence, ringing, hold music, IVR prompts, DTMF tones, or live human speech.
|
||||
|
||||
## Classification Types
|
||||
|
||||
```python
|
||||
class AudioClassification(str, Enum):
|
||||
SILENCE = "silence" # No meaningful audio
|
||||
MUSIC = "music" # Hold music
|
||||
IVR_PROMPT = "ivr_prompt" # Recorded voice menu
|
||||
LIVE_HUMAN = "live_human" # Live person speaking
|
||||
RINGING = "ringing" # Ringback tone
|
||||
DTMF = "dtmf" # Touch-tone digits
|
||||
UNKNOWN = "unknown" # Can't classify
|
||||
```
|
||||
|
||||
## Feature Extraction
|
||||
|
||||
Every audio frame (typically 3 seconds of 16kHz PCM) goes through feature extraction:
|
||||
|
||||
| Feature | What It Measures | How It's Used |
|
||||
|---------|-----------------|---------------|
|
||||
| **RMS Energy** | Loudness (root mean square of samples) | Silence detection — below threshold = silence |
|
||||
| **Spectral Flatness** | How noise-like vs tonal the audio is (0=pure tone, 1=white noise) | Music has low flatness (tonal), speech has higher flatness |
|
||||
| **Zero-Crossing Rate** | How often the waveform crosses zero | Speech has moderate ZCR, tones have very regular ZCR |
|
||||
| **Dominant Frequency** | Strongest frequency component (via FFT) | Ringback detection (440Hz), DTMF detection |
|
||||
| **Spectral Centroid** | "Center of mass" of the frequency spectrum | Speech has higher centroid than music |
|
||||
| **Tonality** | Whether the audio is dominated by a single frequency | Tones/DTMF are highly tonal, speech is not |
|
||||
|
||||
### Feature Extraction Code
|
||||
|
||||
```python
|
||||
def _extract_features(self, audio: np.ndarray) -> dict:
|
||||
rms = np.sqrt(np.mean(audio ** 2))
|
||||
|
||||
# FFT for frequency analysis
|
||||
fft = np.fft.rfft(audio)
|
||||
magnitude = np.abs(fft)
|
||||
freqs = np.fft.rfftfreq(len(audio), 1.0 / self._sample_rate)
|
||||
|
||||
# Spectral flatness: geometric mean / arithmetic mean of magnitude
|
||||
spectral_flatness = np.exp(np.mean(np.log(magnitude + 1e-10))) / (np.mean(magnitude) + 1e-10)
|
||||
|
||||
# Zero-crossing rate
|
||||
zcr = np.mean(np.abs(np.diff(np.sign(audio)))) / 2
|
||||
|
||||
# Dominant frequency
|
||||
dominant_freq = freqs[np.argmax(magnitude)]
|
||||
|
||||
# Spectral centroid
|
||||
spectral_centroid = np.sum(freqs * magnitude) / (np.sum(magnitude) + 1e-10)
|
||||
|
||||
return { ... }
|
||||
```
|
||||
|
||||
## Classification Logic
|
||||
|
||||
Classification follows a priority chain:
|
||||
|
||||
```
|
||||
1. SILENCE — RMS below threshold?
|
||||
└── Yes → SILENCE (confidence based on how quiet)
|
||||
|
||||
2. DTMF — Goertzel algorithm detects dual-tone pairs?
|
||||
└── Yes → DTMF (with detected digit in details)
|
||||
|
||||
3. RINGING — Dominant frequency near 440Hz + tonal?
|
||||
└── Yes → RINGING
|
||||
|
||||
4. SPEECH vs MUSIC discrimination:
|
||||
├── High spectral flatness + moderate ZCR → LIVE_HUMAN or IVR_PROMPT
|
||||
│ └── _looks_like_live_human() checks history for hold→speech transition
|
||||
│ ├── Yes → LIVE_HUMAN
|
||||
│ └── No → IVR_PROMPT
|
||||
│
|
||||
└── Low spectral flatness + tonal → MUSIC
|
||||
```
|
||||
|
||||
### DTMF Detection
|
||||
|
||||
Uses the Goertzel algorithm to detect the dual-tone pairs that make up DTMF digits:
|
||||
|
||||
```
|
||||
1209 Hz 1336 Hz 1477 Hz 1633 Hz
|
||||
697 Hz 1 2 3 A
|
||||
770 Hz 4 5 6 B
|
||||
852 Hz 7 8 9 C
|
||||
941 Hz * 0 # D
|
||||
```
|
||||
|
||||
Each DTMF digit is two simultaneous frequencies. The Goertzel algorithm efficiently checks for the presence of each specific frequency without computing a full FFT.
|
||||
|
||||
### Hold-to-Human Transition
|
||||
|
||||
The most critical detection — when a live person picks up after hold music:
|
||||
|
||||
```python
|
||||
def detect_hold_to_human_transition(self) -> bool:
|
||||
"""
|
||||
Check classification history for the pattern:
|
||||
MUSIC, MUSIC, MUSIC, ... → LIVE_HUMAN/IVR_PROMPT
|
||||
|
||||
Requires:
|
||||
- At least 3 recent MUSIC classifications
|
||||
- Followed by 2+ speech classifications
|
||||
- Speech has sufficient energy (not just noise)
|
||||
"""
|
||||
recent = self._history[-10:]
|
||||
|
||||
# Find the transition point
|
||||
music_count = 0
|
||||
speech_count = 0
|
||||
for result in recent:
|
||||
if result.audio_type == AudioClassification.MUSIC:
|
||||
music_count += 1
|
||||
speech_count = 0 # reset
|
||||
elif result.audio_type in (AudioClassification.LIVE_HUMAN, AudioClassification.IVR_PROMPT):
|
||||
speech_count += 1
|
||||
|
||||
return music_count >= 3 and speech_count >= 2
|
||||
```
|
||||
|
||||
## Classification Result
|
||||
|
||||
Each classification returns:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ClassificationResult:
|
||||
timestamp: float
|
||||
audio_type: AudioClassification
|
||||
confidence: float # 0.0 to 1.0
|
||||
details: dict # Feature values, detected frequencies, etc.
|
||||
```
|
||||
|
||||
The `details` dict includes all extracted features, making it available for debugging and analytics:
|
||||
|
||||
```python
|
||||
{
|
||||
"rms": 0.0423,
|
||||
"spectral_flatness": 0.15,
|
||||
"zcr": 0.087,
|
||||
"dominant_freq": 440.0,
|
||||
"spectral_centroid": 523.7,
|
||||
"is_tonal": True
|
||||
}
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
| Setting | Description | Default |
|
||||
|---------|-------------|---------|
|
||||
| `CLASSIFIER_MUSIC_THRESHOLD` | Spectral flatness below this = music | `0.7` |
|
||||
| `CLASSIFIER_SPEECH_THRESHOLD` | Spectral flatness above this = speech | `0.6` |
|
||||
| `CLASSIFIER_SILENCE_THRESHOLD` | RMS below this = silence | `0.85` |
|
||||
| `CLASSIFIER_WINDOW_SECONDS` | Audio window size for each classification | `3.0` |
|
||||
|
||||
## Testing
|
||||
|
||||
The audio classifier has 18 unit tests covering:
|
||||
|
||||
- Silence detection (pure silence, very quiet, empty audio)
|
||||
- Tone detection (440Hz ringback, 1000Hz test tone)
|
||||
- DTMF detection (digit 5, digit 0)
|
||||
- Speech detection (speech-like waveforms)
|
||||
- Classification history (hold→human transition, IVR non-transition)
|
||||
- Feature extraction (RMS, ZCR, spectral flatness, dominant frequency)
|
||||
|
||||
```bash
|
||||
pytest tests/test_audio_classifier.py -v
|
||||
```
|
||||
|
||||
> **Known issue:** `test_complex_tone_as_music` is a known edge case where a multi-harmonic synthetic tone is classified as `LIVE_HUMAN` instead of `MUSIC`. This is acceptable — real hold music has different characteristics than synthetic test signals.
|
||||
233
docs/call-flows.md
Normal file
233
docs/call-flows.md
Normal file
@@ -0,0 +1,233 @@
|
||||
# Call Flows
|
||||
|
||||
Call flows are reusable IVR navigation trees that tell Hold Slayer exactly how to navigate a company's phone menu. Once a flow is learned (manually or via exploration), subsequent calls to the same number skip the LLM analysis and follow the stored steps directly.
|
||||
|
||||
## Data Model
|
||||
|
||||
### CallFlowStep
|
||||
|
||||
A single step in the IVR navigation:
|
||||
|
||||
```python
|
||||
class CallFlowStep(BaseModel):
|
||||
id: str # Unique step identifier
|
||||
type: CallFlowStepType # DTMF, WAIT, LISTEN, HOLD, SPEAK, TRANSFER
|
||||
description: str # Human-readable description
|
||||
dtmf: Optional[str] = None # Digits to press (for DTMF steps)
|
||||
timeout: float = 10.0 # Max seconds to wait
|
||||
next_step: Optional[str] = None # ID of the next step
|
||||
conditions: dict = {} # Conditional branching rules
|
||||
metadata: dict = {} # Extra data (transcript patterns, etc.)
|
||||
```
|
||||
|
||||
### Step Types
|
||||
|
||||
| Type | Purpose | Key Fields |
|
||||
|------|---------|------------|
|
||||
| `DTMF` | Press touch-tone digits | `dtmf="3"` |
|
||||
| `WAIT` | Pause for a duration | `timeout=5.0` |
|
||||
| `LISTEN` | Record + transcribe + decide | `timeout=15.0`, optional `dtmf` for hardcoded response |
|
||||
| `HOLD` | Wait on hold, monitor for human | `timeout=7200` (max hold time) |
|
||||
| `SPEAK` | Play audio to the call | `metadata={"audio_file": "greeting.wav"}` |
|
||||
| `TRANSFER` | Bridge call to user's device | `metadata={"device": "sip_phone"}` |
|
||||
|
||||
### CallFlow
|
||||
|
||||
A complete IVR navigation tree:
|
||||
|
||||
```python
|
||||
class CallFlow(BaseModel):
|
||||
id: str # "chase_bank_main"
|
||||
name: str # "Chase Bank — Main Menu"
|
||||
company: Optional[str] # "Chase Bank"
|
||||
phone_number: Optional[str] # "+18005551234"
|
||||
description: Optional[str] # "Navigate to disputes department"
|
||||
steps: list[CallFlowStep] # Ordered list of steps
|
||||
created_at: datetime
|
||||
updated_at: datetime
|
||||
version: int = 1
|
||||
tags: list[str] = [] # ["banking", "disputes"]
|
||||
success_count: int = 0 # Times this flow succeeded
|
||||
fail_count: int = 0 # Times this flow failed
|
||||
```
|
||||
|
||||
## Example Call Flow
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "chase_bank_disputes",
|
||||
"name": "Chase Bank — Disputes",
|
||||
"company": "Chase Bank",
|
||||
"phone_number": "+18005551234",
|
||||
"steps": [
|
||||
{
|
||||
"id": "wait_greeting",
|
||||
"type": "WAIT",
|
||||
"description": "Wait for greeting to finish",
|
||||
"timeout": 5.0,
|
||||
"next_step": "main_menu"
|
||||
},
|
||||
{
|
||||
"id": "main_menu",
|
||||
"type": "LISTEN",
|
||||
"description": "Listen to main menu options",
|
||||
"timeout": 15.0,
|
||||
"next_step": "press_3"
|
||||
},
|
||||
{
|
||||
"id": "press_3",
|
||||
"type": "DTMF",
|
||||
"description": "Press 3 for account services",
|
||||
"dtmf": "3",
|
||||
"next_step": "sub_menu"
|
||||
},
|
||||
{
|
||||
"id": "sub_menu",
|
||||
"type": "LISTEN",
|
||||
"description": "Listen to account services sub-menu",
|
||||
"timeout": 15.0,
|
||||
"next_step": "press_1"
|
||||
},
|
||||
{
|
||||
"id": "press_1",
|
||||
"type": "DTMF",
|
||||
"description": "Press 1 for disputes",
|
||||
"dtmf": "1",
|
||||
"next_step": "hold"
|
||||
},
|
||||
{
|
||||
"id": "hold",
|
||||
"type": "HOLD",
|
||||
"description": "Wait on hold for disputes agent",
|
||||
"timeout": 7200,
|
||||
"next_step": "transfer"
|
||||
},
|
||||
{
|
||||
"id": "transfer",
|
||||
"type": "TRANSFER",
|
||||
"description": "Transfer to user's phone"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Call Flow Learner (`services/call_flow_learner.py`)
|
||||
|
||||
Automatically builds call flows from exploration data.
|
||||
|
||||
### How It Works
|
||||
|
||||
1. **Exploration mode** records "discoveries" — what the Hold Slayer encountered and did at each step
|
||||
2. The learner converts discoveries into `CallFlowStep` objects
|
||||
3. Steps are ordered and linked (`next_step` pointers)
|
||||
4. The resulting `CallFlow` is saved for future calls
|
||||
|
||||
### Discovery Types
|
||||
|
||||
| Discovery | Becomes Step |
|
||||
|-----------|-------------|
|
||||
| Heard IVR prompt, pressed DTMF | `LISTEN` → `DTMF` |
|
||||
| Detected hold music | `HOLD` |
|
||||
| Detected silence (waiting) | `WAIT` |
|
||||
| Heard speech (human) | `TRANSFER` |
|
||||
| Sent DTMF digits | `DTMF` |
|
||||
|
||||
### Building a Flow
|
||||
|
||||
```python
|
||||
learner = CallFlowLearner()
|
||||
|
||||
# After an exploration call completes:
|
||||
discoveries = [
|
||||
{"type": "wait", "duration": 3.0, "description": "Initial silence"},
|
||||
{"type": "ivr_menu", "transcript": "Press 1 for billing...", "dtmf_sent": "1"},
|
||||
{"type": "ivr_menu", "transcript": "Press 3 for disputes...", "dtmf_sent": "3"},
|
||||
{"type": "hold", "duration": 480.0},
|
||||
{"type": "human_detected", "transcript": "Thank you for calling..."},
|
||||
]
|
||||
|
||||
flow = learner.build_flow(
|
||||
discoveries=discoveries,
|
||||
phone_number="+18005551234",
|
||||
company="Chase Bank",
|
||||
intent="dispute a charge",
|
||||
)
|
||||
# Returns a CallFlow with 5 steps: WAIT → LISTEN/DTMF → LISTEN/DTMF → HOLD → TRANSFER
|
||||
```
|
||||
|
||||
### Merging Discoveries
|
||||
|
||||
When the same number is called again with exploration, new discoveries can be merged into the existing flow:
|
||||
|
||||
```python
|
||||
updated_flow = learner.merge_discoveries(
|
||||
existing_flow=flow,
|
||||
new_discoveries=new_discoveries,
|
||||
)
|
||||
```
|
||||
|
||||
This handles:
|
||||
- New menu options discovered
|
||||
- Changed IVR structure
|
||||
- Updated timing information
|
||||
- Success/failure tracking
|
||||
|
||||
## REST API
|
||||
|
||||
### List Call Flows
|
||||
|
||||
```
|
||||
GET /api/call-flows
|
||||
GET /api/call-flows?company=Chase+Bank
|
||||
GET /api/call-flows?tag=banking
|
||||
```
|
||||
|
||||
### Get Call Flow
|
||||
|
||||
```
|
||||
GET /api/call-flows/{flow_id}
|
||||
```
|
||||
|
||||
### Create Call Flow
|
||||
|
||||
```
|
||||
POST /api/call-flows
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"name": "Chase Bank — Disputes",
|
||||
"company": "Chase Bank",
|
||||
"phone_number": "+18005551234",
|
||||
"steps": [ ... ]
|
||||
}
|
||||
```
|
||||
|
||||
### Update Call Flow
|
||||
|
||||
```
|
||||
PUT /api/call-flows/{flow_id}
|
||||
Content-Type: application/json
|
||||
|
||||
{ ... updated flow ... }
|
||||
```
|
||||
|
||||
### Delete Call Flow
|
||||
|
||||
```
|
||||
DELETE /api/call-flows/{flow_id}
|
||||
```
|
||||
|
||||
### Learn Flow from Exploration
|
||||
|
||||
```
|
||||
POST /api/call-flows/learn
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"call_id": "call_abc123",
|
||||
"phone_number": "+18005551234",
|
||||
"company": "Chase Bank"
|
||||
}
|
||||
```
|
||||
|
||||
This triggers the Call Flow Learner to build a flow from the call's exploration data.
|
||||
165
docs/configuration.md
Normal file
165
docs/configuration.md
Normal file
@@ -0,0 +1,165 @@
|
||||
# Configuration
|
||||
|
||||
All configuration is via environment variables, loaded through Pydantic Settings. Copy `.env.example` to `.env` and edit.
|
||||
|
||||
## Environment Variables
|
||||
|
||||
### SIP Trunk
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|----------|-------------|---------|----------|
|
||||
| `SIP_TRUNK_HOST` | Your SIP provider hostname | — | Yes |
|
||||
| `SIP_TRUNK_PORT` | SIP signaling port | `5060` | No |
|
||||
| `SIP_TRUNK_USERNAME` | SIP auth username | — | Yes |
|
||||
| `SIP_TRUNK_PASSWORD` | SIP auth password | — | Yes |
|
||||
| `SIP_TRUNK_DID` | Your phone number (E.164) | — | Yes |
|
||||
| `SIP_TRUNK_TRANSPORT` | Transport protocol (`udp`, `tcp`, `tls`) | `udp` | No |
|
||||
|
||||
### Gateway
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|----------|-------------|---------|----------|
|
||||
| `GATEWAY_SIP_PORT` | Port for device SIP registration | `5080` | No |
|
||||
| `GATEWAY_RTP_PORT_MIN` | Minimum RTP port | `10000` | No |
|
||||
| `GATEWAY_RTP_PORT_MAX` | Maximum RTP port | `20000` | No |
|
||||
| `GATEWAY_HOST` | Bind address | `0.0.0.0` | No |
|
||||
|
||||
### LLM
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|----------|-------------|---------|----------|
|
||||
| `LLM_BASE_URL` | OpenAI-compatible API endpoint | `http://localhost:11434/v1` | No |
|
||||
| `LLM_MODEL` | Model name for IVR analysis | `llama3` | No |
|
||||
| `LLM_API_KEY` | API key (if required) | `not-needed` | No |
|
||||
| `LLM_TIMEOUT` | Request timeout in seconds | `30.0` | No |
|
||||
| `LLM_MAX_TOKENS` | Max tokens per response | `1024` | No |
|
||||
| `LLM_TEMPERATURE` | Sampling temperature | `0.3` | No |
|
||||
|
||||
### Speech-to-Text
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|----------|-------------|---------|----------|
|
||||
| `SPEACHES_URL` | Speaches/Whisper STT endpoint | `http://localhost:22070` | No |
|
||||
| `SPEACHES_MODEL` | Whisper model name | `whisper-large-v3` | No |
|
||||
|
||||
### Database
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|----------|-------------|---------|----------|
|
||||
| `DATABASE_URL` | PostgreSQL or SQLite connection string | `sqlite+aiosqlite:///./hold_slayer.db` | No |
|
||||
|
||||
### Notifications
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|----------|-------------|---------|----------|
|
||||
| `NOTIFY_SMS_NUMBER` | Phone number for SMS alerts (E.164) | — | No |
|
||||
|
||||
### Audio Classifier
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|----------|-------------|---------|----------|
|
||||
| `CLASSIFIER_WINDOW_SECONDS` | Audio window size for classification | `3.0` | No |
|
||||
| `CLASSIFIER_SILENCE_THRESHOLD` | RMS below this = silence | `0.85` | No |
|
||||
| `CLASSIFIER_MUSIC_THRESHOLD` | Spectral flatness below this = music | `0.7` | No |
|
||||
| `CLASSIFIER_SPEECH_THRESHOLD` | Spectral flatness above this = speech | `0.6` | No |
|
||||
|
||||
### Hold Slayer
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|----------|-------------|---------|----------|
|
||||
| `MAX_HOLD_TIME` | Maximum seconds to wait on hold | `7200` | No |
|
||||
| `HOLD_CHECK_INTERVAL` | Seconds between audio checks | `2.0` | No |
|
||||
| `DEFAULT_TRANSFER_DEVICE` | Device to transfer to | `sip_phone` | No |
|
||||
|
||||
### Recording
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|----------|-------------|---------|----------|
|
||||
| `RECORDING_DIR` | Directory for WAV recordings | `recordings` | No |
|
||||
| `RECORDING_MAX_SECONDS` | Maximum recording duration | `7200` | No |
|
||||
| `RECORDING_SAMPLE_RATE` | Audio sample rate | `16000` | No |
|
||||
|
||||
## Settings Architecture
|
||||
|
||||
Configuration is managed by Pydantic Settings in `config.py`:
|
||||
|
||||
```python
|
||||
from config import get_settings
|
||||
|
||||
settings = get_settings()
|
||||
settings.sip_trunk_host # "sip.provider.com"
|
||||
settings.llm.base_url # "http://localhost:11434/v1"
|
||||
settings.llm.model # "llama3"
|
||||
settings.speaches_url # "http://localhost:22070"
|
||||
settings.database_url # "sqlite+aiosqlite:///./hold_slayer.db"
|
||||
```
|
||||
|
||||
LLM settings are nested under `settings.llm` as a `LLMSettings` sub-model.
|
||||
|
||||
## Deployment
|
||||
|
||||
### Development
|
||||
|
||||
```bash
|
||||
# 1. Clone and install
|
||||
git clone <repo-url>
|
||||
cd hold-slayer
|
||||
python -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install -e ".[dev]"
|
||||
|
||||
# 2. Configure
|
||||
cp .env.example .env
|
||||
# Edit .env
|
||||
|
||||
# 3. Start Ollama (for LLM)
|
||||
ollama serve
|
||||
ollama pull llama3
|
||||
|
||||
# 4. Start Speaches (for STT)
|
||||
docker run -p 22070:8000 ghcr.io/speaches-ai/speaches
|
||||
|
||||
# 5. Run
|
||||
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
|
||||
```
|
||||
|
||||
### Production
|
||||
|
||||
```bash
|
||||
# Use PostgreSQL instead of SQLite
|
||||
DATABASE_URL=postgresql+asyncpg://user:pass@localhost/hold_slayer
|
||||
|
||||
# Use vLLM for faster inference
|
||||
LLM_BASE_URL=http://localhost:8000/v1
|
||||
LLM_MODEL=meta-llama/Llama-3-8B-Instruct
|
||||
|
||||
# Run with multiple workers (note: each worker is independent)
|
||||
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 1
|
||||
```
|
||||
|
||||
Note: Hold Slayer is designed as a single-process application. Multiple workers would each have their own SIP engine and call state. For high availability, run behind a load balancer with sticky sessions.
|
||||
|
||||
### Docker
|
||||
|
||||
```dockerfile
|
||||
FROM python:3.13-slim
|
||||
|
||||
# Install system dependencies for PJSUA2 and Sippy
|
||||
RUN apt-get update && apt-get install -y \
|
||||
build-essential \
|
||||
libpjproject-dev \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
WORKDIR /app
|
||||
COPY . .
|
||||
RUN pip install -e .
|
||||
|
||||
EXPOSE 8000 5080/udp 10000-20000/udp
|
||||
|
||||
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
|
||||
```
|
||||
|
||||
Port mapping:
|
||||
- `8000` — HTTP API + WebSocket + MCP
|
||||
- `5080/udp` — SIP device registration
|
||||
- `10000-20000/udp` — RTP media ports
|
||||
273
docs/core-engine.md
Normal file
273
docs/core-engine.md
Normal file
@@ -0,0 +1,273 @@
|
||||
# Core Engine
|
||||
|
||||
The core engine provides the foundational infrastructure: SIP call control, media handling, call state management, and event distribution.
|
||||
|
||||
## SIP Engine (`core/sip_engine.py` + `core/sippy_engine.py`)
|
||||
|
||||
### Abstract Interface
|
||||
|
||||
All SIP operations go through the `SIPEngine` abstract base class, which defines the contract:
|
||||
|
||||
```python
|
||||
class SIPEngine(ABC):
|
||||
async def start(self) -> None: ...
|
||||
async def stop(self) -> None: ...
|
||||
async def make_call(self, to_uri: str, from_uri: str = None) -> str: ...
|
||||
async def hangup(self, call_id: str) -> None: ...
|
||||
async def send_dtmf(self, call_id: str, digits: str) -> None: ...
|
||||
async def bridge(self, call_id_a: str, call_id_b: str) -> None: ...
|
||||
async def transfer(self, call_id: str, to_uri: str) -> None: ...
|
||||
async def register(self, ...) -> bool: ...
|
||||
async def get_trunk_status(self) -> TrunkStatus: ...
|
||||
```
|
||||
|
||||
This abstraction allows:
|
||||
- **`SippyEngine`** — Production implementation using Sippy B2BUA
|
||||
- **`MockSIPEngine`** — Test implementation that simulates calls in memory
|
||||
|
||||
### Sippy B2BUA Engine
|
||||
|
||||
The `SippyEngine` wraps Sippy B2BUA for SIP signaling:
|
||||
|
||||
```python
|
||||
class SippyEngine(SIPEngine):
|
||||
"""
|
||||
Production SIP engine using Sippy B2BUA.
|
||||
|
||||
Sippy runs its own event loop in a daemon thread.
|
||||
All async methods bridge to Sippy via run_in_executor().
|
||||
"""
|
||||
```
|
||||
|
||||
**Key internals:**
|
||||
|
||||
| Class | Purpose |
|
||||
|-------|---------|
|
||||
| `SipCallLeg` | Tracks one leg of a call (call-id, state, RTP endpoint, SDP) |
|
||||
| `SipBridge` | Two bridged call legs (outbound + device) |
|
||||
| `SippyCallController` | Handles Sippy callbacks (INVITE received, BYE received, DTMF, etc.) |
|
||||
|
||||
**Call lifecycle:**
|
||||
|
||||
```
|
||||
make_call("sip:+18005551234@trunk")
|
||||
│
|
||||
├── Create SipCallLeg (state=TRYING)
|
||||
├── Sippy: send INVITE
|
||||
├── Sippy callback: 180 Ringing → state=RINGING
|
||||
├── Sippy callback: 200 OK → state=CONNECTED
|
||||
│ └── Extract RTP endpoint from SDP
|
||||
│ └── MediaPipeline.add_stream(rtp_host, rtp_port)
|
||||
└── Return call_id
|
||||
|
||||
send_dtmf(call_id, "1")
|
||||
└── Sippy: send RFC 2833 DTMF or SIP INFO
|
||||
|
||||
bridge(call_id_a, call_id_b)
|
||||
├── Create SipBridge(leg_a, leg_b)
|
||||
└── MediaPipeline.bridge_streams(stream_a, stream_b)
|
||||
|
||||
hangup(call_id)
|
||||
├── Sippy: send BYE
|
||||
├── MediaPipeline.remove_stream()
|
||||
└── Cleanup SipCallLeg
|
||||
```
|
||||
|
||||
**Graceful fallback:** If Sippy B2BUA is not installed, the engine falls back to mock mode with a warning — useful for development and testing without a SIP stack.
|
||||
|
||||
### Trunk Registration
|
||||
|
||||
The engine registers with your SIP trunk provider on startup:
|
||||
|
||||
```python
|
||||
await engine.register(
|
||||
registrar="sip.yourprovider.com",
|
||||
username="your_username",
|
||||
password="your_password",
|
||||
realm="sip.yourprovider.com",
|
||||
)
|
||||
```
|
||||
|
||||
Registration is refreshed automatically. `get_trunk_status()` returns the current registration state and health.
|
||||
|
||||
## Media Pipeline (`core/media_pipeline.py`)
|
||||
|
||||
The media pipeline uses PJSUA2 for all RTP audio handling:
|
||||
|
||||
### Key Classes
|
||||
|
||||
| Class | Purpose |
|
||||
|-------|---------|
|
||||
| `AudioTap` | Extracts audio frames from a stream into an async queue (for classifier/STT) |
|
||||
| `MediaStream` | Wraps a single RTP stream (transport port, conference slot, optional tap + recording) |
|
||||
| `MediaPipeline` | Main orchestrator — manages all streams, bridging, recording |
|
||||
|
||||
### Operations
|
||||
|
||||
```python
|
||||
# Add a new RTP stream (called when SIP call connects)
|
||||
stream_id = await pipeline.add_stream(rtp_host, rtp_port, codec="PCMU")
|
||||
|
||||
# Tap audio for real-time analysis
|
||||
tap = await pipeline.tap_stream(stream_id)
|
||||
async for frame in tap:
|
||||
classification = classifier.classify(frame)
|
||||
|
||||
# Bridge two streams (transfer)
|
||||
await pipeline.bridge_streams(stream_a, stream_b)
|
||||
|
||||
# Record a stream to WAV
|
||||
await pipeline.start_recording(stream_id, "/path/to/recording.wav")
|
||||
await pipeline.stop_recording(stream_id)
|
||||
|
||||
# Play a tone (e.g., ringback to caller)
|
||||
await pipeline.play_tone(stream_id, frequency=440, duration_ms=2000)
|
||||
|
||||
# Clean up
|
||||
await pipeline.remove_stream(stream_id)
|
||||
```
|
||||
|
||||
### Conference Bridge
|
||||
|
||||
PJSUA2's conference bridge is central to the architecture. Every stream gets a conference slot, and bridging is done by connecting slots:
|
||||
|
||||
```
|
||||
Conference Bridge
|
||||
├── Slot 0: Outbound call (to company)
|
||||
├── Slot 1: AudioTap (classifier + STT reads from here)
|
||||
├── Slot 2: Recording port
|
||||
├── Slot 3: Device call (your phone, after transfer)
|
||||
└── Slot 4: Tone generator
|
||||
|
||||
Bridge: Slot 0 ↔ Slot 3 (company ↔ your phone)
|
||||
Tap: Slot 0 → Slot 1 (company audio → classifier)
|
||||
Record: Slot 0 → Slot 2 (company audio → WAV file)
|
||||
```
|
||||
|
||||
### Null Audio Device
|
||||
|
||||
The pipeline uses PJSUA2's null audio device — no sound card required. This is essential for headless server deployment.
|
||||
|
||||
## Call Manager (`core/call_manager.py`)
|
||||
|
||||
Tracks all active calls and their state:
|
||||
|
||||
```python
|
||||
class CallManager:
|
||||
async def create_call(self, number, mode, intent, ...) -> ActiveCall
|
||||
async def get_call(self, call_id) -> Optional[ActiveCall]
|
||||
async def update_status(self, call_id, status) -> None
|
||||
async def end_call(self, call_id, reason) -> None
|
||||
async def add_transcript(self, call_id, text, speaker) -> None
|
||||
def active_call_count(self) -> int
|
||||
def get_all_active(self) -> list[ActiveCall]
|
||||
```
|
||||
|
||||
**ActiveCall state:**
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ActiveCall:
|
||||
call_id: str
|
||||
number: str
|
||||
mode: CallMode # direct, hold_slayer, ai_assisted
|
||||
status: CallStatus # trying, ringing, connected, on_hold, transferring, ended
|
||||
intent: Optional[str]
|
||||
device: Optional[str]
|
||||
call_flow_id: Optional[str]
|
||||
|
||||
# Timing
|
||||
started_at: datetime
|
||||
connected_at: Optional[datetime]
|
||||
hold_started_at: Optional[datetime]
|
||||
ended_at: Optional[datetime]
|
||||
|
||||
# Audio classification
|
||||
current_audio_type: Optional[AudioClassification]
|
||||
classification_history: list[ClassificationResult]
|
||||
|
||||
# Transcript
|
||||
transcript_chunks: list[TranscriptChunk]
|
||||
|
||||
# Services
|
||||
services: dict[str, bool] # recording, transcription, etc.
|
||||
```
|
||||
|
||||
The CallManager publishes events to the EventBus on every state change.
|
||||
|
||||
## Event Bus (`core/event_bus.py`)
|
||||
|
||||
Pure asyncio pub/sub connecting all components:
|
||||
|
||||
```python
|
||||
class EventBus:
|
||||
async def publish(self, event: GatewayEvent) -> None
|
||||
def subscribe(self, event_types: set[EventType] = None) -> EventSubscription
|
||||
@property
|
||||
def recent_events(self) -> list[GatewayEvent]
|
||||
@property
|
||||
def subscriber_count(self) -> int
|
||||
```
|
||||
|
||||
### EventSubscription
|
||||
|
||||
Subscriptions are async iterators:
|
||||
|
||||
```python
|
||||
subscription = event_bus.subscribe(event_types={EventType.HUMAN_DETECTED})
|
||||
|
||||
async for event in subscription:
|
||||
print(f"Human detected on call {event.call_id}!")
|
||||
|
||||
# When done:
|
||||
subscription.close()
|
||||
```
|
||||
|
||||
### How it works
|
||||
|
||||
1. Each `subscribe()` creates an `asyncio.Queue` for that subscriber
|
||||
2. `publish()` does `put_nowait()` on every subscriber's queue
|
||||
3. Full queues (dead subscribers) are automatically cleaned up
|
||||
4. Optional type filtering — only receive events you care about
|
||||
5. Event history (last 1000) for late joiners
|
||||
|
||||
### Event Types
|
||||
|
||||
See [models/events.py](../models/events.py) for the full list. Key categories:
|
||||
|
||||
| Category | Events |
|
||||
|----------|--------|
|
||||
| Call Lifecycle | `CALL_STARTED`, `CALL_RINGING`, `CALL_CONNECTED`, `CALL_ENDED`, `CALL_FAILED` |
|
||||
| Hold Slayer | `HOLD_DETECTED`, `HUMAN_DETECTED`, `TRANSFER_STARTED`, `TRANSFER_COMPLETE` |
|
||||
| IVR Navigation | `IVR_STEP`, `IVR_DTMF_SENT`, `IVR_MENU_DETECTED`, `IVR_EXPLORATION` |
|
||||
| Audio | `AUDIO_CLASSIFIED`, `TRANSCRIPT_CHUNK`, `RECORDING_STARTED`, `RECORDING_STOPPED` |
|
||||
| Device | `DEVICE_REGISTERED`, `DEVICE_UNREGISTERED`, `DEVICE_RINGING` |
|
||||
| System | `GATEWAY_STARTED`, `GATEWAY_STOPPED`, `TRUNK_REGISTERED`, `TRUNK_FAILED` |
|
||||
|
||||
## Gateway (`core/gateway.py`)
|
||||
|
||||
The top-level orchestrator that owns and wires all components:
|
||||
|
||||
```python
|
||||
class AIPSTNGateway:
|
||||
def __init__(self, settings: Settings):
|
||||
self.event_bus = EventBus()
|
||||
self.call_manager = CallManager(self.event_bus)
|
||||
self.sip_engine = SippyEngine(settings, self.event_bus)
|
||||
self.media_pipeline = MediaPipeline(settings)
|
||||
self.llm_client = LLMClient(...)
|
||||
self.transcription = TranscriptionService(...)
|
||||
self.classifier = AudioClassifier()
|
||||
self.hold_slayer = HoldSlayer(...)
|
||||
self.recording = RecordingService(...)
|
||||
self.analytics = CallAnalytics(...)
|
||||
self.notification = NotificationService(...)
|
||||
self.call_flow_learner = CallFlowLearner(...)
|
||||
|
||||
async def start(self) -> None: ... # Start all services
|
||||
async def stop(self) -> None: ... # Graceful shutdown
|
||||
async def make_call(self, ...) -> ActiveCall: ...
|
||||
async def end_call(self, call_id) -> None: ...
|
||||
```
|
||||
|
||||
The gateway is created once at application startup (in `main.py` lifespan) and injected into FastAPI routes via dependency injection (`api/deps.py`).
|
||||
180
docs/development.md
Normal file
180
docs/development.md
Normal file
@@ -0,0 +1,180 @@
|
||||
# Development
|
||||
|
||||
## Setup
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Python 3.13+
|
||||
- Ollama (or any OpenAI-compatible LLM) — for IVR menu analysis
|
||||
- Speaches or Whisper API — for speech-to-text (optional for dev)
|
||||
- A SIP trunk account — for making real calls (optional for dev)
|
||||
|
||||
### Install
|
||||
|
||||
```bash
|
||||
git clone <repo-url>
|
||||
cd hold-slayer
|
||||
python -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install -e ".[dev]"
|
||||
```
|
||||
|
||||
### Dev Dependencies
|
||||
|
||||
The `[dev]` extras include:
|
||||
|
||||
- `pytest` — test runner
|
||||
- `pytest-asyncio` — async test support
|
||||
- `pytest-cov` — coverage reporting
|
||||
|
||||
## Testing
|
||||
|
||||
### Run All Tests
|
||||
|
||||
```bash
|
||||
pytest tests/ -v
|
||||
```
|
||||
|
||||
### Run Specific Test Files
|
||||
|
||||
```bash
|
||||
pytest tests/test_audio_classifier.py -v # 18 tests — waveform analysis
|
||||
pytest tests/test_call_flows.py -v # 10 tests — call flow models
|
||||
pytest tests/test_hold_slayer.py -v # 20 tests — IVR nav, EventBus, CallManager
|
||||
pytest tests/test_services.py -v # 27 tests — LLM, notifications, recording,
|
||||
# analytics, learner, EventBus
|
||||
```
|
||||
|
||||
### Run with Coverage
|
||||
|
||||
```bash
|
||||
pytest tests/ --cov=. --cov-report=term-missing
|
||||
```
|
||||
|
||||
### Test Architecture
|
||||
|
||||
Tests are organized by component:
|
||||
|
||||
| File | Tests | What's Covered |
|
||||
|------|-------|----------------|
|
||||
| `test_audio_classifier.py` | 18 | Silence, tone, DTMF, music, speech detection; feature extraction; classification history |
|
||||
| `test_call_flows.py` | 10 | CallFlowStep types, CallFlow navigation, serialization roundtrip, create/summary models |
|
||||
| `test_hold_slayer.py` | 20 | IVR menu navigation (6 intent scenarios), EventBus pub/sub, CallManager lifecycle, MockSIPEngine |
|
||||
| `test_services.py` | 27 | LLMClient init/stats/chat/JSON/errors/IVR analysis, NotificationService event mapping, RecordingService paths, CallAnalytics summaries, CallFlowLearner build/merge, EventBus integration |
|
||||
|
||||
### Known Test Issues
|
||||
|
||||
`test_complex_tone_as_music` — A synthetic multi-harmonic tone is classified as `LIVE_HUMAN` instead of `MUSIC`. This is a known edge case. Real hold music has different spectral characteristics than synthetic test signals. This test documents the limitation rather than a bug.
|
||||
|
||||
### Writing Tests
|
||||
|
||||
All tests use `pytest-asyncio` for async support. The test configuration in `pyproject.toml`:
|
||||
|
||||
```toml
|
||||
[tool.pytest.ini_options]
|
||||
asyncio_mode = "auto"
|
||||
```
|
||||
|
||||
This means all `async def test_*` functions automatically run in an asyncio event loop.
|
||||
|
||||
**Pattern for testing services:**
|
||||
|
||||
```python
|
||||
import pytest
|
||||
from services.llm_client import LLMClient
|
||||
|
||||
class TestLLMClient:
|
||||
def test_init(self):
|
||||
client = LLMClient(base_url="http://localhost:11434/v1", model="llama3")
|
||||
assert client._model == "llama3"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_chat(self):
|
||||
# Mock httpx for unit tests
|
||||
...
|
||||
```
|
||||
|
||||
**Pattern for testing EventBus:**
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from core.event_bus import EventBus
|
||||
from models.events import EventType, GatewayEvent
|
||||
|
||||
async def test_publish_receive():
|
||||
bus = EventBus()
|
||||
sub = bus.subscribe()
|
||||
|
||||
event = GatewayEvent(type=EventType.CALL_STARTED, call_id="test", data={})
|
||||
await bus.publish(event)
|
||||
|
||||
received = await asyncio.wait_for(sub.get(), timeout=1.0)
|
||||
assert received.type == EventType.CALL_STARTED
|
||||
```
|
||||
|
||||
## Project Conventions
|
||||
|
||||
### Code Style
|
||||
|
||||
- **Type hints everywhere** — All function signatures have type annotations
|
||||
- **Pydantic models** — All data structures are Pydantic BaseModel or dataclass
|
||||
- **Async by default** — All I/O operations are async
|
||||
- **Logging** — Every module uses `logging.getLogger(__name__)`
|
||||
- **Docstrings** — Module-level docstrings explain purpose and usage
|
||||
|
||||
### File Organization
|
||||
|
||||
```
|
||||
module.py
|
||||
├── Module docstring (purpose, usage examples)
|
||||
├── Imports (stdlib → third-party → local)
|
||||
├── Constants
|
||||
├── Classes
|
||||
│ ├── Class docstring
|
||||
│ ├── __init__
|
||||
│ ├── Public methods (async)
|
||||
│ └── Private methods (_prefixed)
|
||||
└── Module-level functions (if any)
|
||||
```
|
||||
|
||||
### Error Handling
|
||||
|
||||
- **Services never crash the call** — All service errors are caught, logged, and return sensible defaults
|
||||
- **LLM failures** return empty string/dict — the Hold Slayer falls back to waiting
|
||||
- **SIP errors** publish `CALL_FAILED` events — the user is notified
|
||||
- **HTTP errors** in the API return structured error responses
|
||||
|
||||
### Event-Driven Architecture
|
||||
|
||||
All components communicate through the EventBus:
|
||||
|
||||
1. **Publishers** — SIP engine, Hold Slayer, classifier, services
|
||||
2. **Subscribers** — WebSocket handler, MCP server, notification service, analytics
|
||||
|
||||
This decouples components and makes the system extensible. Adding a new feature (e.g., Slack notifications) means subscribing to events — no changes to existing code.
|
||||
|
||||
### Dependency Injection
|
||||
|
||||
The `AIPSTNGateway` owns all services and is injected into FastAPI routes via `api/deps.py`:
|
||||
|
||||
```python
|
||||
# api/deps.py
|
||||
async def get_gateway() -> AIPSTNGateway:
|
||||
return app.state.gateway
|
||||
|
||||
# api/calls.py
|
||||
@router.post("/outbound")
|
||||
async def make_call(request: CallRequest, gateway: AIPSTNGateway = Depends(get_gateway)):
|
||||
...
|
||||
```
|
||||
|
||||
This makes testing easy — swap the gateway for a mock in tests.
|
||||
|
||||
## Contributing
|
||||
|
||||
1. Create a feature branch
|
||||
2. Write tests for new functionality
|
||||
3. Ensure all tests pass: `pytest tests/ -v`
|
||||
4. Follow existing code conventions
|
||||
5. Update documentation in `/docs` if adding new features
|
||||
6. Submit a pull request
|
||||
104
docs/dial-plan.md
Normal file
104
docs/dial-plan.md
Normal file
@@ -0,0 +1,104 @@
|
||||
# Hold Slayer Gateway — Dial Plan
|
||||
|
||||
## Overview
|
||||
|
||||
The gateway accepts calls from registered SIP endpoints and routes them
|
||||
based on the dialled digits. No trunk-access prefix (no "9") is needed.
|
||||
All routing is pattern-matched in order; the first match wins.
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ Emergency Services — 911
|
||||
|
||||
> **911 and 9911 are always routed directly to the PSTN trunk.**
|
||||
> No gateway logic intercepts, records, or delays these calls.
|
||||
> `9911` is accepted in addition to `911` to catch the common
|
||||
> mis-dial habit of dialling `9` for an outside line.
|
||||
>
|
||||
> **Your SIP trunk provider must support emergency calling on your DID.**
|
||||
> Verify this with your provider before putting this system in service.
|
||||
> VoIP emergency calling has location limitations — ensure your
|
||||
> registered location is correct with your provider.
|
||||
|
||||
---
|
||||
|
||||
## Extension Ranges
|
||||
|
||||
| Range | Purpose |
|
||||
|-------|--------------------------------|
|
||||
| 2XX | SIP endpoints (phones/softphones) |
|
||||
| 5XX | System services |
|
||||
|
||||
---
|
||||
|
||||
## 2XX — Endpoint Extensions
|
||||
|
||||
Extensions are auto-assigned from **221** upward when a SIP device
|
||||
registers (`SIP REGISTER`) with the gateway or via `POST /api/devices`.
|
||||
|
||||
| Extension | Format | Example |
|
||||
|-----------|---------------------------------|--------------------------------|
|
||||
| 221–299 | Auto-assigned to registered devices | `sip:221@gateway.helu.ca` |
|
||||
|
||||
### Assignment policy
|
||||
|
||||
- First device to register gets **221**, next **222**, and so on.
|
||||
- Extensions are persisted in the database and survive restarts.
|
||||
- If a device is removed its extension is freed and may be reassigned.
|
||||
- `GATEWAY_SIP_DOMAIN` in `.env` sets the domain part of the URI.
|
||||
|
||||
---
|
||||
|
||||
## 5XX — System Services
|
||||
|
||||
| Extension | Service | Notes |
|
||||
|-----------|----------------------|-----------------------------------------|
|
||||
| 500 | Auto-attendant | Reserved — not yet implemented |
|
||||
| 510 | Gateway status | Plays a status announcement |
|
||||
| 511 | Echo test | Returns audio back to caller |
|
||||
| 520 | Hold Slayer launch | Prompts for a number to hold-slay |
|
||||
| 599 | Operator fallback | Transfers to preferred device |
|
||||
|
||||
---
|
||||
|
||||
## Outbound PSTN
|
||||
|
||||
All outbound patterns are routed via the configured SIP trunk
|
||||
(`SIP_TRUNK_HOST`). No access code prefix is needed.
|
||||
|
||||
### Pattern table
|
||||
|
||||
| Pattern | Example input | Normalised to | Notes |
|
||||
|----------------------|--------------------|---------------------|------------------------------------|
|
||||
| `+1NPANXXXXXX` | `+16135550100` | `+16135550100` | E.164 — pass through as-is |
|
||||
| `1NPANXXXXXX` | `16135550100` | `+16135550100` | NANP with country code |
|
||||
| `NPANXXXXXX` | `6135550100` | `+16135550100` | 10-digit NANP — prepend `+1` |
|
||||
| `011CC…` | `01144201234567` | `+44201234567` | International — strip `011` |
|
||||
| `00CC…` | `004420…` | `+4420…` | International alt prefix |
|
||||
| `+CC…` | `+44201234567` | `+44201234567` | E.164 international — pass through |
|
||||
|
||||
### Rules
|
||||
|
||||
1. E.164 (`+` prefix) is always passed to the trunk unchanged.
|
||||
2. NANP 11-digit (`1` + 10 digits) is normalised to E.164 by prepending `+`.
|
||||
3. NANP 10-digit is normalised to E.164 by prepending `+1`.
|
||||
4. International via `011` or `00` strips the IDD prefix and prepends `+`.
|
||||
5. 7-digit local dialling is **not supported** — always dial the area code.
|
||||
|
||||
---
|
||||
|
||||
## Inbound PSTN
|
||||
|
||||
Calls arriving from the trunk on the DID (`SIP_TRUNK_DID`) are routed
|
||||
to the highest-priority online device. If no device is online the call
|
||||
is queued or dropped (configurable via `MAX_HOLD_TIME`).
|
||||
|
||||
---
|
||||
|
||||
## Future
|
||||
|
||||
- Named regions / area-code routing
|
||||
- Least-cost routing across multiple trunks
|
||||
- Time-of-day routing (business hours vs. after-hours)
|
||||
- Ring groups across multiple 2XX extensions
|
||||
- Voicemail (extension 500)
|
||||
168
docs/hold-slayer-service.md
Normal file
168
docs/hold-slayer-service.md
Normal file
@@ -0,0 +1,168 @@
|
||||
# Hold Slayer Service
|
||||
|
||||
The Hold Slayer (`services/hold_slayer.py`) is the brain of the system. It orchestrates the entire process of navigating IVR menus, detecting hold music, recognizing when a human picks up, and triggering the transfer to your phone.
|
||||
|
||||
## Two Operating Modes
|
||||
|
||||
### 1. Flow-Guided Mode (`run_with_flow`)
|
||||
|
||||
When a stored `CallFlow` exists for the number being called, the Hold Slayer follows it step-by-step:
|
||||
|
||||
```python
|
||||
await hold_slayer.run_with_flow(call_id, call_flow)
|
||||
```
|
||||
|
||||
The call flow is a tree of steps (see [Call Flows](call-flows.md)). The Hold Slayer walks through them:
|
||||
|
||||
```
|
||||
CallFlow: "Chase Bank Main"
|
||||
├── Step 1: WAIT 3s (wait for greeting)
|
||||
├── Step 2: LISTEN (transcribe → LLM picks option)
|
||||
├── Step 3: DTMF "2" (press 2 for account services)
|
||||
├── Step 4: LISTEN (transcribe → LLM picks option)
|
||||
├── Step 5: DTMF "1" (press 1 for disputes)
|
||||
├── Step 6: HOLD (wait for human)
|
||||
└── Step 7: TRANSFER (bridge to your phone)
|
||||
```
|
||||
|
||||
**Step execution logic:**
|
||||
|
||||
| Step Type | What Happens |
|
||||
|-----------|-------------|
|
||||
| `DTMF` | Send the specified digits via SIP engine |
|
||||
| `WAIT` | Sleep for the specified duration |
|
||||
| `LISTEN` | Record audio, transcribe, then: use hardcoded DTMF if available, otherwise ask LLM to pick the right option |
|
||||
| `HOLD` | Monitor audio classification, wait for human detection |
|
||||
| `SPEAK` | Play a WAV file or TTS audio (for interactive prompts) |
|
||||
| `TRANSFER` | Bridge the call to the user's device |
|
||||
|
||||
### 2. Exploration Mode (`run_exploration`)
|
||||
|
||||
When no stored call flow exists, the Hold Slayer explores the IVR autonomously:
|
||||
|
||||
```python
|
||||
await hold_slayer.run_exploration(call_id, intent="dispute Amazon charge")
|
||||
```
|
||||
|
||||
**Exploration loop:**
|
||||
|
||||
```
|
||||
┌─→ Classify audio (3-second window)
|
||||
│ ├── SILENCE → wait, increment silence counter
|
||||
│ ├── RINGING → wait for answer
|
||||
│ ├── MUSIC → hold detected, monitor for transition
|
||||
│ ├── DTMF → ignore (echo detection)
|
||||
│ ├── IVR_PROMPT/SPEECH →
|
||||
│ │ ├── Transcribe the audio
|
||||
│ │ ├── Send transcript + intent to LLM
|
||||
│ │ ├── LLM returns: { "action": "dtmf", "digits": "2" }
|
||||
│ │ └── Send DTMF
|
||||
│ └── LIVE_HUMAN → human detected!
|
||||
│ └── TRANSFER
|
||||
│
|
||||
└── Loop until: human detected, max hold time, or call ended
|
||||
```
|
||||
|
||||
**Exploration discoveries** are recorded and can be fed into the `CallFlowLearner` to build a reusable flow for next time.
|
||||
|
||||
## Human Detection
|
||||
|
||||
The critical moment — detecting when a live person picks up after hold:
|
||||
|
||||
### Detection Chain
|
||||
|
||||
```
|
||||
AudioClassifier.classify(audio_frame)
|
||||
│
|
||||
├── Feature extraction:
|
||||
│ ├── RMS energy (loudness)
|
||||
│ ├── Spectral flatness (noise vs tone)
|
||||
│ ├── Zero-crossing rate (speech indicator)
|
||||
│ ├── Dominant frequency
|
||||
│ └── Spectral centroid
|
||||
│
|
||||
├── Classification: MUSIC, SILENCE, SPEECH, etc.
|
||||
│
|
||||
└── Transition detection:
|
||||
└── detect_hold_to_human_transition()
|
||||
├── Check last N classifications
|
||||
├── Pattern: MUSIC, MUSIC, MUSIC → SPEECH, SPEECH
|
||||
├── Confidence: speech energy > threshold
|
||||
└── Result: HUMAN_DETECTED event
|
||||
```
|
||||
|
||||
### What triggers a transfer?
|
||||
|
||||
The Hold Slayer considers a human detected when:
|
||||
|
||||
1. **Classification history** shows a transition from hold-like audio (MUSIC, SILENCE) to speech-like audio (LIVE_HUMAN, IVR_PROMPT)
|
||||
2. **Energy threshold** — the speech audio has sufficient RMS energy (not just background noise)
|
||||
3. **Consecutive speech frames** — at least 2-3 consecutive speech classifications (avoids false positives from hold music announcements like "your call is important to us")
|
||||
|
||||
### False Positive Handling
|
||||
|
||||
Hold music often includes periodic announcements ("Your estimated wait time is 15 minutes"). These are speech, but not a live human. The Hold Slayer handles this by:
|
||||
|
||||
1. **Duration check** — Hold announcements are typically short (5-15 seconds). A live agent conversation continues longer.
|
||||
2. **Pattern matching** — After speech, if audio returns to MUSIC within a few seconds, it was just an announcement.
|
||||
3. **Transcript analysis** — If transcription is active, the LLM can analyze whether the speech sounds like a recorded announcement vs. a live greeting.
|
||||
|
||||
## LISTEN Step + LLM Fallback
|
||||
|
||||
The most interesting step type. When the Hold Slayer encounters a LISTEN step in a call flow:
|
||||
|
||||
```python
|
||||
# Step has hardcoded DTMF? Use it directly.
|
||||
if step.dtmf:
|
||||
await sip_engine.send_dtmf(call_id, step.dtmf)
|
||||
|
||||
# No hardcoded DTMF? Ask the LLM.
|
||||
else:
|
||||
transcript = await transcription.transcribe(audio)
|
||||
decision = await llm_client.analyze_ivr_menu(
|
||||
transcript=transcript,
|
||||
intent=intent,
|
||||
previous_selections=previous_steps,
|
||||
)
|
||||
if decision.get("action") == "dtmf":
|
||||
await sip_engine.send_dtmf(call_id, decision["digits"])
|
||||
```
|
||||
|
||||
The LLM receives:
|
||||
- The IVR transcript ("Press 1 for billing, press 2 for technical support...")
|
||||
- The user's intent ("dispute a charge on my December statement")
|
||||
- Previous menu selections (to avoid loops)
|
||||
|
||||
And returns structured JSON:
|
||||
```json
|
||||
{
|
||||
"action": "dtmf",
|
||||
"digits": "1",
|
||||
"reasoning": "Billing is the correct department for charge disputes"
|
||||
}
|
||||
```
|
||||
|
||||
## Event Publishing
|
||||
|
||||
The Hold Slayer publishes events throughout the process:
|
||||
|
||||
| Event | When |
|
||||
|-------|------|
|
||||
| `IVR_STEP` | Each step in the call flow is executed |
|
||||
| `IVR_DTMF_SENT` | DTMF digits are sent |
|
||||
| `IVR_MENU_DETECTED` | An IVR menu prompt is transcribed |
|
||||
| `HOLD_DETECTED` | Hold music is detected |
|
||||
| `HUMAN_DETECTED` | Live human speech detected after hold |
|
||||
| `TRANSFER_STARTED` | Call bridge initiated to user's device |
|
||||
| `TRANSFER_COMPLETE` | User's device answered, bridge active |
|
||||
|
||||
All events flow through the EventBus to WebSocket clients, MCP server, notification service, and analytics.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Setting | Description | Default |
|
||||
|---------|-------------|---------|
|
||||
| `MAX_HOLD_TIME` | Maximum seconds to wait on hold before giving up | `7200` (2 hours) |
|
||||
| `HOLD_CHECK_INTERVAL` | Seconds between audio classification checks | `2.0` |
|
||||
| `DEFAULT_TRANSFER_DEVICE` | Device to transfer to when human detected | `sip_phone` |
|
||||
| `CLASSIFIER_WINDOW_SECONDS` | Audio window size for classification | `3.0` |
|
||||
155
docs/mcp-server.md
Normal file
155
docs/mcp-server.md
Normal file
@@ -0,0 +1,155 @@
|
||||
# MCP Server
|
||||
|
||||
The MCP (Model Context Protocol) server lets any MCP-compatible AI assistant control the Hold Slayer gateway. Built with [FastMCP](https://github.com/jlowin/fastmcp), it exposes tools and resources over SSE.
|
||||
|
||||
## Overview
|
||||
|
||||
An AI assistant connects via SSE to the MCP server and gains access to tools for placing calls, checking status, sending DTMF, getting transcripts, and managing call flows. The assistant can orchestrate an entire call through natural language.
|
||||
|
||||
## Tools
|
||||
|
||||
### make_call
|
||||
|
||||
Place an outbound call through the SIP trunk.
|
||||
|
||||
| Param | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `number` | string | Yes | Phone number to call (E.164 format) |
|
||||
| `mode` | string | No | Call mode: `direct`, `hold_slayer`, `ai_assisted` (default: `hold_slayer`) |
|
||||
| `intent` | string | No | What you want to accomplish on the call |
|
||||
| `call_flow_id` | string | No | ID of a stored call flow to follow |
|
||||
|
||||
Returns: Call ID and initial status.
|
||||
|
||||
### end_call
|
||||
|
||||
Hang up an active call.
|
||||
|
||||
| Param | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `call_id` | string | Yes | The call to hang up |
|
||||
|
||||
### send_dtmf
|
||||
|
||||
Send touch-tone digits to an active call (for manual IVR navigation).
|
||||
|
||||
| Param | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `call_id` | string | Yes | The call to send digits to |
|
||||
| `digits` | string | Yes | DTMF digits to send (e.g., "1", "3#", "1234") |
|
||||
|
||||
### get_call_status
|
||||
|
||||
Check the current state of a call.
|
||||
|
||||
| Param | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `call_id` | string | Yes | The call to check |
|
||||
|
||||
Returns: Status, duration, hold time, audio classification, transcript excerpt.
|
||||
|
||||
### get_call_transcript
|
||||
|
||||
Get the live transcript of a call.
|
||||
|
||||
| Param | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `call_id` | string | Yes | The call to get transcript for |
|
||||
|
||||
Returns: Array of transcript chunks with timestamps and speaker labels.
|
||||
|
||||
### get_call_recording
|
||||
|
||||
Get recording metadata and file path for a call.
|
||||
|
||||
| Param | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `call_id` | string | Yes | The call to get recording for |
|
||||
|
||||
Returns: Recording path, duration, file size.
|
||||
|
||||
### list_active_calls
|
||||
|
||||
List all calls currently in progress. No parameters.
|
||||
|
||||
Returns: Array of active calls with status, number, duration.
|
||||
|
||||
### get_call_summary
|
||||
|
||||
Get analytics summary — hold times, success rates, call volume. No parameters.
|
||||
|
||||
Returns: Aggregate statistics across all calls.
|
||||
|
||||
### search_call_history
|
||||
|
||||
Search past calls by number, company, or date range.
|
||||
|
||||
| Param | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `query` | string | Yes | Search term (phone number, company name) |
|
||||
| `limit` | int | No | Max results (default: 20) |
|
||||
|
||||
### learn_call_flow
|
||||
|
||||
Build a reusable call flow from a completed exploration call.
|
||||
|
||||
| Param | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `call_id` | string | Yes | The exploration call to learn from |
|
||||
| `company` | string | No | Company name for the flow |
|
||||
|
||||
Returns: The generated CallFlow object.
|
||||
|
||||
## Resources
|
||||
|
||||
MCP resources provide read-only data that assistants can reference:
|
||||
|
||||
| Resource URI | Description |
|
||||
|-------------|-------------|
|
||||
| `gateway://status` | Current gateway status — trunk registration, active calls, service health |
|
||||
| `gateway://calls` | List of all active calls with current status |
|
||||
| `gateway://calls/{call_id}` | Detailed status for a specific call |
|
||||
| `gateway://flows` | List of all stored call flows |
|
||||
| `gateway://analytics` | Call analytics summary |
|
||||
|
||||
## Configuration
|
||||
|
||||
The MCP server is mounted on the FastAPI app at `/mcp`.
|
||||
|
||||
### Connecting an AI Assistant
|
||||
|
||||
Add to your MCP client configuration (e.g., Claude Desktop, Cline):
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"hold-slayer": {
|
||||
"url": "http://localhost:8000/mcp/sse"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Example Conversation
|
||||
|
||||
Here is how an AI assistant would use the MCP tools to handle a complete call:
|
||||
|
||||
**User:** "Call Chase Bank and dispute the Amazon charge from December 15th"
|
||||
|
||||
**Assistant actions:**
|
||||
|
||||
1. Calls `make_call(number="+18005551234", mode="hold_slayer", intent="dispute Amazon charge Dec 15th", call_flow_id="chase-bank-main")`
|
||||
2. Receives `call_id: "call_abc123"`
|
||||
3. Polls `get_call_status("call_abc123")` periodically
|
||||
4. Status progression: `trying` → `ringing` → `connected` → `on_hold`
|
||||
5. Tells user: "I'm on hold with Chase Bank. Currently 4 minutes in. I'll let you know when someone picks up."
|
||||
6. Status changes to `transferring` — human detected!
|
||||
7. Tells user: "A live agent just picked up. I'm transferring the call to your desk phone now. Pick up!"
|
||||
8. After the call, calls `learn_call_flow("call_abc123", company="Chase Bank")` to save the IVR path for next time.
|
||||
|
||||
**User:** "How long was I on hold?"
|
||||
|
||||
**Assistant actions:**
|
||||
|
||||
1. Calls `get_call_summary()`
|
||||
2. Reports: "Your Chase Bank call lasted 12 minutes total, with 8 minutes on hold. The disputes department averages 6 minutes hold time on Tuesdays."
|
||||
290
docs/services.md
Normal file
290
docs/services.md
Normal file
@@ -0,0 +1,290 @@
|
||||
# Services
|
||||
|
||||
The intelligence layer services that power Hold Slayer's decision-making, transcription, recording, analytics, and notifications.
|
||||
|
||||
## LLM Client (`services/llm_client.py`)
|
||||
|
||||
Async HTTP client for any OpenAI-compatible chat completion API. No SDK dependency — just httpx.
|
||||
|
||||
### Supported Backends
|
||||
|
||||
| Backend | URL | Notes |
|
||||
|---------|-----|-------|
|
||||
| Ollama | `http://localhost:11434/v1` | Local, free, good for dev |
|
||||
| LM Studio | `http://localhost:1234/v1` | Local, free, GUI |
|
||||
| vLLM | `http://localhost:8000/v1` | Local, fast, production |
|
||||
| OpenAI | `https://api.openai.com/v1` | Cloud, paid, best quality |
|
||||
|
||||
### Usage
|
||||
|
||||
```python
|
||||
client = LLMClient(
|
||||
base_url="http://localhost:11434/v1",
|
||||
model="llama3",
|
||||
api_key="not-needed", # Ollama doesn't need a key
|
||||
timeout=30.0,
|
||||
max_tokens=1024,
|
||||
temperature=0.3,
|
||||
)
|
||||
|
||||
# Simple chat
|
||||
response = await client.chat("What is 2+2?")
|
||||
# "4"
|
||||
|
||||
# Chat with system prompt
|
||||
response = await client.chat(
|
||||
"Parse this menu transcript...",
|
||||
system="You are a phone menu parser. Return JSON.",
|
||||
)
|
||||
|
||||
# Structured JSON response (auto-parses)
|
||||
result = await client.chat_json(
|
||||
"Extract menu options from: Press 1 for billing, press 2 for support",
|
||||
system="Return JSON with 'options' array.",
|
||||
)
|
||||
# {"options": [{"digit": "1", "label": "billing"}, {"digit": "2", "label": "support"}]}
|
||||
```
|
||||
|
||||
### IVR Menu Analysis
|
||||
|
||||
The primary use case — analyzing IVR transcripts to pick the right menu option:
|
||||
|
||||
```python
|
||||
decision = await client.analyze_ivr_menu(
|
||||
transcript="Welcome to Chase Bank. Press 1 for account balance, press 2 for recent transactions, press 3 for disputes, press 0 for an agent.",
|
||||
intent="dispute a charge from Amazon on December 15th",
|
||||
previous_selections=["main_menu"],
|
||||
)
|
||||
# {"action": "dtmf", "digits": "3", "reasoning": "Disputes is the correct department"}
|
||||
```
|
||||
|
||||
### JSON Extraction
|
||||
|
||||
The client handles messy LLM output gracefully:
|
||||
|
||||
1. Try `json.loads()` on the raw response
|
||||
2. If that fails, look for ```json ... ``` markdown blocks
|
||||
3. If that fails, look for `{...}` patterns in the text
|
||||
4. If all fail, return empty dict (caller handles gracefully)
|
||||
|
||||
### Stats Tracking
|
||||
|
||||
```python
|
||||
stats = client.stats
|
||||
# {
|
||||
# "total_requests": 47,
|
||||
# "total_errors": 2,
|
||||
# "avg_latency_ms": 234.5,
|
||||
# "model": "llama3",
|
||||
# "base_url": "http://localhost:11434/v1"
|
||||
# }
|
||||
```
|
||||
|
||||
### Error Handling
|
||||
|
||||
- HTTP errors return empty string/dict (never crashes the call)
|
||||
- Timeouts are configurable (default 30s)
|
||||
- All errors are logged with full context
|
||||
- Stats track error rates for monitoring
|
||||
|
||||
## Transcription Service (`services/transcription.py`)
|
||||
|
||||
Real-time speech-to-text using Speaches (a self-hosted Whisper API).
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
Audio frames (from AudioTap)
|
||||
│
|
||||
└── POST /v1/audio/transcriptions
|
||||
├── model: whisper-large-v3
|
||||
├── audio: WAV bytes
|
||||
└── language: en
|
||||
│
|
||||
└── Response: { "text": "Press 1 for billing..." }
|
||||
```
|
||||
|
||||
### Usage
|
||||
|
||||
```python
|
||||
service = TranscriptionService(
|
||||
speaches_url="http://perseus.helu.ca:22070",
|
||||
model="whisper-large-v3",
|
||||
)
|
||||
|
||||
# Transcribe audio bytes
|
||||
text = await service.transcribe(audio_bytes)
|
||||
# "Welcome to Chase Bank. For English, press 1."
|
||||
|
||||
# Transcribe with language hint
|
||||
text = await service.transcribe(audio_bytes, language="fr")
|
||||
```
|
||||
|
||||
### Integration with Hold Slayer
|
||||
|
||||
The transcription service is called when the audio classifier detects speech (IVR_PROMPT or LIVE_HUMAN). The transcript is then:
|
||||
|
||||
1. Published as a `TRANSCRIPT_CHUNK` event (→ WebSocket clients)
|
||||
2. Fed to the LLM for IVR menu analysis
|
||||
3. Stored in the call's transcript history
|
||||
4. Used by the Call Flow Learner to build reusable flows
|
||||
|
||||
## Recording Service (`services/recording.py`)
|
||||
|
||||
Manages call recordings via the PJSUA2 media pipeline.
|
||||
|
||||
### Storage Structure
|
||||
|
||||
```
|
||||
recordings/
|
||||
├── 2026/
|
||||
│ ├── 01/
|
||||
│ │ ├── 15/
|
||||
│ │ │ ├── call_abc123_outbound.wav
|
||||
│ │ │ ├── call_abc123_mixed.wav
|
||||
│ │ │ └── call_def456_outbound.wav
|
||||
│ │ └── 16/
|
||||
│ │ └── ...
|
||||
│ └── 02/
|
||||
│ └── ...
|
||||
```
|
||||
|
||||
### Recording Types
|
||||
|
||||
| Type | Description |
|
||||
|------|-------------|
|
||||
| **Outbound** | Audio from the company (IVR, hold music, agent) |
|
||||
| **Inbound** | Audio from the user's device (after transfer) |
|
||||
| **Mixed** | Both parties in one file (for review) |
|
||||
|
||||
### Usage
|
||||
|
||||
```python
|
||||
service = RecordingService(
|
||||
storage_dir="recordings",
|
||||
max_recording_seconds=7200, # 2 hours
|
||||
sample_rate=16000,
|
||||
)
|
||||
|
||||
# Start recording
|
||||
session = await service.start_recording(call_id, stream_id)
|
||||
# session.path = "recordings/2026/01/15/call_abc123_outbound.wav"
|
||||
|
||||
# Stop recording
|
||||
metadata = await service.stop_recording(call_id)
|
||||
# metadata = { "duration": 847.3, "file_size": 27113600, "path": "..." }
|
||||
|
||||
# List recordings for a call
|
||||
recordings = service.get_recordings(call_id)
|
||||
```
|
||||
|
||||
## Call Analytics (`services/call_analytics.py`)
|
||||
|
||||
Tracks call metrics and provides insights for monitoring and optimization.
|
||||
|
||||
### Metrics Tracked
|
||||
|
||||
| Metric | Description |
|
||||
|--------|-------------|
|
||||
| Hold time | Duration spent on hold per call |
|
||||
| Total call duration | End-to-end call time |
|
||||
| Success rate | Percentage of calls that reached a human |
|
||||
| IVR navigation time | Time spent navigating menus |
|
||||
| Company patterns | Per-company hold time averages |
|
||||
| Time-of-day trends | When hold times are shortest |
|
||||
|
||||
### Usage
|
||||
|
||||
```python
|
||||
analytics = CallAnalytics(max_history=10000)
|
||||
|
||||
# Record a completed call
|
||||
analytics.record_call(
|
||||
call_id="call_abc123",
|
||||
number="+18005551234",
|
||||
company="Chase Bank",
|
||||
hold_time=780,
|
||||
total_duration=847,
|
||||
success=True,
|
||||
ivr_steps=6,
|
||||
)
|
||||
|
||||
# Get summary
|
||||
summary = analytics.get_summary()
|
||||
# {
|
||||
# "total_calls": 142,
|
||||
# "success_rate": 0.89,
|
||||
# "avg_hold_time": 623.4,
|
||||
# "avg_total_duration": 712.1,
|
||||
# }
|
||||
|
||||
# Per-company stats
|
||||
stats = analytics.get_company_stats("Chase Bank")
|
||||
# {
|
||||
# "total_calls": 23,
|
||||
# "avg_hold_time": 845.2,
|
||||
# "best_time": "Tuesday 10:00 AM",
|
||||
# "success_rate": 0.91,
|
||||
# }
|
||||
|
||||
# Top numbers by call volume
|
||||
top = analytics.get_top_numbers(limit=10)
|
||||
|
||||
# Hold time trends by hour
|
||||
trends = analytics.get_hold_time_trend()
|
||||
# [{"hour": 9, "avg_hold": 320}, {"hour": 10, "avg_hold": 480}, ...]
|
||||
```
|
||||
|
||||
## Notification Service (`services/notification.py`)
|
||||
|
||||
Sends alerts when important things happen on calls.
|
||||
|
||||
### Notification Channels
|
||||
|
||||
| Channel | Status | Use Case |
|
||||
|---------|--------|----------|
|
||||
| **WebSocket** | ✅ Active | Real-time UI updates (always on) |
|
||||
| **SMS** | ✅ Active | Critical alerts (human detected, call failed) |
|
||||
| **Push** | 🔮 Future | Mobile app notifications |
|
||||
|
||||
### Notification Priority
|
||||
|
||||
| Priority | Events | Delivery |
|
||||
|----------|--------|----------|
|
||||
| `CRITICAL` | Human detected, transfer started | WebSocket + SMS |
|
||||
| `HIGH` | Call failed, call timeout | WebSocket + SMS |
|
||||
| `NORMAL` | Hold detected, call ended | WebSocket only |
|
||||
| `LOW` | IVR step, DTMF sent | WebSocket only |
|
||||
|
||||
### Event → Notification Mapping
|
||||
|
||||
| Event | Notification |
|
||||
|-------|-------------|
|
||||
| `HUMAN_DETECTED` | 🚨 "A live person picked up — transferring you now!" |
|
||||
| `TRANSFER_STARTED` | 📞 "Your call has been connected. Pick up your phone!" |
|
||||
| `CALL_FAILED` | ❌ "The call couldn't be completed." |
|
||||
| `HOLD_DETECTED` | ⏳ "You're on hold. We'll notify you when someone picks up." |
|
||||
| `IVR_STEP` | 📍 "Navigating phone menu..." |
|
||||
| `IVR_DTMF_SENT` | 📱 "Pressed 3" |
|
||||
| `CALL_ENDED` | 📴 "The call has ended." |
|
||||
|
||||
### Deduplication
|
||||
|
||||
The notification service tracks what's been sent per call to avoid spamming:
|
||||
|
||||
```python
|
||||
# Won't send duplicate "on hold" notifications for the same call
|
||||
self._notified: dict[str, set[str]] # call_id → set of event dedup keys
|
||||
```
|
||||
|
||||
Tracking is cleaned up when a call ends.
|
||||
|
||||
### SMS Configuration
|
||||
|
||||
SMS is sent for `CRITICAL` priority notifications when `NOTIFY_SMS_NUMBER` is configured:
|
||||
|
||||
```env
|
||||
NOTIFY_SMS_NUMBER=+15559876543
|
||||
```
|
||||
|
||||
The SMS sender is a placeholder — wire up your preferred provider (Twilio, AWS SNS, etc.).
|
||||
Reference in New Issue
Block a user