feat: add initial Hold Slayer AI telephony gateway implementation

Complete project scaffolding and core implementation of an AI-powered
telephony system that calls companies, navigates IVR menus, waits on
hold, and transfers to the user when a human answers.

Key components:
- FastAPI server with REST API, WebSocket, and MCP (SSE) interfaces
- SIP/VoIP call management via PJSUA2 with RTP audio streaming
- LLM-powered IVR navigation using OpenAI/Anthropic with tool calling
- Hold detection service combining audio analysis and silence detection
- Real-time STT (Whisper/Deepgram) and TTS (OpenAI/Piper) pipelines
- Call recording with per-channel and mixed audio capture
- Event bus (asyncio pub/sub) for real-time client updates
- Web dashboard with live call monitoring
- SQLite persistence via SQLAlchemy with call history and analytics
- Notification support (email, SMS, webhook, desktop)
- Docker Compose deployment with Opal VoIP and Opal Media containers
- Comprehensive test suite with unit, integration, and E2E tests
- Simplified .gitignore and full project documentation in README
This commit is contained in:
2026-03-21 19:23:26 +00:00
parent c9ff60702b
commit ecf37658ce
56 changed files with 11601 additions and 164 deletions

233
docs/call-flows.md Normal file
View File

@@ -0,0 +1,233 @@
# Call Flows
Call flows are reusable IVR navigation trees that tell Hold Slayer exactly how to navigate a company's phone menu. Once a flow is learned (manually or via exploration), subsequent calls to the same number skip the LLM analysis and follow the stored steps directly.
## Data Model
### CallFlowStep
A single step in the IVR navigation:
```python
class CallFlowStep(BaseModel):
id: str # Unique step identifier
type: CallFlowStepType # DTMF, WAIT, LISTEN, HOLD, SPEAK, TRANSFER
description: str # Human-readable description
dtmf: Optional[str] = None # Digits to press (for DTMF steps)
timeout: float = 10.0 # Max seconds to wait
next_step: Optional[str] = None # ID of the next step
conditions: dict = {} # Conditional branching rules
metadata: dict = {} # Extra data (transcript patterns, etc.)
```
### Step Types
| Type | Purpose | Key Fields |
|------|---------|------------|
| `DTMF` | Press touch-tone digits | `dtmf="3"` |
| `WAIT` | Pause for a duration | `timeout=5.0` |
| `LISTEN` | Record + transcribe + decide | `timeout=15.0`, optional `dtmf` for hardcoded response |
| `HOLD` | Wait on hold, monitor for human | `timeout=7200` (max hold time) |
| `SPEAK` | Play audio to the call | `metadata={"audio_file": "greeting.wav"}` |
| `TRANSFER` | Bridge call to user's device | `metadata={"device": "sip_phone"}` |
### CallFlow
A complete IVR navigation tree:
```python
class CallFlow(BaseModel):
id: str # "chase_bank_main"
name: str # "Chase Bank — Main Menu"
company: Optional[str] # "Chase Bank"
phone_number: Optional[str] # "+18005551234"
description: Optional[str] # "Navigate to disputes department"
steps: list[CallFlowStep] # Ordered list of steps
created_at: datetime
updated_at: datetime
version: int = 1
tags: list[str] = [] # ["banking", "disputes"]
success_count: int = 0 # Times this flow succeeded
fail_count: int = 0 # Times this flow failed
```
## Example Call Flow
```json
{
"id": "chase_bank_disputes",
"name": "Chase Bank — Disputes",
"company": "Chase Bank",
"phone_number": "+18005551234",
"steps": [
{
"id": "wait_greeting",
"type": "WAIT",
"description": "Wait for greeting to finish",
"timeout": 5.0,
"next_step": "main_menu"
},
{
"id": "main_menu",
"type": "LISTEN",
"description": "Listen to main menu options",
"timeout": 15.0,
"next_step": "press_3"
},
{
"id": "press_3",
"type": "DTMF",
"description": "Press 3 for account services",
"dtmf": "3",
"next_step": "sub_menu"
},
{
"id": "sub_menu",
"type": "LISTEN",
"description": "Listen to account services sub-menu",
"timeout": 15.0,
"next_step": "press_1"
},
{
"id": "press_1",
"type": "DTMF",
"description": "Press 1 for disputes",
"dtmf": "1",
"next_step": "hold"
},
{
"id": "hold",
"type": "HOLD",
"description": "Wait on hold for disputes agent",
"timeout": 7200,
"next_step": "transfer"
},
{
"id": "transfer",
"type": "TRANSFER",
"description": "Transfer to user's phone"
}
]
}
```
## Call Flow Learner (`services/call_flow_learner.py`)
Automatically builds call flows from exploration data.
### How It Works
1. **Exploration mode** records "discoveries" — what the Hold Slayer encountered and did at each step
2. The learner converts discoveries into `CallFlowStep` objects
3. Steps are ordered and linked (`next_step` pointers)
4. The resulting `CallFlow` is saved for future calls
### Discovery Types
| Discovery | Becomes Step |
|-----------|-------------|
| Heard IVR prompt, pressed DTMF | `LISTEN``DTMF` |
| Detected hold music | `HOLD` |
| Detected silence (waiting) | `WAIT` |
| Heard speech (human) | `TRANSFER` |
| Sent DTMF digits | `DTMF` |
### Building a Flow
```python
learner = CallFlowLearner()
# After an exploration call completes:
discoveries = [
{"type": "wait", "duration": 3.0, "description": "Initial silence"},
{"type": "ivr_menu", "transcript": "Press 1 for billing...", "dtmf_sent": "1"},
{"type": "ivr_menu", "transcript": "Press 3 for disputes...", "dtmf_sent": "3"},
{"type": "hold", "duration": 480.0},
{"type": "human_detected", "transcript": "Thank you for calling..."},
]
flow = learner.build_flow(
discoveries=discoveries,
phone_number="+18005551234",
company="Chase Bank",
intent="dispute a charge",
)
# Returns a CallFlow with 5 steps: WAIT → LISTEN/DTMF → LISTEN/DTMF → HOLD → TRANSFER
```
### Merging Discoveries
When the same number is called again with exploration, new discoveries can be merged into the existing flow:
```python
updated_flow = learner.merge_discoveries(
existing_flow=flow,
new_discoveries=new_discoveries,
)
```
This handles:
- New menu options discovered
- Changed IVR structure
- Updated timing information
- Success/failure tracking
## REST API
### List Call Flows
```
GET /api/call-flows
GET /api/call-flows?company=Chase+Bank
GET /api/call-flows?tag=banking
```
### Get Call Flow
```
GET /api/call-flows/{flow_id}
```
### Create Call Flow
```
POST /api/call-flows
Content-Type: application/json
{
"name": "Chase Bank — Disputes",
"company": "Chase Bank",
"phone_number": "+18005551234",
"steps": [ ... ]
}
```
### Update Call Flow
```
PUT /api/call-flows/{flow_id}
Content-Type: application/json
{ ... updated flow ... }
```
### Delete Call Flow
```
DELETE /api/call-flows/{flow_id}
```
### Learn Flow from Exploration
```
POST /api/call-flows/learn
Content-Type: application/json
{
"call_id": "call_abc123",
"phone_number": "+18005551234",
"company": "Chase Bank"
}
```
This triggers the Call Flow Learner to build a flow from the call's exploration data.