Complete project scaffolding and core implementation of an AI-powered telephony system that calls companies, navigates IVR menus, waits on hold, and transfers to the user when a human answers. Key components: - FastAPI server with REST API, WebSocket, and MCP (SSE) interfaces - SIP/VoIP call management via PJSUA2 with RTP audio streaming - LLM-powered IVR navigation using OpenAI/Anthropic with tool calling - Hold detection service combining audio analysis and silence detection - Real-time STT (Whisper/Deepgram) and TTS (OpenAI/Piper) pipelines - Call recording with per-channel and mixed audio capture - Event bus (asyncio pub/sub) for real-time client updates - Web dashboard with live call monitoring - SQLite persistence via SQLAlchemy with call history and analytics - Notification support (email, SMS, webhook, desktop) - Docker Compose deployment with Opal VoIP and Opal Media containers - Comprehensive test suite with unit, integration, and E2E tests - Simplified .gitignore and full project documentation in README
6.0 KiB
6.0 KiB
Call Flows
Call flows are reusable IVR navigation trees that tell Hold Slayer exactly how to navigate a company's phone menu. Once a flow is learned (manually or via exploration), subsequent calls to the same number skip the LLM analysis and follow the stored steps directly.
Data Model
CallFlowStep
A single step in the IVR navigation:
class CallFlowStep(BaseModel):
id: str # Unique step identifier
type: CallFlowStepType # DTMF, WAIT, LISTEN, HOLD, SPEAK, TRANSFER
description: str # Human-readable description
dtmf: Optional[str] = None # Digits to press (for DTMF steps)
timeout: float = 10.0 # Max seconds to wait
next_step: Optional[str] = None # ID of the next step
conditions: dict = {} # Conditional branching rules
metadata: dict = {} # Extra data (transcript patterns, etc.)
Step Types
| Type | Purpose | Key Fields |
|---|---|---|
DTMF |
Press touch-tone digits | dtmf="3" |
WAIT |
Pause for a duration | timeout=5.0 |
LISTEN |
Record + transcribe + decide | timeout=15.0, optional dtmf for hardcoded response |
HOLD |
Wait on hold, monitor for human | timeout=7200 (max hold time) |
SPEAK |
Play audio to the call | metadata={"audio_file": "greeting.wav"} |
TRANSFER |
Bridge call to user's device | metadata={"device": "sip_phone"} |
CallFlow
A complete IVR navigation tree:
class CallFlow(BaseModel):
id: str # "chase_bank_main"
name: str # "Chase Bank — Main Menu"
company: Optional[str] # "Chase Bank"
phone_number: Optional[str] # "+18005551234"
description: Optional[str] # "Navigate to disputes department"
steps: list[CallFlowStep] # Ordered list of steps
created_at: datetime
updated_at: datetime
version: int = 1
tags: list[str] = [] # ["banking", "disputes"]
success_count: int = 0 # Times this flow succeeded
fail_count: int = 0 # Times this flow failed
Example Call Flow
{
"id": "chase_bank_disputes",
"name": "Chase Bank — Disputes",
"company": "Chase Bank",
"phone_number": "+18005551234",
"steps": [
{
"id": "wait_greeting",
"type": "WAIT",
"description": "Wait for greeting to finish",
"timeout": 5.0,
"next_step": "main_menu"
},
{
"id": "main_menu",
"type": "LISTEN",
"description": "Listen to main menu options",
"timeout": 15.0,
"next_step": "press_3"
},
{
"id": "press_3",
"type": "DTMF",
"description": "Press 3 for account services",
"dtmf": "3",
"next_step": "sub_menu"
},
{
"id": "sub_menu",
"type": "LISTEN",
"description": "Listen to account services sub-menu",
"timeout": 15.0,
"next_step": "press_1"
},
{
"id": "press_1",
"type": "DTMF",
"description": "Press 1 for disputes",
"dtmf": "1",
"next_step": "hold"
},
{
"id": "hold",
"type": "HOLD",
"description": "Wait on hold for disputes agent",
"timeout": 7200,
"next_step": "transfer"
},
{
"id": "transfer",
"type": "TRANSFER",
"description": "Transfer to user's phone"
}
]
}
Call Flow Learner (services/call_flow_learner.py)
Automatically builds call flows from exploration data.
How It Works
- Exploration mode records "discoveries" — what the Hold Slayer encountered and did at each step
- The learner converts discoveries into
CallFlowStepobjects - Steps are ordered and linked (
next_steppointers) - The resulting
CallFlowis saved for future calls
Discovery Types
| Discovery | Becomes Step |
|---|---|
| Heard IVR prompt, pressed DTMF | LISTEN → DTMF |
| Detected hold music | HOLD |
| Detected silence (waiting) | WAIT |
| Heard speech (human) | TRANSFER |
| Sent DTMF digits | DTMF |
Building a Flow
learner = CallFlowLearner()
# After an exploration call completes:
discoveries = [
{"type": "wait", "duration": 3.0, "description": "Initial silence"},
{"type": "ivr_menu", "transcript": "Press 1 for billing...", "dtmf_sent": "1"},
{"type": "ivr_menu", "transcript": "Press 3 for disputes...", "dtmf_sent": "3"},
{"type": "hold", "duration": 480.0},
{"type": "human_detected", "transcript": "Thank you for calling..."},
]
flow = learner.build_flow(
discoveries=discoveries,
phone_number="+18005551234",
company="Chase Bank",
intent="dispute a charge",
)
# Returns a CallFlow with 5 steps: WAIT → LISTEN/DTMF → LISTEN/DTMF → HOLD → TRANSFER
Merging Discoveries
When the same number is called again with exploration, new discoveries can be merged into the existing flow:
updated_flow = learner.merge_discoveries(
existing_flow=flow,
new_discoveries=new_discoveries,
)
This handles:
- New menu options discovered
- Changed IVR structure
- Updated timing information
- Success/failure tracking
REST API
List Call Flows
GET /api/call-flows
GET /api/call-flows?company=Chase+Bank
GET /api/call-flows?tag=banking
Get Call Flow
GET /api/call-flows/{flow_id}
Create Call Flow
POST /api/call-flows
Content-Type: application/json
{
"name": "Chase Bank — Disputes",
"company": "Chase Bank",
"phone_number": "+18005551234",
"steps": [ ... ]
}
Update Call Flow
PUT /api/call-flows/{flow_id}
Content-Type: application/json
{ ... updated flow ... }
Delete Call Flow
DELETE /api/call-flows/{flow_id}
Learn Flow from Exploration
POST /api/call-flows/learn
Content-Type: application/json
{
"call_id": "call_abc123",
"phone_number": "+18005551234",
"company": "Chase Bank"
}
This triggers the Call Flow Learner to build a flow from the call's exploration data.