Files

Robert Helewka ecf37658ce feat: add initial Hold Slayer AI telephony gateway implementation

Complete project scaffolding and core implementation of an AI-powered
telephony system that calls companies, navigates IVR menus, waits on
hold, and transfers to the user when a human answers.

Key components:
- FastAPI server with REST API, WebSocket, and MCP (SSE) interfaces
- SIP/VoIP call management via PJSUA2 with RTP audio streaming
- LLM-powered IVR navigation using OpenAI/Anthropic with tool calling
- Hold detection service combining audio analysis and silence detection
- Real-time STT (Whisper/Deepgram) and TTS (OpenAI/Piper) pipelines
- Call recording with per-channel and mixed audio capture
- Event bus (asyncio pub/sub) for real-time client updates
- Web dashboard with live call monitoring
- SQLite persistence via SQLAlchemy with call history and analytics
- Notification support (email, SMS, webhook, desktop)
- Docker Compose deployment with Opal VoIP and Opal Media containers
- Comprehensive test suite with unit, integration, and E2E tests
- Simplified .gitignore and full project documentation in README

2026-03-21 19:23:26 +00:00

6.0 KiB

Raw Permalink Blame History

Call Flows

Call flows are reusable IVR navigation trees that tell Hold Slayer exactly how to navigate a company's phone menu. Once a flow is learned (manually or via exploration), subsequent calls to the same number skip the LLM analysis and follow the stored steps directly.

Data Model

CallFlowStep

A single step in the IVR navigation:

class CallFlowStep(BaseModel):
    id: str                          # Unique step identifier
    type: CallFlowStepType           # DTMF, WAIT, LISTEN, HOLD, SPEAK, TRANSFER
    description: str                 # Human-readable description
    dtmf: Optional[str] = None       # Digits to press (for DTMF steps)
    timeout: float = 10.0            # Max seconds to wait
    next_step: Optional[str] = None  # ID of the next step
    conditions: dict = {}            # Conditional branching rules
    metadata: dict = {}              # Extra data (transcript patterns, etc.)

Step Types

Type	Purpose	Key Fields
`DTMF`	Press touch-tone digits	`dtmf="3"`
`WAIT`	Pause for a duration	`timeout=5.0`
`LISTEN`	Record + transcribe + decide	`timeout=15.0`, optional `dtmf` for hardcoded response
`HOLD`	Wait on hold, monitor for human	`timeout=7200` (max hold time)
`SPEAK`	Play audio to the call	`metadata={"audio_file": "greeting.wav"}`
`TRANSFER`	Bridge call to user's device	`metadata={"device": "sip_phone"}`

CallFlow

A complete IVR navigation tree:

class CallFlow(BaseModel):
    id: str                          # "chase_bank_main"
    name: str                        # "Chase Bank — Main Menu"
    company: Optional[str]           # "Chase Bank"
    phone_number: Optional[str]      # "+18005551234"
    description: Optional[str]       # "Navigate to disputes department"
    steps: list[CallFlowStep]        # Ordered list of steps
    created_at: datetime
    updated_at: datetime
    version: int = 1
    tags: list[str] = []             # ["banking", "disputes"]
    success_count: int = 0           # Times this flow succeeded
    fail_count: int = 0              # Times this flow failed

Example Call Flow

{
  "id": "chase_bank_disputes",
  "name": "Chase Bank — Disputes",
  "company": "Chase Bank",
  "phone_number": "+18005551234",
  "steps": [
    {
      "id": "wait_greeting",
      "type": "WAIT",
      "description": "Wait for greeting to finish",
      "timeout": 5.0,
      "next_step": "main_menu"
    },
    {
      "id": "main_menu",
      "type": "LISTEN",
      "description": "Listen to main menu options",
      "timeout": 15.0,
      "next_step": "press_3"
    },
    {
      "id": "press_3",
      "type": "DTMF",
      "description": "Press 3 for account services",
      "dtmf": "3",
      "next_step": "sub_menu"
    },
    {
      "id": "sub_menu",
      "type": "LISTEN",
      "description": "Listen to account services sub-menu",
      "timeout": 15.0,
      "next_step": "press_1"
    },
    {
      "id": "press_1",
      "type": "DTMF",
      "description": "Press 1 for disputes",
      "dtmf": "1",
      "next_step": "hold"
    },
    {
      "id": "hold",
      "type": "HOLD",
      "description": "Wait on hold for disputes agent",
      "timeout": 7200,
      "next_step": "transfer"
    },
    {
      "id": "transfer",
      "type": "TRANSFER",
      "description": "Transfer to user's phone"
    }
  ]
}

Call Flow Learner (`services/call_flow_learner.py`)

Automatically builds call flows from exploration data.

How It Works

Exploration mode records "discoveries" — what the Hold Slayer encountered and did at each step
The learner converts discoveries into CallFlowStep objects
Steps are ordered and linked (next_step pointers)
The resulting CallFlow is saved for future calls

Discovery Types

Discovery	Becomes Step
Heard IVR prompt, pressed DTMF	`LISTEN` → `DTMF`
Detected hold music	`HOLD`
Detected silence (waiting)	`WAIT`
Heard speech (human)	`TRANSFER`
Sent DTMF digits	`DTMF`

Building a Flow

learner = CallFlowLearner()

# After an exploration call completes:
discoveries = [
    {"type": "wait", "duration": 3.0, "description": "Initial silence"},
    {"type": "ivr_menu", "transcript": "Press 1 for billing...", "dtmf_sent": "1"},
    {"type": "ivr_menu", "transcript": "Press 3 for disputes...", "dtmf_sent": "3"},
    {"type": "hold", "duration": 480.0},
    {"type": "human_detected", "transcript": "Thank you for calling..."},
]

flow = learner.build_flow(
    discoveries=discoveries,
    phone_number="+18005551234",
    company="Chase Bank",
    intent="dispute a charge",
)
# Returns a CallFlow with 5 steps: WAIT → LISTEN/DTMF → LISTEN/DTMF → HOLD → TRANSFER

Merging Discoveries

When the same number is called again with exploration, new discoveries can be merged into the existing flow:

updated_flow = learner.merge_discoveries(
    existing_flow=flow,
    new_discoveries=new_discoveries,
)

This handles:

New menu options discovered
Changed IVR structure
Updated timing information
Success/failure tracking

REST API

List Call Flows

GET /api/call-flows
GET /api/call-flows?company=Chase+Bank
GET /api/call-flows?tag=banking

Get Call Flow

GET /api/call-flows/{flow_id}

Create Call Flow

POST /api/call-flows
Content-Type: application/json

{
  "name": "Chase Bank — Disputes",
  "company": "Chase Bank",
  "phone_number": "+18005551234",
  "steps": [ ... ]
}

Update Call Flow

PUT /api/call-flows/{flow_id}
Content-Type: application/json

{ ... updated flow ... }

Delete Call Flow

DELETE /api/call-flows/{flow_id}

Learn Flow from Exploration

POST /api/call-flows/learn
Content-Type: application/json

{
  "call_id": "call_abc123",
  "phone_number": "+18005551234",
  "company": "Chase Bank"
}

This triggers the Call Flow Learner to build a flow from the call's exploration data.

6.0 KiB Raw Permalink Blame History