Files

2026-05-25 14:45:29 -04:00

21 KiB

Raw Blame History

Hold Slayer 🔥

An AI-powered telephony gateway that calls companies, navigates IVR menus, waits on hold, and transfers you when a human picks up.

You give it a phone number and an intent ("dispute a charge on my December statement"). It dials the number through your SIP trunk, navigates the phone tree, sits through the hold music, and rings your desk phone the instant a live person answers. You never hear Vivaldi again.

Caution

Emergency calling — 911 Hold Slayer passes 911 and 9911 directly to the PSTN trunk. Your SIP trunk provider must support E911 on your DID and have your correct registered location on file before this system is put into service. VoIP emergency calls are location-dependent — verify with your provider. Do not rely on this system as your only means of reaching emergency services.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        FastAPI Server                           │
│                                                                 │
│  ┌──────────┐  ┌──────────┐  ┌───────────┐  ┌──────────────┐  │
│  │ REST API │  │WebSocket │  │MCP Server │  │  Dashboard   │  │
│  │ /api/*   │  │ /ws/*    │  │ (SSE)     │  │  /dashboard  │  │
│  └────┬─────┘  └────┬─────┘  └─────┬─────┘  └──────────────┘  │
│       │              │              │                            │
│  ┌────┴──────────────┴──────────────┴────┐                     │
│  │             Event Bus                  │                     │
│  │   (asyncio Queue pub/sub per client)   │                     │
│  └────┬──────────────┬──────────────┬────┘                     │
│       │              │              │                            │
│  ┌────┴─────┐  ┌─────┴─────┐  ┌────┴──────────┐               │
│  │   Call   │  │   Hold    │  │   Services    │               │
│  │ Manager  │  │  Slayer   │  │ (LLM, STT,   │               │
│  │          │  │           │  │  Recording,   │               │
│  │          │  │           │  │  Analytics,   │               │
│  │          │  │           │  │  Notify)      │               │
│  └────┬─────┘  └─────┬─────┘  └──────────────┘               │
│       │              │                                          │
│  ┌────┴──────────────┴───────────────────┐                     │
│  │         Sippy B2BUA Engine            │                     │
│  │  (SIP calls, DTMF, conference bridge) │                     │
│  └────┬──────────────────────────────────┘                     │
│       │                                                         │
└───────┼─────────────────────────────────────────────────────────┘
        │
   ┌────┴────┐
   │SIP Trunk│ ──→ PSTN
   └─────────┘

What's Implemented

Core Engine

Sippy B2BUA Engine (core/sippy_engine.py) — SIP call control, DTMF, bridging, conference, trunk registration
PJSUA2 Media Pipeline (core/media_pipeline.py) — Audio routing, recording ports, conference bridge, WAV playback
Call Manager (core/call_manager.py) — Active call state tracking, lifecycle management
Event Bus (core/event_bus.py) — Async pub/sub with per-subscriber queues, type filtering, history

Hold Slayer

IVR Navigation (services/hold_slayer.py) — Follows stored call flows step-by-step through phone menus, including SPEAK steps that synthesize speech via TTS
Audio Classifier (services/audio_classifier.py) — Real-time waveform analysis: silence, tones, DTMF, music, speech detection
Call Flow Learner (services/call_flow_learner.py) — Builds reusable call flows from exploration data, merges new discoveries
LLM Fallback — When a LISTEN step has no hardcoded DTMF, the LLM analyzes the transcript and picks the right menu option

AI Receptionist & Smart Routing

AI Receptionist (services/receptionist.py) — Answers inbound calls, greets via TTS, captures the caller's intent with STT + LLM, then routes to a device or takes a voicemail
Smart Routing (services/routing.py) — Caller-pattern (glob), DNIS, time-of-day (with tz + midnight wrap), per-device DND, and ring-chain priority. Rules win over the LLM on conflict.
TTS (services/tts.py) — Rhema (OpenAI-compatible /v1/audio/speech) — synthesizes Kokoro voices for the SPEAK step and receptionist prompts

Intelligence Layer

LLM Client (services/llm_client.py) — OpenAI-compatible API client (Ollama, vLLM, LM Studio, OpenAI) with JSON parsing, retry, stats
Transcription (services/transcription.py) — Speaches/Whisper STT integration for live call transcription
Recording (services/recording.py) — WAV recording with date-organized storage, dual-channel support, persisted to the recordings table
Call Persistence (services/call_persistence.py) — Writes completed calls + transcript chunks to the database on hangup
Call Analytics (services/call_analytics.py) — Hold time stats, success rates, per-company patterns, time-of-day trends
Notifications (services/notification.py) — WebSocket + SMS alerts for human detection, call failures, hold status

API Surface

REST API — Call management, call history, transcripts, recordings, routing rules, device DND, call flow CRUD
WebSocket — Real-time call events, transcripts, classification updates, receptionist state transitions
MCP Server — 10 tools for AI assistant integration (make calls, send DTMF, get transcripts, manage flows)
Dashboard — SvelteKit UI served at /dashboard with live monitor, call history with transcript playback, and a routing-rules editor

Data Models

Call — Active call state with classification history, transcript chunks, hold time tracking
Call Flow — Stored IVR trees with steps (DTMF, LISTEN, HOLD, TRANSFER, SPEAK)
Routing Rule — Match (caller pattern, DNIS, time range) + action (ring_device, ring_chain, take_message, reject, dnd)
Transcript Chunk — Per-call STT segments with speaker tag and timestamp offset (for click-to-seek playback)
Recording — WAV file metadata (path, duration, size) per call
Events — 30+ typed events (call lifecycle, hold slayer, audio, device, system, receptionist, routing)
Device — SIP phone/softphone registration, priority, DND
Contact — Phone number management with routing preferences

Project Structure

hold-slayer/
├── main.py                      # FastAPI app + lifespan (service wiring)
├── config.py                    # Pydantic settings from .env
├── core/
│   ├── gateway.py               # Top-level gateway orchestrator
│   ├── sippy_engine.py          # Sippy B2BUA SIP engine
│   ├── media_pipeline.py        # PJSUA2 audio routing
│   ├── call_manager.py          # Active call state management
│   └── event_bus.py             # Async pub/sub event bus
├── services/
│   ├── hold_slayer.py           # IVR navigation + hold detection + SPEAK
│   ├── receptionist.py          # AI Receptionist state machine
│   ├── routing.py               # Smart routing (rules, DND, ring chain)
│   ├── tts.py                   # Rhema TTS client (OpenAI-compatible)
│   ├── audio_classifier.py      # Waveform analysis (music/speech/DTMF)
│   ├── call_flow_learner.py     # Auto-learns IVR trees from calls
│   ├── call_persistence.py      # Writes calls + transcript chunks on hangup
│   ├── llm_client.py            # OpenAI-compatible LLM client
│   ├── transcription.py         # Speaches/Whisper STT
│   ├── recording.py             # Call recording management
│   ├── call_analytics.py        # Call metrics and insights
│   └── notification.py          # WebSocket + SMS notifications
├── api/
│   ├── calls.py                 # Call management endpoints
│   ├── call_history.py          # History, transcript, recording playback
│   ├── call_flows.py            # Call flow CRUD
│   ├── devices.py               # Device registration
│   ├── routing.py               # Routing rules CRUD + per-device DND
│   ├── websocket.py             # Real-time event stream
│   └── deps.py                  # FastAPI dependency injection
├── dashboard/                   # SvelteKit UI (built to dashboard/build)
│   └── src/routes/
│       ├── +page.svelte         # Live monitor
│       ├── history/             # Call history list
│       ├── calls/[call_id]/     # Detail page + transcript playback
│       └── routing/             # Rules editor + DND toggles
├── mcp_server/
│   └── server.py                # MCP tools + resources (10 tools)
├── models/
│   ├── call.py                  # Call state models
│   ├── call_flow.py             # IVR tree models
│   ├── routing.py               # Routing rule / match / action models
│   ├── events.py                # Event type definitions
│   ├── device.py                # Device models
│   └── contact.py               # Contact models
├── db/
│   └── database.py              # SQLAlchemy async (PostgreSQL/SQLite)
└── tests/
    ├── test_audio_classifier.py # 18 tests — waveform analysis
    ├── test_call_flows.py       # 10 tests — call flow models
    ├── test_hold_slayer.py      # 20 tests — IVR nav, EventBus, CallManager
    ├── test_services.py         # 27 tests — LLM, notifications, recording,
    │                            #             analytics, learner, EventBus
    ├── test_tts.py              #  4 tests — Rhema TTS client
    ├── test_routing.py          #  8 tests — rules evaluator
    └── test_receptionist.py     #  7 tests — receptionist decision logic

Quick Start

1. Install

python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

2. Configure

cp .env.example .env
# Edit .env with your SIP trunk credentials, LLM endpoint, etc.

3. Build the dashboard (optional but recommended)

cd dashboard
npm install
npm run build
cd ..

The gateway serves the built UI at /dashboard automatically when dashboard/build/ exists. Skip this step if you only need the REST/WS API.

4. Run

uvicorn main:app --host 0.0.0.0 --port 8100

5. Test

pytest tests/ -v

Usage

REST API

Launch Hold Slayer on a number:

curl -X POST http://localhost:8000/api/calls/hold-slayer \
  -H "Content-Type: application/json" \
  -d '{
    "number": "+18005551234",
    "intent": "dispute Amazon charge from December 15th",
    "call_flow_id": "chase_bank_main",
    "transfer_to": "sip_phone"
  }'

Check call status:

curl http://localhost:8000/api/calls/call_abc123

Browse call history (persisted in the database):

curl http://localhost:8000/api/calls/history?limit=50
curl http://localhost:8000/api/calls/call_abc123/transcript
curl -O http://localhost:8000/api/calls/call_abc123/recording   # WAV

Create a smart-routing rule:

curl -X POST http://localhost:8000/api/routing/rules \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Block tollfree at night",
    "priority": 10,
    "enabled": true,
    "match": {
      "caller_pattern": "+1800*",
      "time_range": {"start": "22:00", "end": "06:00", "tz": "America/Toronto", "days": [0,1,2,3,4,5,6]}
    },
    "action": {"type": "reject", "message": "Office is closed."}
  }'

Toggle Do Not Disturb on a device:

curl -X PATCH http://localhost:8000/api/routing/devices/dev_abc123/dnd \
  -H "Content-Type: application/json" \
  -d '{"enabled": true}'

WebSocket — Real-Time Events

const ws = new WebSocket("ws://localhost:8000/ws/events");
ws.onmessage = (msg) => {
  const event = JSON.parse(msg.data);
  // event.type: "human_detected", "hold_detected", "ivr_step", etc.
  // event.call_id: which call this is about
  // event.data: type-specific payload
};

MCP — AI Assistant Integration

The MCP server exposes 10 tools that any MCP-compatible assistant can use:

Tool	Description
`make_call`	Dial a number through the SIP trunk
`end_call`	Hang up an active call
`send_dtmf`	Send touch-tone digits to navigate menus
`get_call_status`	Check current state of a call
`get_call_transcript`	Get live transcript of a call
`get_call_recording`	Get recording metadata and file path
`list_active_calls`	List all calls in progress
`get_call_summary`	Analytics summary (hold times, success rates)
`search_call_history`	Search past calls by number or company
`learn_call_flow`	Build a reusable call flow from exploration data

How It Works

Outbound (Hold Slayer)

You request a call — via REST API, MCP tool, or dashboard
Gateway dials out — Sippy B2BUA places the call through your SIP trunk
Audio classifier listens — Real-time waveform analysis detects IVR prompts, hold music, ringing, silence, and live speech
Transcription runs — Speaches/Whisper converts audio to text in real-time
IVR navigator decides — If a stored call flow exists, it follows the steps (including SPEAK steps that synthesize speech via Rhema TTS). If not, the LLM analyzes the transcript and picks the right menu option
Hold detection — When hold music is detected, the system waits patiently and monitors for transitions
Human detection — The classifier detects the transition from music/silence to live speech
Transfer — Your desk phone rings. Pick up and you're talking to the agent. Zero hold time.

Inbound (AI Receptionist + Smart Routing)

SIP INVITE arrives — Sippy surfaces it to the gateway instead of auto-answering
Routing rules evaluate — Caller pattern, DNIS, and time-of-day rules run in priority order. A reject or dnd action declines the call immediately.
Receptionist answers — TTS plays the greeting; the call's audio tap captures the caller's response
Intent capture — The utterance is transcribed and the LLM extracts intent, urgency, and a recommended action (ring / message / reject)
Final decision — Routing rules win on conflict; otherwise the LLM's recommendation is followed
Route or take a message — ring_chain tries devices in priority order (skipping any in DND); if nobody picks up (or the action is take_message), the receptionist records up to 90s, transcribes it, and emits a RECEPTIONIST_MESSAGE_SAVED event

Configuration

All configuration is via environment variables (see .env.example):

Variable	Description	Default
`SIP_TRUNK_HOST`	Your SIP provider hostname	—
`SIP_TRUNK_USERNAME`	SIP auth username	—
`SIP_TRUNK_PASSWORD`	SIP auth password	—
`SIP_TRUNK_DID`	Your phone number (E.164)	—
`GATEWAY_SIP_PORT`	Port for device registration	`5080`
`SPEACHES_URL`	Speaches/Whisper STT endpoint	`http://localhost:22070`
`LLM_BASE_URL`	OpenAI-compatible LLM endpoint	`http://localhost:11434/v1`
`LLM_MODEL`	Model name for IVR analysis	`llama3`
`TTS_BASE_URL`	Rhema TTS endpoint (OpenAI-compatible)	`http://localhost:8000`
`TTS_MODEL`	TTS model ID	`speaches-ai/Kokoro-82M-v1.0-ONNX`
`TTS_VOICE`	Default Kokoro voice	`af_heart`
`TTS_API_KEY`	Optional bearer token for Rhema	—
`RECEPTIONIST_ENABLED`	Answer inbound calls with the AI receptionist	`true`
`RECEPTIONIST_GREETING_TEMPLATE`	Spoken greeting	`"Hi, you've reached Robert's line. Who's calling, and what's this about?"`
`RECEPTIONIST_MESSAGE_MAX_SECONDS`	Voicemail cap	`90`
`DATABASE_URL`	PostgreSQL or SQLite connection	SQLite fallback

Tech Stack

Python 3.13 + asyncio — Single-process async architecture
FastAPI — REST API + WebSocket server
SvelteKit — Dashboard UI (built static, served by FastAPI at /dashboard)
Sippy B2BUA — SIP call control and DTMF
PJSUA2 — Media pipeline, conference bridge, recording, WAV playback
Speaches (Whisper) — Speech-to-text
Rhema (Kokoro) — Text-to-speech (OpenAI-compatible /v1/audio/speech)
Ollama / vLLM / OpenAI — LLM for IVR menu analysis and receptionist intent capture
SQLAlchemy — Async database (PostgreSQL or SQLite)
MCP (Model Context Protocol) — AI assistant integration

Documentation

Full documentation is in /docs:

Architecture — System design, data flow, threading model
Core Engine — SIP engine, media pipeline, call manager, event bus
Hold Slayer Service — IVR navigation, hold detection, human detection
Audio Classifier — Waveform analysis, feature extraction, classification
Services — LLM client, transcription, recording, analytics, notifications
Call Flows — Call flow model, step types, auto-learner
API Reference — REST endpoints, WebSocket, request/response schemas
MCP Server — MCP tools and resources for AI assistants
Configuration — All environment variables, deployment options
Development — Setup, testing, contributing

Build Phases

Phase 1: Core Engine ✅

Extract EventBus to dedicated module with typed filtering
Implement Sippy B2BUA SIP engine (signaling, DTMF, bridging)
Implement PJSUA2 media pipeline (conference bridge, audio tapping, recording)
Call manager with active call state tracking
Gateway orchestrator wiring all components

Phase 2: Intelligence Layer ✅

LLM client (OpenAI-compatible — Ollama, vLLM, LM Studio, OpenAI)
Hold Slayer IVR navigation with LLM fallback for LISTEN steps
Call Flow Learner — auto-builds reusable IVR trees from exploration
Recording service with date-organized WAV storage
Call analytics with hold time stats, per-company patterns
Audio classifier with spectral analysis, DTMF detection, hold-to-human transition

Phase 3: API & Integration ✅

REST API — calls, call flows, devices, DTMF
WebSocket real-time event streaming
MCP server with 16 tools + 3 resources
Notification service (WebSocket + SMS)
Service wiring in main.py lifespan
75 passing tests across 4 test files

Phase 4: Production Hardening 🔜

Alembic database migrations
API authentication (API keys / JWT)
Rate limiting on API endpoints
Structured JSON logging
Health check endpoints for all dependencies
Graceful degradation (classifier works without STT, etc.)
Docker Compose (Hold Slayer + PostgreSQL)

Phase 5: Additional Services 🚧

AI Receptionist — answer inbound calls, screen callers, take messages
Smart Routing — time-of-day rules, device priority, DND
TTS/Speech — play prompts into calls (SPEAK step support, Rhema/Kokoro)
Spam Filter — detect robocalls using caller ID + audio patterns
Noise Cancellation — RNNoise integration in media pipeline

Phase 6: Dashboard & UX 🚧

Web dashboard with real-time call monitor
Call history with transcript playback (click-to-seek)
Routing rules editor + per-device DND toggles
Call flow visual editor (drag-and-drop IVR tree builder)
Analytics dashboard with hold time graphs
Mobile app (or PWA) for on-the-go control

License

MIT

21 KiB Raw Blame History