# Hold Slayer 🔥 **An AI-powered telephony gateway that calls companies, navigates IVR menus, waits on hold, and transfers you when a human picks up.** You give it a phone number and an intent ("dispute a charge on my December statement"). It dials the number through your SIP trunk, navigates the phone tree, sits through the hold music, and rings your desk phone the instant a live person answers. You never hear Vivaldi again. > [!CAUTION] > **Emergency calling — 911** > Hold Slayer passes `911` and `9911` directly to the PSTN trunk. > **Your SIP trunk provider must support E911 on your DID and have your > correct registered location on file before this system is put into > service.** VoIP emergency calls are location-dependent — verify > with your provider. Do not rely on this system as your only means > of reaching emergency services. ## Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ FastAPI Server │ │ │ │ ┌──────────┐ ┌──────────┐ ┌───────────┐ ┌──────────────┐ │ │ │ REST API │ │WebSocket │ │MCP Server │ │ Dashboard │ │ │ │ /api/* │ │ /ws/* │ │ (SSE) │ │ /dashboard │ │ │ └────┬─────┘ └────┬─────┘ └─────┬─────┘ └──────────────┘ │ │ │ │ │ │ │ ┌────┴──────────────┴──────────────┴────┐ │ │ │ Event Bus │ │ │ │ (asyncio Queue pub/sub per client) │ │ │ └────┬──────────────┬──────────────┬────┘ │ │ │ │ │ │ │ ┌────┴─────┐ ┌─────┴─────┐ ┌────┴──────────┐ │ │ │ Call │ │ Hold │ │ Services │ │ │ │ Manager │ │ Slayer │ │ (LLM, STT, │ │ │ │ │ │ │ │ Recording, │ │ │ │ │ │ │ │ Analytics, │ │ │ │ │ │ │ │ Notify) │ │ │ └────┬─────┘ └─────┬─────┘ └──────────────┘ │ │ │ │ │ │ ┌────┴──────────────┴───────────────────┐ │ │ │ Sippy B2BUA Engine │ │ │ │ (SIP calls, DTMF, conference bridge) │ │ │ └────┬──────────────────────────────────┘ │ │ │ │ └───────┼─────────────────────────────────────────────────────────┘ │ ┌────┴────┐ │SIP Trunk│ ──→ PSTN └─────────┘ ``` ## What's Implemented ### Core Engine - **Sippy B2BUA Engine** (`core/sippy_engine.py`) — SIP call control, DTMF, bridging, conference, trunk registration - **PJSUA2 Media Pipeline** (`core/media_pipeline.py`) — Audio routing, recording ports, conference bridge, WAV playback - **Call Manager** (`core/call_manager.py`) — Active call state tracking, lifecycle management - **Event Bus** (`core/event_bus.py`) — Async pub/sub with per-subscriber queues, type filtering, history ### Hold Slayer - **IVR Navigation** (`services/hold_slayer.py`) — Follows stored call flows step-by-step through phone menus - **Audio Classifier** (`services/audio_classifier.py`) — Real-time waveform analysis: silence, tones, DTMF, music, speech detection - **Call Flow Learner** (`services/call_flow_learner.py`) — Builds reusable call flows from exploration data, merges new discoveries - **LLM Fallback** — When a LISTEN step has no hardcoded DTMF, the LLM analyzes the transcript and picks the right menu option ### Intelligence Layer - **LLM Client** (`services/llm_client.py`) — OpenAI-compatible API client (Ollama, vLLM, LM Studio, OpenAI) with JSON parsing, retry, stats - **Transcription** (`services/transcription.py`) — Speaches/Whisper STT integration for live call transcription - **Recording** (`services/recording.py`) — WAV recording with date-organized storage, dual-channel support - **Call Analytics** (`services/call_analytics.py`) — Hold time stats, success rates, per-company patterns, time-of-day trends - **Notifications** (`services/notification.py`) — WebSocket + SMS alerts for human detection, call failures, hold status ### API Surface - **REST API** — Call management, device registration, call flow CRUD, service configuration - **WebSocket** — Real-time call events, transcripts, classification updates - **MCP Server** — 10 tools for AI assistant integration (make calls, send DTMF, get transcripts, manage flows) ### Data Models - **Call** — Active call state with classification history, transcript chunks, hold time tracking - **Call Flow** — Stored IVR trees with steps (DTMF, LISTEN, HOLD, TRANSFER, SPEAK) - **Events** — 20+ typed events (call lifecycle, hold slayer, audio, device, system) - **Device** — SIP phone/softphone registration and routing - **Contact** — Phone number management with routing preferences ## Project Structure ``` hold-slayer/ ├── main.py # FastAPI app + lifespan (service wiring) ├── config.py # Pydantic settings from .env ├── core/ │ ├── gateway.py # Top-level gateway orchestrator │ ├── sippy_engine.py # Sippy B2BUA SIP engine │ ├── media_pipeline.py # PJSUA2 audio routing │ ├── call_manager.py # Active call state management │ └── event_bus.py # Async pub/sub event bus ├── services/ │ ├── hold_slayer.py # IVR navigation + hold detection │ ├── audio_classifier.py # Waveform analysis (music/speech/DTMF) │ ├── call_flow_learner.py # Auto-learns IVR trees from calls │ ├── llm_client.py # OpenAI-compatible LLM client │ ├── transcription.py # Speaches/Whisper STT │ ├── recording.py # Call recording management │ ├── call_analytics.py # Call metrics and insights │ └── notification.py # WebSocket + SMS notifications ├── api/ │ ├── calls.py # Call management endpoints │ ├── call_flows.py # Call flow CRUD │ ├── devices.py # Device registration │ ├── websocket.py # Real-time event stream │ └── deps.py # FastAPI dependency injection ├── mcp_server/ │ └── server.py # MCP tools + resources (10 tools) ├── models/ │ ├── call.py # Call state models │ ├── call_flow.py # IVR tree models │ ├── events.py # Event type definitions │ ├── device.py # Device models │ └── contact.py # Contact models ├── db/ │ └── database.py # SQLAlchemy async (PostgreSQL/SQLite) └── tests/ ├── test_audio_classifier.py # 18 tests — waveform analysis ├── test_call_flows.py # 10 tests — call flow models ├── test_hold_slayer.py # 20 tests — IVR nav, EventBus, CallManager └── test_services.py # 27 tests — LLM, notifications, recording, # analytics, learner, EventBus ``` ## Quick Start ### 1. Install ```bash python -m venv .venv source .venv/bin/activate pip install -e ".[dev]" ``` ### 2. Configure ```bash cp .env.example .env # Edit .env with your SIP trunk credentials, LLM endpoint, etc. ``` ### 3. Run ```bash uvicorn main:app --host 0.0.0.0 --port 8100 ``` ### 4. Test ```bash pytest tests/ -v ``` ## Usage ### REST API **Launch Hold Slayer on a number:** ```bash curl -X POST http://localhost:8000/api/calls/hold-slayer \ -H "Content-Type: application/json" \ -d '{ "number": "+18005551234", "intent": "dispute Amazon charge from December 15th", "call_flow_id": "chase_bank_main", "transfer_to": "sip_phone" }' ``` **Check call status:** ```bash curl http://localhost:8000/api/calls/call_abc123 ``` ### WebSocket — Real-Time Events ```javascript const ws = new WebSocket("ws://localhost:8000/ws/events"); ws.onmessage = (msg) => { const event = JSON.parse(msg.data); // event.type: "human_detected", "hold_detected", "ivr_step", etc. // event.call_id: which call this is about // event.data: type-specific payload }; ``` ### MCP — AI Assistant Integration The MCP server exposes 10 tools that any MCP-compatible assistant can use: | Tool | Description | |------|-------------| | `make_call` | Dial a number through the SIP trunk | | `end_call` | Hang up an active call | | `send_dtmf` | Send touch-tone digits to navigate menus | | `get_call_status` | Check current state of a call | | `get_call_transcript` | Get live transcript of a call | | `get_call_recording` | Get recording metadata and file path | | `list_active_calls` | List all calls in progress | | `get_call_summary` | Analytics summary (hold times, success rates) | | `search_call_history` | Search past calls by number or company | | `learn_call_flow` | Build a reusable call flow from exploration data | ## How It Works 1. **You request a call** — via REST API, MCP tool, or dashboard 2. **Gateway dials out** — Sippy B2BUA places the call through your SIP trunk 3. **Audio classifier listens** — Real-time waveform analysis detects IVR prompts, hold music, ringing, silence, and live speech 4. **Transcription runs** — Speaches/Whisper converts audio to text in real-time 5. **IVR navigator decides** — If a stored call flow exists, it follows the steps. If not, the LLM analyzes the transcript and picks the right menu option 6. **Hold detection** — When hold music is detected, the system waits patiently and monitors for transitions 7. **Human detection** — The classifier detects the transition from music/silence to live speech 8. **Transfer** — Your desk phone rings. Pick up and you're talking to the agent. Zero hold time. ## Configuration All configuration is via environment variables (see `.env.example`): | Variable | Description | Default | |----------|-------------|---------| | `SIP_TRUNK_HOST` | Your SIP provider hostname | — | | `SIP_TRUNK_USERNAME` | SIP auth username | — | | `SIP_TRUNK_PASSWORD` | SIP auth password | — | | `SIP_TRUNK_DID` | Your phone number (E.164) | — | | `GATEWAY_SIP_PORT` | Port for device registration | `5080` | | `SPEACHES_URL` | Speaches/Whisper STT endpoint | `http://localhost:22070` | | `LLM_BASE_URL` | OpenAI-compatible LLM endpoint | `http://localhost:11434/v1` | | `LLM_MODEL` | Model name for IVR analysis | `llama3` | | `DATABASE_URL` | PostgreSQL or SQLite connection | SQLite fallback | ## Tech Stack - **Python 3.13** + **asyncio** — Single-process async architecture - **FastAPI** — REST API + WebSocket server - **Sippy B2BUA** — SIP call control and DTMF - **PJSUA2** — Media pipeline, conference bridge, recording - **Speaches** (Whisper) — Speech-to-text - **Ollama / vLLM / OpenAI** — LLM for IVR menu analysis - **SQLAlchemy** — Async database (PostgreSQL or SQLite) - **MCP (Model Context Protocol)** — AI assistant integration ## Documentation Full documentation is in [`/docs`](docs/README.md): - [Architecture](docs/architecture.md) — System design, data flow, threading model - [Core Engine](docs/core-engine.md) — SIP engine, media pipeline, call manager, event bus - [Hold Slayer Service](docs/hold-slayer-service.md) — IVR navigation, hold detection, human detection - [Audio Classifier](docs/audio-classifier.md) — Waveform analysis, feature extraction, classification - [Services](docs/services.md) — LLM client, transcription, recording, analytics, notifications - [Call Flows](docs/call-flows.md) — Call flow model, step types, auto-learner - [API Reference](docs/api-reference.md) — REST endpoints, WebSocket, request/response schemas - [MCP Server](docs/mcp-server.md) — MCP tools and resources for AI assistants - [Configuration](docs/configuration.md) — All environment variables, deployment options - [Development](docs/development.md) — Setup, testing, contributing ## Build Phases ### Phase 1: Core Engine ✅ - [x] Extract EventBus to dedicated module with typed filtering - [x] Implement Sippy B2BUA SIP engine (signaling, DTMF, bridging) - [x] Implement PJSUA2 media pipeline (conference bridge, audio tapping, recording) - [x] Call manager with active call state tracking - [x] Gateway orchestrator wiring all components ### Phase 2: Intelligence Layer ✅ - [x] LLM client (OpenAI-compatible — Ollama, vLLM, LM Studio, OpenAI) - [x] Hold Slayer IVR navigation with LLM fallback for LISTEN steps - [x] Call Flow Learner — auto-builds reusable IVR trees from exploration - [x] Recording service with date-organized WAV storage - [x] Call analytics with hold time stats, per-company patterns - [x] Audio classifier with spectral analysis, DTMF detection, hold-to-human transition ### Phase 3: API & Integration ✅ - [x] REST API — calls, call flows, devices, DTMF - [x] WebSocket real-time event streaming - [x] MCP server with 16 tools + 3 resources - [x] Notification service (WebSocket + SMS) - [x] Service wiring in main.py lifespan - [x] 75 passing tests across 4 test files ### Phase 4: Production Hardening 🔜 - [ ] Alembic database migrations - [ ] API authentication (API keys / JWT) - [ ] Rate limiting on API endpoints - [ ] Structured JSON logging - [ ] Health check endpoints for all dependencies - [ ] Graceful degradation (classifier works without STT, etc.) - [ ] Docker Compose (Hold Slayer + PostgreSQL + Speaches + Ollama) ### Phase 5: Additional Services 🔮 - [ ] AI Receptionist — answer inbound calls, screen callers, take messages - [ ] Spam Filter — detect robocalls using caller ID + audio patterns - [ ] Smart Routing — time-of-day rules, device priority, DND - [ ] Noise Cancellation — RNNoise integration in media pipeline - [ ] TTS/Speech — play prompts into calls (SPEAK step support) ### Phase 6: Dashboard & UX 🔮 - [ ] Web dashboard with real-time call monitor - [ ] Call flow visual editor (drag-and-drop IVR tree builder) - [ ] Call history with transcript playback - [ ] Analytics dashboard with hold time graphs - [ ] Mobile app (or PWA) for on-the-go control ## License MIT