feat: add initial Hold Slayer AI telephony gateway implementation
Complete project scaffolding and core implementation of an AI-powered telephony system that calls companies, navigates IVR menus, waits on hold, and transfers to the user when a human answers. Key components: - FastAPI server with REST API, WebSocket, and MCP (SSE) interfaces - SIP/VoIP call management via PJSUA2 with RTP audio streaming - LLM-powered IVR navigation using OpenAI/Anthropic with tool calling - Hold detection service combining audio analysis and silence detection - Real-time STT (Whisper/Deepgram) and TTS (OpenAI/Piper) pipelines - Call recording with per-channel and mixed audio capture - Event bus (asyncio pub/sub) for real-time client updates - Web dashboard with live call monitoring - SQLite persistence via SQLAlchemy with call history and analytics - Notification support (email, SMS, webhook, desktop) - Docker Compose deployment with Opal VoIP and Opal Media containers - Comprehensive test suite with unit, integration, and E2E tests - Simplified .gitignore and full project documentation in README
This commit is contained in:
320
README.md
320
README.md
@@ -1,2 +1,320 @@
|
||||
# hold-slayer
|
||||
# Hold Slayer 🔥
|
||||
|
||||
**An AI-powered telephony gateway that calls companies, navigates IVR menus, waits on hold, and transfers you when a human picks up.**
|
||||
|
||||
You give it a phone number and an intent ("dispute a charge on my December statement"). It dials the number through your SIP trunk, navigates the phone tree, sits through the hold music, and rings your desk phone the instant a live person answers. You never hear Vivaldi again.
|
||||
|
||||
> [!CAUTION]
|
||||
> **Emergency calling — 911**
|
||||
> Hold Slayer passes `911` and `9911` directly to the PSTN trunk.
|
||||
> **Your SIP trunk provider must support E911 on your DID and have your
|
||||
> correct registered location on file before this system is put into
|
||||
> service.** VoIP emergency calls are location-dependent — verify
|
||||
> with your provider. Do not rely on this system as your only means
|
||||
> of reaching emergency services.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ FastAPI Server │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌───────────┐ ┌──────────────┐ │
|
||||
│ │ REST API │ │WebSocket │ │MCP Server │ │ Dashboard │ │
|
||||
│ │ /api/* │ │ /ws/* │ │ (SSE) │ │ /dashboard │ │
|
||||
│ └────┬─────┘ └────┬─────┘ └─────┬─────┘ └──────────────┘ │
|
||||
│ │ │ │ │
|
||||
│ ┌────┴──────────────┴──────────────┴────┐ │
|
||||
│ │ Event Bus │ │
|
||||
│ │ (asyncio Queue pub/sub per client) │ │
|
||||
│ └────┬──────────────┬──────────────┬────┘ │
|
||||
│ │ │ │ │
|
||||
│ ┌────┴─────┐ ┌─────┴─────┐ ┌────┴──────────┐ │
|
||||
│ │ Call │ │ Hold │ │ Services │ │
|
||||
│ │ Manager │ │ Slayer │ │ (LLM, STT, │ │
|
||||
│ │ │ │ │ │ Recording, │ │
|
||||
│ │ │ │ │ │ Analytics, │ │
|
||||
│ │ │ │ │ │ Notify) │ │
|
||||
│ └────┬─────┘ └─────┬─────┘ └──────────────┘ │
|
||||
│ │ │ │
|
||||
│ ┌────┴──────────────┴───────────────────┐ │
|
||||
│ │ Sippy B2BUA Engine │ │
|
||||
│ │ (SIP calls, DTMF, conference bridge) │ │
|
||||
│ └────┬──────────────────────────────────┘ │
|
||||
│ │ │
|
||||
└───────┼─────────────────────────────────────────────────────────┘
|
||||
│
|
||||
┌────┴────┐
|
||||
│SIP Trunk│ ──→ PSTN
|
||||
└─────────┘
|
||||
```
|
||||
|
||||
## What's Implemented
|
||||
|
||||
### Core Engine
|
||||
- **Sippy B2BUA Engine** (`core/sippy_engine.py`) — SIP call control, DTMF, bridging, conference, trunk registration
|
||||
- **PJSUA2 Media Pipeline** (`core/media_pipeline.py`) — Audio routing, recording ports, conference bridge, WAV playback
|
||||
- **Call Manager** (`core/call_manager.py`) — Active call state tracking, lifecycle management
|
||||
- **Event Bus** (`core/event_bus.py`) — Async pub/sub with per-subscriber queues, type filtering, history
|
||||
|
||||
### Hold Slayer
|
||||
- **IVR Navigation** (`services/hold_slayer.py`) — Follows stored call flows step-by-step through phone menus
|
||||
- **Audio Classifier** (`services/audio_classifier.py`) — Real-time waveform analysis: silence, tones, DTMF, music, speech detection
|
||||
- **Call Flow Learner** (`services/call_flow_learner.py`) — Builds reusable call flows from exploration data, merges new discoveries
|
||||
- **LLM Fallback** — When a LISTEN step has no hardcoded DTMF, the LLM analyzes the transcript and picks the right menu option
|
||||
|
||||
### Intelligence Layer
|
||||
- **LLM Client** (`services/llm_client.py`) — OpenAI-compatible API client (Ollama, vLLM, LM Studio, OpenAI) with JSON parsing, retry, stats
|
||||
- **Transcription** (`services/transcription.py`) — Speaches/Whisper STT integration for live call transcription
|
||||
- **Recording** (`services/recording.py`) — WAV recording with date-organized storage, dual-channel support
|
||||
- **Call Analytics** (`services/call_analytics.py`) — Hold time stats, success rates, per-company patterns, time-of-day trends
|
||||
- **Notifications** (`services/notification.py`) — WebSocket + SMS alerts for human detection, call failures, hold status
|
||||
|
||||
### API Surface
|
||||
- **REST API** — Call management, device registration, call flow CRUD, service configuration
|
||||
- **WebSocket** — Real-time call events, transcripts, classification updates
|
||||
- **MCP Server** — 10 tools for AI assistant integration (make calls, send DTMF, get transcripts, manage flows)
|
||||
|
||||
### Data Models
|
||||
- **Call** — Active call state with classification history, transcript chunks, hold time tracking
|
||||
- **Call Flow** — Stored IVR trees with steps (DTMF, LISTEN, HOLD, TRANSFER, SPEAK)
|
||||
- **Events** — 20+ typed events (call lifecycle, hold slayer, audio, device, system)
|
||||
- **Device** — SIP phone/softphone registration and routing
|
||||
- **Contact** — Phone number management with routing preferences
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
hold-slayer/
|
||||
├── main.py # FastAPI app + lifespan (service wiring)
|
||||
├── config.py # Pydantic settings from .env
|
||||
├── core/
|
||||
│ ├── gateway.py # Top-level gateway orchestrator
|
||||
│ ├── sippy_engine.py # Sippy B2BUA SIP engine
|
||||
│ ├── media_pipeline.py # PJSUA2 audio routing
|
||||
│ ├── call_manager.py # Active call state management
|
||||
│ └── event_bus.py # Async pub/sub event bus
|
||||
├── services/
|
||||
│ ├── hold_slayer.py # IVR navigation + hold detection
|
||||
│ ├── audio_classifier.py # Waveform analysis (music/speech/DTMF)
|
||||
│ ├── call_flow_learner.py # Auto-learns IVR trees from calls
|
||||
│ ├── llm_client.py # OpenAI-compatible LLM client
|
||||
│ ├── transcription.py # Speaches/Whisper STT
|
||||
│ ├── recording.py # Call recording management
|
||||
│ ├── call_analytics.py # Call metrics and insights
|
||||
│ └── notification.py # WebSocket + SMS notifications
|
||||
├── api/
|
||||
│ ├── calls.py # Call management endpoints
|
||||
│ ├── call_flows.py # Call flow CRUD
|
||||
│ ├── devices.py # Device registration
|
||||
│ ├── websocket.py # Real-time event stream
|
||||
│ └── deps.py # FastAPI dependency injection
|
||||
├── mcp_server/
|
||||
│ └── server.py # MCP tools + resources (10 tools)
|
||||
├── models/
|
||||
│ ├── call.py # Call state models
|
||||
│ ├── call_flow.py # IVR tree models
|
||||
│ ├── events.py # Event type definitions
|
||||
│ ├── device.py # Device models
|
||||
│ └── contact.py # Contact models
|
||||
├── db/
|
||||
│ └── database.py # SQLAlchemy async (PostgreSQL/SQLite)
|
||||
└── tests/
|
||||
├── test_audio_classifier.py # 18 tests — waveform analysis
|
||||
├── test_call_flows.py # 10 tests — call flow models
|
||||
├── test_hold_slayer.py # 20 tests — IVR nav, EventBus, CallManager
|
||||
└── test_services.py # 27 tests — LLM, notifications, recording,
|
||||
# analytics, learner, EventBus
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Install
|
||||
|
||||
```bash
|
||||
python -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install -e ".[dev]"
|
||||
```
|
||||
|
||||
### 2. Configure
|
||||
|
||||
```bash
|
||||
cp .env.example .env
|
||||
# Edit .env with your SIP trunk credentials, LLM endpoint, etc.
|
||||
```
|
||||
|
||||
### 3. Run
|
||||
|
||||
```bash
|
||||
uvicorn main:app --host 0.0.0.0 --port 8100
|
||||
```
|
||||
|
||||
### 4. Test
|
||||
|
||||
```bash
|
||||
pytest tests/ -v
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### REST API
|
||||
|
||||
**Launch Hold Slayer on a number:**
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/calls/hold-slayer \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"number": "+18005551234",
|
||||
"intent": "dispute Amazon charge from December 15th",
|
||||
"call_flow_id": "chase_bank_main",
|
||||
"transfer_to": "sip_phone"
|
||||
}'
|
||||
```
|
||||
|
||||
**Check call status:**
|
||||
|
||||
```bash
|
||||
curl http://localhost:8000/api/calls/call_abc123
|
||||
```
|
||||
|
||||
### WebSocket — Real-Time Events
|
||||
|
||||
```javascript
|
||||
const ws = new WebSocket("ws://localhost:8000/ws/events");
|
||||
ws.onmessage = (msg) => {
|
||||
const event = JSON.parse(msg.data);
|
||||
// event.type: "human_detected", "hold_detected", "ivr_step", etc.
|
||||
// event.call_id: which call this is about
|
||||
// event.data: type-specific payload
|
||||
};
|
||||
```
|
||||
|
||||
### MCP — AI Assistant Integration
|
||||
|
||||
The MCP server exposes 10 tools that any MCP-compatible assistant can use:
|
||||
|
||||
| Tool | Description |
|
||||
|------|-------------|
|
||||
| `make_call` | Dial a number through the SIP trunk |
|
||||
| `end_call` | Hang up an active call |
|
||||
| `send_dtmf` | Send touch-tone digits to navigate menus |
|
||||
| `get_call_status` | Check current state of a call |
|
||||
| `get_call_transcript` | Get live transcript of a call |
|
||||
| `get_call_recording` | Get recording metadata and file path |
|
||||
| `list_active_calls` | List all calls in progress |
|
||||
| `get_call_summary` | Analytics summary (hold times, success rates) |
|
||||
| `search_call_history` | Search past calls by number or company |
|
||||
| `learn_call_flow` | Build a reusable call flow from exploration data |
|
||||
|
||||
## How It Works
|
||||
|
||||
1. **You request a call** — via REST API, MCP tool, or dashboard
|
||||
2. **Gateway dials out** — Sippy B2BUA places the call through your SIP trunk
|
||||
3. **Audio classifier listens** — Real-time waveform analysis detects IVR prompts, hold music, ringing, silence, and live speech
|
||||
4. **Transcription runs** — Speaches/Whisper converts audio to text in real-time
|
||||
5. **IVR navigator decides** — If a stored call flow exists, it follows the steps. If not, the LLM analyzes the transcript and picks the right menu option
|
||||
6. **Hold detection** — When hold music is detected, the system waits patiently and monitors for transitions
|
||||
7. **Human detection** — The classifier detects the transition from music/silence to live speech
|
||||
8. **Transfer** — Your desk phone rings. Pick up and you're talking to the agent. Zero hold time.
|
||||
|
||||
## Configuration
|
||||
|
||||
All configuration is via environment variables (see `.env.example`):
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `SIP_TRUNK_HOST` | Your SIP provider hostname | — |
|
||||
| `SIP_TRUNK_USERNAME` | SIP auth username | — |
|
||||
| `SIP_TRUNK_PASSWORD` | SIP auth password | — |
|
||||
| `SIP_TRUNK_DID` | Your phone number (E.164) | — |
|
||||
| `GATEWAY_SIP_PORT` | Port for device registration | `5080` |
|
||||
| `SPEACHES_URL` | Speaches/Whisper STT endpoint | `http://localhost:22070` |
|
||||
| `LLM_BASE_URL` | OpenAI-compatible LLM endpoint | `http://localhost:11434/v1` |
|
||||
| `LLM_MODEL` | Model name for IVR analysis | `llama3` |
|
||||
| `DATABASE_URL` | PostgreSQL or SQLite connection | SQLite fallback |
|
||||
|
||||
## Tech Stack
|
||||
|
||||
- **Python 3.13** + **asyncio** — Single-process async architecture
|
||||
- **FastAPI** — REST API + WebSocket server
|
||||
- **Sippy B2BUA** — SIP call control and DTMF
|
||||
- **PJSUA2** — Media pipeline, conference bridge, recording
|
||||
- **Speaches** (Whisper) — Speech-to-text
|
||||
- **Ollama / vLLM / OpenAI** — LLM for IVR menu analysis
|
||||
- **SQLAlchemy** — Async database (PostgreSQL or SQLite)
|
||||
- **MCP (Model Context Protocol)** — AI assistant integration
|
||||
|
||||
## Documentation
|
||||
|
||||
Full documentation is in [`/docs`](docs/README.md):
|
||||
|
||||
- [Architecture](docs/architecture.md) — System design, data flow, threading model
|
||||
- [Core Engine](docs/core-engine.md) — SIP engine, media pipeline, call manager, event bus
|
||||
- [Hold Slayer Service](docs/hold-slayer-service.md) — IVR navigation, hold detection, human detection
|
||||
- [Audio Classifier](docs/audio-classifier.md) — Waveform analysis, feature extraction, classification
|
||||
- [Services](docs/services.md) — LLM client, transcription, recording, analytics, notifications
|
||||
- [Call Flows](docs/call-flows.md) — Call flow model, step types, auto-learner
|
||||
- [API Reference](docs/api-reference.md) — REST endpoints, WebSocket, request/response schemas
|
||||
- [MCP Server](docs/mcp-server.md) — MCP tools and resources for AI assistants
|
||||
- [Configuration](docs/configuration.md) — All environment variables, deployment options
|
||||
- [Development](docs/development.md) — Setup, testing, contributing
|
||||
|
||||
## Build Phases
|
||||
|
||||
### Phase 1: Core Engine ✅
|
||||
|
||||
- [x] Extract EventBus to dedicated module with typed filtering
|
||||
- [x] Implement Sippy B2BUA SIP engine (signaling, DTMF, bridging)
|
||||
- [x] Implement PJSUA2 media pipeline (conference bridge, audio tapping, recording)
|
||||
- [x] Call manager with active call state tracking
|
||||
- [x] Gateway orchestrator wiring all components
|
||||
|
||||
### Phase 2: Intelligence Layer ✅
|
||||
|
||||
- [x] LLM client (OpenAI-compatible — Ollama, vLLM, LM Studio, OpenAI)
|
||||
- [x] Hold Slayer IVR navigation with LLM fallback for LISTEN steps
|
||||
- [x] Call Flow Learner — auto-builds reusable IVR trees from exploration
|
||||
- [x] Recording service with date-organized WAV storage
|
||||
- [x] Call analytics with hold time stats, per-company patterns
|
||||
- [x] Audio classifier with spectral analysis, DTMF detection, hold-to-human transition
|
||||
|
||||
### Phase 3: API & Integration ✅
|
||||
|
||||
- [x] REST API — calls, call flows, devices, DTMF
|
||||
- [x] WebSocket real-time event streaming
|
||||
- [x] MCP server with 16 tools + 3 resources
|
||||
- [x] Notification service (WebSocket + SMS)
|
||||
- [x] Service wiring in main.py lifespan
|
||||
- [x] 75 passing tests across 4 test files
|
||||
|
||||
### Phase 4: Production Hardening 🔜
|
||||
|
||||
- [ ] Alembic database migrations
|
||||
- [ ] API authentication (API keys / JWT)
|
||||
- [ ] Rate limiting on API endpoints
|
||||
- [ ] Structured JSON logging
|
||||
- [ ] Health check endpoints for all dependencies
|
||||
- [ ] Graceful degradation (classifier works without STT, etc.)
|
||||
- [ ] Docker Compose (Hold Slayer + PostgreSQL + Speaches + Ollama)
|
||||
|
||||
### Phase 5: Additional Services 🔮
|
||||
|
||||
- [ ] AI Receptionist — answer inbound calls, screen callers, take messages
|
||||
- [ ] Spam Filter — detect robocalls using caller ID + audio patterns
|
||||
- [ ] Smart Routing — time-of-day rules, device priority, DND
|
||||
- [ ] Noise Cancellation — RNNoise integration in media pipeline
|
||||
- [ ] TTS/Speech — play prompts into calls (SPEAK step support)
|
||||
|
||||
### Phase 6: Dashboard & UX 🔮
|
||||
|
||||
- [ ] Web dashboard with real-time call monitor
|
||||
- [ ] Call flow visual editor (drag-and-drop IVR tree builder)
|
||||
- [ ] Call history with transcript playback
|
||||
- [ ] Analytics dashboard with hold time graphs
|
||||
- [ ] Mobile app (or PWA) for on-the-go control
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
|
||||
Reference in New Issue
Block a user