feat: add initial Hold Slayer AI telephony gateway implementation

Complete project scaffolding and core implementation of an AI-powered telephony system that calls companies, navigates IVR menus, waits on hold, and transfers to the user when a human answers. Key components: - FastAPI server with REST API, WebSocket, and MCP (SSE) interfaces - SIP/VoIP call management via PJSUA2 with RTP audio streaming - LLM-powered IVR navigation using OpenAI/Anthropic with tool calling - Hold detection service combining audio analysis and silence detection - Real-time STT (Whisper/Deepgram) and TTS (OpenAI/Piper) pipelines - Call recording with per-channel and mixed audio capture - Event bus (asyncio pub/sub) for real-time client updates - Web dashboard with live call monitoring - SQLite persistence via SQLAlchemy with call history and analytics - Notification support (email, SMS, webhook, desktop) - Docker Compose deployment with Opal VoIP and Opal Media containers - Comprehensive test suite with unit, integration, and E2E tests - Simplified .gitignore and full project documentation in README
2026-03-21 19:23:26 +00:00
parent c9ff60702b
commit ecf37658ce
56 changed files with 11601 additions and 164 deletions
--- a/README.md
+++ b/README.md
@@ -1,2 +1,320 @@
-# hold-slayer
+# Hold Slayer 🔥

+**An AI-powered telephony gateway that calls companies, navigates IVR menus, waits on hold, and transfers you when a human picks up.**
+
+You give it a phone number and an intent ("dispute a charge on my December statement"). It dials the number through your SIP trunk, navigates the phone tree, sits through the hold music, and rings your desk phone the instant a live person answers. You never hear Vivaldi again.
+
+> [!CAUTION]
+> **Emergency calling — 911**
+> Hold Slayer passes `911` and `9911` directly to the PSTN trunk.
+> **Your SIP trunk provider must support E911 on your DID and have your
+> correct registered location on file before this system is put into
+> service.** VoIP emergency calls are location-dependent — verify
+> with your provider. Do not rely on this system as your only means
+> of reaching emergency services.
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                        FastAPI Server                           │
+│                                                                 │
+│  ┌──────────┐  ┌──────────┐  ┌───────────┐  ┌──────────────┐  │
+│  │ REST API │  │WebSocket │  │MCP Server │  │  Dashboard   │  │
+│  │ /api/*   │  │ /ws/*    │  │ (SSE)     │  │  /dashboard  │  │
+│  └────┬─────┘  └────┬─────┘  └─────┬─────┘  └──────────────┘  │
+│       │              │              │                            │
+│  ┌────┴──────────────┴──────────────┴────┐                     │
+│  │             Event Bus                  │                     │
+│  │   (asyncio Queue pub/sub per client)   │                     │
+│  └────┬──────────────┬──────────────┬────┘                     │
+│       │              │              │                            │
+│  ┌────┴─────┐  ┌─────┴─────┐  ┌────┴──────────┐               │
+│  │   Call   │  │   Hold    │  │   Services    │               │
+│  │ Manager  │  │  Slayer   │  │ (LLM, STT,   │               │
+│  │          │  │           │  │  Recording,   │               │
+│  │          │  │           │  │  Analytics,   │               │
+│  │          │  │           │  │  Notify)      │               │
+│  └────┬─────┘  └─────┬─────┘  └──────────────┘               │
+│       │              │                                          │
+│  ┌────┴──────────────┴───────────────────┐                     │
+│  │         Sippy B2BUA Engine            │                     │
+│  │  (SIP calls, DTMF, conference bridge) │                     │
+│  └────┬──────────────────────────────────┘                     │
+│       │                                                         │
+└───────┼─────────────────────────────────────────────────────────┘
+        │
+   ┌────┴────┐
+   │SIP Trunk│ ──→ PSTN
+   └─────────┘
+```
+
+## What's Implemented
+
+### Core Engine
+- **Sippy B2BUA Engine** (`core/sippy_engine.py`) — SIP call control, DTMF, bridging, conference, trunk registration
+- **PJSUA2 Media Pipeline** (`core/media_pipeline.py`) — Audio routing, recording ports, conference bridge, WAV playback
+- **Call Manager** (`core/call_manager.py`) — Active call state tracking, lifecycle management
+- **Event Bus** (`core/event_bus.py`) — Async pub/sub with per-subscriber queues, type filtering, history
+
+### Hold Slayer
+- **IVR Navigation** (`services/hold_slayer.py`) — Follows stored call flows step-by-step through phone menus
+- **Audio Classifier** (`services/audio_classifier.py`) — Real-time waveform analysis: silence, tones, DTMF, music, speech detection
+- **Call Flow Learner** (`services/call_flow_learner.py`) — Builds reusable call flows from exploration data, merges new discoveries
+- **LLM Fallback** — When a LISTEN step has no hardcoded DTMF, the LLM analyzes the transcript and picks the right menu option
+
+### Intelligence Layer
+- **LLM Client** (`services/llm_client.py`) — OpenAI-compatible API client (Ollama, vLLM, LM Studio, OpenAI) with JSON parsing, retry, stats
+- **Transcription** (`services/transcription.py`) — Speaches/Whisper STT integration for live call transcription
+- **Recording** (`services/recording.py`) — WAV recording with date-organized storage, dual-channel support
+- **Call Analytics** (`services/call_analytics.py`) — Hold time stats, success rates, per-company patterns, time-of-day trends
+- **Notifications** (`services/notification.py`) — WebSocket + SMS alerts for human detection, call failures, hold status
+
+### API Surface
+- **REST API** — Call management, device registration, call flow CRUD, service configuration
+- **WebSocket** — Real-time call events, transcripts, classification updates
+- **MCP Server** — 10 tools for AI assistant integration (make calls, send DTMF, get transcripts, manage flows)
+
+### Data Models
+- **Call** — Active call state with classification history, transcript chunks, hold time tracking
+- **Call Flow** — Stored IVR trees with steps (DTMF, LISTEN, HOLD, TRANSFER, SPEAK)
+- **Events** — 20+ typed events (call lifecycle, hold slayer, audio, device, system)
+- **Device** — SIP phone/softphone registration and routing
+- **Contact** — Phone number management with routing preferences
+
+## Project Structure
+
+```
+hold-slayer/
+├── main.py                      # FastAPI app + lifespan (service wiring)
+├── config.py                    # Pydantic settings from .env
+├── core/
+│   ├── gateway.py               # Top-level gateway orchestrator
+│   ├── sippy_engine.py          # Sippy B2BUA SIP engine
+│   ├── media_pipeline.py        # PJSUA2 audio routing
+│   ├── call_manager.py          # Active call state management
+│   └── event_bus.py             # Async pub/sub event bus
+├── services/
+│   ├── hold_slayer.py           # IVR navigation + hold detection
+│   ├── audio_classifier.py      # Waveform analysis (music/speech/DTMF)
+│   ├── call_flow_learner.py     # Auto-learns IVR trees from calls
+│   ├── llm_client.py            # OpenAI-compatible LLM client
+│   ├── transcription.py         # Speaches/Whisper STT
+│   ├── recording.py             # Call recording management
+│   ├── call_analytics.py        # Call metrics and insights
+│   └── notification.py          # WebSocket + SMS notifications
+├── api/
+│   ├── calls.py                 # Call management endpoints
+│   ├── call_flows.py            # Call flow CRUD
+│   ├── devices.py               # Device registration
+│   ├── websocket.py             # Real-time event stream
+│   └── deps.py                  # FastAPI dependency injection
+├── mcp_server/
+│   └── server.py                # MCP tools + resources (10 tools)
+├── models/
+│   ├── call.py                  # Call state models
+│   ├── call_flow.py             # IVR tree models
+│   ├── events.py                # Event type definitions
+│   ├── device.py                # Device models
+│   └── contact.py               # Contact models
+├── db/
+│   └── database.py              # SQLAlchemy async (PostgreSQL/SQLite)
+└── tests/
+    ├── test_audio_classifier.py # 18 tests — waveform analysis
+    ├── test_call_flows.py       # 10 tests — call flow models
+    ├── test_hold_slayer.py      # 20 tests — IVR nav, EventBus, CallManager
+    └── test_services.py         # 27 tests — LLM, notifications, recording,
+                                 #             analytics, learner, EventBus
+```
+
+## Quick Start
+
+### 1. Install
+
+```bash
+python -m venv .venv
+source .venv/bin/activate
+pip install -e ".[dev]"
+```
+
+### 2. Configure
+
+```bash
+cp .env.example .env
+# Edit .env with your SIP trunk credentials, LLM endpoint, etc.
+```
+
+### 3. Run
+
+```bash
+uvicorn main:app --host 0.0.0.0 --port 8100
+```
+
+### 4. Test
+
+```bash
+pytest tests/ -v
+```
+
+## Usage
+
+### REST API
+
+**Launch Hold Slayer on a number:**
+
+```bash
+curl -X POST http://localhost:8000/api/calls/hold-slayer \
+  -H "Content-Type: application/json" \
+  -d '{
+    "number": "+18005551234",
+    "intent": "dispute Amazon charge from December 15th",
+    "call_flow_id": "chase_bank_main",
+    "transfer_to": "sip_phone"
+  }'
+```
+
+**Check call status:**
+
+```bash
+curl http://localhost:8000/api/calls/call_abc123
+```
+
+### WebSocket — Real-Time Events
+
+```javascript
+const ws = new WebSocket("ws://localhost:8000/ws/events");
+ws.onmessage = (msg) => {
+  const event = JSON.parse(msg.data);
+  // event.type: "human_detected", "hold_detected", "ivr_step", etc.
+  // event.call_id: which call this is about
+  // event.data: type-specific payload
+};
+```
+
+### MCP — AI Assistant Integration
+
+The MCP server exposes 10 tools that any MCP-compatible assistant can use:
+
+| Tool | Description |
+|------|-------------|
+| `make_call` | Dial a number through the SIP trunk |
+| `end_call` | Hang up an active call |
+| `send_dtmf` | Send touch-tone digits to navigate menus |
+| `get_call_status` | Check current state of a call |
+| `get_call_transcript` | Get live transcript of a call |
+| `get_call_recording` | Get recording metadata and file path |
+| `list_active_calls` | List all calls in progress |
+| `get_call_summary` | Analytics summary (hold times, success rates) |
+| `search_call_history` | Search past calls by number or company |
+| `learn_call_flow` | Build a reusable call flow from exploration data |
+
+## How It Works
+
+1. **You request a call** — via REST API, MCP tool, or dashboard
+2. **Gateway dials out** — Sippy B2BUA places the call through your SIP trunk
+3. **Audio classifier listens** — Real-time waveform analysis detects IVR prompts, hold music, ringing, silence, and live speech
+4. **Transcription runs** — Speaches/Whisper converts audio to text in real-time
+5. **IVR navigator decides** — If a stored call flow exists, it follows the steps. If not, the LLM analyzes the transcript and picks the right menu option
+6. **Hold detection** — When hold music is detected, the system waits patiently and monitors for transitions
+7. **Human detection** — The classifier detects the transition from music/silence to live speech
+8. **Transfer** — Your desk phone rings. Pick up and you're talking to the agent. Zero hold time.
+
+## Configuration
+
+All configuration is via environment variables (see `.env.example`):
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `SIP_TRUNK_HOST` | Your SIP provider hostname | — |
+| `SIP_TRUNK_USERNAME` | SIP auth username | — |
+| `SIP_TRUNK_PASSWORD` | SIP auth password | — |
+| `SIP_TRUNK_DID` | Your phone number (E.164) | — |
+| `GATEWAY_SIP_PORT` | Port for device registration | `5080` |
+| `SPEACHES_URL` | Speaches/Whisper STT endpoint | `http://localhost:22070` |
+| `LLM_BASE_URL` | OpenAI-compatible LLM endpoint | `http://localhost:11434/v1` |
+| `LLM_MODEL` | Model name for IVR analysis | `llama3` |
+| `DATABASE_URL` | PostgreSQL or SQLite connection | SQLite fallback |
+
+## Tech Stack
+
+- **Python 3.13** + **asyncio** — Single-process async architecture
+- **FastAPI** — REST API + WebSocket server
+- **Sippy B2BUA** — SIP call control and DTMF
+- **PJSUA2** — Media pipeline, conference bridge, recording
+- **Speaches** (Whisper) — Speech-to-text
+- **Ollama / vLLM / OpenAI** — LLM for IVR menu analysis
+- **SQLAlchemy** — Async database (PostgreSQL or SQLite)
+- **MCP (Model Context Protocol)** — AI assistant integration
+
+## Documentation
+
+Full documentation is in [`/docs`](docs/README.md):
+
+- [Architecture](docs/architecture.md) — System design, data flow, threading model
+- [Core Engine](docs/core-engine.md) — SIP engine, media pipeline, call manager, event bus
+- [Hold Slayer Service](docs/hold-slayer-service.md) — IVR navigation, hold detection, human detection
+- [Audio Classifier](docs/audio-classifier.md) — Waveform analysis, feature extraction, classification
+- [Services](docs/services.md) — LLM client, transcription, recording, analytics, notifications
+- [Call Flows](docs/call-flows.md) — Call flow model, step types, auto-learner
+- [API Reference](docs/api-reference.md) — REST endpoints, WebSocket, request/response schemas
+- [MCP Server](docs/mcp-server.md) — MCP tools and resources for AI assistants
+- [Configuration](docs/configuration.md) — All environment variables, deployment options
+- [Development](docs/development.md) — Setup, testing, contributing
+
+## Build Phases
+
+### Phase 1: Core Engine ✅
+
+- [x] Extract EventBus to dedicated module with typed filtering
+- [x] Implement Sippy B2BUA SIP engine (signaling, DTMF, bridging)
+- [x] Implement PJSUA2 media pipeline (conference bridge, audio tapping, recording)
+- [x] Call manager with active call state tracking
+- [x] Gateway orchestrator wiring all components
+
+### Phase 2: Intelligence Layer ✅
+
+- [x] LLM client (OpenAI-compatible — Ollama, vLLM, LM Studio, OpenAI)
+- [x] Hold Slayer IVR navigation with LLM fallback for LISTEN steps
+- [x] Call Flow Learner — auto-builds reusable IVR trees from exploration
+- [x] Recording service with date-organized WAV storage
+- [x] Call analytics with hold time stats, per-company patterns
+- [x] Audio classifier with spectral analysis, DTMF detection, hold-to-human transition
+
+### Phase 3: API & Integration ✅
+
+- [x] REST API — calls, call flows, devices, DTMF
+- [x] WebSocket real-time event streaming
+- [x] MCP server with 16 tools + 3 resources
+- [x] Notification service (WebSocket + SMS)
+- [x] Service wiring in main.py lifespan
+- [x] 75 passing tests across 4 test files
+
+### Phase 4: Production Hardening 🔜
+
+- [ ] Alembic database migrations
+- [ ] API authentication (API keys / JWT)
+- [ ] Rate limiting on API endpoints
+- [ ] Structured JSON logging
+- [ ] Health check endpoints for all dependencies
+- [ ] Graceful degradation (classifier works without STT, etc.)
+- [ ] Docker Compose (Hold Slayer + PostgreSQL + Speaches + Ollama)
+
+### Phase 5: Additional Services 🔮
+
+- [ ] AI Receptionist — answer inbound calls, screen callers, take messages
+- [ ] Spam Filter — detect robocalls using caller ID + audio patterns
+- [ ] Smart Routing — time-of-day rules, device priority, DND
+- [ ] Noise Cancellation — RNNoise integration in media pipeline
+- [ ] TTS/Speech — play prompts into calls (SPEAK step support)
+
+### Phase 6: Dashboard & UX 🔮
+
+- [ ] Web dashboard with real-time call monitor
+- [ ] Call flow visual editor (drag-and-drop IVR tree builder)
+- [ ] Call history with transcript playback
+- [ ] Analytics dashboard with hold time graphs
+- [ ] Mobile app (or PWA) for on-the-go control
+
+## License
+
+MIT