feat: add initial Hold Slayer AI telephony gateway implementation

Complete project scaffolding and core implementation of an AI-powered telephony system that calls companies, navigates IVR menus, waits on hold, and transfers to the user when a human answers. Key components: - FastAPI server with REST API, WebSocket, and MCP (SSE) interfaces - SIP/VoIP call management via PJSUA2 with RTP audio streaming - LLM-powered IVR navigation using OpenAI/Anthropic with tool calling - Hold detection service combining audio analysis and silence detection - Real-time STT (Whisper/Deepgram) and TTS (OpenAI/Piper) pipelines - Call recording with per-channel and mixed audio capture - Event bus (asyncio pub/sub) for real-time client updates - Web dashboard with live call monitoring - SQLite persistence via SQLAlchemy with call history and analytics - Notification support (email, SMS, webhook, desktop) - Docker Compose deployment with Opal VoIP and Opal Media containers - Comprehensive test suite with unit, integration, and E2E tests - Simplified .gitignore and full project documentation in README
2026-03-21 19:23:26 +00:00
parent c9ff60702b
commit ecf37658ce
56 changed files with 11601 additions and 164 deletions
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -0,0 +1,178 @@
+# Architecture
+
+Hold Slayer is a single-process async Python application built on FastAPI. It acts as an intelligent B2BUA (Back-to-Back User Agent) sitting between your SIP trunk (PSTN access) and your desk phone/softphone.
+
+## System Diagram
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                        FastAPI Server                           │
+│                                                                 │
+│  ┌──────────┐  ┌──────────┐  ┌───────────┐  ┌──────────────┐  │
+│  │ REST API │  │WebSocket │  │MCP Server │  │  Dashboard   │  │
+│  │ /api/*   │  │ /ws/*    │  │ (SSE)     │  │  /dashboard  │  │
+│  └────┬─────┘  └────┬─────┘  └─────┬─────┘  └──────────────┘  │
+│       │              │              │                            │
+│  ┌────┴──────────────┴──────────────┴────┐                     │
+│  │             Event Bus                  │                     │
+│  │   (asyncio Queue pub/sub per client)   │                     │
+│  └────┬──────────────┬──────────────┬────┘                     │
+│       │              │              │                            │
+│  ┌────┴─────┐  ┌─────┴─────┐  ┌────┴──────────┐               │
+│  │   Call   │  │   Hold    │  │   Services    │               │
+│  │ Manager  │  │  Slayer   │  │ (LLM, STT,   │               │
+│  │          │  │           │  │  Recording,   │               │
+│  │          │  │           │  │  Analytics,   │               │
+│  │          │  │           │  │  Notify)      │               │
+│  └────┬─────┘  └─────┬─────┘  └──────────────┘               │
+│       │              │                                          │
+│  ┌────┴──────────────┴───────────────────┐                     │
+│  │         Sippy B2BUA Engine            │                     │
+│  │  (SIP calls, DTMF, conference bridge) │                     │
+│  └────┬──────────────────────────────────┘                     │
+│       │                                                         │
+└───────┼─────────────────────────────────────────────────────────┘
+        │
+   ┌────┴────┐
+   │SIP Trunk│ ──→ PSTN
+   └─────────┘
+```
+
+## Component Overview
+
+### Presentation Layer
+
+| Component | File | Protocol | Purpose |
+|-----------|------|----------|---------|
+| REST API | `api/calls.py`, `api/call_flows.py`, `api/devices.py` | HTTP | Call management, CRUD, configuration |
+| WebSocket | `api/websocket.py` | WS | Real-time event streaming to clients |
+| MCP Server | `mcp_server/server.py` | SSE | AI assistant tool integration |
+
+### Orchestration Layer
+
+| Component | File | Purpose |
+|-----------|------|---------|
+| Gateway | `core/gateway.py` | Top-level orchestrator — owns all services, routes calls |
+| Call Manager | `core/call_manager.py` | Active call state, lifecycle, transcript tracking |
+| Event Bus | `core/event_bus.py` | Async pub/sub connecting everything together |
+
+### Intelligence Layer
+
+| Component | File | Purpose |
+|-----------|------|---------|
+| Hold Slayer | `services/hold_slayer.py` | IVR navigation, hold monitoring, human detection |
+| Audio Classifier | `services/audio_classifier.py` | Real-time waveform analysis (music/speech/DTMF/silence) |
+| LLM Client | `services/llm_client.py` | OpenAI-compatible LLM for IVR menu decisions |
+| Transcription | `services/transcription.py` | Speaches/Whisper STT for live audio |
+| Call Flow Learner | `services/call_flow_learner.py` | Builds reusable IVR trees from exploration data |
+
+### Infrastructure Layer
+
+| Component | File | Purpose |
+|-----------|------|---------|
+| Sippy Engine | `core/sippy_engine.py` | SIP signaling (INVITE, BYE, REGISTER, DTMF) |
+| Media Pipeline | `core/media_pipeline.py` | PJSUA2 RTP media handling, conference bridge, recording |
+| Recording | `services/recording.py` | WAV file management and storage |
+| Analytics | `services/call_analytics.py` | Call metrics, hold time stats, trends |
+| Notifications | `services/notification.py` | WebSocket + SMS alerts |
+| Database | `db/database.py` | SQLAlchemy async (PostgreSQL or SQLite) |
+
+## Data Flow — Hold Slayer Call
+
+```
+1. User Request
+   POST /api/calls/hold-slayer { number, intent, call_flow_id }
+         │
+2. Gateway.make_call()
+   ├── CallManager.create_call()     → track state
+   ├── SippyEngine.make_call()       → SIP INVITE to trunk
+   └── MediaPipeline.add_stream()    → RTP media setup
+         │
+3. HoldSlayer.run_with_flow() or run_exploration()
+   ├── AudioClassifier.classify()    → analyze 3s audio windows
+   │   ├── silence? → wait
+   │   ├── ringing? → wait
+   │   ├── DTMF? → detect tones
+   │   ├── music? → HOLD_DETECTED event
+   │   └── speech? → transcribe + decide
+   │
+   ├── TranscriptionService.transcribe() → STT on speech audio
+   │
+   ├── LLMClient.analyze_ivr_menu() → pick menu option (fallback)
+   │   └── SippyEngine.send_dtmf()  → press the button
+   │
+   └── detect_hold_to_human_transition()
+       └── HUMAN_DETECTED! → transfer
+           │
+4. Transfer
+   ├── SippyEngine.bridge()          → connect call legs
+   ├── MediaPipeline.bridge_streams() → bridge RTP
+   ├── EventBus.publish(TRANSFER_STARTED)
+   └── NotificationService → "Pick up your phone!"
+         │
+5. Real-Time Updates (throughout)
+   EventBus.publish() → WebSocket clients
+                      → MCP server resources
+                      → Notification service
+                      → Analytics tracking
+```
+
+## Threading Model
+
+Hold Slayer is primarily single-threaded async (asyncio), with one exception:
+
+- **Main thread**: FastAPI + all async services (event bus, hold slayer, classifier, etc.)
+- **Sippy thread**: Sippy B2BUA runs its own event loop in a dedicated daemon thread. The `SippyEngine` bridges async↔sync via `asyncio.run_in_executor()`.
+- **PJSUA2**: Runs in the main thread using null audio device (no sound card needed — headless server mode).
+
+```
+Main Thread (asyncio)
+├── FastAPI (uvicorn)
+├── EventBus
+├── CallManager
+├── HoldSlayer
+├── AudioClassifier
+├── TranscriptionService
+├── LLMClient
+├── MediaPipeline (PJSUA2)
+├── NotificationService
+└── RecordingService
+
+Sippy Thread (daemon)
+└── Sippy B2BUA event loop
+    ├── SIP signaling
+    ├── DTMF relay
+    └── Call leg management
+```
+
+## Design Decisions
+
+### Why Sippy B2BUA + PJSUA2?
+
+We split SIP signaling and media handling into two separate libraries:
+
+- **Sippy B2BUA** handles SIP signaling (INVITE, BYE, REGISTER, re-INVITE, DTMF relay). It's battle-tested for telephony and handles the complex SIP state machine.
+- **PJSUA2** handles RTP media (audio streams, conference bridge, recording, tone generation). It provides a clean C++/Python API for media manipulation without needing to deal with raw RTP.
+
+This split lets us tap into the audio stream (for classification and STT) without interfering with SIP signaling, and bridge calls through a conference bridge for clean transfer.
+
+### Why asyncio Queue-based EventBus?
+
+- **Single process** — no need for Redis/RabbitMQ cross-process messaging
+- **Zero dependencies** — pure asyncio, no external services to deploy
+- **Per-subscriber queues** — slow consumers don't block fast publishers
+- **Dead subscriber cleanup** — full queues are automatically removed
+- **Event history** — late joiners can catch up on recent events
+
+If scaling to multiple gateway processes becomes necessary, the EventBus interface can be backed by Redis pub/sub without changing consumers.
+
+### Why OpenAI-compatible LLM API?
+
+The LLM client uses raw HTTP (httpx) against any OpenAI-compatible endpoint. This means:
+
+- **Ollama** (local, free) — `http://localhost:11434/v1`
+- **LM Studio** (local, free) — `http://localhost:1234/v1`
+- **vLLM** (local, fast) — `http://localhost:8000/v1`
+- **OpenAI** (cloud) — `https://api.openai.com/v1`
+
+No SDK dependency. No vendor lock-in. Switch models by changing one env var.