feat: add initial Hold Slayer AI telephony gateway implementation

Complete project scaffolding and core implementation of an AI-powered telephony system that calls companies, navigates IVR menus, waits on hold, and transfers to the user when a human answers. Key components: - FastAPI server with REST API, WebSocket, and MCP (SSE) interfaces - SIP/VoIP call management via PJSUA2 with RTP audio streaming - LLM-powered IVR navigation using OpenAI/Anthropic with tool calling - Hold detection service combining audio analysis and silence detection - Real-time STT (Whisper/Deepgram) and TTS (OpenAI/Piper) pipelines - Call recording with per-channel and mixed audio capture - Event bus (asyncio pub/sub) for real-time client updates - Web dashboard with live call monitoring - SQLite persistence via SQLAlchemy with call history and analytics - Notification support (email, SMS, webhook, desktop) - Docker Compose deployment with Opal VoIP and Opal Media containers - Comprehensive test suite with unit, integration, and E2E tests - Simplified .gitignore and full project documentation in README
2026-03-21 19:23:26 +00:00
parent c9ff60702b
commit ecf37658ce
56 changed files with 11601 additions and 164 deletions
--- a/docs/services.md
+++ b/docs/services.md
@@ -0,0 +1,290 @@
+# Services
+
+The intelligence layer services that power Hold Slayer's decision-making, transcription, recording, analytics, and notifications.
+
+## LLM Client (`services/llm_client.py`)
+
+Async HTTP client for any OpenAI-compatible chat completion API. No SDK dependency — just httpx.
+
+### Supported Backends
+
+| Backend | URL | Notes |
+|---------|-----|-------|
+| Ollama | `http://localhost:11434/v1` | Local, free, good for dev |
+| LM Studio | `http://localhost:1234/v1` | Local, free, GUI |
+| vLLM | `http://localhost:8000/v1` | Local, fast, production |
+| OpenAI | `https://api.openai.com/v1` | Cloud, paid, best quality |
+
+### Usage
+
+```python
+client = LLMClient(
+    base_url="http://localhost:11434/v1",
+    model="llama3",
+    api_key="not-needed",  # Ollama doesn't need a key
+    timeout=30.0,
+    max_tokens=1024,
+    temperature=0.3,
+)
+
+# Simple chat
+response = await client.chat("What is 2+2?")
+# "4"
+
+# Chat with system prompt
+response = await client.chat(
+    "Parse this menu transcript...",
+    system="You are a phone menu parser. Return JSON.",
+)
+
+# Structured JSON response (auto-parses)
+result = await client.chat_json(
+    "Extract menu options from: Press 1 for billing, press 2 for support",
+    system="Return JSON with 'options' array.",
+)
+# {"options": [{"digit": "1", "label": "billing"}, {"digit": "2", "label": "support"}]}
+```
+
+### IVR Menu Analysis
+
+The primary use case — analyzing IVR transcripts to pick the right menu option:
+
+```python
+decision = await client.analyze_ivr_menu(
+    transcript="Welcome to Chase Bank. Press 1 for account balance, press 2 for recent transactions, press 3 for disputes, press 0 for an agent.",
+    intent="dispute a charge from Amazon on December 15th",
+    previous_selections=["main_menu"],
+)
+# {"action": "dtmf", "digits": "3", "reasoning": "Disputes is the correct department"}
+```
+
+### JSON Extraction
+
+The client handles messy LLM output gracefully:
+
+1. Try `json.loads()` on the raw response
+2. If that fails, look for ```json ... ``` markdown blocks
+3. If that fails, look for `{...}` patterns in the text
+4. If all fail, return empty dict (caller handles gracefully)
+
+### Stats Tracking
+
+```python
+stats = client.stats
+# {
+#     "total_requests": 47,
+#     "total_errors": 2,
+#     "avg_latency_ms": 234.5,
+#     "model": "llama3",
+#     "base_url": "http://localhost:11434/v1"
+# }
+```
+
+### Error Handling
+
+- HTTP errors return empty string/dict (never crashes the call)
+- Timeouts are configurable (default 30s)
+- All errors are logged with full context
+- Stats track error rates for monitoring
+
+## Transcription Service (`services/transcription.py`)
+
+Real-time speech-to-text using Speaches (a self-hosted Whisper API).
+
+### Architecture
+
+```
+Audio frames (from AudioTap)
+  │
+  └── POST /v1/audio/transcriptions
+      ├── model: whisper-large-v3
+      ├── audio: WAV bytes
+      └── language: en
+          │
+          └── Response: { "text": "Press 1 for billing..." }
+```
+
+### Usage
+
+```python
+service = TranscriptionService(
+    speaches_url="http://perseus.helu.ca:22070",
+    model="whisper-large-v3",
+)
+
+# Transcribe audio bytes
+text = await service.transcribe(audio_bytes)
+# "Welcome to Chase Bank. For English, press 1."
+
+# Transcribe with language hint
+text = await service.transcribe(audio_bytes, language="fr")
+```
+
+### Integration with Hold Slayer
+
+The transcription service is called when the audio classifier detects speech (IVR_PROMPT or LIVE_HUMAN). The transcript is then:
+
+1. Published as a `TRANSCRIPT_CHUNK` event (→ WebSocket clients)
+2. Fed to the LLM for IVR menu analysis
+3. Stored in the call's transcript history
+4. Used by the Call Flow Learner to build reusable flows
+
+## Recording Service (`services/recording.py`)
+
+Manages call recordings via the PJSUA2 media pipeline.
+
+### Storage Structure
+
+```
+recordings/
+├── 2026/
+│   ├── 01/
+│   │   ├── 15/
+│   │   │   ├── call_abc123_outbound.wav
+│   │   │   ├── call_abc123_mixed.wav
+│   │   │   └── call_def456_outbound.wav
+│   │   └── 16/
+│   │       └── ...
+│   └── 02/
+│       └── ...
+```
+
+### Recording Types
+
+| Type | Description |
+|------|-------------|
+| **Outbound** | Audio from the company (IVR, hold music, agent) |
+| **Inbound** | Audio from the user's device (after transfer) |
+| **Mixed** | Both parties in one file (for review) |
+
+### Usage
+
+```python
+service = RecordingService(
+    storage_dir="recordings",
+    max_recording_seconds=7200,  # 2 hours
+    sample_rate=16000,
+)
+
+# Start recording
+session = await service.start_recording(call_id, stream_id)
+# session.path = "recordings/2026/01/15/call_abc123_outbound.wav"
+
+# Stop recording
+metadata = await service.stop_recording(call_id)
+# metadata = { "duration": 847.3, "file_size": 27113600, "path": "..." }
+
+# List recordings for a call
+recordings = service.get_recordings(call_id)
+```
+
+## Call Analytics (`services/call_analytics.py`)
+
+Tracks call metrics and provides insights for monitoring and optimization.
+
+### Metrics Tracked
+
+| Metric | Description |
+|--------|-------------|
+| Hold time | Duration spent on hold per call |
+| Total call duration | End-to-end call time |
+| Success rate | Percentage of calls that reached a human |
+| IVR navigation time | Time spent navigating menus |
+| Company patterns | Per-company hold time averages |
+| Time-of-day trends | When hold times are shortest |
+
+### Usage
+
+```python
+analytics = CallAnalytics(max_history=10000)
+
+# Record a completed call
+analytics.record_call(
+    call_id="call_abc123",
+    number="+18005551234",
+    company="Chase Bank",
+    hold_time=780,
+    total_duration=847,
+    success=True,
+    ivr_steps=6,
+)
+
+# Get summary
+summary = analytics.get_summary()
+# {
+#     "total_calls": 142,
+#     "success_rate": 0.89,
+#     "avg_hold_time": 623.4,
+#     "avg_total_duration": 712.1,
+# }
+
+# Per-company stats
+stats = analytics.get_company_stats("Chase Bank")
+# {
+#     "total_calls": 23,
+#     "avg_hold_time": 845.2,
+#     "best_time": "Tuesday 10:00 AM",
+#     "success_rate": 0.91,
+# }
+
+# Top numbers by call volume
+top = analytics.get_top_numbers(limit=10)
+
+# Hold time trends by hour
+trends = analytics.get_hold_time_trend()
+# [{"hour": 9, "avg_hold": 320}, {"hour": 10, "avg_hold": 480}, ...]
+```
+
+## Notification Service (`services/notification.py`)
+
+Sends alerts when important things happen on calls.
+
+### Notification Channels
+
+| Channel | Status | Use Case |
+|---------|--------|----------|
+| **WebSocket** | ✅ Active | Real-time UI updates (always on) |
+| **SMS** | ✅ Active | Critical alerts (human detected, call failed) |
+| **Push** | 🔮 Future | Mobile app notifications |
+
+### Notification Priority
+
+| Priority | Events | Delivery |
+|----------|--------|----------|
+| `CRITICAL` | Human detected, transfer started | WebSocket + SMS |
+| `HIGH` | Call failed, call timeout | WebSocket + SMS |
+| `NORMAL` | Hold detected, call ended | WebSocket only |
+| `LOW` | IVR step, DTMF sent | WebSocket only |
+
+### Event → Notification Mapping
+
+| Event | Notification |
+|-------|-------------|
+| `HUMAN_DETECTED` | 🚨 "A live person picked up — transferring you now!" |
+| `TRANSFER_STARTED` | 📞 "Your call has been connected. Pick up your phone!" |
+| `CALL_FAILED` | ❌ "The call couldn't be completed." |
+| `HOLD_DETECTED` | ⏳ "You're on hold. We'll notify you when someone picks up." |
+| `IVR_STEP` | 📍 "Navigating phone menu..." |
+| `IVR_DTMF_SENT` | 📱 "Pressed 3" |
+| `CALL_ENDED` | 📴 "The call has ended." |
+
+### Deduplication
+
+The notification service tracks what's been sent per call to avoid spamming:
+
+```python
+# Won't send duplicate "on hold" notifications for the same call
+self._notified: dict[str, set[str]]  # call_id → set of event dedup keys
+```
+
+Tracking is cleaned up when a call ends.
+
+### SMS Configuration
+
+SMS is sent for `CRITICAL` priority notifications when `NOTIFY_SMS_NUMBER` is configured:
+
+```env
+NOTIFY_SMS_NUMBER=+15559876543
+```
+
+The SMS sender is a placeholder — wire up your preferred provider (Twilio, AWS SNS, etc.).