Docs
This commit is contained in:
136
README.md
136
README.md
@@ -58,28 +58,38 @@ You give it a phone number and an intent ("dispute a charge on my December state
|
||||
- **Event Bus** (`core/event_bus.py`) — Async pub/sub with per-subscriber queues, type filtering, history
|
||||
|
||||
### Hold Slayer
|
||||
- **IVR Navigation** (`services/hold_slayer.py`) — Follows stored call flows step-by-step through phone menus
|
||||
- **IVR Navigation** (`services/hold_slayer.py`) — Follows stored call flows step-by-step through phone menus, including SPEAK steps that synthesize speech via TTS
|
||||
- **Audio Classifier** (`services/audio_classifier.py`) — Real-time waveform analysis: silence, tones, DTMF, music, speech detection
|
||||
- **Call Flow Learner** (`services/call_flow_learner.py`) — Builds reusable call flows from exploration data, merges new discoveries
|
||||
- **LLM Fallback** — When a LISTEN step has no hardcoded DTMF, the LLM analyzes the transcript and picks the right menu option
|
||||
|
||||
### AI Receptionist & Smart Routing
|
||||
- **AI Receptionist** (`services/receptionist.py`) — Answers inbound calls, greets via TTS, captures the caller's intent with STT + LLM, then routes to a device or takes a voicemail
|
||||
- **Smart Routing** (`services/routing.py`) — Caller-pattern (glob), DNIS, time-of-day (with tz + midnight wrap), per-device DND, and ring-chain priority. Rules win over the LLM on conflict.
|
||||
- **TTS** (`services/tts.py`) — [Rhema](https://github.com/heluca/rhema) (OpenAI-compatible `/v1/audio/speech`) — synthesizes Kokoro voices for the SPEAK step and receptionist prompts
|
||||
|
||||
### Intelligence Layer
|
||||
- **LLM Client** (`services/llm_client.py`) — OpenAI-compatible API client (Ollama, vLLM, LM Studio, OpenAI) with JSON parsing, retry, stats
|
||||
- **Transcription** (`services/transcription.py`) — Speaches/Whisper STT integration for live call transcription
|
||||
- **Recording** (`services/recording.py`) — WAV recording with date-organized storage, dual-channel support
|
||||
- **Recording** (`services/recording.py`) — WAV recording with date-organized storage, dual-channel support, persisted to the `recordings` table
|
||||
- **Call Persistence** (`services/call_persistence.py`) — Writes completed calls + transcript chunks to the database on hangup
|
||||
- **Call Analytics** (`services/call_analytics.py`) — Hold time stats, success rates, per-company patterns, time-of-day trends
|
||||
- **Notifications** (`services/notification.py`) — WebSocket + SMS alerts for human detection, call failures, hold status
|
||||
|
||||
### API Surface
|
||||
- **REST API** — Call management, device registration, call flow CRUD, service configuration
|
||||
- **WebSocket** — Real-time call events, transcripts, classification updates
|
||||
- **REST API** — Call management, call history, transcripts, recordings, routing rules, device DND, call flow CRUD
|
||||
- **WebSocket** — Real-time call events, transcripts, classification updates, receptionist state transitions
|
||||
- **MCP Server** — 10 tools for AI assistant integration (make calls, send DTMF, get transcripts, manage flows)
|
||||
- **Dashboard** — SvelteKit UI served at `/dashboard` with live monitor, call history with transcript playback, and a routing-rules editor
|
||||
|
||||
### Data Models
|
||||
- **Call** — Active call state with classification history, transcript chunks, hold time tracking
|
||||
- **Call Flow** — Stored IVR trees with steps (DTMF, LISTEN, HOLD, TRANSFER, SPEAK)
|
||||
- **Events** — 20+ typed events (call lifecycle, hold slayer, audio, device, system)
|
||||
- **Device** — SIP phone/softphone registration and routing
|
||||
- **Routing Rule** — Match (caller pattern, DNIS, time range) + action (ring_device, ring_chain, take_message, reject, dnd)
|
||||
- **Transcript Chunk** — Per-call STT segments with speaker tag and timestamp offset (for click-to-seek playback)
|
||||
- **Recording** — WAV file metadata (path, duration, size) per call
|
||||
- **Events** — 30+ typed events (call lifecycle, hold slayer, audio, device, system, receptionist, routing)
|
||||
- **Device** — SIP phone/softphone registration, priority, DND
|
||||
- **Contact** — Phone number management with routing preferences
|
||||
|
||||
## Project Structure
|
||||
@@ -95,9 +105,13 @@ hold-slayer/
|
||||
│ ├── call_manager.py # Active call state management
|
||||
│ └── event_bus.py # Async pub/sub event bus
|
||||
├── services/
|
||||
│ ├── hold_slayer.py # IVR navigation + hold detection
|
||||
│ ├── hold_slayer.py # IVR navigation + hold detection + SPEAK
|
||||
│ ├── receptionist.py # AI Receptionist state machine
|
||||
│ ├── routing.py # Smart routing (rules, DND, ring chain)
|
||||
│ ├── tts.py # Rhema TTS client (OpenAI-compatible)
|
||||
│ ├── audio_classifier.py # Waveform analysis (music/speech/DTMF)
|
||||
│ ├── call_flow_learner.py # Auto-learns IVR trees from calls
|
||||
│ ├── call_persistence.py # Writes calls + transcript chunks on hangup
|
||||
│ ├── llm_client.py # OpenAI-compatible LLM client
|
||||
│ ├── transcription.py # Speaches/Whisper STT
|
||||
│ ├── recording.py # Call recording management
|
||||
@@ -105,15 +119,24 @@ hold-slayer/
|
||||
│ └── notification.py # WebSocket + SMS notifications
|
||||
├── api/
|
||||
│ ├── calls.py # Call management endpoints
|
||||
│ ├── call_history.py # History, transcript, recording playback
|
||||
│ ├── call_flows.py # Call flow CRUD
|
||||
│ ├── devices.py # Device registration
|
||||
│ ├── routing.py # Routing rules CRUD + per-device DND
|
||||
│ ├── websocket.py # Real-time event stream
|
||||
│ └── deps.py # FastAPI dependency injection
|
||||
├── dashboard/ # SvelteKit UI (built to dashboard/build)
|
||||
│ └── src/routes/
|
||||
│ ├── +page.svelte # Live monitor
|
||||
│ ├── history/ # Call history list
|
||||
│ ├── calls/[call_id]/ # Detail page + transcript playback
|
||||
│ └── routing/ # Rules editor + DND toggles
|
||||
├── mcp_server/
|
||||
│ └── server.py # MCP tools + resources (10 tools)
|
||||
├── models/
|
||||
│ ├── call.py # Call state models
|
||||
│ ├── call_flow.py # IVR tree models
|
||||
│ ├── routing.py # Routing rule / match / action models
|
||||
│ ├── events.py # Event type definitions
|
||||
│ ├── device.py # Device models
|
||||
│ └── contact.py # Contact models
|
||||
@@ -123,8 +146,11 @@ hold-slayer/
|
||||
├── test_audio_classifier.py # 18 tests — waveform analysis
|
||||
├── test_call_flows.py # 10 tests — call flow models
|
||||
├── test_hold_slayer.py # 20 tests — IVR nav, EventBus, CallManager
|
||||
└── test_services.py # 27 tests — LLM, notifications, recording,
|
||||
# analytics, learner, EventBus
|
||||
├── test_services.py # 27 tests — LLM, notifications, recording,
|
||||
│ # analytics, learner, EventBus
|
||||
├── test_tts.py # 4 tests — Rhema TTS client
|
||||
├── test_routing.py # 8 tests — rules evaluator
|
||||
└── test_receptionist.py # 7 tests — receptionist decision logic
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
@@ -144,13 +170,25 @@ cp .env.example .env
|
||||
# Edit .env with your SIP trunk credentials, LLM endpoint, etc.
|
||||
```
|
||||
|
||||
### 3. Run
|
||||
### 3. Build the dashboard (optional but recommended)
|
||||
|
||||
```bash
|
||||
cd dashboard
|
||||
npm install
|
||||
npm run build
|
||||
cd ..
|
||||
```
|
||||
|
||||
The gateway serves the built UI at `/dashboard` automatically when
|
||||
`dashboard/build/` exists. Skip this step if you only need the REST/WS API.
|
||||
|
||||
### 4. Run
|
||||
|
||||
```bash
|
||||
uvicorn main:app --host 0.0.0.0 --port 8100
|
||||
```
|
||||
|
||||
### 4. Test
|
||||
### 5. Test
|
||||
|
||||
```bash
|
||||
pytest tests/ -v
|
||||
@@ -179,6 +217,39 @@ curl -X POST http://localhost:8000/api/calls/hold-slayer \
|
||||
curl http://localhost:8000/api/calls/call_abc123
|
||||
```
|
||||
|
||||
**Browse call history (persisted in the database):**
|
||||
|
||||
```bash
|
||||
curl http://localhost:8000/api/calls/history?limit=50
|
||||
curl http://localhost:8000/api/calls/call_abc123/transcript
|
||||
curl -O http://localhost:8000/api/calls/call_abc123/recording # WAV
|
||||
```
|
||||
|
||||
**Create a smart-routing rule:**
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/routing/rules \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "Block tollfree at night",
|
||||
"priority": 10,
|
||||
"enabled": true,
|
||||
"match": {
|
||||
"caller_pattern": "+1800*",
|
||||
"time_range": {"start": "22:00", "end": "06:00", "tz": "America/Toronto", "days": [0,1,2,3,4,5,6]}
|
||||
},
|
||||
"action": {"type": "reject", "message": "Office is closed."}
|
||||
}'
|
||||
```
|
||||
|
||||
**Toggle Do Not Disturb on a device:**
|
||||
|
||||
```bash
|
||||
curl -X PATCH http://localhost:8000/api/routing/devices/dev_abc123/dnd \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"enabled": true}'
|
||||
```
|
||||
|
||||
### WebSocket — Real-Time Events
|
||||
|
||||
```javascript
|
||||
@@ -210,15 +281,26 @@ The MCP server exposes 10 tools that any MCP-compatible assistant can use:
|
||||
|
||||
## How It Works
|
||||
|
||||
### Outbound (Hold Slayer)
|
||||
|
||||
1. **You request a call** — via REST API, MCP tool, or dashboard
|
||||
2. **Gateway dials out** — Sippy B2BUA places the call through your SIP trunk
|
||||
3. **Audio classifier listens** — Real-time waveform analysis detects IVR prompts, hold music, ringing, silence, and live speech
|
||||
4. **Transcription runs** — Speaches/Whisper converts audio to text in real-time
|
||||
5. **IVR navigator decides** — If a stored call flow exists, it follows the steps. If not, the LLM analyzes the transcript and picks the right menu option
|
||||
5. **IVR navigator decides** — If a stored call flow exists, it follows the steps (including SPEAK steps that synthesize speech via Rhema TTS). If not, the LLM analyzes the transcript and picks the right menu option
|
||||
6. **Hold detection** — When hold music is detected, the system waits patiently and monitors for transitions
|
||||
7. **Human detection** — The classifier detects the transition from music/silence to live speech
|
||||
8. **Transfer** — Your desk phone rings. Pick up and you're talking to the agent. Zero hold time.
|
||||
|
||||
### Inbound (AI Receptionist + Smart Routing)
|
||||
|
||||
1. **SIP INVITE arrives** — Sippy surfaces it to the gateway instead of auto-answering
|
||||
2. **Routing rules evaluate** — Caller pattern, DNIS, and time-of-day rules run in priority order. A `reject` or `dnd` action declines the call immediately.
|
||||
3. **Receptionist answers** — TTS plays the greeting; the call's audio tap captures the caller's response
|
||||
4. **Intent capture** — The utterance is transcribed and the LLM extracts intent, urgency, and a recommended action (ring / message / reject)
|
||||
5. **Final decision** — Routing rules win on conflict; otherwise the LLM's recommendation is followed
|
||||
6. **Route or take a message** — `ring_chain` tries devices in priority order (skipping any in DND); if nobody picks up (or the action is `take_message`), the receptionist records up to 90s, transcribes it, and emits a `RECEPTIONIST_MESSAGE_SAVED` event
|
||||
|
||||
## Configuration
|
||||
|
||||
All configuration is via environment variables (see `.env.example`):
|
||||
@@ -233,16 +315,25 @@ All configuration is via environment variables (see `.env.example`):
|
||||
| `SPEACHES_URL` | Speaches/Whisper STT endpoint | `http://localhost:22070` |
|
||||
| `LLM_BASE_URL` | OpenAI-compatible LLM endpoint | `http://localhost:11434/v1` |
|
||||
| `LLM_MODEL` | Model name for IVR analysis | `llama3` |
|
||||
| `TTS_BASE_URL` | Rhema TTS endpoint (OpenAI-compatible) | `http://localhost:8000` |
|
||||
| `TTS_MODEL` | TTS model ID | `speaches-ai/Kokoro-82M-v1.0-ONNX` |
|
||||
| `TTS_VOICE` | Default Kokoro voice | `af_heart` |
|
||||
| `TTS_API_KEY` | Optional bearer token for Rhema | — |
|
||||
| `RECEPTIONIST_ENABLED` | Answer inbound calls with the AI receptionist | `true` |
|
||||
| `RECEPTIONIST_GREETING_TEMPLATE` | Spoken greeting | `"Hi, you've reached Robert's line. Who's calling, and what's this about?"` |
|
||||
| `RECEPTIONIST_MESSAGE_MAX_SECONDS` | Voicemail cap | `90` |
|
||||
| `DATABASE_URL` | PostgreSQL or SQLite connection | SQLite fallback |
|
||||
|
||||
## Tech Stack
|
||||
|
||||
- **Python 3.13** + **asyncio** — Single-process async architecture
|
||||
- **FastAPI** — REST API + WebSocket server
|
||||
- **SvelteKit** — Dashboard UI (built static, served by FastAPI at `/dashboard`)
|
||||
- **Sippy B2BUA** — SIP call control and DTMF
|
||||
- **PJSUA2** — Media pipeline, conference bridge, recording
|
||||
- **PJSUA2** — Media pipeline, conference bridge, recording, WAV playback
|
||||
- **Speaches** (Whisper) — Speech-to-text
|
||||
- **Ollama / vLLM / OpenAI** — LLM for IVR menu analysis
|
||||
- **Rhema** (Kokoro) — Text-to-speech (OpenAI-compatible `/v1/audio/speech`)
|
||||
- **Ollama / vLLM / OpenAI** — LLM for IVR menu analysis and receptionist intent capture
|
||||
- **SQLAlchemy** — Async database (PostgreSQL or SQLite)
|
||||
- **MCP (Model Context Protocol)** — AI assistant integration
|
||||
|
||||
@@ -297,21 +388,22 @@ Full documentation is in [`/docs`](docs/README.md):
|
||||
- [ ] Structured JSON logging
|
||||
- [ ] Health check endpoints for all dependencies
|
||||
- [ ] Graceful degradation (classifier works without STT, etc.)
|
||||
- [ ] Docker Compose (Hold Slayer + PostgreSQL + Speaches + Ollama)
|
||||
- [ ] Docker Compose (Hold Slayer + PostgreSQL)
|
||||
|
||||
### Phase 5: Additional Services 🔮
|
||||
### Phase 5: Additional Services 🚧
|
||||
|
||||
- [ ] AI Receptionist — answer inbound calls, screen callers, take messages
|
||||
- [x] AI Receptionist — answer inbound calls, screen callers, take messages
|
||||
- [x] Smart Routing — time-of-day rules, device priority, DND
|
||||
- [x] TTS/Speech — play prompts into calls (SPEAK step support, Rhema/Kokoro)
|
||||
- [ ] Spam Filter — detect robocalls using caller ID + audio patterns
|
||||
- [ ] Smart Routing — time-of-day rules, device priority, DND
|
||||
- [ ] Noise Cancellation — RNNoise integration in media pipeline
|
||||
- [ ] TTS/Speech — play prompts into calls (SPEAK step support)
|
||||
|
||||
### Phase 6: Dashboard & UX 🔮
|
||||
### Phase 6: Dashboard & UX 🚧
|
||||
|
||||
- [ ] Web dashboard with real-time call monitor
|
||||
- [x] Web dashboard with real-time call monitor
|
||||
- [x] Call history with transcript playback (click-to-seek)
|
||||
- [x] Routing rules editor + per-device DND toggles
|
||||
- [ ] Call flow visual editor (drag-and-drop IVR tree builder)
|
||||
- [ ] Call history with transcript playback
|
||||
- [ ] Analytics dashboard with hold time graphs
|
||||
- [ ] Mobile app (or PWA) for on-the-go control
|
||||
|
||||
|
||||
Reference in New Issue
Block a user