diff --git a/README.md b/README.md index ae3aa3e..4cdae81 100644 --- a/README.md +++ b/README.md @@ -58,28 +58,38 @@ You give it a phone number and an intent ("dispute a charge on my December state - **Event Bus** (`core/event_bus.py`) — Async pub/sub with per-subscriber queues, type filtering, history ### Hold Slayer -- **IVR Navigation** (`services/hold_slayer.py`) — Follows stored call flows step-by-step through phone menus +- **IVR Navigation** (`services/hold_slayer.py`) — Follows stored call flows step-by-step through phone menus, including SPEAK steps that synthesize speech via TTS - **Audio Classifier** (`services/audio_classifier.py`) — Real-time waveform analysis: silence, tones, DTMF, music, speech detection - **Call Flow Learner** (`services/call_flow_learner.py`) — Builds reusable call flows from exploration data, merges new discoveries - **LLM Fallback** — When a LISTEN step has no hardcoded DTMF, the LLM analyzes the transcript and picks the right menu option +### AI Receptionist & Smart Routing +- **AI Receptionist** (`services/receptionist.py`) — Answers inbound calls, greets via TTS, captures the caller's intent with STT + LLM, then routes to a device or takes a voicemail +- **Smart Routing** (`services/routing.py`) — Caller-pattern (glob), DNIS, time-of-day (with tz + midnight wrap), per-device DND, and ring-chain priority. Rules win over the LLM on conflict. +- **TTS** (`services/tts.py`) — [Rhema](https://github.com/heluca/rhema) (OpenAI-compatible `/v1/audio/speech`) — synthesizes Kokoro voices for the SPEAK step and receptionist prompts + ### Intelligence Layer - **LLM Client** (`services/llm_client.py`) — OpenAI-compatible API client (Ollama, vLLM, LM Studio, OpenAI) with JSON parsing, retry, stats - **Transcription** (`services/transcription.py`) — Speaches/Whisper STT integration for live call transcription -- **Recording** (`services/recording.py`) — WAV recording with date-organized storage, dual-channel support +- **Recording** (`services/recording.py`) — WAV recording with date-organized storage, dual-channel support, persisted to the `recordings` table +- **Call Persistence** (`services/call_persistence.py`) — Writes completed calls + transcript chunks to the database on hangup - **Call Analytics** (`services/call_analytics.py`) — Hold time stats, success rates, per-company patterns, time-of-day trends - **Notifications** (`services/notification.py`) — WebSocket + SMS alerts for human detection, call failures, hold status ### API Surface -- **REST API** — Call management, device registration, call flow CRUD, service configuration -- **WebSocket** — Real-time call events, transcripts, classification updates +- **REST API** — Call management, call history, transcripts, recordings, routing rules, device DND, call flow CRUD +- **WebSocket** — Real-time call events, transcripts, classification updates, receptionist state transitions - **MCP Server** — 10 tools for AI assistant integration (make calls, send DTMF, get transcripts, manage flows) +- **Dashboard** — SvelteKit UI served at `/dashboard` with live monitor, call history with transcript playback, and a routing-rules editor ### Data Models - **Call** — Active call state with classification history, transcript chunks, hold time tracking - **Call Flow** — Stored IVR trees with steps (DTMF, LISTEN, HOLD, TRANSFER, SPEAK) -- **Events** — 20+ typed events (call lifecycle, hold slayer, audio, device, system) -- **Device** — SIP phone/softphone registration and routing +- **Routing Rule** — Match (caller pattern, DNIS, time range) + action (ring_device, ring_chain, take_message, reject, dnd) +- **Transcript Chunk** — Per-call STT segments with speaker tag and timestamp offset (for click-to-seek playback) +- **Recording** — WAV file metadata (path, duration, size) per call +- **Events** — 30+ typed events (call lifecycle, hold slayer, audio, device, system, receptionist, routing) +- **Device** — SIP phone/softphone registration, priority, DND - **Contact** — Phone number management with routing preferences ## Project Structure @@ -95,9 +105,13 @@ hold-slayer/ │ ├── call_manager.py # Active call state management │ └── event_bus.py # Async pub/sub event bus ├── services/ -│ ├── hold_slayer.py # IVR navigation + hold detection +│ ├── hold_slayer.py # IVR navigation + hold detection + SPEAK +│ ├── receptionist.py # AI Receptionist state machine +│ ├── routing.py # Smart routing (rules, DND, ring chain) +│ ├── tts.py # Rhema TTS client (OpenAI-compatible) │ ├── audio_classifier.py # Waveform analysis (music/speech/DTMF) │ ├── call_flow_learner.py # Auto-learns IVR trees from calls +│ ├── call_persistence.py # Writes calls + transcript chunks on hangup │ ├── llm_client.py # OpenAI-compatible LLM client │ ├── transcription.py # Speaches/Whisper STT │ ├── recording.py # Call recording management @@ -105,15 +119,24 @@ hold-slayer/ │ └── notification.py # WebSocket + SMS notifications ├── api/ │ ├── calls.py # Call management endpoints +│ ├── call_history.py # History, transcript, recording playback │ ├── call_flows.py # Call flow CRUD │ ├── devices.py # Device registration +│ ├── routing.py # Routing rules CRUD + per-device DND │ ├── websocket.py # Real-time event stream │ └── deps.py # FastAPI dependency injection +├── dashboard/ # SvelteKit UI (built to dashboard/build) +│ └── src/routes/ +│ ├── +page.svelte # Live monitor +│ ├── history/ # Call history list +│ ├── calls/[call_id]/ # Detail page + transcript playback +│ └── routing/ # Rules editor + DND toggles ├── mcp_server/ │ └── server.py # MCP tools + resources (10 tools) ├── models/ │ ├── call.py # Call state models │ ├── call_flow.py # IVR tree models +│ ├── routing.py # Routing rule / match / action models │ ├── events.py # Event type definitions │ ├── device.py # Device models │ └── contact.py # Contact models @@ -123,8 +146,11 @@ hold-slayer/ ├── test_audio_classifier.py # 18 tests — waveform analysis ├── test_call_flows.py # 10 tests — call flow models ├── test_hold_slayer.py # 20 tests — IVR nav, EventBus, CallManager - └── test_services.py # 27 tests — LLM, notifications, recording, - # analytics, learner, EventBus + ├── test_services.py # 27 tests — LLM, notifications, recording, + │ # analytics, learner, EventBus + ├── test_tts.py # 4 tests — Rhema TTS client + ├── test_routing.py # 8 tests — rules evaluator + └── test_receptionist.py # 7 tests — receptionist decision logic ``` ## Quick Start @@ -144,13 +170,25 @@ cp .env.example .env # Edit .env with your SIP trunk credentials, LLM endpoint, etc. ``` -### 3. Run +### 3. Build the dashboard (optional but recommended) + +```bash +cd dashboard +npm install +npm run build +cd .. +``` + +The gateway serves the built UI at `/dashboard` automatically when +`dashboard/build/` exists. Skip this step if you only need the REST/WS API. + +### 4. Run ```bash uvicorn main:app --host 0.0.0.0 --port 8100 ``` -### 4. Test +### 5. Test ```bash pytest tests/ -v @@ -179,6 +217,39 @@ curl -X POST http://localhost:8000/api/calls/hold-slayer \ curl http://localhost:8000/api/calls/call_abc123 ``` +**Browse call history (persisted in the database):** + +```bash +curl http://localhost:8000/api/calls/history?limit=50 +curl http://localhost:8000/api/calls/call_abc123/transcript +curl -O http://localhost:8000/api/calls/call_abc123/recording # WAV +``` + +**Create a smart-routing rule:** + +```bash +curl -X POST http://localhost:8000/api/routing/rules \ + -H "Content-Type: application/json" \ + -d '{ + "name": "Block tollfree at night", + "priority": 10, + "enabled": true, + "match": { + "caller_pattern": "+1800*", + "time_range": {"start": "22:00", "end": "06:00", "tz": "America/Toronto", "days": [0,1,2,3,4,5,6]} + }, + "action": {"type": "reject", "message": "Office is closed."} + }' +``` + +**Toggle Do Not Disturb on a device:** + +```bash +curl -X PATCH http://localhost:8000/api/routing/devices/dev_abc123/dnd \ + -H "Content-Type: application/json" \ + -d '{"enabled": true}' +``` + ### WebSocket — Real-Time Events ```javascript @@ -210,15 +281,26 @@ The MCP server exposes 10 tools that any MCP-compatible assistant can use: ## How It Works +### Outbound (Hold Slayer) + 1. **You request a call** — via REST API, MCP tool, or dashboard 2. **Gateway dials out** — Sippy B2BUA places the call through your SIP trunk 3. **Audio classifier listens** — Real-time waveform analysis detects IVR prompts, hold music, ringing, silence, and live speech 4. **Transcription runs** — Speaches/Whisper converts audio to text in real-time -5. **IVR navigator decides** — If a stored call flow exists, it follows the steps. If not, the LLM analyzes the transcript and picks the right menu option +5. **IVR navigator decides** — If a stored call flow exists, it follows the steps (including SPEAK steps that synthesize speech via Rhema TTS). If not, the LLM analyzes the transcript and picks the right menu option 6. **Hold detection** — When hold music is detected, the system waits patiently and monitors for transitions 7. **Human detection** — The classifier detects the transition from music/silence to live speech 8. **Transfer** — Your desk phone rings. Pick up and you're talking to the agent. Zero hold time. +### Inbound (AI Receptionist + Smart Routing) + +1. **SIP INVITE arrives** — Sippy surfaces it to the gateway instead of auto-answering +2. **Routing rules evaluate** — Caller pattern, DNIS, and time-of-day rules run in priority order. A `reject` or `dnd` action declines the call immediately. +3. **Receptionist answers** — TTS plays the greeting; the call's audio tap captures the caller's response +4. **Intent capture** — The utterance is transcribed and the LLM extracts intent, urgency, and a recommended action (ring / message / reject) +5. **Final decision** — Routing rules win on conflict; otherwise the LLM's recommendation is followed +6. **Route or take a message** — `ring_chain` tries devices in priority order (skipping any in DND); if nobody picks up (or the action is `take_message`), the receptionist records up to 90s, transcribes it, and emits a `RECEPTIONIST_MESSAGE_SAVED` event + ## Configuration All configuration is via environment variables (see `.env.example`): @@ -233,16 +315,25 @@ All configuration is via environment variables (see `.env.example`): | `SPEACHES_URL` | Speaches/Whisper STT endpoint | `http://localhost:22070` | | `LLM_BASE_URL` | OpenAI-compatible LLM endpoint | `http://localhost:11434/v1` | | `LLM_MODEL` | Model name for IVR analysis | `llama3` | +| `TTS_BASE_URL` | Rhema TTS endpoint (OpenAI-compatible) | `http://localhost:8000` | +| `TTS_MODEL` | TTS model ID | `speaches-ai/Kokoro-82M-v1.0-ONNX` | +| `TTS_VOICE` | Default Kokoro voice | `af_heart` | +| `TTS_API_KEY` | Optional bearer token for Rhema | — | +| `RECEPTIONIST_ENABLED` | Answer inbound calls with the AI receptionist | `true` | +| `RECEPTIONIST_GREETING_TEMPLATE` | Spoken greeting | `"Hi, you've reached Robert's line. Who's calling, and what's this about?"` | +| `RECEPTIONIST_MESSAGE_MAX_SECONDS` | Voicemail cap | `90` | | `DATABASE_URL` | PostgreSQL or SQLite connection | SQLite fallback | ## Tech Stack - **Python 3.13** + **asyncio** — Single-process async architecture - **FastAPI** — REST API + WebSocket server +- **SvelteKit** — Dashboard UI (built static, served by FastAPI at `/dashboard`) - **Sippy B2BUA** — SIP call control and DTMF -- **PJSUA2** — Media pipeline, conference bridge, recording +- **PJSUA2** — Media pipeline, conference bridge, recording, WAV playback - **Speaches** (Whisper) — Speech-to-text -- **Ollama / vLLM / OpenAI** — LLM for IVR menu analysis +- **Rhema** (Kokoro) — Text-to-speech (OpenAI-compatible `/v1/audio/speech`) +- **Ollama / vLLM / OpenAI** — LLM for IVR menu analysis and receptionist intent capture - **SQLAlchemy** — Async database (PostgreSQL or SQLite) - **MCP (Model Context Protocol)** — AI assistant integration @@ -297,21 +388,22 @@ Full documentation is in [`/docs`](docs/README.md): - [ ] Structured JSON logging - [ ] Health check endpoints for all dependencies - [ ] Graceful degradation (classifier works without STT, etc.) -- [ ] Docker Compose (Hold Slayer + PostgreSQL + Speaches + Ollama) +- [ ] Docker Compose (Hold Slayer + PostgreSQL) -### Phase 5: Additional Services 🔮 +### Phase 5: Additional Services 🚧 -- [ ] AI Receptionist — answer inbound calls, screen callers, take messages +- [x] AI Receptionist — answer inbound calls, screen callers, take messages +- [x] Smart Routing — time-of-day rules, device priority, DND +- [x] TTS/Speech — play prompts into calls (SPEAK step support, Rhema/Kokoro) - [ ] Spam Filter — detect robocalls using caller ID + audio patterns -- [ ] Smart Routing — time-of-day rules, device priority, DND - [ ] Noise Cancellation — RNNoise integration in media pipeline -- [ ] TTS/Speech — play prompts into calls (SPEAK step support) -### Phase 6: Dashboard & UX 🔮 +### Phase 6: Dashboard & UX 🚧 -- [ ] Web dashboard with real-time call monitor +- [x] Web dashboard with real-time call monitor +- [x] Call history with transcript playback (click-to-seek) +- [x] Routing rules editor + per-device DND toggles - [ ] Call flow visual editor (drag-and-drop IVR tree builder) -- [ ] Call history with transcript playback - [ ] Analytics dashboard with hold time graphs - [ ] Mobile app (or PWA) for on-the-go control