# Architecture Hold Slayer is a single-process async Python application built on FastAPI. It acts as an intelligent B2BUA (Back-to-Back User Agent) sitting between your SIP trunk (PSTN access) and your desk phone/softphone. ## System Diagram ``` ┌─────────────────────────────────────────────────────────────────┐ │ FastAPI Server │ │ │ │ ┌──────────┐ ┌──────────┐ ┌───────────┐ ┌──────────────┐ │ │ │ REST API │ │WebSocket │ │MCP Server │ │ Dashboard │ │ │ │ /api/* │ │ /ws/* │ │ (SSE) │ │ /dashboard │ │ │ └────┬─────┘ └────┬─────┘ └─────┬─────┘ └──────────────┘ │ │ │ │ │ │ │ ┌────┴──────────────┴──────────────┴────┐ │ │ │ Event Bus │ │ │ │ (asyncio Queue pub/sub per client) │ │ │ └────┬──────────────┬──────────────┬────┘ │ │ │ │ │ │ │ ┌────┴─────┐ ┌─────┴─────┐ ┌────┴──────────┐ │ │ │ Call │ │ Hold │ │ Services │ │ │ │ Manager │ │ Slayer │ │ (LLM, STT, │ │ │ │ │ │ │ │ Recording, │ │ │ │ │ │ │ │ Analytics, │ │ │ │ │ │ │ │ Notify) │ │ │ └────┬─────┘ └─────┬─────┘ └──────────────┘ │ │ │ │ │ │ ┌────┴──────────────┴───────────────────┐ │ │ │ Sippy B2BUA Engine │ │ │ │ (SIP calls, DTMF, conference bridge) │ │ │ └────┬──────────────────────────────────┘ │ │ │ │ └───────┼─────────────────────────────────────────────────────────┘ │ ┌────┴────┐ │SIP Trunk│ ──→ PSTN └─────────┘ ``` ## Component Overview ### Presentation Layer | Component | File | Protocol | Purpose | |-----------|------|----------|---------| | REST API | `api/calls.py`, `api/call_flows.py`, `api/devices.py` | HTTP | Call management, CRUD, configuration | | WebSocket | `api/websocket.py` | WS | Real-time event streaming to clients | | MCP Server | `mcp_server/server.py` | SSE | AI assistant tool integration | ### Orchestration Layer | Component | File | Purpose | |-----------|------|---------| | Gateway | `core/gateway.py` | Top-level orchestrator — owns all services, routes calls | | Call Manager | `core/call_manager.py` | Active call state, lifecycle, transcript tracking | | Event Bus | `core/event_bus.py` | Async pub/sub connecting everything together | ### Intelligence Layer | Component | File | Purpose | |-----------|------|---------| | Hold Slayer | `services/hold_slayer.py` | IVR navigation, hold monitoring, human detection | | Audio Classifier | `services/audio_classifier.py` | Real-time waveform analysis (music/speech/DTMF/silence) | | LLM Client | `services/llm_client.py` | OpenAI-compatible LLM for IVR menu decisions | | Transcription | `services/transcription.py` | Speaches/Whisper STT for live audio | | Call Flow Learner | `services/call_flow_learner.py` | Builds reusable IVR trees from exploration data | ### Infrastructure Layer | Component | File | Purpose | |-----------|------|---------| | Sippy Engine | `core/sippy_engine.py` | SIP signaling (INVITE, BYE, REGISTER, DTMF) | | Media Pipeline | `core/media_pipeline.py` | PJSUA2 RTP media handling, conference bridge, recording | | Recording | `services/recording.py` | WAV file management and storage | | Analytics | `services/call_analytics.py` | Call metrics, hold time stats, trends | | Notifications | `services/notification.py` | WebSocket + SMS alerts | | Database | `db/database.py` | SQLAlchemy async (PostgreSQL or SQLite) | ## Data Flow — Hold Slayer Call ``` 1. User Request POST /api/calls/hold-slayer { number, intent, call_flow_id } │ 2. Gateway.make_call() ├── CallManager.create_call() → track state ├── SippyEngine.make_call() → SIP INVITE to trunk └── MediaPipeline.add_stream() → RTP media setup │ 3. HoldSlayer.run_with_flow() or run_exploration() ├── AudioClassifier.classify() → analyze 3s audio windows │ ├── silence? → wait │ ├── ringing? → wait │ ├── DTMF? → detect tones │ ├── music? → HOLD_DETECTED event │ └── speech? → transcribe + decide │ ├── TranscriptionService.transcribe() → STT on speech audio │ ├── LLMClient.analyze_ivr_menu() → pick menu option (fallback) │ └── SippyEngine.send_dtmf() → press the button │ └── detect_hold_to_human_transition() └── HUMAN_DETECTED! → transfer │ 4. Transfer ├── SippyEngine.bridge() → connect call legs ├── MediaPipeline.bridge_streams() → bridge RTP ├── EventBus.publish(TRANSFER_STARTED) └── NotificationService → "Pick up your phone!" │ 5. Real-Time Updates (throughout) EventBus.publish() → WebSocket clients → MCP server resources → Notification service → Analytics tracking ``` ## Threading Model Hold Slayer is primarily single-threaded async (asyncio), with one exception: - **Main thread**: FastAPI + all async services (event bus, hold slayer, classifier, etc.) - **Sippy thread**: Sippy B2BUA runs its own event loop in a dedicated daemon thread. The `SippyEngine` bridges async↔sync via `asyncio.run_in_executor()`. - **PJSUA2**: Runs in the main thread using null audio device (no sound card needed — headless server mode). ``` Main Thread (asyncio) ├── FastAPI (uvicorn) ├── EventBus ├── CallManager ├── HoldSlayer ├── AudioClassifier ├── TranscriptionService ├── LLMClient ├── MediaPipeline (PJSUA2) ├── NotificationService └── RecordingService Sippy Thread (daemon) └── Sippy B2BUA event loop ├── SIP signaling ├── DTMF relay └── Call leg management ``` ## Design Decisions ### Why Sippy B2BUA + PJSUA2? We split SIP signaling and media handling into two separate libraries: - **Sippy B2BUA** handles SIP signaling (INVITE, BYE, REGISTER, re-INVITE, DTMF relay). It's battle-tested for telephony and handles the complex SIP state machine. - **PJSUA2** handles RTP media (audio streams, conference bridge, recording, tone generation). It provides a clean C++/Python API for media manipulation without needing to deal with raw RTP. This split lets us tap into the audio stream (for classification and STT) without interfering with SIP signaling, and bridge calls through a conference bridge for clean transfer. ### Why asyncio Queue-based EventBus? - **Single process** — no need for Redis/RabbitMQ cross-process messaging - **Zero dependencies** — pure asyncio, no external services to deploy - **Per-subscriber queues** — slow consumers don't block fast publishers - **Dead subscriber cleanup** — full queues are automatically removed - **Event history** — late joiners can catch up on recent events If scaling to multiple gateway processes becomes necessary, the EventBus interface can be backed by Redis pub/sub without changing consumers. ### Why OpenAI-compatible LLM API? The LLM client uses raw HTTP (httpx) against any OpenAI-compatible endpoint. This means: - **Ollama** (local, free) — `http://localhost:11434/v1` - **LM Studio** (local, free) — `http://localhost:1234/v1` - **vLLM** (local, fast) — `http://localhost:8000/v1` - **OpenAI** (cloud) — `https://api.openai.com/v1` No SDK dependency. No vendor lock-in. Switch models by changing one env var.