hold-slayer/docs/services.md

# Services

The intelligence layer services that power Hold Slayer's decision-making, transcription, recording, analytics, and notifications.

## LLM Client (`services/llm_client.py`)

Async HTTP client for any OpenAI-compatible chat completion API. No SDK dependency — just httpx.

### Supported Backends

| Backend | URL | Notes |
|---------|-----|-------|
| Ollama | `http://localhost:11434/v1` | Local, free, good for dev |
| LM Studio | `http://localhost:1234/v1` | Local, free, GUI |
| vLLM | `http://localhost:8000/v1` | Local, fast, production |
| OpenAI | `https://api.openai.com/v1` | Cloud, paid, best quality |

### Usage

```python
client = LLMClient(
    base_url="http://localhost:11434/v1",
    model="llama3",
    api_key="not-needed",  # Ollama doesn't need a key
    timeout=30.0,
    max_tokens=1024,
    temperature=0.3,
)

# Simple chat
response = await client.chat("What is 2+2?")
# "4"

# Chat with system prompt
response = await client.chat(
    "Parse this menu transcript...",
    system="You are a phone menu parser. Return JSON.",
)

# Structured JSON response (auto-parses)
result = await client.chat_json(
    "Extract menu options from: Press 1 for billing, press 2 for support",
    system="Return JSON with 'options' array.",
)
# {"options": [{"digit": "1", "label": "billing"}, {"digit": "2", "label": "support"}]}
```

### IVR Menu Analysis

The primary use case — analyzing IVR transcripts to pick the right menu option:

```python
decision = await client.analyze_ivr_menu(
    transcript="Welcome to Chase Bank. Press 1 for account balance, press 2 for recent transactions, press 3 for disputes, press 0 for an agent.",
    intent="dispute a charge from Amazon on December 15th",
    previous_selections=["main_menu"],
)
# {"action": "dtmf", "digits": "3", "reasoning": "Disputes is the correct department"}
```

### JSON Extraction

The client handles messy LLM output gracefully:

1. Try `json.loads()` on the raw response
2. If that fails, look for ```json ... ``` markdown blocks
3. If that fails, look for `{...}` patterns in the text
4. If all fail, return empty dict (caller handles gracefully)

### Stats Tracking

```python
stats = client.stats
# {
#     "total_requests": 47,
#     "total_errors": 2,
#     "avg_latency_ms": 234.5,
#     "model": "llama3",
#     "base_url": "http://localhost:11434/v1"
# }
```

### Error Handling

- HTTP errors return empty string/dict (never crashes the call)
- Timeouts are configurable (default 30s)
- All errors are logged with full context
- Stats track error rates for monitoring

## Transcription Service (`services/transcription.py`)

Real-time speech-to-text using Speaches (a self-hosted Whisper API).

### Architecture

```
Audio frames (from AudioTap)
  │
  └── POST /v1/audio/transcriptions
      ├── model: whisper-large-v3
      ├── audio: WAV bytes
      └── language: en
          │
          └── Response: { "text": "Press 1 for billing..." }
```

### Usage

```python
service = TranscriptionService(
    speaches_url="http://perseus.helu.ca:22070",
    model="whisper-large-v3",
)

# Transcribe audio bytes
text = await service.transcribe(audio_bytes)
# "Welcome to Chase Bank. For English, press 1."

# Transcribe with language hint
text = await service.transcribe(audio_bytes, language="fr")
```

### Integration with Hold Slayer

The transcription service is called when the audio classifier detects speech (IVR_PROMPT or LIVE_HUMAN). The transcript is then:

1. Published as a `TRANSCRIPT_CHUNK` event (→ WebSocket clients)
2. Fed to the LLM for IVR menu analysis
3. Stored in the call's transcript history
4. Used by the Call Flow Learner to build reusable flows

## Recording Service (`services/recording.py`)

Manages call recordings via the PJSUA2 media pipeline.

### Storage Structure

```
recordings/
├── 2026/
│   ├── 01/
│   │   ├── 15/
│   │   │   ├── call_abc123_outbound.wav
│   │   │   ├── call_abc123_mixed.wav
│   │   │   └── call_def456_outbound.wav
│   │   └── 16/
│   │       └── ...
│   └── 02/
│       └── ...
```

### Recording Types

| Type | Description |
|------|-------------|
| **Outbound** | Audio from the company (IVR, hold music, agent) |
| **Inbound** | Audio from the user's device (after transfer) |
| **Mixed** | Both parties in one file (for review) |

### Usage

```python
service = RecordingService(
    storage_dir="recordings",
    max_recording_seconds=7200,  # 2 hours
    sample_rate=16000,
)

# Start recording
session = await service.start_recording(call_id, stream_id)
# session.path = "recordings/2026/01/15/call_abc123_outbound.wav"

# Stop recording
metadata = await service.stop_recording(call_id)
# metadata = { "duration": 847.3, "file_size": 27113600, "path": "..." }

# List recordings for a call
recordings = service.get_recordings(call_id)
```

## Call Analytics (`services/call_analytics.py`)

Tracks call metrics and provides insights for monitoring and optimization.

### Metrics Tracked

| Metric | Description |
|--------|-------------|
| Hold time | Duration spent on hold per call |
| Total call duration | End-to-end call time |
| Success rate | Percentage of calls that reached a human |
| IVR navigation time | Time spent navigating menus |
| Company patterns | Per-company hold time averages |
| Time-of-day trends | When hold times are shortest |

### Usage

```python
analytics = CallAnalytics(max_history=10000)

# Record a completed call
analytics.record_call(
    call_id="call_abc123",
    number="+18005551234",
    company="Chase Bank",
    hold_time=780,
    total_duration=847,
    success=True,
    ivr_steps=6,
)

# Get summary
summary = analytics.get_summary()
# {
#     "total_calls": 142,
#     "success_rate": 0.89,
#     "avg_hold_time": 623.4,
#     "avg_total_duration": 712.1,
# }

# Per-company stats
stats = analytics.get_company_stats("Chase Bank")
# {
#     "total_calls": 23,
#     "avg_hold_time": 845.2,
#     "best_time": "Tuesday 10:00 AM",
#     "success_rate": 0.91,
# }

# Top numbers by call volume
top = analytics.get_top_numbers(limit=10)

# Hold time trends by hour
trends = analytics.get_hold_time_trend()
# [{"hour": 9, "avg_hold": 320}, {"hour": 10, "avg_hold": 480}, ...]
```

## Notification Service (`services/notification.py`)

Sends alerts when important things happen on calls.

### Notification Channels

| Channel | Status | Use Case |
|---------|--------|----------|
| **WebSocket** | ✅ Active | Real-time UI updates (always on) |
| **SMS** | ✅ Active | Critical alerts (human detected, call failed) |
| **Push** | 🔮 Future | Mobile app notifications |

### Notification Priority

| Priority | Events | Delivery |
|----------|--------|----------|
| `CRITICAL` | Human detected, transfer started | WebSocket + SMS |
| `HIGH` | Call failed, call timeout | WebSocket + SMS |
| `NORMAL` | Hold detected, call ended | WebSocket only |
| `LOW` | IVR step, DTMF sent | WebSocket only |

### Event → Notification Mapping

| Event | Notification |
|-------|-------------|
| `HUMAN_DETECTED` | 🚨 "A live person picked up — transferring you now!" |
| `TRANSFER_STARTED` | 📞 "Your call has been connected. Pick up your phone!" |
| `CALL_FAILED` | ❌ "The call couldn't be completed." |
| `HOLD_DETECTED` | ⏳ "You're on hold. We'll notify you when someone picks up." |
| `IVR_STEP` | 📍 "Navigating phone menu..." |
| `IVR_DTMF_SENT` | 📱 "Pressed 3" |
| `CALL_ENDED` | 📴 "The call has ended." |

### Deduplication

The notification service tracks what's been sent per call to avoid spamming:

```python
# Won't send duplicate "on hold" notifications for the same call
self._notified: dict[str, set[str]]  # call_id → set of event dedup keys
```

Tracking is cleaned up when a call ends.

### SMS Configuration

SMS is sent for `CRITICAL` priority notifications when `NOTIFY_SMS_NUMBER` is configured:

```env
NOTIFY_SMS_NUMBER=+15559876543
```

The SMS sender is a placeholder — wire up your preferred provider (Twilio, AWS SNS, etc.).