stentor/docs/api-reference.md

# Stentor Gateway API Reference

> Version 0.1.0

## Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/` | GET | Dashboard (Bootstrap UI) |
| `/api/v1/realtime` | WebSocket | Real-time audio conversation |
| `/api/v1/info` | GET | Gateway information and configuration |
| `/api/live/` | GET | Liveness probe (Kubernetes) |
| `/api/ready/` | GET | Readiness probe (Kubernetes) |
| `/api/metrics` | GET | Prometheus-compatible metrics |
| `/api/docs` | GET | Interactive API documentation (Swagger UI) |
| `/api/openapi.json` | GET | OpenAPI schema |

---

## WebSocket: `/api/v1/realtime`

Real-time voice conversation endpoint. Protocol inspired by the OpenAI Realtime API.

### Connection

```
ws://{host}:{port}/api/v1/realtime
```

### Client Events

#### `session.start`

Initiates a new conversation session. Must be sent first.

```json
{
  "type": "session.start",
  "client_id": "esp32-kitchen",
  "audio_config": {
    "sample_rate": 16000,
    "channels": 1,
    "sample_width": 16,
    "encoding": "pcm_s16le"
  }
}
```

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `type` | string | ✔ | Must be `"session.start"` |
| `client_id` | string | | Client identifier for tracking |
| `audio_config` | object | | Audio format configuration |

#### `input_audio_buffer.append`

Sends a chunk of audio data. Stream continuously while user is speaking.

```json
{
  "type": "input_audio_buffer.append",
  "audio": "<base64-encoded PCM audio>"
}
```

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `type` | string | ✔ | Must be `"input_audio_buffer.append"` |
| `audio` | string | ✔ | Base64-encoded PCM S16LE audio |

#### `input_audio_buffer.commit`

Signals end of speech. Triggers the STT → Agent → TTS pipeline.

```json
{
  "type": "input_audio_buffer.commit"
}
```

#### `session.close`

Requests session termination. The WebSocket connection will close.

```json
{
  "type": "session.close"
}
```

### Server Events

#### `session.created`

Acknowledges session creation.

```json
{
  "type": "session.created",
  "session_id": "550e8400-e29b-41d4-a716-446655440000"
}
```

#### `status`

Processing status update. Use for LED feedback on ESP32.

```json
{
  "type": "status",
  "state": "listening"
}
```

| State | Description | Suggested LED |
|-------|-------------|--------------|
| `listening` | Ready for audio input | Green |
| `transcribing` | Running STT | Yellow |
| `thinking` | Waiting for agent response | Yellow |
| `speaking` | Playing TTS audio | Cyan |

#### `transcript.done`

Transcript of what the user said.

```json
{
  "type": "transcript.done",
  "text": "What is the weather like today?"
}
```

#### `response.text.done`

AI agent's response text.

```json
{
  "type": "response.text.done",
  "text": "I don't have weather tools yet, but I can help with other things."
}
```

#### `response.audio.delta`

Streamed audio response chunk.

```json
{
  "type": "response.audio.delta",
  "delta": "<base64-encoded PCM audio>"
}
```

#### `response.audio.done`

Audio response streaming complete.

```json
{
  "type": "response.audio.done"
}
```

#### `response.done`

Full response cycle complete. Gateway returns to listening state.

```json
{
  "type": "response.done"
}
```

#### `error`

Error event.

```json
{
  "type": "error",
  "message": "STT service unavailable",
  "code": "stt_error"
}
```

| Code | Description |
|------|-------------|
| `invalid_json` | Client sent malformed JSON |
| `validation_error` | Message failed schema validation |
| `no_session` | Action requires an active session |
| `empty_buffer` | Audio buffer was empty on commit |
| `empty_transcript` | STT returned no speech |
| `empty_response` | Agent returned empty response |
| `pipeline_error` | Internal pipeline failure |
| `unknown_event` | Unrecognized event type |
| `internal_error` | Unexpected server error |

---

## REST: `/api/v1/info`

Returns gateway information and current configuration.

**Response:**

```json
{
  "name": "stentor-gateway",
  "version": "0.1.0",
  "endpoints": {
    "realtime": "/api/v1/realtime",
    "live": "/api/live/",
    "ready": "/api/ready/",
    "metrics": "/api/metrics"
  },
  "config": {
    "stt_url": "http://perseus.incus:8000",
    "tts_url": "http://pan.incus:8000",
    "agent_url": "http://localhost:8001",
    "stt_model": "Systran/faster-whisper-small",
    "tts_model": "kokoro",
    "tts_voice": "af_heart",
    "audio_sample_rate": 16000,
    "audio_channels": 1,
    "audio_sample_width": 16
  }
}
```

---

## REST: `/api/live/`

Kubernetes liveness probe.

**Response (200):**

```json
{
  "status": "ok"
}
```

---

## REST: `/api/ready/`

Kubernetes readiness probe. Checks connectivity to STT, TTS, and Agent services.

**Response (200 — all services reachable):**

```json
{
  "status": "ready",
  "checks": {
    "stt": true,
    "tts": true,
    "agent": true
  }
}
```

**Response (503 — one or more services unavailable):**

```json
{
  "status": "not_ready",
  "checks": {
    "stt": true,
    "tts": false,
    "agent": true
  }
}
```

---

## REST: `/api/metrics`

Prometheus-compatible metrics in text exposition format.

**Metrics exported:**

| Metric | Type | Description |
|--------|------|-------------|
| `stentor_sessions_active` | Gauge | Current active WebSocket sessions |
| `stentor_transcriptions_total` | Counter | Total STT transcription calls |
| `stentor_tts_requests_total` | Counter | Total TTS synthesis calls |
| `stentor_agent_requests_total` | Counter | Total agent message calls |
| `stentor_pipeline_duration_seconds` | Histogram | Full pipeline latency |
| `stentor_stt_duration_seconds` | Histogram | STT transcription latency |
| `stentor_tts_duration_seconds` | Histogram | TTS synthesis latency |
| `stentor_agent_duration_seconds` | Histogram | Agent response latency |

---

## Configuration

All configuration via environment variables (12-factor):

| Variable | Description | Default |
|----------|-------------|---------|
| `STENTOR_HOST` | Gateway bind address | `0.0.0.0` |
| `STENTOR_PORT` | Gateway bind port | `8600` |
| `STENTOR_STT_URL` | Speaches STT endpoint | `http://perseus.incus:8000` |
| `STENTOR_TTS_URL` | Speaches TTS endpoint | `http://pan.incus:8000` |
| `STENTOR_AGENT_URL` | FastAgent HTTP endpoint | `http://localhost:8001` |
| `STENTOR_STT_MODEL` | Whisper model for STT | `Systran/faster-whisper-small` |
| `STENTOR_TTS_MODEL` | TTS model name | `kokoro` |
| `STENTOR_TTS_VOICE` | TTS voice ID | `af_heart` |
| `STENTOR_AUDIO_SAMPLE_RATE` | Audio sample rate in Hz | `16000` |
| `STENTOR_AUDIO_CHANNELS` | Audio channel count | `1` |
| `STENTOR_AUDIO_SAMPLE_WIDTH` | Bits per sample | `16` |
| `STENTOR_LOG_LEVEL` | Logging level | `INFO` |