# Stentor Gateway API Reference > Version 0.1.0 ## Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/` | GET | Dashboard (Bootstrap UI) | | `/api/v1/realtime` | WebSocket | Real-time audio conversation | | `/api/v1/info` | GET | Gateway information and configuration | | `/api/live/` | GET | Liveness probe (Kubernetes) | | `/api/ready/` | GET | Readiness probe (Kubernetes) | | `/api/metrics` | GET | Prometheus-compatible metrics | | `/api/docs` | GET | Interactive API documentation (Swagger UI) | | `/api/openapi.json` | GET | OpenAPI schema | --- ## WebSocket: `/api/v1/realtime` Real-time voice conversation endpoint. Protocol inspired by the OpenAI Realtime API. ### Connection ``` ws://{host}:{port}/api/v1/realtime ``` ### Client Events #### `session.start` Initiates a new conversation session. Must be sent first. ```json { "type": "session.start", "client_id": "esp32-kitchen", "audio_config": { "sample_rate": 16000, "channels": 1, "sample_width": 16, "encoding": "pcm_s16le" } } ``` | Field | Type | Required | Description | |-------|------|----------|-------------| | `type` | string | ✔ | Must be `"session.start"` | | `client_id` | string | | Client identifier for tracking | | `audio_config` | object | | Audio format configuration | #### `input_audio_buffer.append` Sends a chunk of audio data. Stream continuously while user is speaking. ```json { "type": "input_audio_buffer.append", "audio": "" } ``` | Field | Type | Required | Description | |-------|------|----------|-------------| | `type` | string | ✔ | Must be `"input_audio_buffer.append"` | | `audio` | string | ✔ | Base64-encoded PCM S16LE audio | #### `input_audio_buffer.commit` Signals end of speech. Triggers the STT → Agent → TTS pipeline. ```json { "type": "input_audio_buffer.commit" } ``` #### `session.close` Requests session termination. The WebSocket connection will close. ```json { "type": "session.close" } ``` ### Server Events #### `session.created` Acknowledges session creation. ```json { "type": "session.created", "session_id": "550e8400-e29b-41d4-a716-446655440000" } ``` #### `status` Processing status update. Use for LED feedback on ESP32. ```json { "type": "status", "state": "listening" } ``` | State | Description | Suggested LED | |-------|-------------|--------------| | `listening` | Ready for audio input | Green | | `transcribing` | Running STT | Yellow | | `thinking` | Waiting for agent response | Yellow | | `speaking` | Playing TTS audio | Cyan | #### `transcript.done` Transcript of what the user said. ```json { "type": "transcript.done", "text": "What is the weather like today?" } ``` #### `response.text.done` AI agent's response text. ```json { "type": "response.text.done", "text": "I don't have weather tools yet, but I can help with other things." } ``` #### `response.audio.delta` Streamed audio response chunk. ```json { "type": "response.audio.delta", "delta": "" } ``` #### `response.audio.done` Audio response streaming complete. ```json { "type": "response.audio.done" } ``` #### `response.done` Full response cycle complete. Gateway returns to listening state. ```json { "type": "response.done" } ``` #### `error` Error event. ```json { "type": "error", "message": "STT service unavailable", "code": "stt_error" } ``` | Code | Description | |------|-------------| | `invalid_json` | Client sent malformed JSON | | `validation_error` | Message failed schema validation | | `no_session` | Action requires an active session | | `empty_buffer` | Audio buffer was empty on commit | | `empty_transcript` | STT returned no speech | | `empty_response` | Agent returned empty response | | `pipeline_error` | Internal pipeline failure | | `unknown_event` | Unrecognized event type | | `internal_error` | Unexpected server error | --- ## REST: `/api/v1/info` Returns gateway information and current configuration. **Response:** ```json { "name": "stentor-gateway", "version": "0.1.0", "endpoints": { "realtime": "/api/v1/realtime", "live": "/api/live/", "ready": "/api/ready/", "metrics": "/api/metrics" }, "config": { "stt_url": "http://perseus.incus:8000", "tts_url": "http://pan.incus:8000", "agent_url": "http://localhost:8001", "stt_model": "Systran/faster-whisper-small", "tts_model": "kokoro", "tts_voice": "af_heart", "audio_sample_rate": 16000, "audio_channels": 1, "audio_sample_width": 16 } } ``` --- ## REST: `/api/live/` Kubernetes liveness probe. **Response (200):** ```json { "status": "ok" } ``` --- ## REST: `/api/ready/` Kubernetes readiness probe. Checks connectivity to STT, TTS, and Agent services. **Response (200 — all services reachable):** ```json { "status": "ready", "checks": { "stt": true, "tts": true, "agent": true } } ``` **Response (503 — one or more services unavailable):** ```json { "status": "not_ready", "checks": { "stt": true, "tts": false, "agent": true } } ``` --- ## REST: `/api/metrics` Prometheus-compatible metrics in text exposition format. **Metrics exported:** | Metric | Type | Description | |--------|------|-------------| | `stentor_sessions_active` | Gauge | Current active WebSocket sessions | | `stentor_transcriptions_total` | Counter | Total STT transcription calls | | `stentor_tts_requests_total` | Counter | Total TTS synthesis calls | | `stentor_agent_requests_total` | Counter | Total agent message calls | | `stentor_pipeline_duration_seconds` | Histogram | Full pipeline latency | | `stentor_stt_duration_seconds` | Histogram | STT transcription latency | | `stentor_tts_duration_seconds` | Histogram | TTS synthesis latency | | `stentor_agent_duration_seconds` | Histogram | Agent response latency | --- ## Configuration All configuration via environment variables (12-factor): | Variable | Description | Default | |----------|-------------|---------| | `STENTOR_HOST` | Gateway bind address | `0.0.0.0` | | `STENTOR_PORT` | Gateway bind port | `8600` | | `STENTOR_STT_URL` | Speaches STT endpoint | `http://perseus.incus:8000` | | `STENTOR_TTS_URL` | Speaches TTS endpoint | `http://pan.incus:8000` | | `STENTOR_AGENT_URL` | FastAgent HTTP endpoint | `http://localhost:8001` | | `STENTOR_STT_MODEL` | Whisper model for STT | `Systran/faster-whisper-small` | | `STENTOR_TTS_MODEL` | TTS model name | `kokoro` | | `STENTOR_TTS_VOICE` | TTS voice ID | `af_heart` | | `STENTOR_AUDIO_SAMPLE_RATE` | Audio sample rate in Hz | `16000` | | `STENTOR_AUDIO_CHANNELS` | Audio channel count | `1` | | `STENTOR_AUDIO_SAMPLE_WIDTH` | Bits per sample | `16` | | `STENTOR_LOG_LEVEL` | Logging level | `INFO` |