feat: scaffold stentor-gateway with FastAPI voice pipeline
Initialize the stentor-gateway project with WebSocket-based voice pipeline orchestrating STT → Agent → TTS via OpenAI-compatible APIs. - Add FastAPI app with WebSocket endpoint for audio streaming - Add pipeline orchestration (stt_client, tts_client, agent_client) - Add Pydantic Settings configuration and message models - Add audio utilities for PCM/WAV conversion and resampling - Add health check endpoints - Add Dockerfile and pyproject.toml with dependencies - Add initial test suite (pipeline, STT, TTS, WebSocket) - Add comprehensive README covering gateway and ESP32 ear design - Clean up .gitignore for Python/uv project
This commit is contained in:
315
docs/api-reference.md
Normal file
315
docs/api-reference.md
Normal file
@@ -0,0 +1,315 @@
|
||||
# Stentor Gateway API Reference
|
||||
|
||||
> Version 0.1.0
|
||||
|
||||
## Endpoints
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
|----------|--------|-------------|
|
||||
| `/` | GET | Dashboard (Bootstrap UI) |
|
||||
| `/api/v1/realtime` | WebSocket | Real-time audio conversation |
|
||||
| `/api/v1/info` | GET | Gateway information and configuration |
|
||||
| `/api/live/` | GET | Liveness probe (Kubernetes) |
|
||||
| `/api/ready/` | GET | Readiness probe (Kubernetes) |
|
||||
| `/api/metrics` | GET | Prometheus-compatible metrics |
|
||||
| `/api/docs` | GET | Interactive API documentation (Swagger UI) |
|
||||
| `/api/openapi.json` | GET | OpenAPI schema |
|
||||
|
||||
---
|
||||
|
||||
## WebSocket: `/api/v1/realtime`
|
||||
|
||||
Real-time voice conversation endpoint. Protocol inspired by the OpenAI Realtime API.
|
||||
|
||||
### Connection
|
||||
|
||||
```
|
||||
ws://{host}:{port}/api/v1/realtime
|
||||
```
|
||||
|
||||
### Client Events
|
||||
|
||||
#### `session.start`
|
||||
|
||||
Initiates a new conversation session. Must be sent first.
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "session.start",
|
||||
"client_id": "esp32-kitchen",
|
||||
"audio_config": {
|
||||
"sample_rate": 16000,
|
||||
"channels": 1,
|
||||
"sample_width": 16,
|
||||
"encoding": "pcm_s16le"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `type` | string | ✔ | Must be `"session.start"` |
|
||||
| `client_id` | string | | Client identifier for tracking |
|
||||
| `audio_config` | object | | Audio format configuration |
|
||||
|
||||
#### `input_audio_buffer.append`
|
||||
|
||||
Sends a chunk of audio data. Stream continuously while user is speaking.
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "input_audio_buffer.append",
|
||||
"audio": "<base64-encoded PCM audio>"
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `type` | string | ✔ | Must be `"input_audio_buffer.append"` |
|
||||
| `audio` | string | ✔ | Base64-encoded PCM S16LE audio |
|
||||
|
||||
#### `input_audio_buffer.commit`
|
||||
|
||||
Signals end of speech. Triggers the STT → Agent → TTS pipeline.
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "input_audio_buffer.commit"
|
||||
}
|
||||
```
|
||||
|
||||
#### `session.close`
|
||||
|
||||
Requests session termination. The WebSocket connection will close.
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "session.close"
|
||||
}
|
||||
```
|
||||
|
||||
### Server Events
|
||||
|
||||
#### `session.created`
|
||||
|
||||
Acknowledges session creation.
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "session.created",
|
||||
"session_id": "550e8400-e29b-41d4-a716-446655440000"
|
||||
}
|
||||
```
|
||||
|
||||
#### `status`
|
||||
|
||||
Processing status update. Use for LED feedback on ESP32.
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "status",
|
||||
"state": "listening"
|
||||
}
|
||||
```
|
||||
|
||||
| State | Description | Suggested LED |
|
||||
|-------|-------------|--------------|
|
||||
| `listening` | Ready for audio input | Green |
|
||||
| `transcribing` | Running STT | Yellow |
|
||||
| `thinking` | Waiting for agent response | Yellow |
|
||||
| `speaking` | Playing TTS audio | Cyan |
|
||||
|
||||
#### `transcript.done`
|
||||
|
||||
Transcript of what the user said.
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "transcript.done",
|
||||
"text": "What is the weather like today?"
|
||||
}
|
||||
```
|
||||
|
||||
#### `response.text.done`
|
||||
|
||||
AI agent's response text.
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "response.text.done",
|
||||
"text": "I don't have weather tools yet, but I can help with other things."
|
||||
}
|
||||
```
|
||||
|
||||
#### `response.audio.delta`
|
||||
|
||||
Streamed audio response chunk.
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "response.audio.delta",
|
||||
"delta": "<base64-encoded PCM audio>"
|
||||
}
|
||||
```
|
||||
|
||||
#### `response.audio.done`
|
||||
|
||||
Audio response streaming complete.
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "response.audio.done"
|
||||
}
|
||||
```
|
||||
|
||||
#### `response.done`
|
||||
|
||||
Full response cycle complete. Gateway returns to listening state.
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "response.done"
|
||||
}
|
||||
```
|
||||
|
||||
#### `error`
|
||||
|
||||
Error event.
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "error",
|
||||
"message": "STT service unavailable",
|
||||
"code": "stt_error"
|
||||
}
|
||||
```
|
||||
|
||||
| Code | Description |
|
||||
|------|-------------|
|
||||
| `invalid_json` | Client sent malformed JSON |
|
||||
| `validation_error` | Message failed schema validation |
|
||||
| `no_session` | Action requires an active session |
|
||||
| `empty_buffer` | Audio buffer was empty on commit |
|
||||
| `empty_transcript` | STT returned no speech |
|
||||
| `empty_response` | Agent returned empty response |
|
||||
| `pipeline_error` | Internal pipeline failure |
|
||||
| `unknown_event` | Unrecognized event type |
|
||||
| `internal_error` | Unexpected server error |
|
||||
|
||||
---
|
||||
|
||||
## REST: `/api/v1/info`
|
||||
|
||||
Returns gateway information and current configuration.
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "stentor-gateway",
|
||||
"version": "0.1.0",
|
||||
"endpoints": {
|
||||
"realtime": "/api/v1/realtime",
|
||||
"live": "/api/live/",
|
||||
"ready": "/api/ready/",
|
||||
"metrics": "/api/metrics"
|
||||
},
|
||||
"config": {
|
||||
"stt_url": "http://perseus.incus:8000",
|
||||
"tts_url": "http://pan.incus:8000",
|
||||
"agent_url": "http://localhost:8001",
|
||||
"stt_model": "Systran/faster-whisper-small",
|
||||
"tts_model": "kokoro",
|
||||
"tts_voice": "af_heart",
|
||||
"audio_sample_rate": 16000,
|
||||
"audio_channels": 1,
|
||||
"audio_sample_width": 16
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## REST: `/api/live/`
|
||||
|
||||
Kubernetes liveness probe.
|
||||
|
||||
**Response (200):**
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "ok"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## REST: `/api/ready/`
|
||||
|
||||
Kubernetes readiness probe. Checks connectivity to STT, TTS, and Agent services.
|
||||
|
||||
**Response (200 — all services reachable):**
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "ready",
|
||||
"checks": {
|
||||
"stt": true,
|
||||
"tts": true,
|
||||
"agent": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Response (503 — one or more services unavailable):**
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "not_ready",
|
||||
"checks": {
|
||||
"stt": true,
|
||||
"tts": false,
|
||||
"agent": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## REST: `/api/metrics`
|
||||
|
||||
Prometheus-compatible metrics in text exposition format.
|
||||
|
||||
**Metrics exported:**
|
||||
|
||||
| Metric | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| `stentor_sessions_active` | Gauge | Current active WebSocket sessions |
|
||||
| `stentor_transcriptions_total` | Counter | Total STT transcription calls |
|
||||
| `stentor_tts_requests_total` | Counter | Total TTS synthesis calls |
|
||||
| `stentor_agent_requests_total` | Counter | Total agent message calls |
|
||||
| `stentor_pipeline_duration_seconds` | Histogram | Full pipeline latency |
|
||||
| `stentor_stt_duration_seconds` | Histogram | STT transcription latency |
|
||||
| `stentor_tts_duration_seconds` | Histogram | TTS synthesis latency |
|
||||
| `stentor_agent_duration_seconds` | Histogram | Agent response latency |
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
All configuration via environment variables (12-factor):
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `STENTOR_HOST` | Gateway bind address | `0.0.0.0` |
|
||||
| `STENTOR_PORT` | Gateway bind port | `8600` |
|
||||
| `STENTOR_STT_URL` | Speaches STT endpoint | `http://perseus.incus:8000` |
|
||||
| `STENTOR_TTS_URL` | Speaches TTS endpoint | `http://pan.incus:8000` |
|
||||
| `STENTOR_AGENT_URL` | FastAgent HTTP endpoint | `http://localhost:8001` |
|
||||
| `STENTOR_STT_MODEL` | Whisper model for STT | `Systran/faster-whisper-small` |
|
||||
| `STENTOR_TTS_MODEL` | TTS model name | `kokoro` |
|
||||
| `STENTOR_TTS_VOICE` | TTS voice ID | `af_heart` |
|
||||
| `STENTOR_AUDIO_SAMPLE_RATE` | Audio sample rate in Hz | `16000` |
|
||||
| `STENTOR_AUDIO_CHANNELS` | Audio channel count | `1` |
|
||||
| `STENTOR_AUDIO_SAMPLE_WIDTH` | Bits per sample | `16` |
|
||||
| `STENTOR_LOG_LEVEL` | Logging level | `INFO` |
|
||||
Reference in New Issue
Block a user