Files
stentor/docs/api-reference.md
Robert Helewka 912593b796 feat: scaffold stentor-gateway with FastAPI voice pipeline
Initialize the stentor-gateway project with WebSocket-based voice
pipeline orchestrating STT → Agent → TTS via OpenAI-compatible APIs.

- Add FastAPI app with WebSocket endpoint for audio streaming
- Add pipeline orchestration (stt_client, tts_client, agent_client)
- Add Pydantic Settings configuration and message models
- Add audio utilities for PCM/WAV conversion and resampling
- Add health check endpoints
- Add Dockerfile and pyproject.toml with dependencies
- Add initial test suite (pipeline, STT, TTS, WebSocket)
- Add comprehensive README covering gateway and ESP32 ear design
- Clean up .gitignore for Python/uv project
2026-03-21 19:11:48 +00:00

6.7 KiB

Stentor Gateway API Reference

Version 0.1.0

Endpoints

Endpoint Method Description
/ GET Dashboard (Bootstrap UI)
/api/v1/realtime WebSocket Real-time audio conversation
/api/v1/info GET Gateway information and configuration
/api/live/ GET Liveness probe (Kubernetes)
/api/ready/ GET Readiness probe (Kubernetes)
/api/metrics GET Prometheus-compatible metrics
/api/docs GET Interactive API documentation (Swagger UI)
/api/openapi.json GET OpenAPI schema

WebSocket: /api/v1/realtime

Real-time voice conversation endpoint. Protocol inspired by the OpenAI Realtime API.

Connection

ws://{host}:{port}/api/v1/realtime

Client Events

session.start

Initiates a new conversation session. Must be sent first.

{
  "type": "session.start",
  "client_id": "esp32-kitchen",
  "audio_config": {
    "sample_rate": 16000,
    "channels": 1,
    "sample_width": 16,
    "encoding": "pcm_s16le"
  }
}
Field Type Required Description
type string Must be "session.start"
client_id string Client identifier for tracking
audio_config object Audio format configuration

input_audio_buffer.append

Sends a chunk of audio data. Stream continuously while user is speaking.

{
  "type": "input_audio_buffer.append",
  "audio": "<base64-encoded PCM audio>"
}
Field Type Required Description
type string Must be "input_audio_buffer.append"
audio string Base64-encoded PCM S16LE audio

input_audio_buffer.commit

Signals end of speech. Triggers the STT → Agent → TTS pipeline.

{
  "type": "input_audio_buffer.commit"
}

session.close

Requests session termination. The WebSocket connection will close.

{
  "type": "session.close"
}

Server Events

session.created

Acknowledges session creation.

{
  "type": "session.created",
  "session_id": "550e8400-e29b-41d4-a716-446655440000"
}

status

Processing status update. Use for LED feedback on ESP32.

{
  "type": "status",
  "state": "listening"
}
State Description Suggested LED
listening Ready for audio input Green
transcribing Running STT Yellow
thinking Waiting for agent response Yellow
speaking Playing TTS audio Cyan

transcript.done

Transcript of what the user said.

{
  "type": "transcript.done",
  "text": "What is the weather like today?"
}

response.text.done

AI agent's response text.

{
  "type": "response.text.done",
  "text": "I don't have weather tools yet, but I can help with other things."
}

response.audio.delta

Streamed audio response chunk.

{
  "type": "response.audio.delta",
  "delta": "<base64-encoded PCM audio>"
}

response.audio.done

Audio response streaming complete.

{
  "type": "response.audio.done"
}

response.done

Full response cycle complete. Gateway returns to listening state.

{
  "type": "response.done"
}

error

Error event.

{
  "type": "error",
  "message": "STT service unavailable",
  "code": "stt_error"
}
Code Description
invalid_json Client sent malformed JSON
validation_error Message failed schema validation
no_session Action requires an active session
empty_buffer Audio buffer was empty on commit
empty_transcript STT returned no speech
empty_response Agent returned empty response
pipeline_error Internal pipeline failure
unknown_event Unrecognized event type
internal_error Unexpected server error

REST: /api/v1/info

Returns gateway information and current configuration.

Response:

{
  "name": "stentor-gateway",
  "version": "0.1.0",
  "endpoints": {
    "realtime": "/api/v1/realtime",
    "live": "/api/live/",
    "ready": "/api/ready/",
    "metrics": "/api/metrics"
  },
  "config": {
    "stt_url": "http://perseus.incus:8000",
    "tts_url": "http://pan.incus:8000",
    "agent_url": "http://localhost:8001",
    "stt_model": "Systran/faster-whisper-small",
    "tts_model": "kokoro",
    "tts_voice": "af_heart",
    "audio_sample_rate": 16000,
    "audio_channels": 1,
    "audio_sample_width": 16
  }
}

REST: /api/live/

Kubernetes liveness probe.

Response (200):

{
  "status": "ok"
}

REST: /api/ready/

Kubernetes readiness probe. Checks connectivity to STT, TTS, and Agent services.

Response (200 — all services reachable):

{
  "status": "ready",
  "checks": {
    "stt": true,
    "tts": true,
    "agent": true
  }
}

Response (503 — one or more services unavailable):

{
  "status": "not_ready",
  "checks": {
    "stt": true,
    "tts": false,
    "agent": true
  }
}

REST: /api/metrics

Prometheus-compatible metrics in text exposition format.

Metrics exported:

Metric Type Description
stentor_sessions_active Gauge Current active WebSocket sessions
stentor_transcriptions_total Counter Total STT transcription calls
stentor_tts_requests_total Counter Total TTS synthesis calls
stentor_agent_requests_total Counter Total agent message calls
stentor_pipeline_duration_seconds Histogram Full pipeline latency
stentor_stt_duration_seconds Histogram STT transcription latency
stentor_tts_duration_seconds Histogram TTS synthesis latency
stentor_agent_duration_seconds Histogram Agent response latency

Configuration

All configuration via environment variables (12-factor):

Variable Description Default
STENTOR_HOST Gateway bind address 0.0.0.0
STENTOR_PORT Gateway bind port 8600
STENTOR_STT_URL Speaches STT endpoint http://perseus.incus:8000
STENTOR_TTS_URL Speaches TTS endpoint http://pan.incus:8000
STENTOR_AGENT_URL FastAgent HTTP endpoint http://localhost:8001
STENTOR_STT_MODEL Whisper model for STT Systran/faster-whisper-small
STENTOR_TTS_MODEL TTS model name kokoro
STENTOR_TTS_VOICE TTS voice ID af_heart
STENTOR_AUDIO_SAMPLE_RATE Audio sample rate in Hz 16000
STENTOR_AUDIO_CHANNELS Audio channel count 1
STENTOR_AUDIO_SAMPLE_WIDTH Bits per sample 16
STENTOR_LOG_LEVEL Logging level INFO