r/stentor

Files

Robert Helewka 912593b796 feat: scaffold stentor-gateway with FastAPI voice pipeline

Initialize the stentor-gateway project with WebSocket-based voice
pipeline orchestrating STT → Agent → TTS via OpenAI-compatible APIs.

- Add FastAPI app with WebSocket endpoint for audio streaming
- Add pipeline orchestration (stt_client, tts_client, agent_client)
- Add Pydantic Settings configuration and message models
- Add audio utilities for PCM/WAV conversion and resampling
- Add health check endpoints
- Add Dockerfile and pyproject.toml with dependencies
- Add initial test suite (pipeline, STT, TTS, WebSocket)
- Add comprehensive README covering gateway and ESP32 ear design
- Clean up .gitignore for Python/uv project

2026-03-21 19:11:48 +00:00

6.7 KiB

Raw Permalink Blame History

Stentor Gateway API Reference

Version 0.1.0

Endpoints

Endpoint	Method	Description
`/`	GET	Dashboard (Bootstrap UI)
`/api/v1/realtime`	WebSocket	Real-time audio conversation
`/api/v1/info`	GET	Gateway information and configuration
`/api/live/`	GET	Liveness probe (Kubernetes)
`/api/ready/`	GET	Readiness probe (Kubernetes)
`/api/metrics`	GET	Prometheus-compatible metrics
`/api/docs`	GET	Interactive API documentation (Swagger UI)
`/api/openapi.json`	GET	OpenAPI schema

WebSocket: `/api/v1/realtime`

Real-time voice conversation endpoint. Protocol inspired by the OpenAI Realtime API.

Connection

ws://{host}:{port}/api/v1/realtime

Client Events

`session.start`

Initiates a new conversation session. Must be sent first.

{
  "type": "session.start",
  "client_id": "esp32-kitchen",
  "audio_config": {
    "sample_rate": 16000,
    "channels": 1,
    "sample_width": 16,
    "encoding": "pcm_s16le"
  }
}

Field	Type	Required	Description
`type`	string	✔	Must be `"session.start"`
`client_id`	string		Client identifier for tracking
`audio_config`	object		Audio format configuration

`input_audio_buffer.append`

Sends a chunk of audio data. Stream continuously while user is speaking.

{
  "type": "input_audio_buffer.append",
  "audio": "<base64-encoded PCM audio>"
}

Field	Type	Required	Description
`type`	string	✔	Must be `"input_audio_buffer.append"`
`audio`	string	✔	Base64-encoded PCM S16LE audio

`input_audio_buffer.commit`

Signals end of speech. Triggers the STT → Agent → TTS pipeline.

{
  "type": "input_audio_buffer.commit"
}

`session.close`

Requests session termination. The WebSocket connection will close.

{
  "type": "session.close"
}

Server Events

`session.created`

Acknowledges session creation.

{
  "type": "session.created",
  "session_id": "550e8400-e29b-41d4-a716-446655440000"
}

`status`

Processing status update. Use for LED feedback on ESP32.

{
  "type": "status",
  "state": "listening"
}

State	Description	Suggested LED
`listening`	Ready for audio input	Green
`transcribing`	Running STT	Yellow
`thinking`	Waiting for agent response	Yellow
`speaking`	Playing TTS audio	Cyan

`transcript.done`

Transcript of what the user said.

{
  "type": "transcript.done",
  "text": "What is the weather like today?"
}

`response.text.done`

AI agent's response text.

{
  "type": "response.text.done",
  "text": "I don't have weather tools yet, but I can help with other things."
}

`response.audio.delta`

Streamed audio response chunk.

{
  "type": "response.audio.delta",
  "delta": "<base64-encoded PCM audio>"
}

`response.audio.done`

Audio response streaming complete.

{
  "type": "response.audio.done"
}

`response.done`

Full response cycle complete. Gateway returns to listening state.

{
  "type": "response.done"
}

`error`

Error event.

{
  "type": "error",
  "message": "STT service unavailable",
  "code": "stt_error"
}

Code	Description
`invalid_json`	Client sent malformed JSON
`validation_error`	Message failed schema validation
`no_session`	Action requires an active session
`empty_buffer`	Audio buffer was empty on commit
`empty_transcript`	STT returned no speech
`empty_response`	Agent returned empty response
`pipeline_error`	Internal pipeline failure
`unknown_event`	Unrecognized event type
`internal_error`	Unexpected server error

REST: `/api/v1/info`

Returns gateway information and current configuration.

Response:

{
  "name": "stentor-gateway",
  "version": "0.1.0",
  "endpoints": {
    "realtime": "/api/v1/realtime",
    "live": "/api/live/",
    "ready": "/api/ready/",
    "metrics": "/api/metrics"
  },
  "config": {
    "stt_url": "http://perseus.incus:8000",
    "tts_url": "http://pan.incus:8000",
    "agent_url": "http://localhost:8001",
    "stt_model": "Systran/faster-whisper-small",
    "tts_model": "kokoro",
    "tts_voice": "af_heart",
    "audio_sample_rate": 16000,
    "audio_channels": 1,
    "audio_sample_width": 16
  }
}

REST: `/api/live/`

Kubernetes liveness probe.

Response (200):

{
  "status": "ok"
}

REST: `/api/ready/`

Kubernetes readiness probe. Checks connectivity to STT, TTS, and Agent services.

Response (200 — all services reachable):

{
  "status": "ready",
  "checks": {
    "stt": true,
    "tts": true,
    "agent": true
  }
}

Response (503 — one or more services unavailable):

{
  "status": "not_ready",
  "checks": {
    "stt": true,
    "tts": false,
    "agent": true
  }
}

REST: `/api/metrics`

Prometheus-compatible metrics in text exposition format.

Metrics exported:

Metric	Type	Description
`stentor_sessions_active`	Gauge	Current active WebSocket sessions
`stentor_transcriptions_total`	Counter	Total STT transcription calls
`stentor_tts_requests_total`	Counter	Total TTS synthesis calls
`stentor_agent_requests_total`	Counter	Total agent message calls
`stentor_pipeline_duration_seconds`	Histogram	Full pipeline latency
`stentor_stt_duration_seconds`	Histogram	STT transcription latency
`stentor_tts_duration_seconds`	Histogram	TTS synthesis latency
`stentor_agent_duration_seconds`	Histogram	Agent response latency

Configuration

All configuration via environment variables (12-factor):

Variable	Description	Default
`STENTOR_HOST`	Gateway bind address	`0.0.0.0`
`STENTOR_PORT`	Gateway bind port	`8600`
`STENTOR_STT_URL`	Speaches STT endpoint	`http://perseus.incus:8000`
`STENTOR_TTS_URL`	Speaches TTS endpoint	`http://pan.incus:8000`
`STENTOR_AGENT_URL`	FastAgent HTTP endpoint	`http://localhost:8001`
`STENTOR_STT_MODEL`	Whisper model for STT	`Systran/faster-whisper-small`
`STENTOR_TTS_MODEL`	TTS model name	`kokoro`
`STENTOR_TTS_VOICE`	TTS voice ID	`af_heart`
`STENTOR_AUDIO_SAMPLE_RATE`	Audio sample rate in Hz	`16000`
`STENTOR_AUDIO_CHANNELS`	Audio channel count	`1`
`STENTOR_AUDIO_SAMPLE_WIDTH`	Bits per sample	`16`
`STENTOR_LOG_LEVEL`	Logging level	`INFO`

6.7 KiB Raw Permalink Blame History

Stentor Gateway API Reference

Endpoints

WebSocket: /api/v1/realtime

Connection

Client Events

session.start

input_audio_buffer.append

input_audio_buffer.commit

session.close

Server Events

session.created

status

transcript.done

response.text.done

response.audio.delta

response.audio.done

response.done

error

REST: /api/v1/info

REST: /api/live/

REST: /api/ready/

REST: /api/metrics