Skip to content

Declarative TTS / STT Providers

The chat-provider and embedding-provider declarative pattern (#979) extends to text-to-speech and speech-to-text. Voice-mode applications no longer need to hard-code provider construction in Go.

spec:
tts_providers:
- id: voice
type: elevenlabs
model: eleven_turbo_v2
credential:
credential_env: ELEVEN_API_KEY
- id: cart
type: cartesia
credential:
credential_env: CARTESIA_API_KEY
additional_config:
ws_url: wss://api.cartesia.ai/tts/websocket
stt_providers:
- id: whisper
type: openai
model: whisper-1
credential:
credential_env: OPENAI_API_KEY
conv, _ := sdk.Open("./pack.json", "chat",
sdk.WithRuntimeConfig("./runtime.yaml"),
)

The first declared TTS entry becomes the default ttsService unless WithTTS (or WithVADMode) wired one programmatically. Same for STT and sttService.

Blocktype valueUnderlying package
tts_providersopenairuntime/tts (OpenAI TTS)
tts_providerselevenlabsruntime/tts (ElevenLabs)
tts_providerscartesiaruntime/tts (Cartesia)
stt_providersopenairuntime/stt (OpenAI Whisper)

additional_config honored extras:

  • Cartesiaws_url (string) for the websocket streaming endpoint.

WithTTS(service) and WithVADMode(stt, tts, ...) are unchanged. When set, they win over the YAML default — same precedence rule as chat and embedding providers.

LoadRuntimeConfig rejects:

  • Missing type.
  • A type outside the supported set.
  • Two entries with the same effective ID (explicit ID, or type when ID is omitted).

The factory pattern is open: a per-provider package can self-register via init():

package mytts
import "github.com/AltairaLabs/PromptKit/runtime/tts"
func init() {
tts.RegisterFactory("my_tts", func(spec tts.ProviderSpec) (tts.Service, error) {
return NewMyTTS(spec.Model, tts.APIKeyFromCredential(spec.Credential)), nil
})
}

Side-effect-import the package from your application and the new type: value works in YAML immediately.