TTS API Reference
Complete reference for text-to-speech services.
Service Interface
Section titled “Service Interface”type Service interface { Name() string Synthesize(ctx context.Context, text string, config SynthesisConfig) (io.ReadCloser, error) SupportedVoices() []Voice SupportedFormats() []AudioFormat}Methods
Section titled “Methods”func (s Service) Name() stringReturns the provider identifier (e.g., “openai”, “elevenlabs”).
Synthesize
Section titled “Synthesize”func (s Service) Synthesize(ctx context.Context, text string, config SynthesisConfig) (io.ReadCloser, error)Converts text to audio. Returns a reader for streaming audio data. The caller is responsible for closing the reader.
SupportedVoices
Section titled “SupportedVoices”func (s Service) SupportedVoices() []VoiceReturns available voices for this provider.
SupportedFormats
Section titled “SupportedFormats”func (s Service) SupportedFormats() []AudioFormatReturns supported audio output formats.
StreamingService Interface
Section titled “StreamingService Interface”type StreamingService interface { Service SynthesizeStream(ctx context.Context, text string, config SynthesisConfig) (<-chan AudioChunk, error)}Extends Service with streaming synthesis capabilities for lower latency.
SynthesisConfig
Section titled “SynthesisConfig”type SynthesisConfig struct { Voice string // Voice ID Format AudioFormat // Output format Speed float64 // Speech rate (0.25-4.0) Pitch float64 // Pitch adjustment (-20 to 20) Language string // Language code Model string // TTS model (provider-specific)}Fields
Section titled “Fields”| Field | Type | Default | Description |
|---|---|---|---|
Voice | string | ”alloy” | Voice ID (provider-specific) |
Format | AudioFormat | MP3 | Output audio format |
Speed | float64 | 1.0 | Speech rate multiplier |
Pitch | float64 | 0 | Pitch adjustment in semitones |
Language | string | "" | Language code (e.g., “en-US”) |
Model | string | "" | TTS model (e.g., “tts-1-hd”) |
Constructor
Section titled “Constructor”func DefaultSynthesisConfig() SynthesisConfigReturns sensible defaults for synthesis.
Voice Type
Section titled “Voice Type”type Voice struct { ID string // Provider-specific identifier Name string // Human-readable name Language string // Primary language code Gender string // "male", "female", "neutral" Description string // Voice characteristics Preview string // URL to voice sample}AudioFormat Type
Section titled “AudioFormat Type”type AudioFormat struct { Name string // Format identifier MIMEType string // Content type SampleRate int // Sample rate in Hz BitDepth int // Bits per sample Channels int // Number of channels}Predefined Formats
Section titled “Predefined Formats”| Constant | Name | MIME Type | Use Case |
|---|---|---|---|
FormatMP3 | mp3 | audio/mpeg | Most compatible |
FormatOpus | opus | audio/opus | Best for streaming |
FormatAAC | aac | audio/aac | Apple devices |
FormatFLAC | flac | audio/flac | Lossless quality |
FormatPCM16 | pcm | audio/pcm | Raw processing |
FormatWAV | wav | audio/wav | PCM with header |
AudioChunk Type
Section titled “AudioChunk Type”type AudioChunk struct { Data []byte // Raw audio bytes Index int // Chunk sequence number Final bool // Last chunk indicator Error error // Error during synthesis}Providers
Section titled “Providers”OpenAI TTS
Section titled “OpenAI TTS”func NewOpenAI(apiKey string) ServiceCreates an OpenAI TTS service.
Voices:
| ID | Character |
|---|---|
| alloy | Neutral, versatile |
| echo | Warm, smooth |
| fable | Expressive, British |
| onyx | Deep, authoritative |
| nova | Friendly, youthful |
| shimmer | Clear, professional |
Models:
tts-1: Fast, optimized for real-timetts-1-hd: High quality, longer latency
Example:
service := tts.NewOpenAI(os.Getenv("OPENAI_API_KEY"))
config := tts.SynthesisConfig{ Voice: "nova", Format: tts.FormatMP3, Model: "tts-1-hd",}
reader, _ := service.Synthesize(ctx, "Hello world", config)ElevenLabs TTS
Section titled “ElevenLabs TTS”func NewElevenLabs(apiKey string) ServiceCreates an ElevenLabs TTS service.
Features:
- Wide variety of voices
- Voice cloning support
- Multilingual support
Example:
service := tts.NewElevenLabs(os.Getenv("ELEVENLABS_API_KEY"))
// List available voicesvoices := service.SupportedVoices()for _, v := range voices { fmt.Printf("%s: %s\n", v.ID, v.Name)}Cartesia TTS
Section titled “Cartesia TTS”func NewCartesia(apiKey string) ServiceCreates a Cartesia TTS service.
Features:
- Ultra-low latency
- Interactive streaming mode
- Emotion control
Example:
service := tts.NewCartesia(os.Getenv("CARTESIA_API_KEY"))Error Types
Section titled “Error Types”var ( ErrInvalidVoice = errors.New("invalid voice") ErrInvalidFormat = errors.New("unsupported format") ErrTextTooLong = errors.New("text exceeds maximum length") ErrRateLimited = errors.New("rate limited") ErrServiceDown = errors.New("service unavailable"))Usage Examples
Section titled “Usage Examples”Basic Synthesis
Section titled “Basic Synthesis”service := tts.NewOpenAI(apiKey)
reader, err := service.Synthesize(ctx, "Hello!", tts.DefaultSynthesisConfig())if err != nil { log.Fatal(err)}defer reader.Close()
data, _ := io.ReadAll(reader)// Use audio data...Streaming Synthesis
Section titled “Streaming Synthesis”service := tts.NewCartesia(apiKey)
streamingService, ok := service.(tts.StreamingService)if !ok { log.Fatal("Provider doesn't support streaming")}
chunks, err := streamingService.SynthesizeStream(ctx, "Hello world!", config)if err != nil { log.Fatal(err)}
for chunk := range chunks { if chunk.Error != nil { log.Printf("Error: %v", chunk.Error) break } playAudio(chunk.Data)}Custom Configuration
Section titled “Custom Configuration”config := tts.SynthesisConfig{ Voice: "onyx", Format: tts.FormatOpus, Speed: 0.9, // Slightly slower Pitch: -2, // Slightly lower Language: "en-US", Model: "tts-1-hd",}
reader, _ := service.Synthesize(ctx, text, config)See Also
Section titled “See Also”- TTS Tutorial - Getting started
- Audio Reference - Audio session API
- VAD Mode - Voice activity detection