OpenAI Realtime API Example
This example demonstrates bidirectional audio streaming with the OpenAI Realtime API using PromptKit.
Features
Section titled “Features”- Bidirectional Audio Streaming: Send and receive audio simultaneously at 24kHz
- Server-Side VAD: OpenAI’s voice activity detection handles turn-taking
- Function Calling: Execute tools/functions during streaming sessions
- Input Transcription: Get transcripts of what the user said
- Multiple Voices: Choose from alloy, echo, shimmer, ash, ballad, coral, sage, verse
Prerequisites
Section titled “Prerequisites”- OpenAI API Key with Realtime API access
- PortAudio (for audio modes):
Terminal window # macOSbrew install portaudio# Ubuntu/Debiansudo apt-get install portaudio19-dev# Windows# Download from http://www.portaudio.com/
Text Mode (No PortAudio Required)
Section titled “Text Mode (No PortAudio Required)”export OPENAI_API_KEY=your-keygo run .Interactive Voice Mode
Section titled “Interactive Voice Mode”export OPENAI_API_KEY=your-keygo run -tags portaudio .Available Modes (with PortAudio)
Section titled “Available Modes (with PortAudio)”# Interactive voice chat (default)go run -tags portaudio . interactive
# Function calling demo (ask about weather)go run -tags portaudio . tools
# Real-time translator (English to Spanish)go run -tags portaudio . translatorHow It Works
Section titled “How It Works”Architecture
Section titled “Architecture” PromptKit SDK | v +-----------------------+ | OpenDuplex() | | - Creates session | | - Manages WebSocket | +-----------------------+ | +---------------+---------------+ | | v v+------------------+ +------------------+| Audio Capture | | Audio Playback || (Microphone) | | (Speakers) || 24kHz PCM16 | | 24kHz PCM16 |+------------------+ +------------------+ | ^ v |+------------------+ +------------------+| SendChunk() | | Response() || - Audio chunks | | - Audio deltas || - Text messages | | - Text deltas |+------------------+ +------------------+ | ^ v |+-------------------------------------------------+| OpenAI Realtime API || (WebSocket Connection) || || - gpt-4o-realtime-preview model || - Server-side VAD (Voice Activity Detection) || - Function/Tool calling || - Audio transcription |+-------------------------------------------------+Audio Format
Section titled “Audio Format”OpenAI Realtime API uses:
- Sample Rate: 24kHz (24000 Hz)
- Bit Depth: 16-bit signed integer
- Channels: Mono (1 channel)
- Encoding: PCM16 (little-endian)
Code Example
Section titled “Code Example”package main
import ( "context" "github.com/AltairaLabs/PromptKit/runtime/providers" "github.com/AltairaLabs/PromptKit/runtime/types" "github.com/AltairaLabs/PromptKit/sdk")
func main() { // Open duplex conversation conv, _ := sdk.OpenDuplex( "./openai-realtime.pack.json", "assistant", sdk.WithModel("gpt-4o-realtime-preview"), sdk.WithAPIKey(os.Getenv("OPENAI_API_KEY")), sdk.WithStreamingConfig(&providers.StreamingInputConfig{ Config: types.StreamingMediaConfig{ Type: types.ContentTypeAudio, SampleRate: 24000, Channels: 1, Encoding: "pcm16", BitDepth: 16, ChunkSize: 4800, }, Metadata: map[string]interface{}{ "voice": "alloy", "modalities": []string{"text", "audio"}, "input_transcription": true, }, }), ) defer conv.Close()
// Send audio chunk chunk := &providers.StreamChunk{ MediaDelta: &types.MediaContent{ MIMEType: "audio/pcm", Data: &audioData, // PCM16 bytes as string }, } conv.SendChunk(ctx, chunk)
// Or send text conv.SendText(ctx, "Hello!")
// Receive streaming response respCh, _ := conv.Response() for chunk := range respCh { if chunk.MediaDelta != nil { // Play audio } if chunk.Delta != "" { // Print text } }}Voice Options
Section titled “Voice Options”| Voice | Description |
|---|---|
alloy | Neutral, balanced |
echo | Warm, conversational |
shimmer | Clear, expressive |
ash | Deep, authoritative |
ballad | Melodic, storytelling |
coral | Bright, energetic |
sage | Calm, thoughtful |
verse | Dynamic, engaging |
Function Calling
Section titled “Function Calling”The tools demo shows how to handle function calls during streaming:
// Define tools in StreamingInputConfigTools: []providers.StreamingToolDefinition{ { Name: "get_weather", Description: "Get the current weather for a location", Parameters: map[string]interface{}{...}, },},
// Handle tool calls in responseif chunk.ToolCalls != nil { for _, tc := range chunk.ToolCalls { result := executeToolCall(tc.Name, tc.Arguments) conv.SendToolResult(ctx, tc.ID, result) }}Troubleshooting
Section titled “Troubleshooting””OPENAI_API_KEY environment variable is required”
Section titled “”OPENAI_API_KEY environment variable is required””Set your API key:
export OPENAI_API_KEY=sk-...“failed to initialize PortAudio”
Section titled ““failed to initialize PortAudio””Install PortAudio for your platform (see Prerequisites above).
No audio output
Section titled “No audio output”- Check your speaker/headphone volume
- Ensure the correct audio device is selected as default
- Try a different voice option
Echo/feedback issues
Section titled “Echo/feedback issues”Use headphones to prevent the microphone from picking up speaker output.
Related Examples
Section titled “Related Examples”duplex-streaming- Gemini Live API streamingvoice-chat- Traditional STT/TTS voice chatvoice-interview- Full voice interview application