Skip to content

OpenAI Realtime API Example

This example demonstrates bidirectional audio streaming with the OpenAI Realtime API using PromptKit.

  • Bidirectional Audio Streaming: Send and receive audio simultaneously at 24kHz
  • Server-Side VAD: OpenAI’s voice activity detection handles turn-taking
  • Function Calling: Execute tools/functions during streaming sessions
  • Input Transcription: Get transcripts of what the user said
  • Multiple Voices: Choose from alloy, echo, shimmer, ash, ballad, coral, sage, verse
  1. OpenAI API Key with Realtime API access
  2. PortAudio (for audio modes):
    Terminal window
    # macOS
    brew install portaudio
    # Ubuntu/Debian
    sudo apt-get install portaudio19-dev
    # Windows
    # Download from http://www.portaudio.com/
Terminal window
export OPENAI_API_KEY=your-key
go run .
Terminal window
export OPENAI_API_KEY=your-key
go run -tags portaudio .
Terminal window
# Interactive voice chat (default)
go run -tags portaudio . interactive
# Function calling demo (ask about weather)
go run -tags portaudio . tools
# Real-time translator (English to Spanish)
go run -tags portaudio . translator
PromptKit SDK
|
v
+-----------------------+
| OpenDuplex() |
| - Creates session |
| - Manages WebSocket |
+-----------------------+
|
+---------------+---------------+
| |
v v
+------------------+ +------------------+
| Audio Capture | | Audio Playback |
| (Microphone) | | (Speakers) |
| 24kHz PCM16 | | 24kHz PCM16 |
+------------------+ +------------------+
| ^
v |
+------------------+ +------------------+
| SendChunk() | | Response() |
| - Audio chunks | | - Audio deltas |
| - Text messages | | - Text deltas |
+------------------+ +------------------+
| ^
v |
+-------------------------------------------------+
| OpenAI Realtime API |
| (WebSocket Connection) |
| |
| - gpt-4o-realtime-preview model |
| - Server-side VAD (Voice Activity Detection) |
| - Function/Tool calling |
| - Audio transcription |
+-------------------------------------------------+

OpenAI Realtime API uses:

  • Sample Rate: 24kHz (24000 Hz)
  • Bit Depth: 16-bit signed integer
  • Channels: Mono (1 channel)
  • Encoding: PCM16 (little-endian)
package main
import (
"context"
"github.com/AltairaLabs/PromptKit/runtime/providers"
"github.com/AltairaLabs/PromptKit/runtime/types"
"github.com/AltairaLabs/PromptKit/sdk"
)
func main() {
// Open duplex conversation
conv, _ := sdk.OpenDuplex(
"./openai-realtime.pack.json",
"assistant",
sdk.WithModel("gpt-4o-realtime-preview"),
sdk.WithAPIKey(os.Getenv("OPENAI_API_KEY")),
sdk.WithStreamingConfig(&providers.StreamingInputConfig{
Config: types.StreamingMediaConfig{
Type: types.ContentTypeAudio,
SampleRate: 24000,
Channels: 1,
Encoding: "pcm16",
BitDepth: 16,
ChunkSize: 4800,
},
Metadata: map[string]interface{}{
"voice": "alloy",
"modalities": []string{"text", "audio"},
"input_transcription": true,
},
}),
)
defer conv.Close()
// Send audio chunk
chunk := &providers.StreamChunk{
MediaDelta: &types.MediaContent{
MIMEType: "audio/pcm",
Data: &audioData, // PCM16 bytes as string
},
}
conv.SendChunk(ctx, chunk)
// Or send text
conv.SendText(ctx, "Hello!")
// Receive streaming response
respCh, _ := conv.Response()
for chunk := range respCh {
if chunk.MediaDelta != nil {
// Play audio
}
if chunk.Delta != "" {
// Print text
}
}
}
VoiceDescription
alloyNeutral, balanced
echoWarm, conversational
shimmerClear, expressive
ashDeep, authoritative
balladMelodic, storytelling
coralBright, energetic
sageCalm, thoughtful
verseDynamic, engaging

The tools demo shows how to handle function calls during streaming:

// Define tools in StreamingInputConfig
Tools: []providers.StreamingToolDefinition{
{
Name: "get_weather",
Description: "Get the current weather for a location",
Parameters: map[string]interface{}{...},
},
},
// Handle tool calls in response
if chunk.ToolCalls != nil {
for _, tc := range chunk.ToolCalls {
result := executeToolCall(tc.Name, tc.Arguments)
conv.SendToolResult(ctx, tc.ID, result)
}
}

”OPENAI_API_KEY environment variable is required”

Section titled “”OPENAI_API_KEY environment variable is required””

Set your API key:

Terminal window
export OPENAI_API_KEY=sk-...

Install PortAudio for your platform (see Prerequisites above).

  • Check your speaker/headphone volume
  • Ensure the correct audio device is selected as default
  • Try a different voice option

Use headphones to prevent the microphone from picking up speaker output.