This example demonstrates bidirectional streaming using OpenDuplex() with the Gemini Live API.
Features
- Interactive Voice Mode: Real-time audio capture from microphone with voice activity detection
- Real-time bidirectional streaming
- Text and audio chunk streaming
- Response handling with streaming chunks
- Duplex session lifecycle management
Requirements
- Gemini API key with Live API access enabled
- Model:
gemini-2.0-flash-exp(supports streaming input) - Microphone (for interactive voice mode)
- PortAudio library (for audio capture)
Note: The Gemini Live API is currently in preview and requires special access. If you encounter authentication errors, visit https://ai.google.dev/ to request Live API access.
Setup
- Set your Gemini API key:
export GEMINI_API_KEY=your-key-here
- Install PortAudio (for audio capture):
# macOS
brew install portaudio
# Ubuntu/Debian
sudo apt-get install portaudio19-dev
# Fedora
sudo dnf install portaudio-devel
- Run the example:
# Interactive voice mode (default)
go run .
# Text streaming only
go run . text
# Multiple chunks example
go run . chunks
Modes
The example supports three modes:
-
interactive (default): Real-time voice input via microphone
- Captures audio from your microphone continuously
- Streams audio chunks to Gemini in real-time (bidirectional)
- Receives and plays audio responses through speakers
- Also displays text transcription for debugging
-
text: Text streaming example
- Sends a text message
- Receives streaming response
-
chunks: Multiple chunk sending
- Sends message in multiple chunks
- Demonstrates incremental content building
API Usage
Interactive Audio Mode
// Open duplex conversation
conv, err := sdk.OpenDuplex("./duplex.pack.json", "assistant")
// Send audio chunk
audioData := string(pcmBytes) // PCM16 audio data
chunk := &providers.StreamChunk{
MediaDelta: &types.MediaContent{
MIMEType: types.MIMETypeAudioWAV,
Data: &audioData,
},
}
conv.SendChunk(ctx, chunk)
// Receive streaming responses
respCh, _ := conv.Response()
for chunk := range respCh {
fmt.Print(chunk.Delta)
if chunk.FinishReason != nil {
break
}
}
Text Streaming
// Open duplex conversation
conv, err := sdk.OpenDuplex(
"./duplex.pack.json",
"assistant",
sdk.WithModel("gemini-2.0-flash-exp"),
sdk.WithAPIKey(apiKey),
)
defer conv.Close()
// Send text
conv.SendText(ctx, "Hello!")
// Get response channel
respCh, _ := conv.Response()
// Receive streaming responses
for chunk := range respCh {
fmt.Print(chunk.Delta)
if chunk.FinishReason != nil {
break
}
}
How It Works
Interactive Voice Mode
- Audio Capture: Uses PortAudio to capture microphone input at 16kHz mono PCM16
- Continuous Streaming: Audio is streamed continuously to Gemini Live API (no turn detection)
- Bidirectional Audio: Gemini ASM model streams audio responses back in real-time
- Audio Playback: Responses are played through speakers at 24kHz
- Text Display: Text transcription also shown for debugging
Visual feedback during capture:
█= Audio detected (high energy)░= Low/no audio
OpenDuplex vs Stream
- OpenDuplex: Full bidirectional streaming with the model. You can send multiple chunks and receive responses in real-time.
- Stream: Unary mode with streaming responses. You send one complete message and receive a streaming response.
Use OpenDuplex when you need:
- Real-time audio/video streaming
- Interactive back-and-forth during model response
- Voice conversation applications
- Live media processing
Was this page helpful?