Interactive voice chat using the SDK with streaming provider support (Gemini Live API).
Features
- Bidirectional Streaming: Real-time audio input/output through provider
- Pipeline Integration: Uses SDK pipeline with VAD middleware
- Provider-native TTS: Audio responses directly from provider
- Full-duplex Conversation: Seamless voice interaction
Requirements
-
System:
- Microphone (default system input)
- Speakers/audio output
- PortAudio library installed
-
macOS:
brew install portaudio -
Linux:
sudo apt-get install portaudio19-dev -
Windows: Download and install PortAudio from http://www.portaudio.com/
-
API Keys:
- Gemini API key for streaming audio
Installation
cd sdk/examples/voice-chat
go mod tidy
Usage
-
Set your Gemini API key:
export GEMINI_API_KEY=your-key-here -
Run the example:
go run . -
Speak into your microphone
- The session streams your audio to the provider
- VAD middleware detects turn boundaries
- Provider responds with text and/or audio
- Audio responses play through speakers
-
Press Ctrl+C to exit
How It Works
This example uses the SDK’s proper pipeline architecture:
- Provider Session: Creates streaming session with Gemini Live API
- Bidirectional Session: Wraps provider session with SDK session management
- Audio Capture: Microphone input sent as StreamChunks to session
- Pipeline Processing: VAD middleware detects turns, provider generates responses
- Response Handling: Text and audio responses received via response channel
- Audio Playback: Provider-generated audio played through speakers
Architecture
┌─────────────┐
│ Microphone │
└──────┬──────┘
│ PCM chunks
▼
┌─────────────────────┐
│ BidirectionalSession│
│ (SDK Pipeline) │
│ │
│ ┌─────────────┐ │
│ │ VAD │ │◄── Turn detection
│ │ Middleware │ │
│ └─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Provider │ │◄── Gemini Live API
│ │ Session │ │
│ └─────────────┘ │
│ │ │
└─────────┴───────────┘
│
▼ Text + Audio
┌─────────────┐
│ Speakers │
└─────────────┘
Customization
The pipeline handles VAD, transcription, and TTS internally through middleware. Configuration is done through the provider session request.
Next Steps
- Explore VAD demo for VAD configuration
- Check streaming example for text streaming
- See SDK documentation for pipeline middleware
Was this page helpful?