Voice Chat Example
Interactive voice chat using the SDK with streaming provider support (Gemini Live API).
Features
Section titled “Features”- Bidirectional Streaming: Real-time audio input/output through provider
- Pipeline Integration: Uses SDK pipeline with VAD middleware
- Provider-native TTS: Audio responses directly from provider
- Full-duplex Conversation: Seamless voice interaction
Requirements
Section titled “Requirements”-
System:
- Microphone (default system input)
- Speakers/audio output
- PortAudio library installed
-
macOS:
Terminal window brew install portaudio -
Linux:
Terminal window sudo apt-get install portaudio19-dev -
Windows: Download and install PortAudio from http://www.portaudio.com/
-
API Keys:
- Gemini API key for streaming audio
Installation
Section titled “Installation”cd sdk/examples/voice-chatgo mod tidy-
Set your Gemini API key:
Terminal window export GEMINI_API_KEY=your-key-here -
Run the example:
Terminal window go run . -
Speak into your microphone
- The session streams your audio to the provider
- VAD middleware detects turn boundaries
- Provider responds with text and/or audio
- Audio responses play through speakers
-
Press Ctrl+C to exit
How It Works
Section titled “How It Works”This example uses the SDK’s proper pipeline architecture:
- Provider Session: Creates streaming session with Gemini Live API
- Bidirectional Session: Wraps provider session with SDK session management
- Audio Capture: Microphone input sent as StreamChunks to session
- Pipeline Processing: VAD middleware detects turns, provider generates responses
- Response Handling: Text and audio responses received via response channel
- Audio Playback: Provider-generated audio played through speakers
Architecture
Section titled “Architecture”┌─────────────┐│ Microphone │└──────┬──────┘ │ PCM chunks ▼┌─────────────────────┐│ BidirectionalSession││ (SDK Pipeline) ││ ││ ┌─────────────┐ ││ │ VAD │ │◄── Turn detection│ │ Middleware │ ││ └─────────────┘ ││ │ ││ ▼ ││ ┌─────────────┐ ││ │ Provider │ │◄── Gemini Live API│ │ Session │ ││ └─────────────┘ ││ │ │└─────────┴───────────┘ │ ▼ Text + Audio┌─────────────┐│ Speakers │└─────────────┘Customization
Section titled “Customization”The pipeline handles VAD, transcription, and TTS internally through middleware. Configuration is done through the provider session request.
Next Steps
Section titled “Next Steps”- Explore VAD demo for VAD configuration
- Check streaming example for text streaming
- See SDK documentation for pipeline middleware