Interactive voice chat using the SDK with streaming provider support (Gemini Live API).

Features

Requirements

Installation

cd sdk/examples/voice-chat
go mod tidy

Usage

  1. Set your Gemini API key:

    export GEMINI_API_KEY=your-key-here
  2. Run the example:

    go run .
  3. Speak into your microphone

    • The session streams your audio to the provider
    • VAD middleware detects turn boundaries
    • Provider responds with text and/or audio
    • Audio responses play through speakers
  4. Press Ctrl+C to exit

How It Works

This example uses the SDK’s proper pipeline architecture:

  1. Provider Session: Creates streaming session with Gemini Live API
  2. Bidirectional Session: Wraps provider session with SDK session management
  3. Audio Capture: Microphone input sent as StreamChunks to session
  4. Pipeline Processing: VAD middleware detects turns, provider generates responses
  5. Response Handling: Text and audio responses received via response channel
  6. Audio Playback: Provider-generated audio played through speakers

Architecture

┌─────────────┐
│ Microphone  │
└──────┬──────┘
       │ PCM chunks

┌─────────────────────┐
│ BidirectionalSession│
│   (SDK Pipeline)    │
│                     │
│  ┌─────────────┐   │
│  │ VAD         │   │◄── Turn detection
│  │ Middleware  │   │
│  └─────────────┘   │
│         │           │
│         ▼           │
│  ┌─────────────┐   │
│  │ Provider    │   │◄── Gemini Live API
│  │ Session     │   │
│  └─────────────┘   │
│         │           │
└─────────┴───────────┘

          ▼ Text + Audio
┌─────────────┐
│  Speakers   │
└─────────────┘

Customization

The pipeline handles VAD, transcription, and TTS internally through middleware. Configuration is done through the provider session request.

Next Steps