Skip to content

Duplex Streaming Example

This example demonstrates Arena’s duplex streaming capabilities for testing real-time, bidirectional audio conversations with LLMs.

Duplex streaming enables simultaneous input and output audio streams, allowing for natural voice conversations where:

  • User speaks → Audio is streamed to the LLM in real-time
  • LLM responds → Audio is streamed back while the user might still be speaking
  • Natural interruptions → The system handles turn-taking using voice activity detection (VAD)

This is ideal for testing voice assistants, customer support bots, and any real-time conversational AI.

FeatureDescription
Duplex ModeBidirectional audio streaming with configurable timeouts
VAD Turn DetectionVoice activity detection for natural conversation flow
Self-Play with TTSLLM-generated user messages converted to audio via TTS
Multiple ProvidersTest across Gemini 2.0 Flash and OpenAI GPT-4o Realtime
Mock ModeCI-friendly testing without API keys
Terminal window
# Set your API keys
export GEMINI_API_KEY="your-gemini-api-key"
export OPENAI_API_KEY="your-openai-api-key"

No API keys required - uses deterministic mock responses.

Terminal window
# Navigate to the example directory
cd examples/duplex-streaming
# Run all scenarios with mock provider
promptarena run --provider mock-duplex
# Run a specific scenario
promptarena run --scenario duplex-basic --provider mock-duplex
Terminal window
# Run with Gemini 2.0 Flash (requires GEMINI_API_KEY)
promptarena run --provider gemini-2-flash
# Run with OpenAI GPT-4o Realtime (requires OPENAI_API_KEY)
promptarena run --provider openai-gpt4o-realtime
# Run specific scenario
promptarena run --scenario duplex-selfplay --provider gemini-2-flash

Simple scripted conversation to verify duplex functionality:

  • 3 scripted user turns
  • Tests greeting, Q&A, and follow-up
  • Validates response patterns
duplex:
timeout: "5m"
turn_detection:
mode: vad
vad:
silence_threshold_ms: 500
min_speech_ms: 1000

Demonstrates automated conversation testing using self-play:

  • LLM generates user messages
  • TTS converts generated text to audio
  • Audio is fed back into the duplex stream
turns:
- role: selfplay-user
persona: curious-customer
turns: 2
tts:
provider: openai
voice: alloy

3. duplex-interactive - Interactive Technical Support

Section titled “3. duplex-interactive - Interactive Technical Support”

Extended conversation simulating a support call:

  • Multiple self-play turns with different personas
  • Comprehensive assertion testing
  • Tests natural conversation flow
spec:
duplex:
# Maximum session duration
timeout: "10m"
# Turn detection settings
turn_detection:
mode: vad # "vad" or "asm" (provider-native)
vad:
# Silence duration to trigger turn end (ms)
silence_threshold_ms: 500
# Minimum speech before silence counts (ms)
min_speech_ms: 1000
turns:
- role: selfplay-user
persona: curious-customer
tts:
provider: openai # "openai", "elevenlabs", "cartesia"
voice: alloy # Provider-specific voice ID
ProviderVoices
OpenAIalloy, echo, fable, onyx, nova, shimmer
ElevenLabsUse voice IDs from your ElevenLabs account
CartesiaUse voice IDs from your Cartesia account

For testing with pre-recorded audio files, use the parts field with media content:

turns:
# Turn 1: Greeting - "Hello, can you hear me?"
- role: user
parts:
- type: audio
media:
file_path: audio/greeting.pcm
mime_type: audio/L16

In duplex mode, the audio from parts is streamed directly to the model. Use comments to document what each audio file contains.

Supported audio formats:

  • PCM (audio/L16) - Raw 16-bit PCM at 16kHz mono
  • Opus (audio/opus) - Compressed audio
  • WAV (audio/wav) - Uncompressed WAV files
duplex-streaming/
├── config.arena.yaml # Main arena configuration
├── README.md # This file
├── mock-responses.yaml # Mock responses for CI testing
├── audio/ # Pre-recorded audio fixtures
│ ├── greeting.pcm # "Hello, can you hear me?"
│ ├── question.pcm # "What's your name?"
│ └── funfact.pcm # "Tell me a fun fact"
├── providers/
│ ├── gemini-2-flash.provider.yaml
│ ├── openai-gpt4o-realtime.provider.yaml
│ └── mock-duplex.provider.yaml
├── scenarios/
│ ├── duplex-basic.scenario.yaml
│ ├── duplex-selfplay.scenario.yaml
│ └── duplex-interactive.scenario.yaml
├── prompts/
│ └── voice-assistant.prompt.yaml
├── personas/
│ ├── curious-customer.persona.yaml
│ └── technical-user.persona.yaml
└── out/ # Test results output

Duplex streaming requires providers that support bidirectional audio streaming.

Duplex mode requires providers to implement StreamInputSupport interface, which enables:

  • Streaming audio input to the model
  • Streaming audio output from the model
  • Bidirectional, real-time conversation

Supported providers:

  • Gemini 2.0 Flash (with audio enabled)
  • OpenAI GPT-4o Realtime
  • Mock provider (for CI/testing)

Not supported:

  • Standard text-only providers

When running with unsupported providers, you’ll see:

Error: provider does not support streaming input

The mock provider fully supports duplex streaming, enabling CI testing without API keys:

# GitHub Actions example - run duplex tests
- name: Run Duplex Streaming Tests
run: |
cd examples/duplex-streaming
promptarena run --provider mock-duplex

For schema validation only:

# GitHub Actions example - validate configuration
- name: Validate Duplex Streaming Config
run: |
cd examples/duplex-streaming
promptarena validate config.arena.yaml

Pre-recorded PCM audio files are included in the audio/ directory for testing:

  • greeting.pcm - Simple greeting (~2.5s)
  • question.pcm - Basic question (~1.5s)
  • funfact.pcm - Follow-up request (~2.3s)

These can be used to test audio streaming without TTS dependencies.

Ensure you’re using a provider that supports duplex mode:

  • Gemini 2.0 Flash with audio enabled
  • OpenAI GPT-4o Realtime
  • Mock provider (mock-duplex)

For self-play scenarios with TTS, ensure:

  1. The TTS provider API key is set (e.g., OPENAI_API_KEY)
  2. The voice ID is valid for the chosen provider

If turn detection isn’t working:

  • Increase silence_threshold_ms for longer pauses
  • Decrease min_speech_ms if speech is being cut off