Skip to content

Duplex Configuration Reference

Complete reference for configuring duplex (bidirectional) streaming scenarios in PromptArena.

Duplex mode enables real-time bidirectional audio streaming for testing voice assistants and conversational AI. When enabled, audio is streamed in chunks and turn boundaries are detected dynamically using either VAD or ASM mode.

Requires: Gemini Live API (provider type: gemini, model: gemini-2.0-flash-exp or similar)


Enable duplex mode by adding the duplex field to your scenario spec:

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: voice-assistant-test
spec:
id: voice-assistant-test
task_type: voice-assistant
streaming: true # Required for duplex
duplex:
timeout: "5m"
turn_detection:
mode: asm
resilience:
max_retries: 2
partial_success_min_turns: 2

The main duplex configuration object.

FieldTypeDefaultDescription
timeoutstring"10m"Maximum session duration (Go duration format)
turn_detectionTurnDetectionConfigmode: asmTurn boundary detection settings
resilienceDuplexResilienceConfigSee belowError handling and retry behavior
duplex:
timeout: "5m30s"
turn_detection:
mode: vad
vad:
silence_threshold_ms: 600
min_speech_ms: 200
resilience:
max_retries: 2
inter_turn_delay_ms: 500

Configures how turn boundaries are detected during duplex streaming.

FieldTypeDefaultDescription
modestring"asm"Detection mode: "vad" or "asm"
vadVADConfig-Voice activity detection settings (when mode is vad)
ModeNameDescription
asmProvider-NativeThe provider (Gemini) handles turn detection internally using its automatic speech detection
vadVoice Activity DetectionClient-side VAD with configurable silence thresholds
duplex:
turn_detection:
mode: asm

Best for: Simple tests, trusting provider behavior, less configuration.

How it works: The Gemini Live API automatically detects when the speaker stops talking and triggers a response.

duplex:
turn_detection:
mode: vad
vad:
silence_threshold_ms: 600
min_speech_ms: 200
max_turn_duration_s: 60

Best for: Precise control over turn boundaries, testing interruption handling, consistent behavior across providers.


Voice Activity Detection configuration (used when turn_detection.mode is "vad").

FieldTypeDefaultDescription
silence_threshold_msint500Silence duration (ms) to trigger turn end
min_speech_msint1000Minimum speech duration before silence counts
max_turn_duration_sint60Force turn end after this duration (seconds)
duplex:
turn_detection:
mode: vad
vad:
silence_threshold_ms: 800 # Longer silence for natural speech pauses
min_speech_ms: 300 # Short utterances still count
max_turn_duration_s: 30 # Limit long turns
Scenariosilence_threshold_msmin_speech_ms
Quick responses400-500150-200
Natural conversation600-800200-300
TTS with pauses1000-1500500-800
Slow/deliberate speech1200-2000800-1000

Error handling and retry behavior for duplex sessions.

FieldTypeDefaultDescription
max_retriesint0Retry attempts for failed turns
retry_delay_msint1000Delay between retries (ms)
inter_turn_delay_msint500Delay between turns (ms)
selfplay_inter_turn_delay_msint1000Delay after self-play turns (ms)
partial_success_min_turnsint1Minimum completed turns for partial success
ignore_last_turn_session_endbooltrueTreat session end on final turn as success
duplex:
resilience:
max_retries: 2
retry_delay_ms: 2000
inter_turn_delay_ms: 500
selfplay_inter_turn_delay_ms: 1500
partial_success_min_turns: 3
ignore_last_turn_session_end: true

When partial_success_min_turns is set, sessions that end unexpectedly after completing at least that many turns are treated as successful:

resilience:
partial_success_min_turns: 2 # Accept if 2+ turns complete

This is useful for exploratory testing where completing all turns isn’t critical.

By default, if the session ends on the final expected turn, it’s treated as success:

resilience:
ignore_last_turn_session_end: true # Default

Set to false if you need the final turn to complete normally without session termination.


Text-to-speech (TTS) configuration for self-play audio generation.

FieldTypeRequiredDescription
providerstringYesTTS provider: "openai", "elevenlabs", "cartesia", "mock"
voicestringYes*Voice ID for synthesis (*optional for mock with audio_files)
audio_files[]stringNoPCM audio files for mock provider (rotated through)
sample_rateintNoOutput sample rate in Hz (default: 24000)
turns:
- role: selfplay-user
persona: curious-customer
turns: 3
tts:
provider: openai
voice: alloy
turns:
- role: selfplay-user
persona: test-persona
turns: 3
tts:
provider: mock
audio_files:
- audio/question1.pcm
- audio/question2.pcm
- audio/question3.pcm
sample_rate: 16000 # Match your file sample rate
VoiceDescription
alloyNeutral, balanced
echoWarm, engaging
fableExpressive, dynamic
onyxDeep, authoritative
novaFriendly, conversational
shimmerClear, professional

In duplex scenarios, user turns contain audio parts instead of text:

turns:
- role: user
parts:
- type: audio
media:
file_path: audio/greeting.pcm
mime_type: audio/L16
ParameterValueDescription
FormatRaw PCMNo headers (not WAV)
Sample Rate16000 HzRequired by Gemini Live API
Bit Depth16-bitSigned integer
ChannelsMonoSingle channel
MIME Typeaudio/L16Linear PCM
Terminal window
# WAV to PCM
ffmpeg -i input.wav -f s16le -ar 16000 -ac 1 output.pcm
# MP3 to PCM
ffmpeg -i input.mp3 -f s16le -ar 16000 -ac 1 output.pcm
# Verify format
ffprobe -show_format -show_streams output.pcm

Duplex requires a Gemini provider with streaming enabled:

# providers/gemini-live.provider.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Provider
metadata:
name: gemini-live
spec:
id: gemini-live
type: gemini
model: gemini-2.0-flash-exp
defaults:
temperature: 0.7
max_tokens: 1000
# Gemini-specific configuration
additional_config:
audio_enabled: true
response_modalities:
- AUDIO # Returns audio + text transcription
ModalityDescription
AUDIOReturns audio response with text transcription
TEXTReturns text-only response (no audio)

Note: Gemini Live API supports only ONE modality at a time. AUDIO mode includes text transcription via outputAudioTranscription.


apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: voice-assistant-comprehensive
spec:
id: voice-assistant-comprehensive
task_type: voice-assistant
description: "Full duplex voice assistant test with self-play"
streaming: true
duplex:
timeout: "5m"
turn_detection:
mode: vad
vad:
silence_threshold_ms: 800
min_speech_ms: 250
max_turn_duration_s: 45
resilience:
max_retries: 2
retry_delay_ms: 2000
inter_turn_delay_ms: 500
selfplay_inter_turn_delay_ms: 1200
partial_success_min_turns: 3
ignore_last_turn_session_end: true
turns:
# Initial audio greeting
- role: user
parts:
- type: audio
media:
file_path: audio/greeting.pcm
mime_type: audio/L16
assertions:
- type: content_matches
params:
pattern: "(?i)(hello|hi|welcome)"
# Self-play generates follow-up questions
- role: selfplay-user
persona: curious-customer
turns: 3
tts:
provider: openai
voice: nova
assertions:
- type: content_matches
params:
pattern: ".{20,}" # At least 20 chars
conversation_assertions:
- type: content_includes_any
params:
patterns:
- "help"
- "assist"
- "support"

Common configuration errors and solutions:

ErrorCauseSolution
invalid duplex timeout formatTimeout not in Go duration formatUse format like "5m", "30s", "1h30m"
invalid turn detection modeMode not vad or asmUse mode: vad or mode: asm
silence_threshold_ms must be non-negativeNegative VAD thresholdUse positive values
tts provider is requiredMissing TTS providerAdd provider: openai or similar
tts voice is requiredMissing voice IDAdd voice: alloy or similar