Skip to content

Duplex Configuration Reference

Complete reference for configuring duplex (bidirectional) streaming scenarios in PromptArena.

Duplex mode enables real-time bidirectional audio streaming for testing voice assistants and conversational AI. When enabled, audio is streamed in chunks and turn boundaries are detected dynamically using either VAD or ASM mode.

Requires: Gemini Live API (provider type: gemini, model: gemini-2.0-flash-exp or similar)


Enable duplex mode by adding the duplex field to your scenario spec:

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: voice-assistant-test
spec:
id: voice-assistant-test
task_type: voice-assistant
streaming: true # Required for duplex
duplex:
timeout: "5m"
turn_detection:
mode: asm
resilience:
max_retries: 2
partial_success_min_turns: 2

The main duplex configuration object.

FieldTypeDefaultDescription
timeoutstring"10m"Maximum session duration (Go duration format)
turn_detectionTurnDetectionConfigmode: asmTurn boundary detection settings
resilienceDuplexResilienceConfigSee belowError handling and retry behavior
duplex:
timeout: "5m30s"
turn_detection:
mode: vad
vad:
silence_threshold_ms: 600
min_speech_ms: 200
resilience:
max_retries: 2
inter_turn_delay_ms: 500

Configures how turn boundaries are detected during duplex streaming.

FieldTypeDefaultDescription
modestring"asm"Detection mode: "vad" or "asm"
vadVADConfig-Voice activity detection settings (when mode is vad)
ModeNameDescription
asmProvider-NativeThe provider (Gemini) handles turn detection internally using its automatic speech detection
vadVoice Activity DetectionClient-side VAD with configurable silence thresholds
duplex:
turn_detection:
mode: asm

Best for: Simple tests, trusting provider behavior, less configuration.

How it works: The Gemini Live API automatically detects when the speaker stops talking and triggers a response.

duplex:
turn_detection:
mode: vad
vad:
silence_threshold_ms: 600
min_speech_ms: 200
max_turn_duration_s: 60

Best for: Precise control over turn boundaries, testing interruption handling, consistent behavior across providers.


Voice Activity Detection configuration (used when turn_detection.mode is "vad").

FieldTypeDefaultDescription
silence_threshold_msint500Silence duration (ms) to trigger turn end
min_speech_msint1000Minimum speech duration before silence counts
max_turn_duration_sint60Force turn end after this duration (seconds)
duplex:
turn_detection:
mode: vad
vad:
silence_threshold_ms: 800 # Longer silence for natural speech pauses
min_speech_ms: 300 # Short utterances still count
max_turn_duration_s: 30 # Limit long turns
Scenariosilence_threshold_msmin_speech_ms
Quick responses400-500150-200
Natural conversation600-800200-300
TTS with pauses1000-1500500-800
Slow/deliberate speech1200-2000800-1000

Error handling and retry behavior for duplex sessions.

FieldTypeDefaultDescription
max_retriesint0Retry attempts for failed turns
retry_delay_msint1000Delay between retries (ms)
inter_turn_delay_msint500Delay between turns (ms)
selfplay_inter_turn_delay_msint1000Delay after self-play turns (ms)
partial_success_min_turnsint1Minimum completed turns for partial success
ignore_last_turn_session_endbooltrueTreat session end on final turn as success
duplex:
resilience:
max_retries: 2
retry_delay_ms: 2000
inter_turn_delay_ms: 500
selfplay_inter_turn_delay_ms: 1500
partial_success_min_turns: 3
ignore_last_turn_session_end: true

When partial_success_min_turns is set, sessions that end unexpectedly after completing at least that many turns are treated as successful:

resilience:
partial_success_min_turns: 2 # Accept if 2+ turns complete

This is useful for exploratory testing where completing all turns isn’t critical.

By default, if the session ends on the final expected turn, it’s treated as success:

resilience:
ignore_last_turn_session_end: true # Default

Set to false if you need the final turn to complete normally without session termination.


Text-to-speech (TTS) for self-play audio generation is configured through the arena voice catalog rather than inline on individual turns. TTS providers are declared in tts_providers: and bound to voice IDs in voices:. Personas and scripted-text scenarios reference those IDs.

# config.arena.yaml
spec:
tts_providers:
- file: providers/openai-alloy.provider.yaml
- file: providers/mock-tts.provider.yaml
voices:
# Real TTS (requires OPENAI_API_KEY). For CI: change provider to mock-tts.
- id: alloy
provider: openai-alloy

Assign a voice ID to a persona. All selfplay turns using that persona will use the corresponding TTS provider.

# personas/curious-customer.persona.yaml
spec:
id: curious-customer
voice: alloy # references the voice catalog id above
system_template: |
You are a curious customer ...

For scripted-text duplex scenarios (turns with content: instead of audio parts:), declare voice: at the scenario level:

spec:
id: my-scripted-scenario
voice: alloy # references the voice catalog id above
turns:
- role: user
content: "Hello, can you hear me?"

A single edit to the voices: block in the arena config switches between real vendor TTS and mock TTS, with no changes required to personas or scenarios:

voices:
# Recording mode (requires API key):
- id: alloy
provider: openai-alloy
# CI / keyless mode — swap to:
# - id: alloy
# provider: mock-tts

In duplex scenarios, user turns contain audio parts instead of text:

turns:
- role: user
parts:
- type: audio
media:
file_path: audio/greeting.pcm
mime_type: audio/L16
ParameterValueDescription
FormatRaw PCMNo headers (not WAV)
Sample Rate16000 HzRequired by Gemini Live API
Bit Depth16-bitSigned integer
ChannelsMonoSingle channel
MIME Typeaudio/L16Linear PCM
Terminal window
# WAV to PCM
ffmpeg -i input.wav -f s16le -ar 16000 -ac 1 output.pcm
# MP3 to PCM
ffmpeg -i input.mp3 -f s16le -ar 16000 -ac 1 output.pcm
# Verify format
ffprobe -show_format -show_streams output.pcm

Duplex requires a Gemini provider with streaming enabled:

# providers/gemini-live.provider.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Provider
metadata:
name: gemini-live
spec:
id: gemini-live
type: gemini
model: gemini-2.0-flash-exp
defaults:
temperature: 0.7
max_tokens: 1000
# Gemini-specific configuration
additional_config:
audio_enabled: true
response_modalities:
- AUDIO # Returns audio + text transcription
ModalityDescription
AUDIOReturns audio response with text transcription
TEXTReturns text-only response (no audio)

Note: Gemini Live API supports only ONE modality at a time. AUDIO mode includes text transcription via outputAudioTranscription.


apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: voice-assistant-comprehensive
spec:
id: voice-assistant-comprehensive
task_type: voice-assistant
description: "Full duplex voice assistant test with self-play"
streaming: true
duplex:
timeout: "5m"
turn_detection:
mode: vad
vad:
silence_threshold_ms: 800
min_speech_ms: 250
max_turn_duration_s: 45
resilience:
max_retries: 2
retry_delay_ms: 2000
inter_turn_delay_ms: 500
selfplay_inter_turn_delay_ms: 1200
partial_success_min_turns: 3
ignore_last_turn_session_end: true
turns:
# Initial audio greeting
- role: user
parts:
- type: audio
media:
file_path: audio/greeting.pcm
mime_type: audio/L16
assertions:
- type: content_matches
params:
pattern: "(?i)(hello|hi|welcome)"
# Self-play generates follow-up questions; voice is resolved from the persona
- role: selfplay-user
persona: curious-customer
turns: 3
assertions:
- type: content_matches
params:
pattern: ".{20,}" # At least 20 chars
conversation_assertions:
- type: content_includes_any
params:
patterns:
- "help"
- "assist"
- "support"

Common configuration errors and solutions:

ErrorCauseSolution
invalid duplex timeout formatTimeout not in Go duration formatUse format like "5m", "30s", "1h30m"
invalid turn detection modeMode not vad or asmUse mode: vad or mode: asm
silence_threshold_ms must be non-negativeNegative VAD thresholdUse positive values
voices[N]: provider "X" not foundVoice references an unknown TTS provider IDCheck that tts_providers: lists the provider file and the id: matches
persona "X": voice "Y" not found in catalogPersona references a voice ID not declared in voices:Add the voice binding to the arena voices: list