Tutorial 6: Duplex Voice Testing
Learn to test bidirectional voice conversations with real-time audio streaming.
What You’ll Learn
Section titled “What You’ll Learn”- Understand duplex streaming vs traditional audio testing
- Create duplex test scenarios with audio files
- Configure turn detection modes (VAD vs ASM)
- Use self-play with TTS for automated voice testing
- Handle session resilience and error recovery
Prerequisites
Section titled “Prerequisites”- Completed Tutorial 1: Your First Test
- A Gemini API key (duplex streaming requires Gemini Live API)
- Optional: OpenAI API key (for TTS in self-play mode)
Understanding Duplex Streaming
Section titled “Understanding Duplex Streaming”Traditional audio testing sends entire audio files as blobs. Duplex streaming is different:
| Aspect | Traditional | Duplex |
|---|---|---|
| Audio delivery | Entire file at once | Streamed in chunks |
| Turn detection | Manual (per turn) | Dynamic (VAD or provider) |
| Response timing | After full upload | Real-time as audio streams |
| Use case | Transcription testing | Voice assistants, interviews |
Duplex mode enables testing of real-time voice conversations where timing and turn-taking matter.
Step 1: Set Up Your Project
Section titled “Step 1: Set Up Your Project”Create a new duplex testing project:
mkdir duplex-testcd duplex-testmkdir -p audio prompts providers scenariosCreate the arena configuration:
# arena.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: ArenaConfigmetadata: name: duplex-test
spec: prompts_dir: ./prompts providers_dir: ./providers scenarios_dir: ./scenarios output_dir: ./outStep 2: Configure the Gemini Provider
Section titled “Step 2: Configure the Gemini Provider”Duplex streaming requires Gemini’s Live API:
# providers/gemini-live.provider.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Providermetadata: name: gemini-live
spec: id: gemini-live type: gemini api_key_env: GEMINI_API_KEY model: gemini-2.0-flash-exp
# Enable streaming with audio output streaming: enabled: true response_modalities: - AUDIO - TEXTSet your API key:
export GEMINI_API_KEY="your-api-key"Step 3: Create a Voice Assistant Prompt
Section titled “Step 3: Create a Voice Assistant Prompt”# prompts/voice-assistant.prompt.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Promptmetadata: name: voice-assistant
spec: id: voice-assistant task_type: voice-assistant
system_prompt: | You are Nova, a friendly voice assistant. Keep responses brief and conversational since this is a voice interaction.
Guidelines: - Speak naturally as if in conversation - Keep responses under 2-3 sentences - Be helpful and warmStep 4: Prepare Audio Files
Section titled “Step 4: Prepare Audio Files”You’ll need PCM audio files for testing. Audio requirements:
| Parameter | Value |
|---|---|
| Format | Raw PCM (no headers) |
| Sample Rate | 16000 Hz |
| Bit Depth | 16-bit |
| Channels | Mono |
Convert existing audio files using ffmpeg:
# Convert WAV to PCMffmpeg -i input.wav -f s16le -ar 16000 -ac 1 audio/greeting.pcm
# Convert MP3 to PCMffmpeg -i input.mp3 -f s16le -ar 16000 -ac 1 audio/question.pcmOr record directly in the correct format:
# Record 5 seconds of audio (macOS)rec -r 16000 -b 16 -c 1 -e signed-integer audio/greeting.pcm trim 0 5Step 5: Create a Basic Duplex Scenario
Section titled “Step 5: Create a Basic Duplex Scenario”# scenarios/basic-duplex.scenario.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: basic-duplex
spec: id: basic-duplex task_type: voice-assistant description: "Basic duplex streaming test"
# Enable duplex mode duplex: timeout: "30s" turn_detection: mode: asm # Provider-native turn detection
streaming: true
turns: # Turn 1: Greeting - role: user parts: - type: audio media: file_path: audio/greeting.pcm mime_type: audio/L16 assertions: - type: content_matches params: pattern: "(?i)(hello|hi|hey)"
# Turn 2: Question - role: user parts: - type: audio media: file_path: audio/question.pcm mime_type: audio/L16 assertions: - type: content_matches params: pattern: ".{10,}" # At least 10 chars responseStep 6: Run Your First Duplex Test
Section titled “Step 6: Run Your First Duplex Test”promptarena run --scenario basic-duplex --provider gemini-liveYou should see real-time streaming output as audio is processed.
Turn Detection Modes
Section titled “Turn Detection Modes”Duplex supports two turn detection modes:
ASM Mode (Provider-Native)
Section titled “ASM Mode (Provider-Native)”The provider (Gemini) handles turn detection internally:
duplex: turn_detection: mode: asmBest for: Simple tests, provider-specific behavior testing.
VAD Mode (Voice Activity Detection)
Section titled “VAD Mode (Voice Activity Detection)”Client-side VAD with configurable thresholds:
duplex: turn_detection: mode: vad vad: silence_threshold_ms: 600 # Silence to end turn min_speech_ms: 200 # Minimum speech durationBest for: Precise control over turn boundaries, testing interruption handling.
Step 7: Add Self-Play with TTS
Section titled “Step 7: Add Self-Play with TTS”For fully automated testing, use self-play mode where an LLM generates user responses converted to audio via TTS:
# scenarios/selfplay-duplex.scenario.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: selfplay-duplex
spec: id: selfplay-duplex task_type: voice-assistant description: "Automated voice testing with TTS"
duplex: timeout: "5m" turn_detection: mode: vad vad: silence_threshold_ms: 1200 # Longer for TTS pauses min_speech_ms: 800 resilience: max_retries: 2 partial_success_min_turns: 2 ignore_last_turn_session_end: true
streaming: true
turns: # Initial audio greeting - role: user parts: - type: audio media: file_path: audio/greeting.pcm mime_type: audio/L16 assertions: - type: content_matches params: pattern: "(?i)(help|assist)"
# Self-play: LLM generates questions, TTS converts to audio - role: selfplay-user persona: curious-customer turns: 3 # Generate 3 follow-up turns tts: provider: openai voice: alloy assertions: - type: content_matches params: pattern: ".{10,}"
context: goal: "Test multi-turn voice conversation" user_type: "potential customer"Create the persona:
# prompts/personas/curious-customer.persona.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Personametadata: name: curious-customer
spec: id: curious-customer description: "A curious customer asking follow-up questions"
system_prompt: | You are a curious customer exploring a product or service. Ask natural follow-up questions based on the assistant's responses. Keep questions brief and conversational.Configure the TTS provider:
# providers/openai-tts.provider.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Providermetadata: name: openai-tts
spec: id: openai type: openai api_key_env: OPENAI_API_KEY model: gpt-4o-miniRun the self-play test:
export OPENAI_API_KEY="your-openai-key"promptarena run --scenario selfplay-duplex --provider gemini-liveSession Resilience Configuration
Section titled “Session Resilience Configuration”Voice sessions can be interrupted by network issues or provider limits. Configure resilience:
duplex: resilience: # Retry failed conversations max_retries: 2 retry_delay_ms: 2000
# Delay between turns inter_turn_delay_ms: 500 selfplay_inter_turn_delay_ms: 1000
# Accept partial success partial_success_min_turns: 2
# Don't fail on final turn session end ignore_last_turn_session_end: trueAssertions for Voice Testing
Section titled “Assertions for Voice Testing”Common assertions for duplex tests:
assertions: # Content pattern matching - type: content_matches params: pattern: "(?i)(hello|greeting)"
# Must include certain phrases - type: content_includes params: patterns: - "welcome" - "help"
# Response length check - type: content_matches params: pattern: ".{20,}" # At least 20 characters
# Sentiment analysis - type: sentiment params: expected: positiveDebugging Duplex Tests
Section titled “Debugging Duplex Tests”Enable Verbose Logging
Section titled “Enable Verbose Logging”promptarena run --scenario basic-duplex --provider gemini-live --verboseCheck Audio Format
Section titled “Check Audio Format”Ensure your audio files are correct:
# Check file infoffprobe audio/greeting.pcm
# Play back (requires sox)play -r 16000 -b 16 -c 1 -e signed-integer audio/greeting.pcmCommon Issues
Section titled “Common Issues”| Issue | Solution |
|---|---|
| ”Session ended early” | Increase partial_success_min_turns |
| ”Empty response” | Check audio quality, increase silence_threshold_ms |
| ”Turn interrupted” | Increase inter_turn_delay_ms |
| TTS pauses causing issues | Increase silence_threshold_ms to 1200ms+ |
Complete Example Project
Section titled “Complete Example Project”duplex-test/├── arena.yaml├── audio/│ ├── greeting.pcm│ └── question.pcm├── prompts/│ ├── voice-assistant.prompt.yaml│ └── personas/│ └── curious-customer.persona.yaml├── providers/│ ├── gemini-live.provider.yaml│ └── openai-tts.provider.yaml└── scenarios/ ├── basic-duplex.scenario.yaml └── selfplay-duplex.scenario.yamlNext Steps
Section titled “Next Steps”- Multi-Provider Testing - Test across providers
- CI/CD Integration - Automate voice tests
- Duplex Reference - Full configuration options
See Also
Section titled “See Also”- Arena CLI Reference - Command options
- Assertions Reference - All assertion types
- Validators Reference - Output validation