Set Up Voice Testing with Self-Play
Configure automated voice testing using self-play mode with TTS for multi-turn conversations.
Prerequisites
Section titled “Prerequisites”- Gemini API key (for duplex streaming)
- OpenAI API key (for TTS, or use mock TTS)
- Audio files in PCM format (16kHz, 16-bit, mono)
Quick Setup
Section titled “Quick Setup”1. Create the Provider Configuration
Section titled “1. Create the Provider Configuration”# providers/gemini-live.provider.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Providermetadata: name: gemini-livespec: id: gemini-live type: gemini model: gemini-2.0-flash-exp additional_config: audio_enabled: true response_modalities: - AUDIO2. Create a Persona for Self-Play
Section titled “2. Create a Persona for Self-Play”# prompts/personas/test-user.persona.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Personametadata: name: test-userspec: id: test-user description: "Curious user asking follow-up questions" system_prompt: | You are testing a voice assistant. Ask natural follow-up questions based on the assistant's responses. Keep questions brief and conversational.3. Create the Self-Play Scenario
Section titled “3. Create the Self-Play Scenario”# scenarios/voice-selfplay.scenario.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: voice-selfplayspec: id: voice-selfplay task_type: voice-assistant streaming: true
duplex: timeout: "5m" turn_detection: mode: vad vad: silence_threshold_ms: 1200 # Longer for TTS pauses min_speech_ms: 500 resilience: partial_success_min_turns: 2 ignore_last_turn_session_end: true
turns: # Initial audio turn - role: user parts: - type: audio media: file_path: audio/greeting.pcm mime_type: audio/L16
# Self-play generates follow-up turns - role: selfplay-user persona: test-user turns: 3 tts: provider: openai voice: alloy4. Run the Test
Section titled “4. Run the Test”export GEMINI_API_KEY="your-key"export OPENAI_API_KEY="your-key"promptarena run --scenario voice-selfplay --provider gemini-liveUsing Mock TTS
Section titled “Using Mock TTS”For faster testing without OpenAI costs, use pre-recorded audio:
turns: - role: selfplay-user persona: test-user turns: 3 tts: provider: mock audio_files: - audio/question1.pcm - audio/question2.pcm - audio/question3.pcm sample_rate: 16000Tuning Turn Detection
Section titled “Tuning Turn Detection”If turns are cutting off early or late, adjust VAD settings:
| Issue | Solution |
|---|---|
| Cuts off mid-sentence | Increase silence_threshold_ms to 1500-2000 |
| Long pauses before response | Decrease silence_threshold_ms to 800-1000 |
| Short utterances ignored | Decrease min_speech_ms to 200-300 |
Adding Assertions
Section titled “Adding Assertions”Validate responses with turn-level assertions:
turns: - role: selfplay-user persona: test-user turns: 3 tts: provider: openai voice: alloy assertions: - type: content_matches params: pattern: ".{20,}" # At least 20 characters - type: content_includes params: patterns: - "help" - "assist"See Also
Section titled “See Also”- Tutorial 6: Duplex Voice Testing - Complete learning path
- Duplex Configuration Reference - All configuration options
- Duplex Architecture - How duplex streaming works