Set Up Voice Testing with Self-Play
Configure automated voice testing using self-play mode with TTS for multi-turn conversations.
Prerequisites
Section titled “Prerequisites”- Gemini API key (for duplex streaming)
- OpenAI API key (for TTS, or use mock TTS)
- Audio files in PCM format (16kHz, 16-bit, mono)
Quick Setup
Section titled “Quick Setup”1. Declare TTS Providers and Voices in the Arena Config
Section titled “1. Declare TTS Providers and Voices in the Arena Config”TTS is configured at the arena level. Declare one or more TTS provider files under
tts_providers:, then bind voice IDs in voices:. Personas and scenarios reference
those IDs — a single edit to voices: swaps between a real vendor and mock TTS for CI.
# config.arena.yamlspec: providers: - file: providers/gemini-live.provider.yaml
tts_providers: - file: providers/openai-alloy.provider.yaml # real TTS - file: providers/mock-tts.provider.yaml # for CI
voices: # Real-vendor mode: point to openai-alloy. # CI / keyless mode: change provider to mock-tts. - id: test-voice provider: openai-alloyThe provider files themselves declare the vendor details:
# providers/openai-alloy.provider.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Providermetadata: name: openai-alloyspec: id: openai-alloy type: openai role: tts voice: alloy sample_rate: 240002. Create a Provider Configuration for the Duplex Model
Section titled “2. Create a Provider Configuration for the Duplex Model”# providers/gemini-live.provider.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Providermetadata: name: gemini-livespec: id: gemini-live type: gemini model: gemini-2.0-flash-exp additional_config: audio_enabled: true response_modalities: - AUDIO3. Create a Persona for Self-Play
Section titled “3. Create a Persona for Self-Play”Assign a voice ID from the catalog to the persona. The runtime resolves it to the correct TTS provider at run time.
# prompts/personas/test-user.persona.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Personametadata: name: test-userspec: id: test-user voice: test-voice description: "Curious user asking follow-up questions" system_prompt: | You are testing a voice assistant. Ask natural follow-up questions based on the assistant's responses. Keep questions brief and conversational.4. Create the Self-Play Scenario
Section titled “4. Create the Self-Play Scenario”The scenario references the persona by ID. No inline tts: block is needed — the
voice is resolved through the catalog.
# scenarios/voice-selfplay.scenario.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: voice-selfplayspec: id: voice-selfplay task_type: voice-assistant streaming: true
duplex: timeout: "5m" turn_detection: mode: vad vad: silence_threshold_ms: 1200 # Longer for TTS pauses min_speech_ms: 500 resilience: partial_success_min_turns: 2 ignore_last_turn_session_end: true
turns: # Initial audio turn - role: user parts: - type: audio media: file_path: audio/greeting.pcm mime_type: audio/L16
# Self-play generates follow-up turns; voice is resolved from the persona - role: selfplay-user persona: test-user turns: 35. Run the Test
Section titled “5. Run the Test”export GEMINI_API_KEY="your-key"export OPENAI_API_KEY="your-key"promptarena run --scenario voice-selfplay --provider gemini-liveCI vs Recording Mode
Section titled “CI vs Recording Mode”Because voice IDs are declared in one place (voices: in the arena config), switching
between real TTS and a mock is a single-line change:
voices: # Recording mode (requires OPENAI_API_KEY): - id: test-voice provider: openai-alloy
# CI / keyless mode — swap to: # - id: test-voice # provider: mock-ttsTuning Turn Detection
Section titled “Tuning Turn Detection”If turns are cutting off early or late, adjust VAD settings:
| Issue | Solution |
|---|---|
| Cuts off mid-sentence | Increase silence_threshold_ms to 1500-2000 |
| Long pauses before response | Decrease silence_threshold_ms to 800-1000 |
| Short utterances ignored | Decrease min_speech_ms to 200-300 |
Adding Assertions
Section titled “Adding Assertions”Validate responses with turn-level assertions:
turns: - role: selfplay-user persona: test-user turns: 3 assertions: - type: content_matches params: pattern: ".{20,}" # At least 20 characters - type: content_includes params: patterns: - "help" - "assist"See Also
Section titled “See Also”- Tutorial 6: Duplex Voice Testing - Complete learning path
- Duplex Configuration Reference - All configuration options
- Duplex Architecture - How duplex streaming works