Skip to content

Set Up Voice Testing with Self-Play

Configure automated voice testing using self-play mode with TTS for multi-turn conversations.

  • Gemini API key (for duplex streaming)
  • OpenAI API key (for TTS, or use mock TTS)
  • Audio files in PCM format (16kHz, 16-bit, mono)

1. Declare TTS Providers and Voices in the Arena Config

Section titled “1. Declare TTS Providers and Voices in the Arena Config”

TTS is configured at the arena level. Declare one or more TTS provider files under tts_providers:, then bind voice IDs in voices:. Personas and scenarios reference those IDs — a single edit to voices: swaps between a real vendor and mock TTS for CI.

# config.arena.yaml
spec:
providers:
- file: providers/gemini-live.provider.yaml
tts_providers:
- file: providers/openai-alloy.provider.yaml # real TTS
- file: providers/mock-tts.provider.yaml # for CI
voices:
# Real-vendor mode: point to openai-alloy.
# CI / keyless mode: change provider to mock-tts.
- id: test-voice
provider: openai-alloy

The provider files themselves declare the vendor details:

# providers/openai-alloy.provider.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Provider
metadata:
name: openai-alloy
spec:
id: openai-alloy
type: openai
role: tts
voice: alloy
sample_rate: 24000

2. Create a Provider Configuration for the Duplex Model

Section titled “2. Create a Provider Configuration for the Duplex Model”
# providers/gemini-live.provider.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Provider
metadata:
name: gemini-live
spec:
id: gemini-live
type: gemini
model: gemini-2.0-flash-exp
additional_config:
audio_enabled: true
response_modalities:
- AUDIO

Assign a voice ID from the catalog to the persona. The runtime resolves it to the correct TTS provider at run time.

# prompts/personas/test-user.persona.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Persona
metadata:
name: test-user
spec:
id: test-user
voice: test-voice
description: "Curious user asking follow-up questions"
system_prompt: |
You are testing a voice assistant. Ask natural follow-up
questions based on the assistant's responses. Keep questions
brief and conversational.

The scenario references the persona by ID. No inline tts: block is needed — the voice is resolved through the catalog.

# scenarios/voice-selfplay.scenario.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: voice-selfplay
spec:
id: voice-selfplay
task_type: voice-assistant
streaming: true
duplex:
timeout: "5m"
turn_detection:
mode: vad
vad:
silence_threshold_ms: 1200 # Longer for TTS pauses
min_speech_ms: 500
resilience:
partial_success_min_turns: 2
ignore_last_turn_session_end: true
turns:
# Initial audio turn
- role: user
parts:
- type: audio
media:
file_path: audio/greeting.pcm
mime_type: audio/L16
# Self-play generates follow-up turns; voice is resolved from the persona
- role: selfplay-user
persona: test-user
turns: 3
Terminal window
export GEMINI_API_KEY="your-key"
export OPENAI_API_KEY="your-key"
promptarena run --scenario voice-selfplay --provider gemini-live

Because voice IDs are declared in one place (voices: in the arena config), switching between real TTS and a mock is a single-line change:

voices:
# Recording mode (requires OPENAI_API_KEY):
- id: test-voice
provider: openai-alloy
# CI / keyless mode — swap to:
# - id: test-voice
# provider: mock-tts

If turns are cutting off early or late, adjust VAD settings:

IssueSolution
Cuts off mid-sentenceIncrease silence_threshold_ms to 1500-2000
Long pauses before responseDecrease silence_threshold_ms to 800-1000
Short utterances ignoredDecrease min_speech_ms to 200-300

Validate responses with turn-level assertions:

turns:
- role: selfplay-user
persona: test-user
turns: 3
assertions:
- type: content_matches
params:
pattern: ".{20,}" # At least 20 characters
- type: content_includes
params:
patterns:
- "help"
- "assist"