Skip to content

Set Up Voice Testing with Self-Play

Configure automated voice testing using self-play mode with TTS for multi-turn conversations.

Prerequisites

Gemini API key (for duplex streaming)
OpenAI API key (for TTS, or use mock TTS)
Audio files in PCM format (16kHz, 16-bit, mono)

Quick Setup

1. Create the Provider Configuration

# providers/gemini-live.provider.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Provider
metadata:
  name: gemini-live
spec:
  id: gemini-live
  type: gemini
  model: gemini-2.0-flash-exp
  additional_config:
    audio_enabled: true
    response_modalities:
      - AUDIO

2. Create a Persona for Self-Play

# prompts/personas/test-user.persona.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Persona
metadata:
  name: test-user
spec:
  id: test-user
  description: "Curious user asking follow-up questions"
  system_prompt: |
    You are testing a voice assistant. Ask natural follow-up
    questions based on the assistant's responses. Keep questions
    brief and conversational.

3. Create the Self-Play Scenario

# scenarios/voice-selfplay.scenario.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
  name: voice-selfplay
spec:
  id: voice-selfplay
  task_type: voice-assistant
  streaming: true

  duplex:
    timeout: "5m"
    turn_detection:
      mode: vad
      vad:
        silence_threshold_ms: 1200  # Longer for TTS pauses
        min_speech_ms: 500
    resilience:
      partial_success_min_turns: 2
      ignore_last_turn_session_end: true

  turns:
    # Initial audio turn
    - role: user
      parts:
        - type: audio
          media:
            file_path: audio/greeting.pcm
            mime_type: audio/L16

    # Self-play generates follow-up turns
    - role: selfplay-user
      persona: test-user
      turns: 3
      tts:
        provider: openai
        voice: alloy

4. Run the Test

export GEMINI_API_KEY="your-key"
export OPENAI_API_KEY="your-key"
promptarena run --scenario voice-selfplay --provider gemini-live

Using Mock TTS

For faster testing without OpenAI costs, use pre-recorded audio:

turns:
  - role: selfplay-user
    persona: test-user
    turns: 3
    tts:
      provider: mock
      audio_files:
        - audio/question1.pcm
        - audio/question2.pcm
        - audio/question3.pcm
      sample_rate: 16000

Tuning Turn Detection

If turns are cutting off early or late, adjust VAD settings:

Issue	Solution
Cuts off mid-sentence	Increase `silence_threshold_ms` to 1500-2000
Long pauses before response	Decrease `silence_threshold_ms` to 800-1000
Short utterances ignored	Decrease `min_speech_ms` to 200-300

Adding Assertions

Validate responses with turn-level assertions:

turns:
  - role: selfplay-user
    persona: test-user
    turns: 3
    tts:
      provider: openai
      voice: alloy
    assertions:
      - type: content_matches
        params:
          pattern: ".{20,}"  # At least 20 characters
      - type: content_includes
        params:
          patterns:
            - "help"
            - "assist"

See Also

Tutorial 6: Duplex Voice Testing - Complete learning path
Duplex Configuration Reference - All configuration options
Duplex Architecture - How duplex streaming works