Tutorial 6: Duplex Voice Testing

Learn to test bidirectional voice conversations with real-time audio streaming.

What You’ll Learn

Prerequisites

Understanding Duplex Streaming

Traditional audio testing sends entire audio files as blobs. Duplex streaming is different:

AspectTraditionalDuplex
Audio deliveryEntire file at onceStreamed in chunks
Turn detectionManual (per turn)Dynamic (VAD or provider)
Response timingAfter full uploadReal-time as audio streams
Use caseTranscription testingVoice assistants, interviews

Duplex mode enables testing of real-time voice conversations where timing and turn-taking matter.

Step 1: Set Up Your Project

Create a new duplex testing project:

mkdir duplex-test
cd duplex-test
mkdir -p audio prompts providers scenarios

Create the arena configuration:

# arena.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: ArenaConfig
metadata:
  name: duplex-test

spec:
  prompts_dir: ./prompts
  providers_dir: ./providers
  scenarios_dir: ./scenarios
  output_dir: ./out

Step 2: Configure the Gemini Provider

Duplex streaming requires Gemini’s Live API:

# providers/gemini-live.provider.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Provider
metadata:
  name: gemini-live

spec:
  id: gemini-live
  type: gemini
  api_key_env: GEMINI_API_KEY
  model: gemini-2.0-flash-exp

  # Enable streaming with audio output
  streaming:
    enabled: true
    response_modalities:
      - AUDIO
      - TEXT

Set your API key:

export GEMINI_API_KEY="your-api-key"

Step 3: Create a Voice Assistant Prompt

# prompts/voice-assistant.prompt.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Prompt
metadata:
  name: voice-assistant

spec:
  id: voice-assistant
  task_type: voice-assistant

  system_prompt: |
    You are Nova, a friendly voice assistant. Keep responses brief
    and conversational since this is a voice interaction.

    Guidelines:
    - Speak naturally as if in conversation
    - Keep responses under 2-3 sentences
    - Be helpful and warm

Step 4: Prepare Audio Files

You’ll need PCM audio files for testing. Audio requirements:

ParameterValue
FormatRaw PCM (no headers)
Sample Rate16000 Hz
Bit Depth16-bit
ChannelsMono

Convert existing audio files using ffmpeg:

# Convert WAV to PCM
ffmpeg -i input.wav -f s16le -ar 16000 -ac 1 audio/greeting.pcm

# Convert MP3 to PCM
ffmpeg -i input.mp3 -f s16le -ar 16000 -ac 1 audio/question.pcm

Or record directly in the correct format:

# Record 5 seconds of audio (macOS)
rec -r 16000 -b 16 -c 1 -e signed-integer audio/greeting.pcm trim 0 5

Step 5: Create a Basic Duplex Scenario

# scenarios/basic-duplex.scenario.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
  name: basic-duplex

spec:
  id: basic-duplex
  task_type: voice-assistant
  description: "Basic duplex streaming test"

  # Enable duplex mode
  duplex:
    timeout: "30s"
    turn_detection:
      mode: asm  # Provider-native turn detection

  streaming: true

  turns:
    # Turn 1: Greeting
    - role: user
      parts:
        - type: audio
          media:
            file_path: audio/greeting.pcm
            mime_type: audio/L16
      assertions:
        - type: content_matches
          params:
            pattern: "(?i)(hello|hi|hey)"

    # Turn 2: Question
    - role: user
      parts:
        - type: audio
          media:
            file_path: audio/question.pcm
            mime_type: audio/L16
      assertions:
        - type: content_matches
          params:
            pattern: ".{10,}"  # At least 10 chars response

Step 6: Run Your First Duplex Test

promptarena run --scenario basic-duplex --provider gemini-live

You should see real-time streaming output as audio is processed.

Turn Detection Modes

Duplex supports two turn detection modes:

ASM Mode (Provider-Native)

The provider (Gemini) handles turn detection internally:

duplex:
  turn_detection:
    mode: asm

Best for: Simple tests, provider-specific behavior testing.

VAD Mode (Voice Activity Detection)

Client-side VAD with configurable thresholds:

duplex:
  turn_detection:
    mode: vad
    vad:
      silence_threshold_ms: 600   # Silence to end turn
      min_speech_ms: 200          # Minimum speech duration

Best for: Precise control over turn boundaries, testing interruption handling.

Step 7: Add Self-Play with TTS

For fully automated testing, use self-play mode where an LLM generates user responses converted to audio via TTS:

# scenarios/selfplay-duplex.scenario.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
  name: selfplay-duplex

spec:
  id: selfplay-duplex
  task_type: voice-assistant
  description: "Automated voice testing with TTS"

  duplex:
    timeout: "5m"
    turn_detection:
      mode: vad
      vad:
        silence_threshold_ms: 1200  # Longer for TTS pauses
        min_speech_ms: 800
    resilience:
      max_retries: 2
      partial_success_min_turns: 2
      ignore_last_turn_session_end: true

  streaming: true

  turns:
    # Initial audio greeting
    - role: user
      parts:
        - type: audio
          media:
            file_path: audio/greeting.pcm
            mime_type: audio/L16
      assertions:
        - type: content_matches
          params:
            pattern: "(?i)(help|assist)"

    # Self-play: LLM generates questions, TTS converts to audio
    - role: selfplay-user
      persona: curious-customer
      turns: 3  # Generate 3 follow-up turns
      tts:
        provider: openai
        voice: alloy
      assertions:
        - type: content_matches
          params:
            pattern: ".{10,}"

  context:
    goal: "Test multi-turn voice conversation"
    user_type: "potential customer"

Create the persona:

# prompts/personas/curious-customer.persona.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Persona
metadata:
  name: curious-customer

spec:
  id: curious-customer
  description: "A curious customer asking follow-up questions"

  system_prompt: |
    You are a curious customer exploring a product or service.
    Ask natural follow-up questions based on the assistant's responses.
    Keep questions brief and conversational.

Configure the TTS provider:

# providers/openai-tts.provider.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Provider
metadata:
  name: openai-tts

spec:
  id: openai
  type: openai
  api_key_env: OPENAI_API_KEY
  model: gpt-4o-mini

Run the self-play test:

export OPENAI_API_KEY="your-openai-key"
promptarena run --scenario selfplay-duplex --provider gemini-live

Session Resilience Configuration

Voice sessions can be interrupted by network issues or provider limits. Configure resilience:

duplex:
  resilience:
    # Retry failed conversations
    max_retries: 2
    retry_delay_ms: 2000

    # Delay between turns
    inter_turn_delay_ms: 500
    selfplay_inter_turn_delay_ms: 1000

    # Accept partial success
    partial_success_min_turns: 2

    # Don't fail on final turn session end
    ignore_last_turn_session_end: true

Assertions for Voice Testing

Common assertions for duplex tests:

assertions:
  # Content pattern matching
  - type: content_matches
    params:
      pattern: "(?i)(hello|greeting)"

  # Must include certain phrases
  - type: content_includes
    params:
      patterns:
        - "welcome"
        - "help"

  # Response length check
  - type: content_matches
    params:
      pattern: ".{20,}"  # At least 20 characters

  # Sentiment analysis
  - type: sentiment
    params:
      expected: positive

Debugging Duplex Tests

Enable Verbose Logging

promptarena run --scenario basic-duplex --provider gemini-live --verbose

Check Audio Format

Ensure your audio files are correct:

# Check file info
ffprobe audio/greeting.pcm

# Play back (requires sox)
play -r 16000 -b 16 -c 1 -e signed-integer audio/greeting.pcm

Common Issues

IssueSolution
”Session ended early”Increase partial_success_min_turns
”Empty response”Check audio quality, increase silence_threshold_ms
”Turn interrupted”Increase inter_turn_delay_ms
TTS pauses causing issuesIncrease silence_threshold_ms to 1200ms+

Complete Example Project

duplex-test/
├── arena.yaml
├── audio/
│   ├── greeting.pcm
│   └── question.pcm
├── prompts/
│   ├── voice-assistant.prompt.yaml
│   └── personas/
│       └── curious-customer.persona.yaml
├── providers/
│   ├── gemini-live.provider.yaml
│   └── openai-tts.provider.yaml
└── scenarios/
    ├── basic-duplex.scenario.yaml
    └── selfplay-duplex.scenario.yaml

Next Steps

See Also