PromptArena CLI Reference

Complete command-line interface reference for PromptArena, the LLM testing framework.

Overview

PromptArena (promptarena) is a CLI tool for running multi-turn conversation simulations across multiple LLM providers, validating conversation flows, and generating comprehensive test reports.

promptarena [command] [flags]

Commands

CommandDescription
initInitialize a new Arena test project from template (built-in or remote)
runRun conversation simulations (main command)
mocksGenerate mock provider responses from Arena JSON results
config-inspectInspect and validate configuration
debugDebug configuration and prompt loading
prompt-debugDebug and test prompt generation
renderGenerate HTML report from existing results
completionGenerate shell autocompletion script
helpHelp about any command

Global Flags

-h, --help         help for promptarena

promptarena init

Initialize a new PromptArena test project from a built-in template.

Usage

promprarena init [directory] [flags]

Flags

FlagTypeDefaultDescription
--quickboolfalseSkip interactive prompts, use defaults
--providerstring-Provider to configure (mock, openai, claude, gemini)
--templatestringquick-startTemplate to use for initialization
--list-templatesboolfalseList all available built-in templates
--var[]string-Set template variables (key=value)
--template-indexstringcommunityTemplate repo name or index URL/path for remote templates
--repo-configstringuser configTemplate repo config file
--template-cachestringtemp dirCache directory for remote templates

Built-In Templates

PromptArena includes 6 built-in templates:

TemplateFiles GeneratedDescription
basic-chatbot6 filesSimple conversational testing setup
customer-support10 filesSupport agent with KB search and order status tools
code-assistant9 filesCode generation and review with separate prompts
content-generation9 filesCreative content for blogs, products, social media
multimodal7 filesImage analysis and vision testing
mcp-integration7 filesMCP filesystem server integration

Examples

List Available Templates

# See all built-in templates
promprarena init --list-templates

# List remote templates (from the default community repo)
promptarena templates list

# List remote templates from a named repo
promptarena templates repo add --name internal --url https://example.com/index.yaml
promptarena templates list --index internal

# List using repo/template shorthand
promptarena templates list --index community

Quick Start

# Create project with defaults (basic-chatbot template)
promprarena init my-test --quick

# With specific provider
promprarena init my-test --quick --provider openai

# With specific template
promprarena init my-test --quick --template customer-support --provider openai

# Render a remote template explicitly
promptarena templates fetch --template community/basic-chatbot --version 1.0.0
promptarena templates render --template community/basic-chatbot --version 1.0.0 --out ./out

Interactive Mode

# Interactive prompts guide you through setup
promprarena init my-project

Template Variables

# Override template variables
promprarena init my-test --quick --provider openai \
  --var project_name="My Custom Project" \
  --var description="Custom description" \
  --var temperature=0.8

What Gets Created

Depending on the template, init creates:

Template Comparison

basic-chatbot (6 files):

customer-support (10 files):

code-assistant (9 files):

content-generation (9 files):

multimodal (7 files):

mcp-integration (7 files):

After Initialization

# Navigate to project
cd my-test

# Add your API key to .env
echo "OPENAI_API_KEY=sk-..." >> .env

# Run tests
promprarena run

# View results
open out/report.html

promptarena mocks generate

Generate mock provider YAML from recorded Arena JSON results so you can replay conversations without calling real LLMs.

Usage

promptarena mocks generate [flags]

Flags

FlagTypeDefaultDescription
--input, -istringoutArena JSON result file or directory containing *.json runs
--output, -ostringproviders/mock-generated.yamlOutput file path or directory (when --per-scenario is set)
--per-scenarioboolfalseWrite one YAML file per scenario (in --output directory)
--mergeboolfalseMerge with existing mock file(s) instead of overwriting
--scenario[]string-Only include specified scenario IDs
--provider[]string-Only include specified provider IDs
--dry-runboolfalsePrint generated YAML instead of writing files
--default-responsestring-Set defaultResponse when not present

Examples

Generate a consolidated mock file from the latest runs:

promptarena mocks generate \
  --input out \
  --scenario hardware-faults \
  --provider openai-gpt4o \
  --output providers/mock-generated.yaml \
  --merge

Write one file per scenario:

promptarena mocks generate \
  --input out \
  --per-scenario \
  --output providers/responses \
  --merge

Preview without writing:

promptarena mocks generate --input out --dry-run

promptarena run

Run multi-turn conversation simulations across multiple LLM providers.

Usage

promptarena run [flags]

Flags

Configuration

FlagTypeDefaultDescription
-c, --configstringarena.yamlConfiguration file path

Execution Control

FlagTypeDefaultDescription
-j, --concurrencyint6Number of concurrent workers
-s, --seedint42Random seed for reproducibility
--ciboolfalseCI mode (headless, minimal output)

Filtering

FlagTypeDefaultDescription
--provider[]stringallProviders to use (comma-separated)
--scenario[]stringallScenarios to run (comma-separated)
--region[]stringallRegions to run (comma-separated)
--roles[]stringallSelf-play role configurations to use

Parameter Overrides

FlagTypeDefaultDescription
--temperaturefloat320.6Override temperature for all scenarios
--max-tokensint-Override max tokens for all scenarios

Self-Play Mode

FlagTypeDefaultDescription
--selfplayboolfalseEnable self-play mode

Mock Testing

FlagTypeDefaultDescription
--mock-providerboolfalseReplace all providers with MockProvider
--mock-configstring-Path to mock provider configuration (YAML)

Output Configuration

FlagTypeDefaultDescription
-o, --outstringoutOutput directory
--format[]stringfrom configOutput formats: json, junit, html, markdown
--formats[]stringfrom configAlias for —format

Legacy Output Flags (Deprecated)

FlagTypeDefaultDescription
--htmlboolfalseGenerate HTML report (use —format html instead)
--html-filestringout/report-[timestamp].htmlHTML report output file
--junit-filestringout/junit.xmlJUnit XML output file
--markdown-filestringout/results.mdMarkdown report output file

Debugging

FlagTypeDefaultDescription
-v, --verboseboolfalseEnable verbose debug logging for API calls

Examples

Basic Run

# Run all tests with default configuration
promptarena run

# Specify configuration file
promptarena run --config my-arena.yaml

Filter Execution

# Run specific providers only
promptarena run --provider openai,claude

# Run specific scenarios
promptarena run --scenario basic-qa,edge-cases

# Combine filters
promptarena run --provider openai --scenario customer-support

Control Parallelism

# Run with 3 concurrent workers
promptarena run --concurrency 3

# Sequential execution (no parallelism)
promptarena run --concurrency 1

Override Parameters

# Override temperature for all tests
promptarena run --temperature 0.8

# Override max tokens
promptarena run --max-tokens 500

# Combined overrides
promptarena run --temperature 0.9 --max-tokens 1000

Output Formats

# Generate JSON and HTML reports
promptarena run --format json,html

# Generate all available formats
promptarena run --format json,junit,html,markdown

# Custom output directory
promptarena run --out test-results-2024-01-15

# Specify custom HTML filename (legacy)
promptarena run --html --html-file custom-report.html

Mock Testing

# Use mock provider instead of real APIs (fast, no cost)
promptarena run --mock-provider

# Use custom mock configuration
promptarena run --mock-config mock-responses.yaml

Self-Play Mode

# Enable self-play testing
promptarena run --selfplay

# Self-play with specific roles
promptarena run --selfplay --roles frustrated-customer,tech-support

CI/CD Mode

# Headless mode for CI pipelines
promptarena run --ci --format junit,json

# With specific quality gates
promptarena run --ci --concurrency 3 --format junit

Debugging

# Verbose output for troubleshooting
promptarena run --verbose

# Verbose with specific scenario
promptarena run --verbose --scenario failing-test

Reproducible Tests

# Use specific seed for reproducibility
promptarena run --seed 12345

# Same seed across runs produces same results
promptarena run --seed 12345 --provider openai

promptarena config-inspect

Inspect and validate arena configuration, showing all loaded resources and validating cross-references. This command provides a rich, styled display of your configuration with validation results.

Usage

promptarena config-inspect [flags]

Flags

FlagTypeDefaultDescription
-c, --configstringarena.yamlConfiguration file path
--formatstringtextOutput format: text, json
-s, --shortboolfalseShow only validation results (shortcut for --section validation)
--sectionstring-Focus on specific section: prompts, providers, scenarios, tools, selfplay, judges, defaults, validation
--verboseboolfalseShow detailed information including file contents
--statsboolfalseShow cache statistics

Examples

# Inspect default configuration
promptarena config-inspect

# Inspect specific config file
promptarena config-inspect --config staging-arena.yaml

# Verbose output with full details
promptarena config-inspect --verbose

# Quick validation check only
promptarena config-inspect --short
# or
promptarena config-inspect -s

# Focus on specific section
promptarena config-inspect --section providers
promptarena config-inspect --section selfplay
promptarena config-inspect --section validation

# JSON output for programmatic use
promptarena config-inspect --format json

# Show cache statistics
promptarena config-inspect --stats

Sections

The --section flag allows focusing on specific parts of the configuration:

SectionDescription
promptsPrompt configurations with task types, variables, validators
providersProvider details organized by group (default, judge, selfplay)
scenariosScenario details with turn counts and assertion summaries
toolsTool definitions with modes, parameters, timeouts
selfplaySelf-play configuration including personas and roles
judgesJudge configurations for LLM-as-judge validators
defaultsDefault settings (temperature, max tokens, concurrency)
validationValidation results and connectivity checks

Output

The command displays styled boxes with:

Example Output:

✨ PromptArena Configuration Inspector ✨

╭──────────────────────────────────────────────────────────────────────────────╮
│ Configuration: arena.yaml                                                    │
╰──────────────────────────────────────────────────────────────────────────────╯

  📋 Prompt Configs (2)

╭──────────────────────────────────────────────────────────────────────────────╮
│ troubleshooter-v2                                                            │
│   Task Type: troubleshooting                                                 │
│   File: prompts/troubleshooter-v2.prompt.yaml                                │
╰──────────────────────────────────────────────────────────────────────────────╯

  🔌 Providers (3)

╭──────────────────────────────────────────────────────────────────────────────╮
│ [default]                                                                    │
│   openai-gpt4o: gpt-4o (temp: 0.70, max: 1000)                               │
│                                                                              │
│ [judge]                                                                      │
│   judge-provider: gpt-4o-mini (temp: 0.00, max: 500)                         │
│                                                                              │
│ [selfplay]                                                                   │
│   mock-selfplay: mock-model (temp: 0.80, max: 1000)                          │
╰──────────────────────────────────────────────────────────────────────────────╯

  🎭 Self-Play (2 personas, 2 roles)

Personas:
╭──────────────────────────────────────────────────────────────────────────────╮
│ red-team-attacker                                                            │
│ plant-operator                                                               │
╰──────────────────────────────────────────────────────────────────────────────╯
Roles:
╭──────────────────────────────────────────────────────────────────────────────╮
│ attacker (red-team-attacker) → openai-gpt4o                                  │
│ operator (plant-operator) → openai-gpt4o                                     │
╰──────────────────────────────────────────────────────────────────────────────╯

  ✅ Validation

╭──────────────────────────────────────────────────────────────────────────────╮
│ ✓ Configuration is valid                                                     │
│                                                                              │
│ Connectivity Checks:                                                         │
│   ☑ Tools are used by prompts                                                │
│   ☑ Unique task types per prompt                                             │
│   ☑ Scenario task types exist                                                │
│   ☑ Allowed tools are defined                                                │
│   ☑ Self-play roles have valid providers                                     │
╰──────────────────────────────────────────────────────────────────────────────╯

promptarena debug

Debug command shows loaded configuration, prompt packs, scenarios, and providers to help troubleshoot configuration issues.

Usage

promptarena debug [flags]

Flags

FlagTypeDefaultDescription
-c, --configstringarena.yamlConfiguration file path

Examples

# Debug default configuration
promptarena debug

# Debug specific config
promptarena debug --config test-arena.yaml

Use Cases


promptarena prompt-debug

Test prompt generation with specific regions, task types, and contexts. Useful for validating prompt assembly before running full tests.

Usage

promptarena prompt-debug [flags]

Flags

FlagTypeDefaultDescription
-c, --configstringarena.yamlConfiguration file path
-t, --task-typestring-Task type for prompt generation
-r, --regionstring-Region for prompt generation
--personastring-Persona ID to test
--scenariostring-Scenario file path to load task_type and context
--contextstring-Context slot content
--userstring-User context (e.g., “iOS developer”)
--domainstring-Domain hint (e.g., “mobile development”)
-l, --listboolfalseList available regions and task types
-j, --jsonboolfalseOutput as JSON
-p, --show-promptbooltrueShow the full assembled prompt
-m, --show-metabooltrueShow metadata and configuration info
-s, --show-statsbooltrueShow statistics (length, tokens, etc.)
-v, --verboseboolfalseVerbose output with debug info

Examples

# List available configurations
promptarena prompt-debug --list

# Test prompt generation for task type
promptarena prompt-debug --task-type support

# Test with region
promptarena prompt-debug --task-type support --region us

# Test with persona
promptarena prompt-debug --persona us-hustler-v1

# Test with scenario file
promptarena prompt-debug --scenario scenarios/customer-support.yaml

# Test with custom context
promptarena prompt-debug --task-type support --context "urgent billing issue"

# JSON output for parsing
promptarena prompt-debug --task-type support --json

# Minimal output (just the prompt)
promptarena prompt-debug --task-type support --show-meta=false --show-stats=false

Output

The command shows:

Example Output:

=== Prompt Debug ===

Task Type: support
Region: us
Persona: default

--- System Prompt ---
You are a helpful customer support agent for TechCo.

Your role:
- Answer product questions
- Help track orders
- Process returns and refunds
...

--- Statistics ---
Characters: 1,234
Estimated Tokens: 308
Lines: 42

--- Metadata ---
Prompt Config: support
Version: v1.0.0
Validators: 3

promptarena render

Generate an HTML report from existing test results.

Usage

promptarena render [index.json path] [flags]

Flags

FlagTypeDefaultDescription
-o, --outputstringreport-[timestamp].htmlOutput HTML file path

Examples

# Render from default location
promptarena render out/index.json

# Custom output path
promptarena render out/index.json --output custom-report.html

# Render from archived results
promptarena render archive/2024-01-15/index.json --output reports/jan-15-report.html

Use Cases


promptarena completion

Generate shell autocompletion script for bash, zsh, fish, or PowerShell.

Usage

promptarena completion [bash|zsh|fish|powershell]

Examples

# Bash
promptarena completion bash > /etc/bash_completion.d/promptarena

# Zsh
promptarena completion zsh > "${fpath[1]}/_promptarena"

# Fish
promptarena completion fish > ~/.config/fish/completions/promptarena.fish

# PowerShell
promptarena completion powershell > promptarena.ps1

Environment Variables

PromptArena respects the following environment variables:

VariableDescription
OPENAI_API_KEYOpenAI API authentication
ANTHROPIC_API_KEYAnthropic API authentication
GOOGLE_API_KEYGoogle AI API authentication
PROMPTARENA_CONFIGDefault configuration file (overrides config.arena.yaml)
PROMPTARENA_OUTPUTDefault output directory (overrides out)

Example

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export PROMPTARENA_CONFIG="staging-arena.yaml"
export PROMPTARENA_OUTPUT="test-results"

promptarena run

Exit Codes

CodeMeaning
0Success - all tests passed
1Failure - one or more tests failed or error occurred

Check exit code in scripts:

if promptarena run --ci; then
  echo "✅ Tests passed"
else
  echo "❌ Tests failed"
  exit 1
fi

Common Workflows

Local Development

# Quick test with mock providers
promptarena run --mock-provider

# Test specific feature
promptarena run --scenario new-feature --verbose

# Inspect configuration
promptarena config-inspect --verbose

CI/CD Pipeline

# Run in headless CI mode
promptarena run --ci --format junit,json

# Check specific providers
promptarena run --ci --provider openai,claude --format junit

Debugging

# Validate configuration
promptarena config-inspect

# Debug prompt assembly
promptarena prompt-debug --task-type support --verbose

# Run with verbose logging
promptarena run --verbose --scenario failing-test

# Check configuration loading
promptarena debug

Report Generation

# Run tests
promptarena run --format json

# Later, generate HTML from results
promptarena render out/index.json --output reports/latest.html

Multi-Provider Comparison

# Test all providers
promptarena run --format html,json

# Test specific providers
promptarena run --provider openai,claude,gemini --format html

Configuration File

PromptArena uses a YAML configuration file (default: config.arena.yaml). See the Configuration Reference for complete documentation.

Basic Structure

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Arena
metadata:
  name: my-arena
spec:
  prompt_configs:
    - id: assistant
      file: prompts/assistant.yaml

  providers:
    - file: providers/openai.yaml

  scenarios:
    - file: scenarios/test.yaml

  defaults:
    output:
      dir: out
      formats: ["json", "html"]

Multimodal Content & Media Rendering

PromptArena supports multimodal content (images, audio, video) in test scenarios with comprehensive media rendering in all output formats.

Media Content in Scenarios

Test scenarios can include multimodal content using the parts array:

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
  name: image-analysis
spec:
  turns:
    - role: user
      parts:
        - type: text
          patterns: ["What's in this image?"]
        - type: image
          media:
            file_path: test-data/sample.jpg
            detail: high

Supported Media Types

Media Sources

Media can be loaded from three sources:

1. Local Files

- type: image
  media:
    file_path: images/diagram.png
    detail: high

2. URLs (fetched during test execution)

- type: image
  media:
    url: https://example.com/photo.jpg
    detail: auto

3. Inline Base64 Data

- type: image
  media:
    data: "iVBORw0KGgoAAAANSUhEUgAAAAUA..."
    mime_type: image/png
    detail: low

Media Rendering in Reports

All output formats include media statistics and rendering:

HTML Reports

HTML reports include:

Media Summary Dashboard

Media Badges

🖼️ x3  🎵 x2  ✅ 5  ❌ 0  💾 1.2 MB

Media Items Display

Example HTML Output:

<div class="media-summary">
  <div class="stat-card">
    <div class="stat-value">5</div>
    <div class="stat-label">🖼️ Images</div>
  </div>
  <div class="stat-card">
    <div class="stat-value">3</div>
    <div class="stat-label">🎵 Audio</div>
  </div>
  <!-- ... -->
</div>

JUnit XML Reports

JUnit XML includes media metadata as test suite properties:

<testsuite name="image-analysis" tests="1">
  <properties>
    <property name="media.images.total" value="5"/>
    <property name="media.audio.total" value="3"/>
    <property name="media.video.total" value="0"/>
    <property name="media.loaded.success" value="8"/>
    <property name="media.loaded.errors" value="0"/>
    <property name="media.size.total_bytes" value="1245678"/>
  </properties>
  <testcase name="test-001" classname="image-analysis" time="2.34"/>
</testsuite>

Property Naming Convention:

These properties are useful for:

Markdown Reports

Markdown reports include a media statistics table in the overview section:

## 📊 Overview

| Metric | Value |
|--------|-------|
| Tests Run | 6 |
| Passed | 5 ✅ |
| Failed | 1 ❌ |
| Success Rate | 83.3% |
| Total Cost | $0.0245 |
| Total Duration | 12.5s |

### 🎨 Media Content

| Type | Count |
|------|-------|
| 🖼️  Images | 5 |
| 🎵 Audio Files | 3 |
| 🎬 Videos | 0 |
| ✅ Loaded | 8 |
| ❌ Errors | 0 |
| 💾 Total Size | 1.2 MB |

Media Loading Options

Control how media is loaded and processed:

HTTP Media Loader

For URL-based media, configure the HTTP loader:

spec:
  defaults:
    media:
      http:
        timeout: 30s
        max_file_size: 50MB

Local File Paths

Relative paths are resolved from the configuration file directory:

# If arena.yaml is in /project/tests/
# This resolves to /project/tests/images/sample.jpg
- type: image
  media:
    file_path: images/sample.jpg

Media Validation

PromptArena validates media content:

Path Security

File Validation

Error Handling

Examples

Testing Image Analysis

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
  name: product-image-analysis
spec:
  task_type: vision
  turns:
    - role: user
      parts:
        - type: text
          patterns: ["Analyze this product image for defects"]
        - type: image
          media:
            file_path: test-data/product-123.jpg
            detail: high
  assertions:
    - type: content_includes
      patterns: ["quality", "inspection"]

Testing Audio Transcription

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
  name: audio-transcription
spec:
  task_type: transcription
  turns:
    - role: user
      parts:
        - type: text
          patterns: ["Transcribe this audio"]
        - type: audio
          media:
            file_path: test-data/meeting-recording.mp3
  assertions:
    - type: content_includes
      patterns: ["meeting", "agenda"]

Mixed Multimodal Content

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
  name: multimodal-analysis
spec:
  turns:
    - role: user
      parts:
        - type: text
          patterns: ["Compare these media files"]
        - type: image
          media:
            file_path: charts/q1-results.png
        - type: image
          media:
            file_path: charts/q2-results.png
        - type: audio
          media:
            file_path: presentations/summary.mp3

Generate Media-Rich Reports

# Run multimodal tests with all formats
promptarena run --format html,junit,markdown

# HTML report includes interactive media dashboard
open out/report.html

# JUnit XML includes media metrics for CI
cat out/junit.xml | grep "media\."

# Markdown shows media statistics
cat out/results.md

Media Statistics in CI/CD

Extract media metrics from JUnit XML:

# Count total images tested
xmllint --xpath "//property[@name='media.images.total']/@value" out/junit.xml

# Check for media load errors
xmllint --xpath "//property[@name='media.loaded.errors']/@value" out/junit.xml

Best Practices

File Organization

project/
├── arena.yaml
├── test-data/
│   ├── images/
│   │   ├── valid/
│   │   └── invalid/
│   ├── audio/
│   └── video/
└── scenarios/
    └── multimodal-tests.yaml

Size Limits

URL Loading

Assertions


Media Assertions (Phase 1)

Arena provides six specialized media validators to test media content in LLM responses. These assertions validate format, dimensions, duration, and resolution of images, audio, and video outputs.

Image Assertions

image_format

Validates that images in assistant responses match allowed formats.

Parameters:

Example:

turns:
  - role: user
    parts:
      - type: text
        patterns: ["Generate a PNG image of a sunset"]
    
    assertions:
      - type: image_format
        params:
          formats:
            - png

Use Cases:

image_dimensions

Validates image dimensions (width and height) in assistant responses.

Parameters:

Example:

turns:
  - role: user
    parts:
      - type: text
        patterns: ["Create a 1920x1080 wallpaper"]
    
    assertions:
      # Exact dimensions
      - type: image_dimensions
        params:
          width: 1920
          height: 1080
turns:
  - role: user
    parts:
      - type: text
        patterns: ["Generate a thumbnail"]
    
    assertions:
      # Size range
      - type: image_dimensions
        params:
          min_width: 100
          max_width: 400
          min_height: 100
          max_height: 400

Use Cases:

Audio Assertions

audio_format

Validates audio format in assistant responses.

Parameters:

Example:

turns:
  - role: user
    parts:
      - type: text
        patterns: ["Generate an audio clip"]
    
    assertions:
      - type: audio_format
        params:
          formats:
            - mp3
            - wav

Use Cases:

audio_duration

Validates audio duration in assistant responses.

Parameters:

Example:

turns:
  - role: user
    parts:
      - type: text
        patterns: ["Create a 30-second audio clip"]
    
    assertions:
      - type: audio_duration
        params:
          min_seconds: 29
          max_seconds: 31
turns:
  - role: user
    parts:
      - type: text
        patterns: ["Generate a brief notification sound"]
    
    assertions:
      - type: audio_duration
        params:
          max_seconds: 5

Use Cases:

Video Assertions

video_resolution

Validates video resolution in assistant responses.

Parameters:

Supported Presets:

Example with Presets:

turns:
  - role: user
    parts:
      - type: text
        patterns: ["Generate a 1080p video"]
    
    assertions:
      - type: video_resolution
        params:
          presets:
            - 1080p
            - fhd

Example with Dimensions:

turns:
  - role: user
    parts:
      - type: text
        patterns: ["Create a high-resolution video"]
    
    assertions:
      - type: video_resolution
        params:
          min_width: 1920
          min_height: 1080

Use Cases:

video_duration

Validates video duration in assistant responses.

Parameters:

Example:

turns:
  - role: user
    parts:
      - type: text
        patterns: ["Create a 1-minute video clip"]
    
    assertions:
      - type: video_duration
        params:
          min_seconds: 59
          max_seconds: 61

Use Cases:

Combining Media Assertions

You can combine multiple media assertions on a single turn:

turns:
  - role: user
    parts:
      - type: text
        patterns: ["Create a 30-second 4K video in MP4 format"]
    
    assertions:
      # Validate format (if you add video_format validator)
      - type: content_includes
        params:
          patterns: ["video"]
      
      # Validate resolution
      - type: video_resolution
        params:
          presets:
            - 4k
            - uhd
      
      # Validate duration
      - type: video_duration
        params:
          min_seconds: 29
          max_seconds: 31

Complete Example Scenario

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
  name: media-validation-complete
  description: Comprehensive media validation testing
spec:
  provider: gpt-4-vision
  
  turns:
    # Image validation
    - role: user
      parts:
        - type: text
          patterns: ["Generate a 1920x1080 PNG wallpaper"]
      
      assertions:
        - type: image_format
          params:
            formats: [png]
        
        - type: image_dimensions
          params:
            width: 1920
            height: 1080
    
    # Audio validation
    - role: user
      parts:
        - type: text
          patterns: ["Create a 10-second MP3 audio clip"]
      
      assertions:
        - type: audio_format
          params:
            formats: [mp3]
        
        - type: audio_duration
          params:
            min_seconds: 9
            max_seconds: 11
    
    # Video validation
    - role: user
      parts:
        - type: text
          patterns: ["Generate a 30-second 4K video"]
      
      assertions:
        - type: video_resolution
          params:
            presets: [4k, uhd]
        
        - type: video_duration
          params:
            min_seconds: 29
            max_seconds: 31

Media Assertion Best Practices

Format Validation

Dimension/Resolution Testing

Duration Testing

Performance

Example Test Scenarios

See complete examples in examples/arena-media-test/:


Tips & Best Practices

Execution Performance

# Increase concurrency for faster execution
promptarena run --concurrency 10

# Reduce concurrency for stability
promptarena run --concurrency 1

Cost Control

# Use mock provider during development
promptarena run --mock-provider

# Test with cheaper models first
promptarena run --provider gpt-3.5-turbo

Reproducibility

# Always use same seed for consistent results
promptarena run --seed 42

# Document seed in test reports
promptarena run --seed 42 --format json,html

Debugging Tips

# Always start with config validation
promptarena config-inspect --verbose

# Use verbose mode to see API calls
promptarena run --verbose --scenario problematic-test

# Test prompt generation separately
promptarena prompt-debug --scenario scenarios/test.yaml

Next Steps


Need Help?

# General help
promptarena --help

# Command-specific help
promptarena run --help
promptarena config-inspect --help