Skip to content

CLI Commands

Complete command-line interface reference for PromptArena, the LLM testing framework.

PromptArena (promptarena) is a CLI tool for running multi-turn conversation simulations across multiple LLM providers, validating conversation flows, and generating comprehensive test reports.

Terminal window
promptarena [command] [flags]
CommandDescription
initInitialize a new Arena test project from template (built-in or remote)
runRun conversation simulations (main command)
mocksGenerate mock provider responses from Arena JSON results
config-inspectInspect and validate configuration
debugDebug configuration and prompt loading
prompt-debugDebug and test prompt generation
renderGenerate HTML report from existing results
completionGenerate shell autocompletion script
helpHelp about any command
Terminal window
-h, --help help for promptarena

Initialize a new PromptArena test project from a built-in template.

Terminal window
promprarena init [directory] [flags]
FlagTypeDefaultDescription
--quickboolfalseSkip interactive prompts, use defaults
--providerstring-Provider to configure (mock, openai, claude, gemini)
--templatestringquick-startTemplate to use for initialization
--list-templatesboolfalseList all available built-in templates
--var[]string-Set template variables (key=value)
--template-indexstringcommunityTemplate repo name or index URL/path for remote templates
--repo-configstringuser configTemplate repo config file
--template-cachestringtemp dirCache directory for remote templates

PromptArena includes 6 built-in templates:

TemplateFiles GeneratedDescription
basic-chatbot6 filesSimple conversational testing setup
customer-support10 filesSupport agent with KB search and order status tools
code-assistant9 filesCode generation and review with separate prompts
content-generation9 filesCreative content for blogs, products, social media
multimodal7 filesImage analysis and vision testing
mcp-integration7 filesMCP filesystem server integration
Terminal window
# See all built-in templates
promprarena init --list-templates
# List remote templates (from the default community repo)
promptarena templates list
# List remote templates from a named repo
promptarena templates repo add --name internal --url https://example.com/index.yaml
promptarena templates list --index internal
# List using repo/template shorthand
promptarena templates list --index community
Terminal window
# Create project with defaults (basic-chatbot template)
promprarena init my-test --quick
# With specific provider
promprarena init my-test --quick --provider openai
# With specific template
promprarena init my-test --quick --template customer-support --provider openai
# Render a remote template explicitly
promptarena templates fetch --template community/basic-chatbot --version 1.0.0
promptarena templates render --template community/basic-chatbot --version 1.0.0 --out ./out
Terminal window
# Interactive prompts guide you through setup
promprarena init my-project
Terminal window
# Override template variables
promprarena init my-test --quick --provider openai \
--var project_name="My Custom Project" \
--var description="Custom description" \
--var temperature=0.8

Depending on the template, init creates:

  • arena.yaml - Main Arena configuration
  • prompts/ - Prompt configurations
  • providers/ - Provider configurations
  • scenarios/ - Test scenarios
  • tools/ - Tool definitions (customer-support template)
  • .env - Environment variables with API key placeholders
  • .gitignore - Ignores .env and output files
  • README.md - Project documentation and usage instructions

basic-chatbot (6 files):

  • Best for: Beginners, simple testing
  • Includes: 1 prompt, 1 provider, 1 basic scenario

customer-support (10 files):

  • Best for: Support agent testing, tool calling
  • Includes: 1 prompt, 3 scenarios, 2 tools (KB search, order status)

code-assistant (9 files):

  • Best for: Code generation workflows
  • Includes: 2 prompts (generator, reviewer), 3 scenarios
  • Temperature: 0.3 (deterministic)

content-generation (9 files):

  • Best for: Marketing, creative writing
  • Includes: 2 prompts (blog, marketing), 3 scenarios
  • Temperature: 0.8 (creative)

multimodal (7 files):

  • Best for: Vision AI, image analysis
  • Includes: 1 vision prompt, 2 scenarios with sample images

mcp-integration (7 files):

  • Best for: MCP server testing, tool integration
  • Includes: 1 prompt, 2 scenarios, MCP filesystem server config
Terminal window
# Navigate to project
cd my-test
# Add your API key to .env
echo "OPENAI_API_KEY=sk-..." >> .env
# Run tests
promprarena run
# View results
open out/report.html

Generate mock provider YAML from recorded Arena JSON results so you can replay conversations without calling real LLMs.

Terminal window
promptarena mocks generate [flags]
FlagTypeDefaultDescription
--input, -istringoutArena JSON result file or directory containing *.json runs
--output, -ostringproviders/mock-generated.yamlOutput file path or directory (when --per-scenario is set)
--per-scenarioboolfalseWrite one YAML file per scenario (in --output directory)
--mergeboolfalseMerge with existing mock file(s) instead of overwriting
--scenario[]string-Only include specified scenario IDs
--provider[]string-Only include specified provider IDs
--dry-runboolfalsePrint generated YAML instead of writing files
--default-responsestring-Set defaultResponse when not present

Generate a consolidated mock file from the latest runs:

Terminal window
promptarena mocks generate \
--input out \
--scenario hardware-faults \
--provider openai-gpt4o \
--output providers/mock-generated.yaml \
--merge

Write one file per scenario:

Terminal window
promptarena mocks generate \
--input out \
--per-scenario \
--output providers/responses \
--merge

Preview without writing:

Terminal window
promptarena mocks generate --input out --dry-run

Run multi-turn conversation simulations across multiple LLM providers.

Terminal window
promptarena run [flags]
FlagTypeDefaultDescription
-c, --configstringarena.yamlConfiguration file path
FlagTypeDefaultDescription
-j, --concurrencyint6Number of concurrent workers
-s, --seedint42Random seed for reproducibility
--ciboolfalseCI mode (headless, minimal output)
FlagTypeDefaultDescription
--provider[]stringallProviders to use (comma-separated)
--scenario[]stringallScenarios to run (comma-separated)
--region[]stringallRegions to run (comma-separated)
--roles[]stringallSelf-play role configurations to use
FlagTypeDefaultDescription
--temperaturefloat320.6Override temperature for all scenarios
--max-tokensint-Override max tokens for all scenarios
FlagTypeDefaultDescription
--selfplayboolfalseEnable self-play mode
FlagTypeDefaultDescription
--mock-providerboolfalseReplace all providers with MockProvider
--mock-configstring-Path to mock provider configuration (YAML)
FlagTypeDefaultDescription
-o, --outstringoutOutput directory
--format[]stringfrom configOutput formats: json, junit, html, markdown
--formats[]stringfrom configAlias for —format
FlagTypeDefaultDescription
--htmlboolfalseGenerate HTML report (use —format html instead)
--html-filestringout/report-[timestamp].htmlHTML report output file
--junit-filestringout/junit.xmlJUnit XML output file
--markdown-filestringout/results.mdMarkdown report output file
FlagTypeDefaultDescription
-v, --verboseboolfalseEnable verbose debug logging for API calls
Terminal window
# Run all tests with default configuration
promptarena run
# Specify configuration file
promptarena run --config my-arena.yaml
Terminal window
# Run specific providers only
promptarena run --provider openai,claude
# Run specific scenarios
promptarena run --scenario basic-qa,edge-cases
# Combine filters
promptarena run --provider openai --scenario customer-support
Terminal window
# Run with 3 concurrent workers
promptarena run --concurrency 3
# Sequential execution (no parallelism)
promptarena run --concurrency 1
Terminal window
# Override temperature for all tests
promptarena run --temperature 0.8
# Override max tokens
promptarena run --max-tokens 500
# Combined overrides
promptarena run --temperature 0.9 --max-tokens 1000
Terminal window
# Generate JSON and HTML reports
promptarena run --format json,html
# Generate all available formats
promptarena run --format json,junit,html,markdown
# Custom output directory
promptarena run --out test-results-2024-01-15
# Specify custom HTML filename (legacy)
promptarena run --html --html-file custom-report.html
Terminal window
# Use mock provider instead of real APIs (fast, no cost)
promptarena run --mock-provider
# Use custom mock configuration
promptarena run --mock-config mock-responses.yaml
Terminal window
# Enable self-play testing
promptarena run --selfplay
# Self-play with specific roles
promptarena run --selfplay --roles frustrated-customer,tech-support
Terminal window
# Headless mode for CI pipelines
promptarena run --ci --format junit,json
# With specific quality gates
promptarena run --ci --concurrency 3 --format junit
Terminal window
# Verbose output for troubleshooting
promptarena run --verbose
# Verbose with specific scenario
promptarena run --verbose --scenario failing-test
Terminal window
# Use specific seed for reproducibility
promptarena run --seed 12345
# Same seed across runs produces same results
promptarena run --seed 12345 --provider openai

Inspect and validate arena configuration, showing all loaded resources and validating cross-references. This command provides a rich, styled display of your configuration with validation results.

Terminal window
promptarena config-inspect [flags]
FlagTypeDefaultDescription
-c, --configstringarena.yamlConfiguration file path
--formatstringtextOutput format: text, json
-s, --shortboolfalseShow only validation results (shortcut for --section validation)
--sectionstring-Focus on specific section: prompts, providers, scenarios, tools, selfplay, judges, defaults, validation
--verboseboolfalseShow detailed information including file contents
--statsboolfalseShow cache statistics
Terminal window
# Inspect default configuration
promptarena config-inspect
# Inspect specific config file
promptarena config-inspect --config staging-arena.yaml
# Verbose output with full details
promptarena config-inspect --verbose
# Quick validation check only
promptarena config-inspect --short
# or
promptarena config-inspect -s
# Focus on specific section
promptarena config-inspect --section providers
promptarena config-inspect --section selfplay
promptarena config-inspect --section validation
# JSON output for programmatic use
promptarena config-inspect --format json
# Show cache statistics
promptarena config-inspect --stats

The --section flag allows focusing on specific parts of the configuration:

SectionDescription
promptsPrompt configurations with task types, variables, validators
providersProvider details organized by group (default, judge, selfplay)
scenariosScenario details with turn counts and assertion summaries
toolsTool definitions with modes, parameters, timeouts
selfplaySelf-play configuration including personas and roles
judgesJudge configurations for LLM-as-judge validators
defaultsDefault settings (temperature, max tokens, concurrency)
validationValidation results and connectivity checks

The command displays styled boxes with:

  • Loaded prompt configurations with task types, variables, and validators
  • Configured providers organized by group (default, judge, selfplay)
  • Available scenarios with turn counts and assertion summaries
  • Tool definitions with modes and parameters
  • Self-play roles with persona associations
  • Judge configurations
  • Default settings
  • Cross-reference validation results with connectivity checks

Example Output:

✨ PromptArena Configuration Inspector ✨
╭──────────────────────────────────────────────────────────────────────────────╮
│ Configuration: arena.yaml │
╰──────────────────────────────────────────────────────────────────────────────╯
📋 Prompt Configs (2)
╭──────────────────────────────────────────────────────────────────────────────╮
│ troubleshooter-v2 │
│ Task Type: troubleshooting │
│ File: prompts/troubleshooter-v2.prompt.yaml │
╰──────────────────────────────────────────────────────────────────────────────╯
🔌 Providers (3)
╭──────────────────────────────────────────────────────────────────────────────╮
│ [default] │
│ openai-gpt4o: gpt-4o (temp: 0.70, max: 1000) │
│ │
│ [judge] │
│ judge-provider: gpt-4o-mini (temp: 0.00, max: 500) │
│ │
│ [selfplay] │
│ mock-selfplay: mock-model (temp: 0.80, max: 1000) │
╰──────────────────────────────────────────────────────────────────────────────╯
🎭 Self-Play (2 personas, 2 roles)
Personas:
╭──────────────────────────────────────────────────────────────────────────────╮
│ red-team-attacker │
│ plant-operator │
╰──────────────────────────────────────────────────────────────────────────────╯
Roles:
╭──────────────────────────────────────────────────────────────────────────────╮
│ attacker (red-team-attacker) → openai-gpt4o │
│ operator (plant-operator) → openai-gpt4o │
╰──────────────────────────────────────────────────────────────────────────────╯
✅ Validation
╭──────────────────────────────────────────────────────────────────────────────╮
│ ✓ Configuration is valid │
│ │
│ Connectivity Checks: │
│ ☑ Tools are used by prompts │
│ ☑ Unique task types per prompt │
│ ☑ Scenario task types exist │
│ ☑ Allowed tools are defined │
│ ☑ Self-play roles have valid providers │
╰──────────────────────────────────────────────────────────────────────────────╯

Debug command shows loaded configuration, prompt packs, scenarios, and providers to help troubleshoot configuration issues.

Terminal window
promptarena debug [flags]
FlagTypeDefaultDescription
-c, --configstringarena.yamlConfiguration file path
Terminal window
# Debug default configuration
promptarena debug
# Debug specific config
promptarena debug --config test-arena.yaml
  • Troubleshoot configuration loading issues
  • Verify all files are found and parsed correctly
  • Check prompt pack assembly
  • Validate provider initialization

Test prompt generation with specific regions, task types, and contexts. Useful for validating prompt assembly before running full tests.

Terminal window
promptarena prompt-debug [flags]
FlagTypeDefaultDescription
-c, --configstringarena.yamlConfiguration file path
-t, --task-typestring-Task type for prompt generation
-r, --regionstring-Region for prompt generation
--personastring-Persona ID to test
--scenariostring-Scenario file path to load task_type and context
--contextstring-Context slot content
--userstring-User context (e.g., “iOS developer”)
--domainstring-Domain hint (e.g., “mobile development”)
-l, --listboolfalseList available regions and task types
-j, --jsonboolfalseOutput as JSON
-p, --show-promptbooltrueShow the full assembled prompt
-m, --show-metabooltrueShow metadata and configuration info
-s, --show-statsbooltrueShow statistics (length, tokens, etc.)
-v, --verboseboolfalseVerbose output with debug info
Terminal window
# List available configurations
promptarena prompt-debug --list
# Test prompt generation for task type
promptarena prompt-debug --task-type support
# Test with region
promptarena prompt-debug --task-type support --region us
# Test with persona
promptarena prompt-debug --persona us-hustler-v1
# Test with scenario file
promptarena prompt-debug --scenario scenarios/customer-support.yaml
# Test with custom context
promptarena prompt-debug --task-type support --context "urgent billing issue"
# JSON output for parsing
promptarena prompt-debug --task-type support --json
# Minimal output (just the prompt)
promptarena prompt-debug --task-type support --show-meta=false --show-stats=false

The command shows:

  • Assembled system prompt
  • Metadata (task type, region, persona)
  • Statistics (character count, estimated tokens)
  • Configuration used

Example Output:

=== Prompt Debug ===
Task Type: support
Region: us
Persona: default
--- System Prompt ---
You are a helpful customer support agent for TechCo.
Your role:
- Answer product questions
- Help track orders
- Process returns and refunds
...
--- Statistics ---
Characters: 1,234
Estimated Tokens: 308
Lines: 42
--- Metadata ---
Prompt Config: support
Version: v1.0.0
Validators: 3

Generate an HTML report from existing test results.

Terminal window
promptarena render [index.json path] [flags]
FlagTypeDefaultDescription
-o, --outputstringreport-[timestamp].htmlOutput HTML file path
Terminal window
# Render from default location
promptarena render out/index.json
# Custom output path
promptarena render out/index.json --output custom-report.html
# Render from archived results
promptarena render archive/2024-01-15/index.json --output reports/jan-15-report.html
  • Regenerate reports after test runs
  • Create reports with different formatting
  • Archive and view historical results
  • Share results without re-running tests

Generate shell autocompletion script for bash, zsh, fish, or PowerShell.

Terminal window
promptarena completion [bash|zsh|fish|powershell]
Terminal window
# Bash
promptarena completion bash > /etc/bash_completion.d/promptarena
# Zsh
promptarena completion zsh > "${fpath[1]}/_promptarena"
# Fish
promptarena completion fish > ~/.config/fish/completions/promptarena.fish
# PowerShell
promptarena completion powershell > promptarena.ps1

PromptArena respects the following environment variables:

VariableDescription
OPENAI_API_KEYOpenAI API authentication
ANTHROPIC_API_KEYAnthropic API authentication
GOOGLE_API_KEYGoogle AI API authentication
PROMPTARENA_CONFIGDefault configuration file (overrides config.arena.yaml)
PROMPTARENA_OUTPUTDefault output directory (overrides out)
Terminal window
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export PROMPTARENA_CONFIG="staging-arena.yaml"
export PROMPTARENA_OUTPUT="test-results"
promptarena run

CodeMeaning
0Success - all tests passed
1Failure - one or more tests failed or error occurred

Check exit code in scripts:

Terminal window
if promptarena run --ci; then
echo "✅ Tests passed"
else
echo "❌ Tests failed"
exit 1
fi

Terminal window
# Quick test with mock providers
promptarena run --mock-provider
# Test specific feature
promptarena run --scenario new-feature --verbose
# Inspect configuration
promptarena config-inspect --verbose
Terminal window
# Run in headless CI mode
promptarena run --ci --format junit,json
# Check specific providers
promptarena run --ci --provider openai,claude --format junit
Terminal window
# Validate configuration
promptarena config-inspect
# Debug prompt assembly
promptarena prompt-debug --task-type support --verbose
# Run with verbose logging
promptarena run --verbose --scenario failing-test
# Check configuration loading
promptarena debug
Terminal window
# Run tests
promptarena run --format json
# Later, generate HTML from results
promptarena render out/index.json --output reports/latest.html
Terminal window
# Test all providers
promptarena run --format html,json
# Test specific providers
promptarena run --provider openai,claude,gemini --format html

PromptArena uses a YAML configuration file (default: config.arena.yaml). See the Configuration Reference for complete documentation.

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Arena
metadata:
name: my-arena
spec:
prompt_configs:
- id: assistant
file: prompts/assistant.yaml
providers:
- file: providers/openai.yaml
scenarios:
- file: scenarios/test.yaml
defaults:
output:
dir: out
formats: ["json", "html"]

PromptArena supports multimodal content (images, audio, video) in test scenarios with comprehensive media rendering in all output formats.

Test scenarios can include multimodal content using the parts array:

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: image-analysis
spec:
turns:
- role: user
parts:
- type: text
patterns: ["What's in this image?"]
- type: image
media:
file_path: test-data/sample.jpg
detail: high
  • Images: JPEG, PNG, GIF, WebP
  • Audio: MP3, WAV, OGG, M4A
  • Video: MP4, WebM, MOV

Media can be loaded from three sources:

1. Local Files

- type: image
media:
file_path: images/diagram.png
detail: high

2. URLs (fetched during test execution)

- type: image
media:
url: https://example.com/photo.jpg
detail: auto

3. Inline Base64 Data

- type: image
media:
data: "iVBORw0KGgoAAAANSUhEUgAAAAUA..."
mime_type: image/png
detail: low

All output formats include media statistics and rendering:

HTML reports include:

Media Summary Dashboard

  • Visual statistics cards showing:
    • Total images, audio, and video files
    • Successfully loaded vs. failed media
    • Total media size in human-readable format
    • Media type icons (🖼️ 🎵 🎬)

Media Badges

🖼️ x3 🎵 x2 ✅ 5 ❌ 0 💾 1.2 MB

Media Items Display

  • Individual media items with:
    • Type icon and format badge
    • Source (file path, URL, or “inline”)
    • MIME type
    • File size
    • Load status (✅ loaded / ❌ error)

Example HTML Output:

<div class="media-summary">
<div class="stat-card">
<div class="stat-value">5</div>
<div class="stat-label">🖼️ Images</div>
</div>
<div class="stat-card">
<div class="stat-value">3</div>
<div class="stat-label">🎵 Audio</div>
</div>
<!-- ... -->
</div>

JUnit XML includes media metadata as test suite properties:

<testsuite name="image-analysis" tests="1">
<properties>
<property name="media.images.total" value="5"/>
<property name="media.audio.total" value="3"/>
<property name="media.video.total" value="0"/>
<property name="media.loaded.success" value="8"/>
<property name="media.loaded.errors" value="0"/>
<property name="media.size.total_bytes" value="1245678"/>
</properties>
<testcase name="test-001" classname="image-analysis" time="2.34"/>
</testsuite>

Property Naming Convention:

  • media.{type}.total - Count by media type (images, audio, video)
  • media.loaded.success - Successfully loaded media items
  • media.loaded.errors - Failed media loads
  • media.size.total_bytes - Total size in bytes

These properties are useful for:

  • CI/CD metrics and tracking
  • Test result analysis
  • Media resource monitoring

Markdown reports include a media statistics table in the overview section:

## 📊 Overview
| Metric | Value |
|--------|-------|
| Tests Run | 6 |
| Passed | 5 ✅ |
| Failed | 1 ❌ |
| Success Rate | 83.3% |
| Total Cost | $0.0245 |
| Total Duration | 12.5s |
### 🎨 Media Content
| Type | Count |
|------|-------|
| 🖼️ Images | 5 |
| 🎵 Audio Files | 3 |
| 🎬 Videos | 0 |
| ✅ Loaded | 8 |
| ❌ Errors | 0 |
| 💾 Total Size | 1.2 MB |

Control how media is loaded and processed:

For URL-based media, configure the HTTP loader:

spec:
defaults:
media:
http:
timeout: 30s
max_file_size: 50MB

Relative paths are resolved from the configuration file directory:

# If arena.yaml is in /project/tests/
# This resolves to /project/tests/images/sample.jpg
- type: image
media:
file_path: images/sample.jpg

PromptArena validates media content:

Path Security

  • Prevents path traversal attacks (.. sequences)
  • Validates file paths are within allowed directories
  • Checks symlink targets

File Validation

  • Verifies MIME types match content types
  • Checks file existence
  • Validates file sizes against limits
  • Ensures files are regular files (not directories)

Error Handling

  • Media load failures are captured in test results
  • Errors reported in all output formats
  • Tests can continue with partial media failures
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: product-image-analysis
spec:
task_type: vision
turns:
- role: user
parts:
- type: text
patterns: ["Analyze this product image for defects"]
- type: image
media:
file_path: test-data/product-123.jpg
detail: high
assertions:
- type: content_includes
patterns: ["quality", "inspection"]
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: audio-transcription
spec:
task_type: transcription
turns:
- role: user
parts:
- type: text
patterns: ["Transcribe this audio"]
- type: audio
media:
file_path: test-data/meeting-recording.mp3
assertions:
- type: content_includes
patterns: ["meeting", "agenda"]
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: multimodal-analysis
spec:
turns:
- role: user
parts:
- type: text
patterns: ["Compare these media files"]
- type: image
media:
file_path: charts/q1-results.png
- type: image
media:
file_path: charts/q2-results.png
- type: audio
media:
file_path: presentations/summary.mp3
Terminal window
# Run multimodal tests with all formats
promptarena run --format html,junit,markdown
# HTML report includes interactive media dashboard
open out/report.html
# JUnit XML includes media metrics for CI
cat out/junit.xml | grep "media\."
# Markdown shows media statistics
cat out/results.md

Extract media metrics from JUnit XML:

Terminal window
# Count total images tested
xmllint --xpath "//property[@name='media.images.total']/@value" out/junit.xml
# Check for media load errors
xmllint --xpath "//property[@name='media.loaded.errors']/@value" out/junit.xml

File Organization

project/
├── arena.yaml
├── test-data/
│ ├── images/
│ │ ├── valid/
│ │ └── invalid/
│ ├── audio/
│ └── video/
└── scenarios/
└── multimodal-tests.yaml

Size Limits

  • Keep test media files small (<10MB recommended)
  • Use compressed formats (WebP for images, MP3 for audio)
  • Consider using thumbnails for image tests

URL Loading

  • Use reliable, stable URLs for CI/CD
  • Consider local copies for critical tests
  • Set appropriate timeouts for remote resources

Assertions

  • Validate media is processed in responses
  • Check for expected content types
  • Verify quality/accuracy of analysis

Arena provides six specialized media validators to test media content in LLM responses. These assertions validate format, dimensions, duration, and resolution of images, audio, and video outputs.

Validates that images in assistant responses match allowed formats.

Parameters:

  • formats ([]string, required): List of allowed formats (e.g., png, jpeg, jpg, webp, gif)

Example:

turns:
- role: user
parts:
- type: text
patterns: ["Generate a PNG image of a sunset"]
assertions:
- type: image_format
params:
formats:
- png

Use Cases:

  • Validate model outputs correct image format
  • Test format conversion capabilities
  • Ensure compatibility with downstream systems

Validates image dimensions (width and height) in assistant responses.

Parameters:

  • width (int, optional): Exact required width in pixels
  • height (int, optional): Exact required height in pixels
  • min_width (int, optional): Minimum width in pixels
  • max_width (int, optional): Maximum width in pixels
  • min_height (int, optional): Minimum height in pixels
  • max_height (int, optional): Maximum height in pixels

Example:

turns:
- role: user
parts:
- type: text
patterns: ["Create a 1920x1080 wallpaper"]
assertions:
# Exact dimensions
- type: image_dimensions
params:
width: 1920
height: 1080
turns:
- role: user
parts:
- type: text
patterns: ["Generate a thumbnail"]
assertions:
# Size range
- type: image_dimensions
params:
min_width: 100
max_width: 400
min_height: 100
max_height: 400

Use Cases:

  • Validate exact resolution requirements
  • Test minimum/maximum size constraints
  • Verify thumbnail generation
  • Ensure HD/4K resolution compliance

Validates audio format in assistant responses.

Parameters:

  • formats ([]string, required): List of allowed formats (e.g., mp3, wav, ogg, m4a, flac)

Example:

turns:
- role: user
parts:
- type: text
patterns: ["Generate an audio clip"]
assertions:
- type: audio_format
params:
formats:
- mp3
- wav

Use Cases:

  • Validate audio output format
  • Test format compatibility
  • Ensure codec requirements

Validates audio duration in assistant responses.

Parameters:

  • min_seconds (float, optional): Minimum duration in seconds
  • max_seconds (float, optional): Maximum duration in seconds

Example:

turns:
- role: user
parts:
- type: text
patterns: ["Create a 30-second audio clip"]
assertions:
- type: audio_duration
params:
min_seconds: 29
max_seconds: 31
turns:
- role: user
parts:
- type: text
patterns: ["Generate a brief notification sound"]
assertions:
- type: audio_duration
params:
max_seconds: 5

Use Cases:

  • Validate exact duration requirements
  • Test length constraints
  • Verify podcast/music length
  • Ensure compliance with platform limits

Validates video resolution in assistant responses.

Parameters:

  • presets ([]string, optional): List of resolution presets
  • min_width (int, optional): Minimum width in pixels
  • max_width (int, optional): Maximum width in pixels
  • min_height (int, optional): Minimum height in pixels
  • max_height (int, optional): Maximum height in pixels

Supported Presets:

  • 480p, sd - Standard Definition (480 height)
  • 720p, hd - HD (720 height)
  • 1080p, fhd, full_hd - Full HD (1080 height)
  • 1440p, 2k, qhd - QHD (1440 height)
  • 2160p, 4k, uhd - 4K Ultra HD (2160 height)
  • 4320p, 8k - 8K (4320 height)

Example with Presets:

turns:
- role: user
parts:
- type: text
patterns: ["Generate a 1080p video"]
assertions:
- type: video_resolution
params:
presets:
- 1080p
- fhd

Example with Dimensions:

turns:
- role: user
parts:
- type: text
patterns: ["Create a high-resolution video"]
assertions:
- type: video_resolution
params:
min_width: 1920
min_height: 1080

Use Cases:

  • Validate exact resolution requirements
  • Test HD/4K compliance
  • Verify minimum quality standards
  • Validate aspect ratios

Validates video duration in assistant responses.

Parameters:

  • min_seconds (float, optional): Minimum duration in seconds
  • max_seconds (float, optional): Maximum duration in seconds

Example:

turns:
- role: user
parts:
- type: text
patterns: ["Create a 1-minute video clip"]
assertions:
- type: video_duration
params:
min_seconds: 59
max_seconds: 61

Use Cases:

  • Validate exact duration requirements
  • Test length constraints
  • Verify platform compliance (e.g., TikTok 60s limit)
  • Ensure streaming segment sizes

You can combine multiple media assertions on a single turn:

turns:
- role: user
parts:
- type: text
patterns: ["Create a 30-second 4K video in MP4 format"]
assertions:
# Validate format (if you add video_format validator)
- type: content_includes
params:
patterns: ["video"]
# Validate resolution
- type: video_resolution
params:
presets:
- 4k
- uhd
# Validate duration
- type: video_duration
params:
min_seconds: 29
max_seconds: 31
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: media-validation-complete
description: Comprehensive media validation testing
spec:
provider: gpt-4-vision
turns:
# Image validation
- role: user
parts:
- type: text
patterns: ["Generate a 1920x1080 PNG wallpaper"]
assertions:
- type: image_format
params:
formats: [png]
- type: image_dimensions
params:
width: 1920
height: 1080
# Audio validation
- role: user
parts:
- type: text
patterns: ["Create a 10-second MP3 audio clip"]
assertions:
- type: audio_format
params:
formats: [mp3]
- type: audio_duration
params:
min_seconds: 9
max_seconds: 11
# Video validation
- role: user
parts:
- type: text
patterns: ["Generate a 30-second 4K video"]
assertions:
- type: video_resolution
params:
presets: [4k, uhd]
- type: video_duration
params:
min_seconds: 29
max_seconds: 31
  • Always specify multiple acceptable formats when possible
  • Use lowercase format names for consistency
  • Test format conversion capabilities
  • Use min/max ranges to allow for encoding variations
  • Test common aspect ratios (16:9, 4:3, 9:16)
  • Validate minimum quality standards
  • Allow small tolerance ranges (±1-2 seconds)
  • Test edge cases (very short/long durations)
  • Verify platform-specific limits
  • Media assertions execute on assistant responses only
  • No API calls are made for validation
  • Assertions run in parallel with other validators

See complete examples in examples/arena-media-test/:

  • image-validation.yaml - Image format and dimension testing
  • audio-validation.yaml - Audio format and duration testing
  • video-validation.yaml - Video resolution and duration testing

Terminal window
# Increase concurrency for faster execution
promptarena run --concurrency 10
# Reduce concurrency for stability
promptarena run --concurrency 1
Terminal window
# Use mock provider during development
promptarena run --mock-provider
# Test with cheaper models first
promptarena run --provider gpt-3.5-turbo
Terminal window
# Always use same seed for consistent results
promptarena run --seed 42
# Document seed in test reports
promptarena run --seed 42 --format json,html
Terminal window
# Always start with config validation
promptarena config-inspect --verbose
# Use verbose mode to see API calls
promptarena run --verbose --scenario problematic-test
# Test prompt generation separately
promptarena prompt-debug --scenario scenarios/test.yaml


Need Help?

Terminal window
# General help
promptarena --help
# Command-specific help
promptarena run --help
promptarena config-inspect --help