Skip to content

Test Scenario Format

PromptPack is an open-source specification for defining LLM prompts, test scenarios, and configurations in a portable, version-controllable format.

For complete specification documentation, please visit:

The official PromptPack specification site includes:


PromptArena is a reference implementation and testing tool for PromptPack files.

  • PromptPack v1.1 with multimodal support (images, audio, video)
  • ✅ Kubernetes-style YAML resources: Arena, PromptConfig, Scenario, Provider, Tool, Persona
  • ✅ Multi-provider testing: OpenAI, Anthropic, Google Gemini, Azure, Bedrock, and Mock
  • ✅ MCP (Model Context Protocol) server integration
  • ✅ Comprehensive assertion framework for validation
  • ✅ HTML, JSON, and Markdown output formats
Terminal window
# Run a test scenario
promptarena run examples/arena-media-test/arena.yaml
# Test across multiple providers
promptarena run arena.yaml --provider openai,anthropic --format html

While implementing the PromptPack specification, PromptArena adds these testing-focused features:

The Arena resource orchestrates testing across multiple prompts, providers, and scenarios:

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Arena
metadata:
name: my-test-suite
spec:
prompt_configs:
- id: support
file: prompts/support-bot.yaml
providers:
- file: providers/openai-gpt4o.yaml
- file: providers/claude-sonnet.yaml
scenarios:
- file: scenarios/test-1.yaml
# MCP server integration
mcp_servers:
filesystem:
command: npx
args: ["@modelcontextprotocol/server-filesystem", "/data"]
defaults:
output:
dir: out
formats: ["html", "json"]

PromptArena extends standard assertions with testing-specific validators:

# Turn-level assertions
assertions:
# Content validation
- type: content_includes
- type: content_matches
# Tool usage validation
- type: tools_called
- type: tools_not_called
# JSON validation
- type: is_valid_json
- type: json_schema
- type: json_path
# Multimodal validation
- type: image_format
- type: image_dimensions
- type: audio_format
- type: audio_duration
- type: video_resolution
- type: video_duration
# LLM Judge
- type: llm_judge
# Conversation-level assertions (in conversation_assertions field)
conversation_assertions:
- type: tools_called
- type: tools_not_called
- type: tool_calls_with_args
- type: tools_not_called_with_args
- type: content_includes_any
- type: content_not_includes
- type: llm_judge_conversation

See the Assertions Guide for complete documentation.

PromptArena implements PromptPack v1.1 multimodal support with comprehensive testing capabilities:

# In PromptConfig
spec:
media:
enabled: true
supported_types: [image, audio, video, document]
image:
max_size_mb: 20
allowed_formats: [jpeg, png, webp]
document:
max_size_mb: 32
allowed_formats: [pdf]
# In Scenario
turns:
- role: user
content:
- type: text
patterns: ["What's in this image?"]
- type: image
image_url:
url: "path/to/image.jpg"
detail: "high"
- type: document
document_url:
url: "path/to/document.pdf"

See examples/arena-media-test/ and examples/document-analysis/ for complete examples.

Test without API costs using the Mock provider with configurable responses:

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Provider
metadata:
name: mock-provider
spec:
type: mock
model: mock-model

Configure responses in providers/mock-responses.yaml. See Mock Provider Usage.

Define AI personas to automatically test conversational flows:

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Persona
metadata:
name: frustrated-customer
spec:
system_prompt: |
You are a frustrated customer...
max_turns: 8
goal: "Get reassurance about order delivery and feel heard"

See the Self-Play Guide for details.


Recommended project layout for PromptArena tests:

my-project/
├── arena.yaml # Main Arena configuration
├── prompts/
│ ├── support.yaml
│ └── sales.yaml
├── scenarios/
│ ├── smoke-tests/
│ └── regression/
├── providers/
│ ├── mock.yaml
│ └── openai.yaml
├── tools/
│ └── weather.yaml
└── out/ # Generated reports (add to .gitignore)

PromptPack VersionPromptArena SupportKey Features
v1.0✅ FullCore specification
v1.1✅ FullMultimodal support (images, audio, video)
v1alpha1✅ FullKubernetes-style resource format


Questions? Visit PromptPack.org or GitHub Discussions