Test Scenario Format
PromptPack is an open-source specification for defining LLM prompts, test scenarios, and configurations in a portable, version-controllable format.
Official Documentation
Section titled “Official Documentation”For complete specification documentation, please visit:
The official PromptPack specification site includes:
- Specification Overview - Understanding the PromptPack format
- File Format & Structure - Pack JSON structure
- Schema Reference - JSON schema validation
- Real-World Examples - Complete example packs
- Getting Started Guide - Quick start instructions
- Version History - v1.0, v1.1, etc.
PromptArena Implementation
Section titled “PromptArena Implementation”PromptArena is a reference implementation and testing tool for PromptPack files.
Supported Features
Section titled “Supported Features”- ✅ PromptPack v1.1 with multimodal support (images, audio, video)
- ✅ Kubernetes-style YAML resources:
Arena,PromptConfig,Scenario,Provider,Tool,Persona - ✅ Multi-provider testing: OpenAI, Anthropic, Google Gemini, Azure, Bedrock, and Mock
- ✅ MCP (Model Context Protocol) server integration
- ✅ Comprehensive assertion framework for validation
- ✅ HTML, JSON, and Markdown output formats
Quick Start
Section titled “Quick Start”# Run a test scenariopromptarena run examples/arena-media-test/arena.yaml
# Test across multiple providerspromptarena run arena.yaml --provider openai,anthropic --format htmlQuick Links
Section titled “Quick Links”- Schema: v1.1 JSON Schema
- Local Examples:
examples/directory in this repository - Arena Guides: Writing Scenarios | Assertions | Self-Play
- Community: GitHub Discussions
PromptArena-Specific Extensions
Section titled “PromptArena-Specific Extensions”While implementing the PromptPack specification, PromptArena adds these testing-focused features:
1. Arena Configuration Resource
Section titled “1. Arena Configuration Resource”The Arena resource orchestrates testing across multiple prompts, providers, and scenarios:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Arenametadata: name: my-test-suitespec: prompt_configs: - id: support file: prompts/support-bot.yaml
providers: - file: providers/openai-gpt4o.yaml - file: providers/claude-sonnet.yaml
scenarios: - file: scenarios/test-1.yaml
# MCP server integration mcp_servers: filesystem: command: npx args: ["@modelcontextprotocol/server-filesystem", "/data"]
defaults: output: dir: out formats: ["html", "json"]2. Enhanced Assertions
Section titled “2. Enhanced Assertions”PromptArena extends standard assertions with testing-specific validators:
# Turn-level assertionsassertions: # Content validation - type: content_includes - type: content_matches
# Tool usage validation - type: tools_called - type: tools_not_called
# JSON validation - type: is_valid_json - type: json_schema - type: json_path
# Multimodal validation - type: image_format - type: image_dimensions - type: audio_format - type: audio_duration - type: video_resolution - type: video_duration
# Workflow assertions - type: state_is - type: transitioned_to - type: workflow_complete
# LLM Judge - type: llm_judge
# External Evals - type: rest_eval - type: a2a_eval
# Conversation-level assertions (in conversation_assertions field)conversation_assertions: - type: tools_called - type: tools_not_called - type: tool_calls_with_argsSee the Assertions Guide for complete documentation.
3. Workflow Testing
Section titled “3. Workflow Testing”PromptArena supports testing workflow-based packs with step-by-step scenario execution. Workflow scenarios use input steps to send user messages; state transitions are LLM-initiated via the workflow__transition tool call rather than scripted in the scenario:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: WorkflowScenariometadata: name: support-escalationspec: id: support-escalation pack: ./support.pack.json description: "Test customer support escalation flow" variables: company_name: "Acme Corp" context_carry_forward: true steps: # Step 1: Send a message in the initial state - type: input content: "I need help with my billing" assertions: - type: state_is params: state: "intake" message: "Should be in intake state"
# Step 2: The LLM should decide to escalate - type: input content: "My invoice shows a duplicate charge of $49.99" assertions: - type: transitioned_to params: state: "specialist" message: "Should have transitioned to specialist" - type: content_includes params: patterns: ["invoice"]
# Step 3: Resolve and verify completion - type: input content: "Thank you, the refund looks correct!" assertions: - type: workflow_complete message: "Workflow should be complete"How transitions work: The LLM calls the workflow__transition tool with an event (matching the state machine’s defined transitions) and a context string that carries forward relevant information to the next state. The driver processes the transition internally and makes the context available via {{workflow_context}} in the new state’s system prompt.
Workflow Assertions (available in input step assertions):
state_is— Checks current workflow statetransitioned_to— Checks if a state was visited in the transition historyworkflow_complete— Checks if the workflow reached a terminal state
See the Assertions Reference for full details.
4. Multimodal Testing
Section titled “4. Multimodal Testing”PromptArena implements PromptPack v1.1 multimodal support with comprehensive testing capabilities:
# In PromptConfigspec: media: enabled: true supported_types: [image, audio, video, document] image: max_size_mb: 20 allowed_formats: [jpeg, png, webp] document: max_size_mb: 32 allowed_formats: [pdf]# In Scenarioturns: - role: user content: - type: text patterns: ["What's in this image?"] - type: image image_url: url: "path/to/image.jpg" detail: "high" - type: document document_url: url: "path/to/document.pdf"See examples/arena-media-test/ and examples/document-analysis/ for complete examples.
5. Mock Provider Support
Section titled “5. Mock Provider Support”Test without API costs using the Mock provider with configurable responses:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Providermetadata: name: mock-providerspec: type: mock model: mock-modelConfigure responses in providers/mock-responses.yaml. See Mock Provider Usage.
6. Self-Play Testing
Section titled “6. Self-Play Testing”Define AI personas to automatically generate user messages in conversations:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Personametadata: name: frustrated-customerspec: id: frustrated-customer description: A frustrated customer with a delayed order system_prompt: | You are a frustrated customer whose order hasn't arrived. Ask about delivery status and express your concerns. goals: - Get an update on order status - Express frustration appropriately constraints: - Keep messages to 1-2 sentences defaults: temperature: 0.8Then reference the persona in a scenario turn:
turns: - role: user content: "My order hasn't arrived yet." - role: gemini-user persona: frustrated-customer turns: 3 max_turns: 8Directory Structure
Section titled “Directory Structure”Recommended project layout for PromptArena tests:
my-project/├── arena.yaml # Main Arena configuration├── prompts/│ ├── support.yaml│ └── sales.yaml├── scenarios/│ ├── smoke-tests/│ └── regression/├── providers/│ ├── mock.yaml│ └── openai.yaml├── tools/│ └── weather.yaml└── out/ # Generated reports (add to .gitignore)Version Support
Section titled “Version Support”| PromptPack Version | PromptArena Support | Key Features |
|---|---|---|
| v1.0 | ✅ Full | Core specification |
| v1.1 | ✅ Full | Multimodal support (images, audio, video) |
| v1alpha1 | ✅ Full | Kubernetes-style resource format |
Learn More
Section titled “Learn More”PromptPack Specification
Section titled “PromptPack Specification”- PromptPack.org - Official specification
- GitHub Repository - Spec source and discussions
PromptArena Guides
Section titled “PromptArena Guides”- Writing Scenarios - Create effective test cases
- Assertions Reference - Complete assertion documentation
- Self-Play Testing - AI-driven testing with personas
- MCP Integration - Model Context Protocol servers
Examples
Section titled “Examples”examples/customer-support/- Basic support botexamples/arena-media-test/- Multimodal testingexamples/mcp-chatbot/- MCP server integration
Questions? Visit PromptPack.org or GitHub Discussions