Skip to content

Test Scenario Format

PromptPack is an open-source specification for defining LLM prompts, test scenarios, and configurations in a portable, version-controllable format.

For complete specification documentation, please visit:

The official PromptPack specification site includes:


PromptArena is a reference implementation and testing tool for PromptPack files.

  • PromptPack v1.1 with multimodal support (images, audio, video)
  • ✅ Kubernetes-style YAML resources: Arena, PromptConfig, Scenario, Provider, Tool, Persona
  • ✅ Multi-provider testing: OpenAI, Anthropic, Google Gemini, Azure, Bedrock, and Mock
  • ✅ MCP (Model Context Protocol) server integration
  • ✅ Comprehensive assertion framework for validation
  • ✅ HTML, JSON, and Markdown output formats
Terminal window
# Run a test scenario
promptarena run examples/arena-media-test/arena.yaml
# Test across multiple providers
promptarena run arena.yaml --provider openai,anthropic --format html

While implementing the PromptPack specification, PromptArena adds these testing-focused features:

The Arena resource orchestrates testing across multiple prompts, providers, and scenarios:

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Arena
metadata:
name: my-test-suite
spec:
prompt_configs:
- id: support
file: prompts/support-bot.yaml
providers:
- file: providers/openai-gpt4o.yaml
- file: providers/claude-sonnet.yaml
scenarios:
- file: scenarios/test-1.yaml
# MCP server integration
mcp_servers:
filesystem:
command: npx
args: ["@modelcontextprotocol/server-filesystem", "/data"]
defaults:
output:
dir: out
formats: ["html", "json"]

PromptArena extends standard assertions with testing-specific validators:

# Turn-level assertions
assertions:
# Content validation
- type: content_includes
- type: content_matches
# Tool usage validation
- type: tools_called
- type: tools_not_called
# JSON validation
- type: is_valid_json
- type: json_schema
- type: json_path
# Multimodal validation
- type: image_format
- type: image_dimensions
- type: audio_format
- type: audio_duration
- type: video_resolution
- type: video_duration
# Workflow assertions
- type: state_is
- type: transitioned_to
- type: workflow_complete
# LLM Judge
- type: llm_judge
# External Evals
- type: rest_eval
- type: a2a_eval
# Conversation-level assertions (in conversation_assertions field)
conversation_assertions:
- type: tools_called
- type: tools_not_called
- type: tool_calls_with_args

See the Assertions Guide for complete documentation.

PromptArena supports testing workflow-based packs with step-by-step scenario execution. Workflow scenarios use input steps to send user messages; state transitions are LLM-initiated via the workflow__transition tool call rather than scripted in the scenario:

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: WorkflowScenario
metadata:
name: support-escalation
spec:
id: support-escalation
pack: ./support.pack.json
description: "Test customer support escalation flow"
variables:
company_name: "Acme Corp"
context_carry_forward: true
steps:
# Step 1: Send a message in the initial state
- type: input
content: "I need help with my billing"
assertions:
- type: state_is
params:
state: "intake"
message: "Should be in intake state"
# Step 2: The LLM should decide to escalate
- type: input
content: "My invoice shows a duplicate charge of $49.99"
assertions:
- type: transitioned_to
params:
state: "specialist"
message: "Should have transitioned to specialist"
- type: content_includes
params:
patterns: ["invoice"]
# Step 3: Resolve and verify completion
- type: input
content: "Thank you, the refund looks correct!"
assertions:
- type: workflow_complete
message: "Workflow should be complete"

How transitions work: The LLM calls the workflow__transition tool with an event (matching the state machine’s defined transitions) and a context string that carries forward relevant information to the next state. The driver processes the transition internally and makes the context available via {{workflow_context}} in the new state’s system prompt.

Workflow Assertions (available in input step assertions):

  • state_is — Checks current workflow state
  • transitioned_to — Checks if a state was visited in the transition history
  • workflow_complete — Checks if the workflow reached a terminal state

See the Assertions Reference for full details.

PromptArena implements PromptPack v1.1 multimodal support with comprehensive testing capabilities:

# In PromptConfig
spec:
media:
enabled: true
supported_types: [image, audio, video, document]
image:
max_size_mb: 20
allowed_formats: [jpeg, png, webp]
document:
max_size_mb: 32
allowed_formats: [pdf]
# In Scenario
turns:
- role: user
content:
- type: text
patterns: ["What's in this image?"]
- type: image
image_url:
url: "path/to/image.jpg"
detail: "high"
- type: document
document_url:
url: "path/to/document.pdf"

See examples/arena-media-test/ and examples/document-analysis/ for complete examples.

Test without API costs using the Mock provider with configurable responses:

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Provider
metadata:
name: mock-provider
spec:
type: mock
model: mock-model

Configure responses in providers/mock-responses.yaml. See Mock Provider Usage.

Define AI personas to automatically generate user messages in conversations:

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Persona
metadata:
name: frustrated-customer
spec:
id: frustrated-customer
description: A frustrated customer with a delayed order
system_prompt: |
You are a frustrated customer whose order hasn't arrived.
Ask about delivery status and express your concerns.
goals:
- Get an update on order status
- Express frustration appropriately
constraints:
- Keep messages to 1-2 sentences
defaults:
temperature: 0.8

Then reference the persona in a scenario turn:

turns:
- role: user
content: "My order hasn't arrived yet."
- role: gemini-user
persona: frustrated-customer
turns: 3
max_turns: 8

Recommended project layout for PromptArena tests:

my-project/
├── arena.yaml # Main Arena configuration
├── prompts/
│ ├── support.yaml
│ └── sales.yaml
├── scenarios/
│ ├── smoke-tests/
│ └── regression/
├── providers/
│ ├── mock.yaml
│ └── openai.yaml
├── tools/
│ └── weather.yaml
└── out/ # Generated reports (add to .gitignore)

PromptPack VersionPromptArena SupportKey Features
v1.0✅ FullCore specification
v1.1✅ FullMultimodal support (images, audio, video)
v1alpha1✅ FullKubernetes-style resource format


Questions? Visit PromptPack.org or GitHub Discussions