Test Scenario Format

PromptPack is an open-source specification for defining LLM prompts, test scenarios, and configurations in a portable, version-controllable format.

Official Documentation

For complete specification documentation, please visit:

PromptPack.org 📘

The official PromptPack specification site includes:

Specification Overview - Understanding the PromptPack format
File Format & Structure - Pack JSON structure
Schema Reference - JSON schema validation
Real-World Examples - Complete example packs
Getting Started Guide - Quick start instructions
Version History - v1.0, v1.1, etc.

PromptArena Implementation

PromptArena is a reference implementation and testing tool for PromptPack files.

Supported Features

✅ PromptPack v1.1 with multimodal support (images, audio, video)
✅ Kubernetes-style YAML resources: Arena, PromptConfig, Scenario, Provider, Tool, Persona
✅ Multi-provider testing: OpenAI, Anthropic, Google Gemini, Azure, Bedrock, and Mock
✅ MCP (Model Context Protocol) server integration
✅ Comprehensive assertion framework for validation
✅ HTML, JSON, and Markdown output formats

Quick Start

# Run a test scenario
promptarena run examples/arena-media-test/arena.yaml

# Test across multiple providers
promptarena run arena.yaml --provider openai,anthropic --format html

Quick Links

Schema: v1.1 JSON Schema
Local Examples: examples/ directory in this repository
Arena Guides: Writing Scenarios | Assertions | Self-Play
Community: GitHub Discussions

PromptArena-Specific Extensions

While implementing the PromptPack specification, PromptArena adds these testing-focused features:

1. Arena Configuration Resource

The Arena resource orchestrates testing across multiple prompts, providers, and scenarios:

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Arena
metadata:
  name: my-test-suite
spec:
  prompt_configs:
    - id: support
      file: prompts/support-bot.yaml

  providers:
    - file: providers/openai-gpt4o.yaml
    - file: providers/claude-sonnet.yaml

  scenarios:
    - file: scenarios/test-1.yaml

  # MCP server integration
  mcp_servers:
    filesystem:
      command: npx
      args: ["@modelcontextprotocol/server-filesystem", "/data"]

  defaults:
    output:
      dir: out
      formats: ["html", "json"]

2. Enhanced Assertions

PromptArena extends standard assertions with testing-specific validators:

# Turn-level assertions
assertions:
  # Content validation
  - type: content_includes
  - type: content_matches

  # Tool usage validation
  - type: tools_called
  - type: tools_not_called

  # JSON validation
  - type: is_valid_json
  - type: json_schema
  - type: json_path

  # Multimodal validation
  - type: image_format
  - type: image_dimensions
  - type: audio_format
  - type: audio_duration
  - type: video_resolution
  - type: video_duration

  # LLM Judge
  - type: llm_judge

# Conversation-level assertions (in conversation_assertions field)
conversation_assertions:
  - type: tools_called
  - type: tools_not_called
  - type: tool_calls_with_args
  - type: tools_not_called_with_args
  - type: content_includes_any
  - type: content_not_includes
  - type: llm_judge_conversation

See the Assertions Guide for complete documentation.

3. Multimodal Testing

PromptArena implements PromptPack v1.1 multimodal support with comprehensive testing capabilities:

# In PromptConfig
spec:
  media:
    enabled: true
    supported_types: [image, audio, video, document]
    image:
      max_size_mb: 20
      allowed_formats: [jpeg, png, webp]
    document:
      max_size_mb: 32
      allowed_formats: [pdf]

# In Scenario
turns:
  - role: user
    content:
      - type: text
        patterns: ["What's in this image?"]
      - type: image
        image_url:
          url: "path/to/image.jpg"
          detail: "high"
      - type: document
        document_url:
          url: "path/to/document.pdf"

See examples/arena-media-test/ and examples/document-analysis/ for complete examples.

4. Mock Provider Support

Test without API costs using the Mock provider with configurable responses:

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Provider
metadata:
  name: mock-provider
spec:
  type: mock
  model: mock-model

Configure responses in providers/mock-responses.yaml. See Mock Provider Usage.

5. Self-Play Testing

Define AI personas to automatically test conversational flows:

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Persona
metadata:
  name: frustrated-customer
spec:
  system_prompt: |
    You are a frustrated customer...
  max_turns: 8
  goal: "Get reassurance about order delivery and feel heard"

See the Self-Play Guide for details.

Directory Structure

Recommended project layout for PromptArena tests:

my-project/
├── arena.yaml           # Main Arena configuration
├── prompts/
│   ├── support.yaml
│   └── sales.yaml
├── scenarios/
│   ├── smoke-tests/
│   └── regression/
├── providers/
│   ├── mock.yaml
│   └── openai.yaml
├── tools/
│   └── weather.yaml
└── out/                 # Generated reports (add to .gitignore)

Version Support

PromptPack Version	PromptArena Support	Key Features
v1.0	✅ Full	Core specification
v1.1	✅ Full	Multimodal support (images, audio, video)
v1alpha1	✅ Full	Kubernetes-style resource format

Learn More

PromptPack Specification

PromptPack.org - Official specification
GitHub Repository - Spec source and discussions

PromptArena Guides

Writing Scenarios - Create effective test cases
Assertions Reference - Complete assertion documentation
Self-Play Testing - AI-driven testing with personas
MCP Integration - Model Context Protocol servers

Examples

examples/customer-support/ - Basic support bot
examples/arena-media-test/ - Multimodal testing
examples/mcp-chatbot/ - MCP server integration

Questions? Visit PromptPack.org or GitHub Discussions