Write Test Scenarios
Learn how to create and structure test scenarios for LLM testing.
Overview
Test scenarios define the conversation flows, expected behaviors, and validation criteria for your LLM applications. Each scenario is a YAML file in the PromptPack format.
Quick Start with Templates
The fastest way to create scenarios is using the project generator:
# Create a new test project with sample scenario
promptarena init my-test --quick --provider mock
# This generates scenarios/basic-test.yaml for you
cd my-test
cat scenarios/basic-test.yaml
The generated scenario includes working examples of assertions, multi-turn conversations, and best practices. Use it as a starting point and customize for your needs.
Manual Scenario Creation
If you prefer to create scenarios from scratch or need custom structures, create a file scenarios/basic-test.yaml:
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: basic-customer-inquiry
labels:
category: customer-service
priority: basic
spec:
task_type: support # Links to prompt configuration
description: "Tests customer support responses"
# The conversation turns
turns:
- role: user
content: "What are your business hours?"
assertions:
- type: content_includes
params:
patterns: ["Monday"]
message: "Should mention business days"
params:
max_seconds: 3
message: "Should respond quickly"
- role: user
content: "Do you offer weekend support?"
assertions:
- type: content_includes
params:
patterns: ["Saturday"]
message: "Should mention weekend availability"
Key Components
Scenario Metadata
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: support-test
labels:
category: support
region: us
annotations:
description: "Tests customer support responses"
Spec Section
The spec section contains your test configuration:
spec:
task_type: support
description: "Conversation flow test"
context_metadata:
urgency: high
topic: billing
turns:
# Conversation turns go here
Conversation Turns
Define user messages and expected responses:
spec:
turns:
# Basic turn
- role: user
content: "Hello, I need help"
# Turn with assertions
- role: user
content: "What's the refund policy?"
assertions:
- type: content_includes
params:
patterns: ["30 days"]
message: "Should mention refund period"
Common Patterns
Multi-turn Conversations
Test conversation flow and context retention:
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: multi-turn-support-dialog
spec:
task_type: support
turns:
- role: user
content: "I'm having issues with my account"
- role: user
content: "It won't let me log in"
assertions:
- type: content_includes
params:
patterns: ["password"]
message: "Should offer password reset"
- role: user
content: "I already tried that"
assertions:
- type: content_includes
params:
patterns: ["help"]
message: "Should provide alternative help"
Tool/Function Calling
Test tool integration:
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: weather-query-with-tool
spec:
task_type: assistant
turns:
- role: user
content: "What's the weather in San Francisco?"
assertions:
- type: tools_called
params:
tools: ["get_weather"]
message: "Should call weather tool"
- type: content_includes
params:
patterns: ["temperature"]
message: "Should mention temperature"
Advanced Features
Fixtures
Reuse common data across scenarios:
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: account-specific-test
spec:
task_type: support
fixtures:
sample_user:
name: "Jane Smith"
account_id: "12345"
plan: "Premium"
system_template: |
You are helping user {{fixtures.sample_user.name}}
(Account: {{fixtures.sample_user.account_id}}).
turns:
- role: user
content: "What's my current plan?"
assertions:
- type: content_includes
params:
patterns: ["Premium"]
message: "Should reference user's Premium plan"
Conditional Assertions
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: order-status-check
spec:
task_type: support
turns:
- role: user
content: "Check my order status"
assertions:
- type: content_includes
params:
patterns: ["shipped"]
message: "Should mention shipped status"
Multiple Providers
Test the same scenario across different models:
# In arena.yaml
scenarios:
- path: ./scenarios/cross-provider.yaml
providers: [openai-gpt4, claude-sonnet, gemini-pro]
Best Practices
1. Use Descriptive Names
# Good
- name: "Escalation: Frustrated customer with billing issue"
# Avoid
- name: "Test 1"
2. Tag Appropriately
tags: [billing, high-priority, multi-turn, tool-calling]
Useful for selective test runs:
# Run only high-priority tests
promptarena run --scenario high-priority
3. Balance Specificity
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: balance-check-flexible
spec:
task_type: support
turns:
# Too specific (brittle)
- role: user
content: "What's my account balance?"
assertions:
patterns: ["Your account balance is exactly $42.00"]
message: "Exact match - too brittle"
# Better (flexible)
- role: user
content: "What's my account balance?"
assertions:
- type: content_includes
params:
patterns: ["account balance"]
message: "Should mention balance"
- type: content_matches
params:
pattern: '\$\d+\.\d{2}'
message: "Should include dollar amount"
4. Test Edge Cases
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: edge-case-testing
spec:
task_type: support
fixtures:
long_patterns: ["Very long text here..."] # 10,000 chars
turns:
- role: user
content: ""
assertions:
- type: content_includes
params:
patterns: ["help"]
message: "Should handle empty input gracefully"
- role: user
content: "{{fixtures.long_text}}"
assertions:
- type: content_not_empty
params:
message: "Should respond to very long input"
File Organization
Structure your scenarios for maintainability:
scenarios/
├── customer-support/
│ ├── basic-inquiries.yaml
│ ├── billing-issues.yaml
│ └── escalations.yaml
├── content-generation/
│ ├── blog-posts.yaml
│ └── social-media.yaml
└── tool-integration/
├── weather.yaml
└── database.yaml
Examples
See complete working examples in the examples directory.
Next Steps
- Configure Providers - Set up LLM providers
- Validate Outputs - Use assertions and validators
- Scenario Format Reference - Complete format specification
Related Documentation
- Assertions Reference - All available assertion types
- Validators Reference - Custom validation logic
- Tutorial: First Test - Step-by-step guide