Write Test Scenarios
Learn how to create and structure test scenarios for LLM testing.
Overview
Section titled “Overview”Test scenarios define the conversation flows, expected behaviors, and validation criteria for your LLM applications. Each scenario is a YAML file in the PromptPack format.
Quick Start with Templates
Section titled “Quick Start with Templates”The fastest way to create scenarios is using the project generator:
# Create a new test project with sample scenariopromptarena init my-test --quick --provider mock
# This generates scenarios/basic-test.yaml for youcd my-testcat scenarios/basic-test.yamlThe generated scenario includes working examples of assertions, multi-turn conversations, and best practices. Use it as a starting point and customize for your needs.
Manual Scenario Creation
Section titled “Manual Scenario Creation”If you prefer to create scenarios from scratch or need custom structures, create a file scenarios/basic-test.yaml:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: basic-customer-inquiry labels: category: customer-service priority: basic
spec: task_type: support # Links to prompt configuration description: "Tests customer support responses"
# The conversation turns turns: - role: user content: "What are your business hours?" assertions: - type: content_includes params: patterns: ["Monday"] message: "Should mention business days"params: max_seconds: 3 message: "Should respond quickly"
- role: user content: "Do you offer weekend support?" assertions: - type: content_includes params: patterns: ["Saturday"] message: "Should mention weekend availability"Key Components
Section titled “Key Components”Scenario Metadata
Section titled “Scenario Metadata”apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: support-test labels: category: support region: us annotations: description: "Tests customer support responses"Spec Section
Section titled “Spec Section”The spec section contains your test configuration:
spec: task_type: support description: "Conversation flow test"
context_metadata: urgency: high topic: billing
turns: # Conversation turns go hereRequired Capabilities
Section titled “Required Capabilities”Use required_capabilities to ensure scenarios only run against providers that support specific features:
spec: task_type: capability-test description: "Test vision with streaming" required_capabilities: - vision - streaming streaming: true
turns: - role: user parts: - type: text text: "Describe this image" - type: image media: url: "https://example.com/image.png" mime_type: "image/png"Available Capabilities: text, streaming, vision, tools, json, audio, video, documents, duplex
This scenario will only run against providers that have both vision AND streaming in their capabilities list.
Tool Policy
Section titled “Tool Policy”Use tool_policy to control tool calling behavior:
spec: task_type: test description: "Text-only test without tools" tool_policy: tool_choice: none # Disable tool calling max_tool_calls_per_turn: 0 max_total_tool_calls: 0
turns: - role: user content: "Just respond with text, no tools"Tool Choice Options:
none- Disable tool calling entirelyauto- Let the model decide (default)required- Force tool use<tool_name>- Force specific tool
Example - Require tool use:
spec: tool_policy: tool_choice: required max_tool_calls_per_turn: 3 max_total_tool_calls: 10This is useful when:
- Testing scenarios that should NOT use tools (text, streaming, vision tests)
- Forcing tool usage for tool-specific tests
- Limiting tool calls to prevent runaway loops
Conversation Turns
Section titled “Conversation Turns”Define user messages and expected responses:
spec: turns: # Basic turn - role: user content: "Hello, I need help"
# Turn with assertions - role: user content: "What's the refund policy?" assertions: - type: content_includes params: patterns: ["30 days"] message: "Should mention refund period"Common Patterns
Section titled “Common Patterns”Multi-turn Conversations
Section titled “Multi-turn Conversations”Test conversation flow and context retention:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: multi-turn-support-dialog
spec: task_type: support
turns: - role: user content: "I'm having issues with my account"
- role: user content: "It won't let me log in" assertions: - type: content_includes params: patterns: ["password"] message: "Should offer password reset"
- role: user content: "I already tried that" assertions: - type: content_includes params: patterns: ["help"] message: "Should provide alternative help"Tool/Function Calling
Section titled “Tool/Function Calling”Test tool integration:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: weather-query-with-tool
spec: task_type: assistant
turns: - role: user content: "What's the weather in San Francisco?" assertions: - type: tools_called params: tools: ["get_weather"] message: "Should call weather tool"
- type: content_includes params: patterns: ["temperature"] message: "Should mention temperature"Advanced Features
Section titled “Advanced Features”Fixtures
Section titled “Fixtures”Reuse common data across scenarios:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: account-specific-test
spec: task_type: support
fixtures: sample_user: name: "Jane Smith" account_id: "12345" plan: "Premium"
system_template: | You are helping user {{fixtures.sample_user.name}} (Account: {{fixtures.sample_user.account_id}}).
turns: - role: user content: "What's my current plan?" assertions: - type: content_includes params: patterns: ["Premium"] message: "Should reference user's Premium plan"Conditional Assertions
Section titled “Conditional Assertions”apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: order-status-check
spec: task_type: support
turns: - role: user content: "Check my order status" assertions: - type: content_includes params: patterns: ["shipped"] message: "Should mention shipped status"Multiple Providers
Section titled “Multiple Providers”Test the same scenario across different models:
# In arena.yamlscenarios: - path: ./scenarios/cross-provider.yaml providers: [openai-gpt4, claude-sonnet, gemini-pro]Best Practices
Section titled “Best Practices”1. Use Descriptive Names
Section titled “1. Use Descriptive Names”# Good- name: "Escalation: Frustrated customer with billing issue"
# Avoid- name: "Test 1"2. Tag Appropriately
Section titled “2. Tag Appropriately”tags: [billing, high-priority, multi-turn, tool-calling]Useful for selective test runs:
# Run only high-priority testspromptarena run --scenario high-priority3. Balance Specificity
Section titled “3. Balance Specificity”apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: balance-check-flexible
spec: task_type: support
turns: # Too specific (brittle) - role: user content: "What's my account balance?" assertions: patterns: ["Your account balance is exactly $42.00"] message: "Exact match - too brittle"
# Better (flexible) - role: user content: "What's my account balance?" assertions: - type: content_includes params: patterns: ["account balance"] message: "Should mention balance"
- type: content_matches params: pattern: '\$\d+\.\d{2}' message: "Should include dollar amount"4. Test Edge Cases
Section titled “4. Test Edge Cases”apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: edge-case-testing
spec: task_type: support
fixtures: long_patterns: ["Very long text here..."] # 10,000 chars
turns: - role: user content: "" assertions: - type: content_includes params: patterns: ["help"] message: "Should handle empty input gracefully"
- role: user content: "{{fixtures.long_text}}" assertions: - type: content_not_empty params: message: "Should respond to very long input"File Organization
Section titled “File Organization”Structure your scenarios for maintainability:
scenarios/├── customer-support/│ ├── basic-inquiries.yaml│ ├── billing-issues.yaml│ └── escalations.yaml├── content-generation/│ ├── blog-posts.yaml│ └── social-media.yaml└── tool-integration/ ├── weather.yaml └── database.yamlExamples
Section titled “Examples”See complete working examples in the examples directory.
Next Steps
Section titled “Next Steps”- Configure Providers - Set up LLM providers
- Validate Outputs - Use assertions and validators
- Scenario Format Reference - Complete format specification
Related Documentation
Section titled “Related Documentation”- Assertions Reference - All available assertion types
- Validators Reference - Custom validation logic
- Tutorial: First Test - Step-by-step guide