Validate Outputs
Learn how to use assertions and validators to verify LLM responses.
Overview
Section titled “Overview”PromptArena provides built-in assertions and custom validators to verify that LLM responses meet your quality requirements.
Built-in Assertions
Section titled “Built-in Assertions”Content Assertions
Section titled “Content Assertions”Contains
Section titled “Contains”Check if response includes specific text:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: business-hours-check
spec: turns: - role: user content: "What are your business hours?" assertions: - type: content_includes params: patterns: ["Monday"] message: "Should mention Monday"
- type: content_includes params: patterns: ["9 AM"] message: "Should include opening time"Regex Match
Section titled “Regex Match”Pattern matching:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: email-validation
spec: turns: - role: user content: "What's the support email?" assertions: - type: content_matches params: pattern: '[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}' message: "Should contain valid email"Negative Pattern Matching
Section titled “Negative Pattern Matching”Ensure specific content is absent using negative lookahead:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: product-description
spec: turns: - role: user content: "Describe our product" assertions: - type: content_matches params: pattern: "^(?!.*competitor).*$" message: "Should not mention competitors"Structural Assertions
Section titled “Structural Assertions”JSON Structure
Section titled “JSON Structure”apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: json-validation
spec: turns: - role: user content: "Return user data as JSON" assertions: - type: is_valid_json params: message: "Should return valid JSON"
- type: json_schema params: schema: type: object required: [name, email] properties: name: type: string email: type: string message: "Should match user schema"Behavioral Assertions
Section titled “Behavioral Assertions”Tool Calling
Section titled “Tool Calling”apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: weather-tool-check
spec: turns: - role: user content: "What's the weather in Paris?" assertions: - type: tools_called params: tools: ["get_weather"] message: "Should call weather tool"
# Conversation-level assertion to check tool arguments conversation_assertions: - type: tool_calls_with_args params: tool: "get_weather" expected_args: location: "Paris" message: "Should pass Paris as location"Context Retention
Section titled “Context Retention”apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: context-memory
spec: turns: - role: user content: "My name is Alice"
- role: user content: "What's my name?" assertions: - type: content_includes params: patterns: ["Alice"] message: "Should remember user's name"Assertion Combinations
Section titled “Assertion Combinations”AND Logic (All must pass)
Section titled “AND Logic (All must pass)”turns: - user: "Provide customer support response" assertions: - type: content_includes params: patterns: ["thank you", "help"] message: "Must be helpful" - type: llm_judge params: criteria: "Response has positive sentiment" judge_provider: "openai/gpt-4o-mini" message: "Must be positive" - type: content_matches params: pattern: "^.{1,500}$" message: "Must be under 500 characters" # All assertions must passConditional Assertions
Section titled “Conditional Assertions”apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: order-status-conditional
spec: turns: - role: user content: "Check order status" assertions: # Always validate - type: content_includes params: patterns: ["order"] message: "Should mention order"
# Additional checks based on order status - type: content_includes params: patterns: ["shipped"] message: "Should mention shipping if shipped"Testing Strategies
Section titled “Testing Strategies”Progressive Validation
Section titled “Progressive Validation”Start with basic assertions, add complexity:
# Level 1: Basic structure- type: content_matches params: pattern: ".+" message: "Must not be empty"
# Level 2: Content presence- type: content_includes params: patterns: ["customer service"] message: "Must mention customer service"
# Level 3: Quality checks- type: llm_judge params: criteria: "Response has positive sentiment" judge_provider: "openai/gpt-4o-mini" message: "Must be positive"- type: llm_judge params: criteria: "Response maintains professional tone" judge_provider: "openai/gpt-4o-mini" message: "Must be professional"
# Level 4: Custom business logic- type: llm_judge params: criteria: "Response complies with brand guidelines" judge_provider: "openai/gpt-4o-mini" message: "Must meet brand compliance"
# Level 5: External evaluation service- type: rest_eval params: url: "https://eval-service.example.com/evaluate" headers: Authorization: "Bearer ${EVAL_API_KEY}" criteria: "Response meets compliance requirements" min_score: 0.9 message: "External compliance check"Quality Gates
Section titled “Quality Gates”Define must-pass criteria:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: critical-path-test
spec: task_type: critical
turns: - role: user content: "Important customer query" assertions: - type: content_includes params: patterns: ["critical terms"] message: "Must include critical terms"Regression Testing
Section titled “Regression Testing”Track quality over time:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: baseline-quality-check
spec: turns: - role: user content: "Standard query" assertions: - type: llm_judge params: criteria: "Response quality is above baseline expectations" judge_provider: "openai/gpt-4o-mini" message: "Quality should be above baseline"Output Reports
Section titled “Output Reports”View validation results:
# JSON report with detailed assertion resultspromptarena run --format json
# HTML report with visual pass/failpromptarena run --format html
# JUnit XML for CI integrationpromptarena run --format junitExample JSON output:
{ "test_case": "Customer Support Response", "turn": 1, "assertions": [ { "type": "content_includes", "expected": "thank you", "passed": true }, { "type": "llm_judge", "expected": "pass", "actual": "pass", "passed": true } ], "overall_pass": true}Best Practices
Section titled “Best Practices”1. Layer Assertions
Section titled “1. Layer Assertions”# Structure first- type: is_valid_json params: message: "Must be valid JSON"- type: content_matches params: pattern: ".+" message: "Must not be empty"
# Then content- type: content_includes params: patterns: ["expected data"] message: "Must contain expected data"
# Finally quality- type: llm_judge params: criteria: "Response follows business rules and policies" judge_provider: "openai/gpt-4o-mini" message: "Must follow business rules"2. Balance Strictness
Section titled “2. Balance Strictness”# Too strict (brittle)- type: content_matches params: pattern: "^Thank you for contacting AcmeCorp support\\.$" message: "Exact match required"
# Better (flexible)- type: content_includes params: patterns: ["thank", "AcmeCorp", "support"] message: "Must acknowledge support contact"- type: llm_judge params: criteria: "Response has positive sentiment" judge_provider: "openai/gpt-4o-mini" message: "Must be positive"3. Meaningful Error Messages
Section titled “3. Meaningful Error Messages”assertions: - type: content_includes params: patterns: ["30 days"] message: "Refund responses must mention 30-day policy"4. Test Validators
Section titled “4. Test Validators”# Run with verbose output to debug validatorspromptarena run --verbose --scenario validator-testNext Steps
Section titled “Next Steps”- Checks Reference — All check types and parameters
- Integrate CI/CD — Automate validation in pipelines
- Assertions Reference — Assertion syntax and configuration
- Guardrails Reference — Runtime policy enforcement
- Unified Check Model — How assertions, guardrails, and evals relate
Examples
Section titled “Examples”See validation examples:
examples/assertions-test/— All assertion typesexamples/guardrails-test/— Guardrail patternsexamples/customer-support/— Real-world validation patterns