Validate Outputs
Learn how to use assertions and validators to verify LLM responses.
Overview
Section titled “Overview”PromptArena provides built-in assertions and custom validators to verify that LLM responses meet your quality requirements.
Built-in Assertions
Section titled “Built-in Assertions”Content Assertions
Section titled “Content Assertions”Contains
Section titled “Contains”Check if response includes specific text:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: business-hours-check
spec: turns: - role: user content: "What are your business hours?" assertions: - type: content_includes params: patterns: ["Monday"] message: "Should mention Monday"
- type: content_includes params: patterns: ["9 AM"] message: "Should include opening time"Regex Match
Section titled “Regex Match”Pattern matching:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: email-validation
spec: turns: - role: user content: "What's the support email?" assertions: - type: content_matches params: pattern: '[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}' message: "Should contain valid email"Not Containstern Matching
Section titled “Not Containstern Matching”Ensure specific content is absent using negative lookahead:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: product-description
spec: turns: - role: user content: "Describe our product" assertions: - type: content_matches params: pattern: "^(?!.*competitor).*$" message: "Should not mention competitors"Structural Assertions
Section titled “Structural Assertions”JSON Structure
Section titled “JSON Structure”apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: json-validation
spec: turns: - role: user content: "Return user data as JSON" assertions: - type: is_valid_json params: message: "Should return valid JSON"
- type: json_schema params: schema: type: object required: [name, email] properties: name: type: string email: type: string message: "Should match user schema"Behavioral Assertions
Section titled “Behavioral Assertions”Tool Calling
Section titled “Tool Calling”apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: weather-tool-check
spec: turns: - role: user content: "What's the weather in Paris?" assertions: - type: tools_called params: tools: ["get_weather"] message: "Should call weather tool"
# Conversation-level assertion to check tool arguments conversation_assertions: - type: tool_calls_with_args params: tool: "get_weather" expected_args: location: "Paris" message: "Should pass Paris as location"Context Retention
Section titled “Context Retention”apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: context-memory
spec: turns: - role: user content: "My name is Alice"
- role: user content: "What's my name?" assertions: - type: content_includes params: patterns: ["Alice"] message: "Should remember user's name"Custom Validators
Section titled “Custom Validators”Create custom validation logic for complex requirements.
Validator File Structure
Section titled “Validator File Structure”# validators/custom-validators.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Toolmetadata: name: custom-validators
spec: type: validator
validators: - name: check_pii_removal description: "Ensures no PII in responses" language: python script: | import re
def validate(response, context): # Check for email addresses if re.search(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', response): return False, "Email address found in response"
# Check for phone numbers if re.search(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b', response): return False, "Phone number found in response"
# Check for SSN patterns if re.search(r'\b\d{3}-\d{2}-\d{4}\b', response): return False, "SSN pattern found in response"
return True, "No PII detected"Use Custom Validators
Section titled “Use Custom Validators”# arena.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Arenametadata: name: pii-testing-arena
spec: validators: - path: ./validators/custom-validators.yaml
scenarios: - path: ./scenarios/pii-test.yaml
# In scenarioapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: pii-test
spec: turns: - role: user content: "Tell me about user John Doe" assertions: - type: custom_validator params: validator: check_pii_removal message: "Should not contain PII"Advanced Validator Examples
Section titled “Advanced Validator Examples”Brand Consistency
Section titled “Brand Consistency”validators: - name: brand_check type: script language: python script: | def validate(response, context): brand_terms = { "our company": "AcmeCorp", "our product": "SuperWidget", }
for wrong, correct in brand_terms.items(): if wrong.lower() in response.lower(): return False, f"Use '{correct}' instead of '{wrong}'"
return True, "Brand terms correct"Factual Accuracy (with external data)
Section titled “Factual Accuracy (with external data)”apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Toolmetadata: name: fact-checker
spec: type: validator
validators: - name: fact_check language: python script: | import json
def validate(response, context): facts = context.get("known_facts", {})
for key, value in facts.items(): if key in response and str(value) not in response: return False, f"Incorrect {key}: expected {value}"
return True, "Facts verified"
# Use in scenarioapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: fact-checking-test
spec: fixtures: known_facts: price: "$99" warranty: "2 years"
turns: - role: user content: "What's the warranty period?" assertions: - type: custom_validator params: validator: fact_check message: "Facts should be accurate"Citation Validation
Section titled “Citation Validation”validators: - name: check_citations type: script language: python script: | import re
def validate(response, context): # Require citation format [Source: XYZ] citations = re.findall(r'\[Source: (.+?)\]', response)
if not citations: return False, "No citations found"
# Verify citations are in allowed sources allowed = context.get("allowed_sources", []) for cite in citations: if cite not in allowed: return False, f"Invalid source: {cite}"
return True, f"Found {len(citations)} valid citations"Assertion Combinations
Section titled “Assertion Combinations”AND Logic (All must pass)
Section titled “AND Logic (All must pass)”turns: - user: "Provide customer support response" assertions: - type: content_includes params: patterns: ["thank you", "help"] message: "Must be helpful" - type: llm_judge params: criteria: "Response has positive sentiment" judge_provider: "openai/gpt-4o-mini" message: "Must be positive" - type: content_matches params: pattern: "^.{1,500}$" message: "Must be under 500 characters" # All assertions must passConditional Assertions
Section titled “Conditional Assertions”apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: order-status-conditional
spec: turns: - role: user content: "Check order status" assertions: # Always validate - type: content_includes params: patterns: ["order"] message: "Should mention order"
# Additional checks based on order status - type: content_includes params: patterns: ["shipped"] message: "Should mention shipping if shipped"Testing Strategies
Section titled “Testing Strategies”Progressive Validation
Section titled “Progressive Validation”Start with basic assertions, add complexity:
# Level 1: Basic structure- type: content_matches params: pattern: ".+" message: "Must not be empty"
# Level 2: Content presence- type: content_includes params: patterns: ["customer service"] message: "Must mention customer service"
# Level 3: Quality checks- type: llm_judge params: criteria: "Response has positive sentiment" judge_provider: "openai/gpt-4o-mini" message: "Must be positive"- type: llm_judge params: criteria: "Response maintains professional tone" judge_provider: "openai/gpt-4o-mini" message: "Must be professional"
# Level 4: Custom business logic- type: llm_judge params: criteria: "Response complies with brand guidelines" judge_provider: "openai/gpt-4o-mini" message: "Must meet brand compliance"Quality Gates
Section titled “Quality Gates”Define must-pass criteria:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: critical-path-test
spec: task_type: critical
turns: - role: user content: "Important customer query" assertions: - type: content_includes params: patterns: ["critical terms"] message: "Must include critical terms"params: seconds: 1 message: "Must respond within 1 second"
- type: custom_validator params: validator: safety_check message: "Must pass safety check"Regression Testing
Section titled “Regression Testing”Track quality over time:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: baseline-quality-check
spec: turns: - role: user content: "Standard query" assertions: score: 0.85 message: "Quality should be above 85%"Output Reports
Section titled “Output Reports”View validation results:
# JSON report with detailed assertion resultspromptarena run --format json
# HTML report with visual pass/failpromptarena run --format html
# JUnit XML for CI integrationpromptarena run --format junitExample JSON output:
{ "test_case": "Customer Support Response", "turn": 1, "assertions": [ { "type": "contains", "expected": "thank you", "passed": true }, { "type": "sentiment", "expected": "positive", "actual": "positive", "passed": true }, { "type": "response_time", "max_seconds": 2, "actual_seconds": 1.3, "passed": true } ], "overall_pass": true}Best Practices
Section titled “Best Practices”1. Layer Assertions
Section titled “1. Layer Assertions”# Structure first- type: is_valid_json params: message: "Must be valid JSON"- type: content_matches params: pattern: ".+" message: "Must not be empty"
# Then content- type: content_includes params: patterns: ["expected data"] message: "Must contain expected data"
# Finally quality- type: llm_judge params: criteria: "Response follows business rules and policies" judge_provider: "openai/gpt-4o-mini" message: "Must follow business rules"2. Balance Strictness
Section titled “2. Balance Strictness”# Too strict (brittle)- type: content_matches params: pattern: "^Thank you for contacting AcmeCorp support\\.$" message: "Exact match required"
# Better (flexible)- type: content_includes params: patterns: ["thank", "AcmeCorp", "support"] message: "Must acknowledge support contact"- type: llm_judge params: criteria: "Response has positive sentiment" judge_provider: "openai/gpt-4o-mini" message: "Must be positive"3. Meaningful Error Messages
Section titled “3. Meaningful Error Messages”validators: - name: check_policy script: | def validate(response, context): if "refund" in response and "30 days" not in response: return False, "Refund responses must mention 30-day policy" return True, "Policy mentioned correctly"4. Test Validators
Section titled “4. Test Validators”# Run with verbose output to debug validatorspromptarena run --verbose --scenario validator-testNext Steps
Section titled “Next Steps”- Integrate CI/CD - Automate validation in pipelines
- Assertions Reference - Complete assertion catalog
- Validators Reference - Validator API details
Examples
Section titled “Examples”See validation examples:
examples/assertions-test/- All assertion typesexamples/guardrails-test/- Custom validatorsexamples/customer-support/- Real-world validation patterns