Validate Outputs
Learn how to use assertions and validators to verify LLM responses.
Overview
PromptArena provides built-in assertions and custom validators to verify that LLM responses meet your quality requirements.
Built-in Assertions
Content Assertions
Contains
Check if response includes specific text:
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: business-hours-check
spec:
turns:
- role: user
content: "What are your business hours?"
assertions:
- type: content_includes
params:
patterns: ["Monday"]
message: "Should mention Monday"
- type: content_includes
params:
patterns: ["9 AM"]
message: "Should include opening time"
Regex Match
Pattern matching:
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: email-validation
spec:
turns:
- role: user
content: "What's the support email?"
assertions:
- type: content_matches
params:
pattern: '[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}'
message: "Should contain valid email"
Not Containstern Matching
Ensure specific content is absent using negative lookahead:
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: product-description
spec:
turns:
- role: user
content: "Describe our product"
assertions:
- type: content_matches
params:
pattern: "^(?!.*competitor).*$"
message: "Should not mention competitors"
Structural Assertions
JSON Structure
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: json-validation
spec:
turns:
- role: user
content: "Return user data as JSON"
assertions:
- type: is_valid_json
params:
message: "Should return valid JSON"
- type: json_schema
params:
schema:
type: object
required: [name, email]
properties:
name:
type: string
email:
type: string
message: "Should match user schema"
Behavioral Assertions
Tool Calling
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: weather-tool-check
spec:
turns:
- role: user
content: "What's the weather in Paris?"
assertions:
- type: tools_called
params:
tools: ["get_weather"]
message: "Should call weather tool"
# Conversation-level assertion to check tool arguments
conversation_assertions:
- type: tool_calls_with_args
params:
tool: "get_weather"
expected_args:
location: "Paris"
message: "Should pass Paris as location"
Context Retention
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: context-memory
spec:
turns:
- role: user
content: "My name is Alice"
- role: user
content: "What's my name?"
assertions:
- type: content_includes
params:
patterns: ["Alice"]
message: "Should remember user's name"
Custom Validators
Create custom validation logic for complex requirements.
Validator File Structure
# validators/custom-validators.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Tool
metadata:
name: custom-validators
spec:
type: validator
validators:
- name: check_pii_removal
description: "Ensures no PII in responses"
language: python
script: |
import re
def validate(response, context):
# Check for email addresses
if re.search(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', response):
return False, "Email address found in response"
# Check for phone numbers
if re.search(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b', response):
return False, "Phone number found in response"
# Check for SSN patterns
if re.search(r'\b\d{3}-\d{2}-\d{4}\b', response):
return False, "SSN pattern found in response"
return True, "No PII detected"
Use Custom Validators
# arena.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Arena
metadata:
name: pii-testing-arena
spec:
validators:
- path: ./validators/custom-validators.yaml
scenarios:
- path: ./scenarios/pii-test.yaml
# In scenario
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: pii-test
spec:
turns:
- role: user
content: "Tell me about user John Doe"
assertions:
- type: custom_validator
params:
validator: check_pii_removal
message: "Should not contain PII"
Advanced Validator Examples
Brand Consistency
validators:
- name: brand_check
type: script
language: python
script: |
def validate(response, context):
brand_terms = {
"our company": "AcmeCorp",
"our product": "SuperWidget",
}
for wrong, correct in brand_terms.items():
if wrong.lower() in response.lower():
return False, f"Use '{correct}' instead of '{wrong}'"
return True, "Brand terms correct"
Factual Accuracy (with external data)
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Tool
metadata:
name: fact-checker
spec:
type: validator
validators:
- name: fact_check
language: python
script: |
import json
def validate(response, context):
facts = context.get("known_facts", {})
for key, value in facts.items():
if key in response and str(value) not in response:
return False, f"Incorrect {key}: expected {value}"
return True, "Facts verified"
# Use in scenario
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: fact-checking-test
spec:
fixtures:
known_facts:
price: "$99"
warranty: "2 years"
turns:
- role: user
content: "What's the warranty period?"
assertions:
- type: custom_validator
params:
validator: fact_check
message: "Facts should be accurate"
Citation Validation
validators:
- name: check_citations
type: script
language: python
script: |
import re
def validate(response, context):
# Require citation format [Source: XYZ]
citations = re.findall(r'\[Source: (.+?)\]', response)
if not citations:
return False, "No citations found"
# Verify citations are in allowed sources
allowed = context.get("allowed_sources", [])
for cite in citations:
if cite not in allowed:
return False, f"Invalid source: {cite}"
return True, f"Found {len(citations)} valid citations"
Assertion Combinations
AND Logic (All must pass)
turns:
- user: "Provide customer support response"
assertions:
- type: content_includes
params:
patterns: ["thank you", "help"]
message: "Must be helpful"
- type: llm_judge
params:
criteria: "Response has positive sentiment"
judge_provider: "openai/gpt-4o-mini"
message: "Must be positive"
- type: content_matches
params:
pattern: "^.{1,500}$"
message: "Must be under 500 characters"
# All assertions must pass
Conditional Assertions
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: order-status-conditional
spec:
turns:
- role: user
content: "Check order status"
assertions:
# Always validate
- type: content_includes
params:
patterns: ["order"]
message: "Should mention order"
# Additional checks based on order status
- type: content_includes
params:
patterns: ["shipped"]
message: "Should mention shipping if shipped"
Testing Strategies
Progressive Validation
Start with basic assertions, add complexity:
# Level 1: Basic structure
- type: content_matches
params:
pattern: ".+"
message: "Must not be empty"
# Level 2: Content presence
- type: content_includes
params:
patterns: ["customer service"]
message: "Must mention customer service"
# Level 3: Quality checks
- type: llm_judge
params:
criteria: "Response has positive sentiment"
judge_provider: "openai/gpt-4o-mini"
message: "Must be positive"
- type: llm_judge
params:
criteria: "Response maintains professional tone"
judge_provider: "openai/gpt-4o-mini"
message: "Must be professional"
# Level 4: Custom business logic
- type: llm_judge
params:
criteria: "Response complies with brand guidelines"
judge_provider: "openai/gpt-4o-mini"
message: "Must meet brand compliance"
Quality Gates
Define must-pass criteria:
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: critical-path-test
spec:
task_type: critical
turns:
- role: user
content: "Important customer query"
assertions:
- type: content_includes
params:
patterns: ["critical terms"]
message: "Must include critical terms"
params:
seconds: 1
message: "Must respond within 1 second"
- type: custom_validator
params:
validator: safety_check
message: "Must pass safety check"
Regression Testing
Track quality over time:
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: baseline-quality-check
spec:
turns:
- role: user
content: "Standard query"
assertions:
score: 0.85
message: "Quality should be above 85%"
Output Reports
View validation results:
# JSON report with detailed assertion results
promptarena run --format json
# HTML report with visual pass/fail
promptarena run --format html
# JUnit XML for CI integration
promptarena run --format junit
Example JSON output:
{
"test_case": "Customer Support Response",
"turn": 1,
"assertions": [
{
"type": "contains",
"expected": "thank you",
"passed": true
},
{
"type": "sentiment",
"expected": "positive",
"actual": "positive",
"passed": true
},
{
"type": "response_time",
"max_seconds": 2,
"actual_seconds": 1.3,
"passed": true
}
],
"overall_pass": true
}
Best Practices
1. Layer Assertions
# Structure first
- type: is_valid_json
params:
message: "Must be valid JSON"
- type: content_matches
params:
pattern: ".+"
message: "Must not be empty"
# Then content
- type: content_includes
params:
patterns: ["expected data"]
message: "Must contain expected data"
# Finally quality
- type: llm_judge
params:
criteria: "Response follows business rules and policies"
judge_provider: "openai/gpt-4o-mini"
message: "Must follow business rules"
2. Balance Strictness
# Too strict (brittle)
- type: content_matches
params:
pattern: "^Thank you for contacting AcmeCorp support\\.$"
message: "Exact match required"
# Better (flexible)
- type: content_includes
params:
patterns: ["thank", "AcmeCorp", "support"]
message: "Must acknowledge support contact"
- type: llm_judge
params:
criteria: "Response has positive sentiment"
judge_provider: "openai/gpt-4o-mini"
message: "Must be positive"
3. Meaningful Error Messages
validators:
- name: check_policy
script: |
def validate(response, context):
if "refund" in response and "30 days" not in response:
return False, "Refund responses must mention 30-day policy"
return True, "Policy mentioned correctly"
4. Test Validators
# Run with verbose output to debug validators
promptarena run --verbose --scenario validator-test
Next Steps
- Integrate CI/CD - Automate validation in pipelines
- Assertions Reference - Complete assertion catalog
- Validators Reference - Validator API details
Examples
See validation examples:
examples/assertions-test/- All assertion typesexamples/guardrails-test/- Custom validatorsexamples/customer-support/- Real-world validation patterns
Was this page helpful?