Validators Reference
Validators Reference
Section titled “Validators Reference”Validators (also called guardrails) are runtime checks that enforce policies on LLM responses. Unlike assertions that verify test expectations, validators actively prevent policy violations and can abort streaming responses early.
Validators vs Assertions
Section titled “Validators vs Assertions”graph TB subgraph "Validators (Guardrails)" V1["Runtime Enforcement"] V2["Can Abort Streaming"] V3["Policy Compliance"] V4["Defined in PromptConfig"] end
subgraph "Assertions" A1["Test Verification"] A2["Check After Complete"] A3["Verify Expectations"] A4["Defined in Scenario"] end
style V1 fill:#f99 style A1 fill:#9f9Key Differences:
| Aspect | Validators | Assertions |
|---|---|---|
| Purpose | Enforce policies | Verify behavior |
| When | During generation | After generation |
| Streaming | Can abort early | Runs after complete |
| Defined In | PromptConfig | Scenario |
| Failure | Policy violation | Test failure |
How Validators Work
Section titled “How Validators Work”Non-Streaming Mode
Section titled “Non-Streaming Mode”sequenceDiagram participant LLM participant Response participant Validators participant Result
LLM->>Response: Generate Complete Response Response->>Validators: Validate Content
alt Validation Passes Validators-->>Result: ✅ Return Response else Validation Fails Validators-->>Result: ❌ Validation Error endStreaming Mode
Section titled “Streaming Mode”sequenceDiagram participant LLM participant Stream participant Validators participant Result
LLM->>Stream: Start Streaming
loop Each Chunk Stream->>Validators: Validate Chunk
alt Validation OK Validators-->>Result: ✅ Forward Chunk else Validation Fails Validators->>Stream: ❌ Abort Stream Stream-->>Result: Return Partial + Error end endBenefits of Streaming Validation:
- Catch violations immediately
- Save API costs by aborting early
- Faster failure detection
- Prevent bad content from reaching users
Validator Structure
Section titled “Validator Structure”Validators are defined in PromptConfig:
# prompts/my-prompt.yamlspec: system_template: | Your system prompt here...
validators: - type: validator_name # Required: Validator type params: # Required: Type-specific params param1: value1 param2: value2 message: "Policy description" # Optional: Violation messageAvailable Validators
Section titled “Available Validators”Content Safety Validators
Section titled “Content Safety Validators”banned_words
Section titled “banned_words”Prevents responses containing banned words or phrases.
Use Cases:
- Avoid making absolute promises
- Prevent inappropriate language
- Enforce brand guidelines
- Maintain professional tone
Parameters:
words(array): List of banned words/phrases (case-insensitive)
Streaming: ✅ Yes - Aborts immediately when banned word detected
Example:
validators: - type: banned_words params: words: - guarantee - promise - definitely - "100%" - absolutely - always - never message: "Avoid absolute promises"How It Works:
- Case-insensitive matching
- Word boundary detection (won’t match partial words)
- “guarantee” matches “I guarantee” but not “guaranteed”
- Checks accumulated content in streaming mode
Real-World Example (Customer Support):
validators: - type: banned_words params: words: # Absolute promises - guarantee - promise - definitely - "100%" - absolutely - certainly
# Inappropriate for support - stupid - idiot - dumb
# Avoid legal liability - sue - lawsuit - lawyer message: "Maintain professional tone and avoid absolute promises"Validation Details:
{ "passed": false, "details": ["guarantee", "definitely"]}max_length
Section titled “max_length”Enforces maximum response length.
Use Cases:
- Keep responses concise
- Control API costs
- Enforce UX constraints
- Prevent rambling
Parameters:
max_characters(int): Maximum character countmax_tokens(int): Maximum token count (approximate)
Streaming: ✅ Yes - Aborts when limit exceeded
Example:
validators: - type: max_length params: max_characters: 1000 max_tokens: 250 message: "Keep responses under 250 tokens"Token Estimation:
- If provider returns token count: uses exact count
- Otherwise: estimates as
characters / 4 - Streaming: uses
chunk.TokenCountif available
Both Limits:
# Enforce both character and token limitsvalidators: - type: max_length params: max_characters: 2000 # Hard limit max_tokens: 500 # API limitValidation Details:
{ "passed": false, "details": { "character_count": 1250, "max_characters": 1000, "token_count": 312, "max_tokens": 250 }}Structure Validators
Section titled “Structure Validators”max_sentences
Section titled “max_sentences”Enforces maximum number of sentences in response.
Use Cases:
- Enforce conciseness
- Maintain consistent response length
- UI constraints (e.g., chat bubbles)
Parameters:
max_sentences(int): Maximum sentence count
Streaming: ❌ No - Requires complete response
Example:
validators: - type: max_sentences params: max_sentences: 5 message: "Keep responses to 5 sentences or less"Sentence Counting:
- Splits on
.,!,? - Handles common abbreviations (Dr., Mr., etc.)
- Counts non-empty sentences
Example Scenarios:
# ✅ 3 sentences - PASS"Hello! How can I help? Let me know."
# ❌ 6 sentences - FAIL"Hello! How are you? I can help with that. Let me check. I'll get back to you. Please wait."Validation Details:
{ "passed": false, "details": { "count": 6, "max": 5 }}required_fields
Section titled “required_fields”Ensures response contains required fields/information.
Use Cases:
- Verify structured responses
- Ensure key information is provided
- Enforce response templates
Parameters:
required_fields(array): List of required text fields
Streaming: ❌ No - Requires complete response
Example:
validators: - type: required_fields params: required_fields: - "order number" - "tracking number" - "estimated delivery" message: "Must provide order, tracking, and delivery info"Use Case (Support Ticket):
validators: - type: required_fields params: required_fields: - "ticket number" - "priority" - "assigned to" message: "Support tickets must include number, priority, and assignment"Validation Details:
{ "passed": false, "details": { "missing": ["tracking number", "estimated delivery"] }}commit
Section titled “commit”Validates structured decision/commit blocks in responses.
Use Cases:
- Enforce decision documentation
- Ensure reasoning is captured
- Structured thinking responses
Parameters:
must_end_with_commit(bool): Response must end with commit blockcommit_fields(array): Required fields in commit block
Streaming: ❌ No - Requires complete response
Example:
validators: - type: commit params: must_end_with_commit: true commit_fields: - decision - reasoning - next step message: "Must end with structured decision block"Expected Format:
Your analysis here...
Decision: Approve the requestReasoning: Meets all criteria and within policyNext Step: Process paymentUse Case (Agent Decision Making):
validators: - type: commit params: must_end_with_commit: true commit_fields: - decision - confidence - actionValidation Details:
{ "passed": false, "details": { "error": "missing commit structure", "missing_fields": ["decision", "reasoning"] }}Combining Validators
Section titled “Combining Validators”Multiple validators can enforce different policies:
validators: # Content safety - type: banned_words params: words: ["guarantee", "promise"] message: "No absolute promises"
# Length control - type: max_length params: max_characters: 1000 max_tokens: 250 message: "Stay concise"
# Structure - type: max_sentences params: max_sentences: 5 message: "Maximum 5 sentences"Execution Order:
- Streaming validators run during generation
- Non-streaming validators run after completion
- First failure stops validation chain
Streaming Validator Behavior
Section titled “Streaming Validator Behavior”Immediate Abort
Section titled “Immediate Abort”Streaming validators abort generation immediately:
graph LR Start["Start"] --> C1["Chunk 1<br/>✅ OK"] C1 --> C2["Chunk 2<br/>✅ OK"] C2 --> C3["Chunk 3<br/>❌ Banned Word!"] C3 --> Abort["Abort Stream"]
style C3 fill:#f99 style Abort fill:#f99Benefits:
- Saves API costs (no wasted tokens)
- Faster failure detection
- Prevents bad content in UI
Example:
# Response starts streaming..."I can help with that. Let me guarantee that..." ↑ Validator detects "guarantee" Aborts immediately Returns partial response + errorCost Savings
Section titled “Cost Savings”Without Streaming Validation: Generate 500 tokens @ $0.03/1K = $0.015 Detect violation after complete = $0.015 wasted
With Streaming Validation: Generate 100 tokens @ $0.03/1K = $0.003 Detect violation early = $0.012 savedValidator Error Handling
Section titled “Validator Error Handling”In Test Mode (Arena)
Section titled “In Test Mode (Arena)”Validator failures are captured in test results:
{ "scenario": "test-case", "turn": 3, "status": "failed", "error": { "type": "validation_error", "validator": "banned_words", "message": "Avoid absolute promises", "details": ["guarantee"] }}In Production (SDK)
Section titled “In Production (SDK)”Configure behavior via fail_on:
# arena.yamldefaults: fail_on: - validation_error # Treat as test failure # OR omit to allow violationsBest Practices
Section titled “Best Practices”1. Layer Validators
Section titled “1. Layer Validators”validators: # Fast checks first (streaming) - type: banned_words params: words: ["inappropriate"]
- type: max_length params: max_tokens: 500
# Slow checks last (post-completion) - type: max_sentences params: max_sentences: 102. Use Meaningful Messages
Section titled “2. Use Meaningful Messages”# ❌ Generic message- type: banned_words params: words: ["guarantee"] message: "Invalid content"
# ✅ Specific guidance- type: banned_words params: words: ["guarantee"] message: "Avoid absolute promises. Use phrases like 'we'll do our best' instead."3. Test Validators
Section titled “3. Test Validators”Create scenarios specifically to test guardrails:
# scenarios/guardrail-tests.yamlspec: turns: - role: user content: "Will this definitely work?" assertions: - type: guardrail_triggered params: guardrail: banned_words assertions: true message: "Should catch 'definitely'"4. Balance Safety and Utility
Section titled “4. Balance Safety and Utility”# ❌ Too restrictive - blocks useful contentvalidators: - type: banned_words params: words: ["can", "will", "should"] # Too common!
# ✅ Target specific policy violationsvalidators: - type: banned_words params: words: ["guarantee", "promise", "definitely", "100%"]5. Document Policies
Section titled “5. Document Policies”spec: system_template: | You are a support agent.
IMPORTANT: Our validators enforce these policies: - No absolute promises (guarantee, definitely, etc.) - Max 250 tokens per response - Max 5 sentences - Professional tone required
validators: - type: banned_words params: words: ["guarantee", "definitely"] - type: max_length params: max_tokens: 250 - type: max_sentences params: max_sentences: 5Troubleshooting
Section titled “Troubleshooting”Validator Always Triggers
Section titled “Validator Always Triggers”Check:
- Are banned words too common?
- Is length limit reasonable?
- Is sentence count realistic?
# ❌ Too strictmax_sentences: 1max_tokens: 50
# ✅ Reasonablemax_sentences: 8max_tokens: 500Validator Never Triggers
Section titled “Validator Never Triggers”Check:
- Is validator registered?
- Are params correct?
- Is validator type spelled correctly?
# ❌ Typotype: max_lenght
# ✅ Correcttype: max_lengthStreaming Abort Issues
Section titled “Streaming Abort Issues”Check:
- Does validator support streaming?
- Is streaming enabled for provider?
- Are chunks being validated?
# Check validator streaming supportbanned_words: ✅ Streamingmax_length: ✅ Streamingmax_sentences: ❌ Post-completion onlyrequired_fields: ❌ Post-completion onlycommit: ❌ Post-completion onlyPerformance Considerations
Section titled “Performance Considerations”Validator Speed
Section titled “Validator Speed”# Fast (O(n))- banned_words- max_length
# Medium (O(n))- max_sentences- required_fields
# Slow (O(n²))- complex regex in custom validatorsOptimization Tips
Section titled “Optimization Tips”- Order matters: Put fast validators first
- Streaming: Use streaming validators when possible
- Specific patterns: Avoid overly broad checks
- Batch validation: Combine related checks
Custom Validators
Section titled “Custom Validators”While PromptArena provides built-in validators, the SDK allows custom validators:
// Custom validator example (SDK usage)type CustomValidator struct {}
func (v *CustomValidator) Validate(content string, params map[string]interface{}) ValidationResult { // Your validation logic return ValidationResult{ Passed: true, Details: nil, }}Register in SDK:
registry.Register("custom_validator", NewCustomValidator)Use in prompt:
validators: - type: custom_validator params: your_param: valueNext Steps
Section titled “Next Steps”- Assertions Reference - Test verification
- Configuration Reference - Full config docs
- Best Practices - Production tips
Examples: See examples/customer-support/prompts/support-bot.yaml for real-world validator usage.