Guardrails Test Example
Guardrails Test Example
Section titled “Guardrails Test Example”This example demonstrates the guardrail assertion feature, which allows you to test whether validators (guardrails) trigger as expected in your prompt configurations.
Overview
Section titled “Overview”The guardrail_triggered assertion type enables you to:
- Verify guardrails trigger when they should - Test that your validators catch problematic inputs
- Verify guardrails don’t trigger when they shouldn’t - Ensure clean inputs pass through without false positives
- Test in non-production mode - Use
suppress_validation_exceptions: trueto allow execution to continue after validation failures, so you can assert on the guardrail behavior
Key Concept: SuppressValidationExceptions
Section titled “Key Concept: SuppressValidationExceptions”By default, when a validator fails, it throws a ValidationError and halts execution. This is the correct behavior for production.
For testing purposes, Arena automatically enables SuppressValidationExceptions mode in the validator middleware. This allows:
- The validator to run and record its result (pass/fail)
- Execution to continue even if validation fails
- Assertions to inspect whether the guardrail triggered
Important: This suppression behavior is built into Arena’s pipeline construction, not configured in the PromptConfig. Your production prompt configurations remain unchanged - they use the same validator definitions for both production and testing.
Example Structure
Section titled “Example Structure”guardrails-test/├── arena.yaml # Test scenarios with guardrail_triggered assertions├── prompts/│ └── content-filter.yaml # Prompt with banned_words validator└── providers/ └── openai.yaml # OpenAI provider configurationConfiguration Details
Section titled “Configuration Details”Prompt Configuration (prompts/content-filter.yaml)
Section titled “Prompt Configuration (prompts/content-filter.yaml)”The prompt includes a banned_words validator - the same configuration used in production:
validators: - type: banned_words params: words: - damn - crap - hell case_sensitive: falseNote: No special test-only flags are needed in the PromptConfig. Arena’s test framework automatically enables suppression mode when running validators.
Test Scenarios (arena.yaml)
Section titled “Test Scenarios (arena.yaml)”Four test scenarios demonstrate different assertion patterns:
guardrail-should-trigger: Input contains banned words → expect validator to triggerguardrail-should-not-trigger: Clean input → expect validator not to triggermultiple-violations: Multiple banned words → expect validator to triggerstreaming-guardrail-trigger: Tests guardrail in streaming mode → expect validator to trigger and interrupt stream
Each scenario uses the guardrail_triggered assertion:
assertions: - type: guardrail_triggered validator: banned_words # Name of the validator to check should_trigger: true # Expected behavior (true = should fail, false = should pass) message: "Descriptive message for test output"Running the Tests
Section titled “Running the Tests”-
Set up your OpenAI API key:
Terminal window export OPENAI_API_KEY="your-api-key-here" -
Run the Arena tests:
Terminal window promptarena run examples/guardrails-test/arena.yaml
Expected Results
Section titled “Expected Results”- ✅ guardrail-should-trigger: PASS (validator triggered as expected)
- ✅ guardrail-should-not-trigger: PASS (validator did not trigger as expected)
- ✅ multiple-violations: PASS (validator triggered on multiple violations as expected)
- ✅ streaming-guardrail-trigger: PASS (validator triggered in streaming mode and interrupted stream as expected)
Streaming Mode Support
Section titled “Streaming Mode Support”The streaming-guardrail-trigger scenario demonstrates how guardrails work with streaming responses:
- Real-time validation: Validators process each chunk as it arrives
- Stream interruption: When a validation fails, the stream is immediately interrupted
- Suppression behavior: With suppression enabled (Arena test mode), the stream interrupts but no error is thrown
- Recorded results: Validation failures are still recorded in metadata for assertions to inspect
This ensures that guardrails work correctly in both regular and streaming execution modes.
How It Works
Section titled “How It Works”-
Execution Phase:
- User input is processed through the prompt
- Validators run (Arena automatically enables suppression mode)
- Validation results are recorded in execution context, but errors are suppressed
- LLM generates a response
-
Assertion Phase:
- The
guardrail_triggeredassertion inspects the execution context - It finds the last assistant message and its validation results
- It checks if the specified validator passed or failed
- It compares the actual result against the
should_triggerexpectation
- The
-
Test Outcome:
- If actual behavior matches expectation → Test PASS
- If actual behavior differs from expectation → Test FAIL with descriptive error
Use Cases
Section titled “Use Cases”This pattern is valuable for:
- Regression testing: Ensure guardrails continue to work as expected over time
- Configuration validation: Verify validator configs (banned word lists, patterns, etc.) are correct
- Coverage testing: Confirm edge cases are properly handled by your guardrails
- CI/CD integration: Automated testing of prompt safety measures
Production vs Test Mode
Section titled “Production vs Test Mode”Production Mode (SDK/Conversation API - default):
// Production code uses DynamicValidatorMiddleware with default behaviormiddleware.DynamicValidatorMiddleware(registry)- Validation failures throw errors immediately
- Execution halts on first validation failure
- Appropriate for live user-facing systems
Test Mode (Arena test framework):
// Arena automatically uses suppression modemiddleware.DynamicValidatorMiddlewareWithSuppression(registry, true)- Validation failures are logged but don’t throw errors
- Execution continues so assertions can inspect results
- Appropriate for automated testing and development
The same PromptConfig works in both modes - no test-specific configuration needed!
Related Documentation
Section titled “Related Documentation”- GUARDRAIL_ASSERTION_PROPOSAL.md - Original proposal
- GitHub Issue #25 - Implementation tracking
- Arena User Guide - General Arena testing documentation