Guardrails Test Example

This example demonstrates the guardrail assertion feature, which allows you to test whether validators (guardrails) trigger as expected in your prompt configurations.

Overview

The guardrail_triggered assertion type enables you to:

Key Concept: SuppressValidationExceptions

By default, when a validator fails, it throws a ValidationError and halts execution. This is the correct behavior for production.

For testing purposes, Arena automatically enables SuppressValidationExceptions mode in the validator middleware. This allows:

  1. The validator to run and record its result (pass/fail)
  2. Execution to continue even if validation fails
  3. Assertions to inspect whether the guardrail triggered

Important: This suppression behavior is built into Arena’s pipeline construction, not configured in the PromptConfig. Your production prompt configurations remain unchanged - they use the same validator definitions for both production and testing.

Example Structure

guardrails-test/
├── arena.yaml              # Test scenarios with guardrail_triggered assertions
├── prompts/
│   └── content-filter.yaml # Prompt with banned_words validator
└── providers/
    └── openai.yaml         # OpenAI provider configuration

Configuration Details

Prompt Configuration (prompts/content-filter.yaml)

The prompt includes a banned_words validator - the same configuration used in production:

validators:
  - type: banned_words
    params:
      words:
        - damn
        - crap
        - hell
      case_sensitive: false

Note: No special test-only flags are needed in the PromptConfig. Arena’s test framework automatically enables suppression mode when running validators.

Test Scenarios (arena.yaml)

Four test scenarios demonstrate different assertion patterns:

  1. guardrail-should-trigger: Input contains banned words → expect validator to trigger
  2. guardrail-should-not-trigger: Clean input → expect validator not to trigger
  3. multiple-violations: Multiple banned words → expect validator to trigger
  4. streaming-guardrail-trigger: Tests guardrail in streaming mode → expect validator to trigger and interrupt stream

Each scenario uses the guardrail_triggered assertion:

assertions:
  - type: guardrail_triggered
    validator: banned_words        # Name of the validator to check
    should_trigger: true           # Expected behavior (true = should fail, false = should pass)
    message: "Descriptive message for test output"

Running the Tests

  1. Set up your OpenAI API key:

    export OPENAI_API_KEY="your-api-key-here"
  2. Run the Arena tests:

    promptarena run examples/guardrails-test/arena.yaml

Expected Results

Streaming Mode Support

The streaming-guardrail-trigger scenario demonstrates how guardrails work with streaming responses:

This ensures that guardrails work correctly in both regular and streaming execution modes.

How It Works

  1. Execution Phase:

    • User input is processed through the prompt
    • Validators run (Arena automatically enables suppression mode)
    • Validation results are recorded in execution context, but errors are suppressed
    • LLM generates a response
  2. Assertion Phase:

    • The guardrail_triggered assertion inspects the execution context
    • It finds the last assistant message and its validation results
    • It checks if the specified validator passed or failed
    • It compares the actual result against the should_trigger expectation
  3. Test Outcome:

    • If actual behavior matches expectation → Test PASS
    • If actual behavior differs from expectation → Test FAIL with descriptive error

Use Cases

This pattern is valuable for:

Production vs Test Mode

Production Mode (SDK/Conversation API - default):

// Production code uses DynamicValidatorMiddleware with default behavior
middleware.DynamicValidatorMiddleware(registry)

Test Mode (Arena test framework):

// Arena automatically uses suppression mode
middleware.DynamicValidatorMiddlewareWithSuppression(registry, true)

The same PromptConfig works in both modes - no test-specific configuration needed!