Assertions Test Example

This example demonstrates turn-level assertions across four functional areas:

Scenarios

  1. scripted-turns.yaml - Scripted conversation turns with content_includes and content_matches assertions
  2. self-play.yaml - Self-play mode with assertions on every turn
  3. tool-usage.yaml - Tool usage validation with tools_called and tools_not_called assertions
  4. json-validation.yaml - JSON validation with is_valid_json, json_schema, and json_path assertions

Setup

Copy .env.example to .env and add your API keys:

cp .env.example .env
# Edit .env and add your OPENAI_API_KEY and GEMINI_API_KEY

Running the Tests

# Run all scenarios with both providers
./bin/promptarena run -c examples/assertions-test

# Run specific scenario with real API
./bin/promptarena run -c examples/assertions-test --scenario scripted-turns --provider gemini-flash
./bin/promptarena run -c examples/assertions-test --scenario tool-usage --provider openai-mini
./bin/promptarena run -c examples/assertions-test --scenario self-play --provider gemini-flash
./bin/promptarena run -c examples/assertions-test --scenario json-validation --provider openai-mini

# Run with mock provider (no API keys needed)
./bin/promptarena run -c examples/assertions-test --mock-provider --mock-config examples/assertions-test/mock-responses.yaml

# Run specific scenario with mock data
./bin/promptarena run -c examples/assertions-test --scenario json-validation --mock-provider --mock-config examples/assertions-test/mock-responses.yaml

Assertion Types Demonstrated

Content Assertions

Tool Assertions

JSON Validation Assertions

Expected Behavior

Self-Play with Assertions

The self-play scenario demonstrates assertions working with LLM-generated user messages:

turns:
  # Turn 1: Initial scripted prompt with assertions on assistant response
  - role: user
    content: "Let's discuss renewable energy. Start by mentioning solar power."
    assertions:
      - type: content_includes
        params:
          patterns: ["renewable", "solar"]
  
  # Turn 2-3: Self-play continues (gemini-user generates messages)
  - role: gemini-user
    turns: 2
    assertions:
      - type: content_includes
        params:
          patterns: ["energy"]

Each assistant response (whether responding to a scripted or self-play user message) is validated against the assertions defined for that turn.

JSON Validation with Mock and Real APIs

The json-validation scenario demonstrates comprehensive JSON validation capabilities that work with both mock responses and real API calls:

turns:
  - role: user
    content: "Generate a user profile in JSON format..."
    assertions:
      # Validate JSON is parseable (extracts from markdown code blocks)
      - type: is_valid_json
        params:
          allow_wrapped: true
          extract_json: true
      
      # Validate against JSON Schema
      - type: json_schema
        params:
          schema:
            type: object
            required: ["name", "age", "email"]
            properties:
              name: {type: string}
              age: {type: integer, minimum: 18}
              email: {type: string, pattern: "^[a-zA-Z0-9._%+-]+@..."}
          allow_wrapped: true
          extract_json: true
      
      # Validate specific field values with JMESPath
      - type: json_path
        params:
          jmespath_expression: "name"
          expected: "Alice Smith"
          allow_wrapped: true
          extract_json: true

Key Features:

Test with mock data (fast, no API keys):

./bin/promptarena run -c examples/assertions-test --scenario json-validation --mock-provider --mock-config examples/assertions-test/mock-responses.yaml

Test with real API (validates actual LLM output):

./bin/promptarena run -c examples/assertions-test --scenario json-validation --provider openai-mini