🏟️ PromptArena

Comprehensive testing framework for validating LLM prompts across multiple providers


What is PromptArena?

PromptArena is a powerful testing tool that helps you:


Quick Start

Get up and running in 60 seconds with the interactive project generator:

# Install PromptKit (includes PromptArena)
brew install promptkit

# Or with Go
go install github.com/AltairaLabs/PromptKit/tools/arena@latest

# Create a new test project instantly
promprarena init my-test --quick

# Choose your provider when prompted:
#   β€’ mock     - No API calls, instant testing
#   β€’ openai   - OpenAI GPT models
#   β€’ anthropic - Claude models
#   β€’ google   - Gemini models

# Or use a built-in template for common use cases:
#   β€’ basic-chatbot       - Simple conversational testing
#   β€’ customer-support    - Support agent with tools
#   β€’ code-assistant      - Code generation & review
#   β€’ content-generation  - Creative content testing
#   β€’ multimodal          - Image/audio/video AI
#   β€’ mcp-integration     - MCP server testing

# Run your first test
cd my-test
promptarena run

That’s it! The init command creates:

Need More Control?

Use interactive mode for custom configuration:

promptarena init my-project
# Answer prompts to customize:
#   - Project name and description
#   - Provider selection
#   - System prompt customization
#   - Test scenario setup

Or skip the wizard and create files manually (see below).

Next: Your First Arena Test Tutorial


Documentation by Type

πŸ“š Tutorials (Learn by Doing)

Step-by-step guides that teach you Arena through hands-on exercises:

  1. Your First Test - Get started in 5 minutes
  2. Multi-Provider Testing - Compare providers
  3. Multi-Turn Conversations - Test conversation flows
  4. MCP Tool Integration - Test with tool calling
  5. CI/CD Integration - Automate testing

πŸ”§ How-To Guides (Accomplish Specific Tasks)

Focused guides for specific Arena tasks:

πŸ’‘ Explanation (Understand the Concepts)

Deep dives into Arena’s design and philosophy:

πŸ“– Reference (Look Up Details)

Complete technical specifications:


Key Features

Multi-Provider Testing

Test the same prompt across different LLM providers simultaneously:

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Arena
metadata:
  name: cross-provider-test

spec:
  providers:
    - path: ./providers/openai.yaml
    - path: ./providers/claude.yaml
    - path: ./providers/gemini.yaml
  
  scenarios:
    - path: ./scenarios/quantum-test.yaml
      providers: [openai-gpt4, claude-sonnet, gemini-pro]

Rich Assertions

Validate outputs with powerful assertions:

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
  name: quantum-test

spec:
  turns:
    - role: user
      content: "Explain quantum computing"
      assertions:
        - type: content_includes
          params:
            patterns: ["quantum"]
            message: "Should mention quantum"
        
        - type: content_matches
          params:
            pattern: "(qubit|superposition|entanglement)"
            message: "Should mention key quantum concepts"

Performance Metrics

Automatically track:

CI/CD Integration

Run tests in your pipeline:

# .github/workflows/test-prompts.yml
- name: Test Prompts
  run: promptarena run --ci --fail-on-error

Use Cases

For Prompt Engineers

For QA Teams

For ML Ops


Examples

Real-world Arena testing scenarios:


Common Workflows

Development Workflow

  1. Write prompt β†’ 2. Create test β†’ 3. Run Arena β†’ 4. Refine β†’ 5. Repeat

CI/CD Workflow

  1. Push changes β†’ 2. Arena runs automatically β†’ 3. Tests must pass β†’ 4. Deploy

Provider Evaluation

  1. Define test suite β†’ 2. Run across providers β†’ 3. Compare results β†’ 4. Choose best

Getting Help