Skip to content

PromptArena

Comprehensive testing framework for validating LLM prompts across multiple providers


PromptArena is a powerful testing tool that helps you:

  • Test prompts systematically across OpenAI, Anthropic, Google, and more
  • Compare provider performance side-by-side with detailed metrics
  • Validate conversation flows with multi-turn testing scenarios
  • Integrate with CI/CD to catch prompt regressions before production
  • Generate comprehensive reports with HTML, JSON, and markdown output

Get up and running in 60 seconds with the interactive project generator:

Terminal window
# Install PromptKit (includes PromptArena)
brew install promptkit
# Or with Go
go install github.com/AltairaLabs/PromptKit/tools/arena@latest
# Create a new test project instantly
promprarena init my-test --quick
# Choose your provider when prompted:
# • mock - No API calls, instant testing
# • openai - OpenAI GPT models
# • anthropic - Claude models
# • google - Gemini models
# Or use a built-in template for common use cases:
# • basic-chatbot - Simple conversational testing
# • customer-support - Support agent with tools
# • code-assistant - Code generation & review
# • content-generation - Creative content testing
# • multimodal - Image/audio/video AI
# • mcp-integration - MCP server testing
# Run your first test
cd my-test
promptarena run

That’s it! The init command creates:

  • ✅ Complete Arena configuration
  • ✅ Provider setup (ready to use)
  • ✅ Sample test scenario
  • ✅ Working prompt configuration
  • ✅ README with next steps

Use interactive mode for custom configuration:

Terminal window
promptarena init my-project
# Answer prompts to customize:
# - Project name and description
# - Provider selection
# - System prompt customization
# - Test scenario setup

Or skip the wizard and create files manually (see below).

Next: Your First Arena Test Tutorial


Step-by-step guides that teach you Arena through hands-on exercises:

  1. Your First Test - Get started in 5 minutes
  2. Multi-Provider Testing - Compare providers
  3. Multi-Turn Conversations - Test conversation flows
  4. MCP Tool Integration - Test with tool calling
  5. CI/CD Integration - Automate testing

🔧 How-To Guides (Accomplish Specific Tasks)

Section titled “🔧 How-To Guides (Accomplish Specific Tasks)”

Focused guides for specific Arena tasks:

💡 Explanation (Understand the Concepts)

Section titled “💡 Explanation (Understand the Concepts)”

Deep dives into Arena’s design and philosophy:

Complete technical specifications:


Test the same prompt across different LLM providers simultaneously:

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Arena
metadata:
name: cross-provider-test
spec:
providers:
- path: ./providers/openai.yaml
- path: ./providers/claude.yaml
- path: ./providers/gemini.yaml
scenarios:
- path: ./scenarios/quantum-test.yaml
providers: [openai-gpt4, claude-sonnet, gemini-pro]

Validate outputs with powerful assertions:

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: quantum-test
spec:
turns:
- role: user
content: "Explain quantum computing"
assertions:
- type: content_includes
params:
patterns: ["quantum"]
message: "Should mention quantum"
- type: content_matches
params:
pattern: "(qubit|superposition|entanglement)"
message: "Should mention key quantum concepts"

Automatically track:

  • Response time (latency)
  • Token usage (input/output)
  • Cost estimation
  • Success/failure rates

Run tests in your pipeline:

# .github/workflows/test-prompts.yml
- name: Test Prompts
run: promptarena run --ci --fail-on-error

  • Develop and refine prompts with confidence
  • A/B test different prompt variations
  • Ensure consistency across providers
  • Track performance over time
  • Validate prompt quality before deployment
  • Catch regressions in prompt behavior
  • Test edge cases and failure modes
  • Generate test reports for stakeholders
  • Integrate prompt testing into CI/CD
  • Monitor prompt performance
  • Compare provider costs and quality
  • Automate regression testing

Real-world Arena testing scenarios:


  1. Write prompt → 2. Create test → 3. Run Arena → 4. Refine → 5. Repeat
  1. Push changes → 2. Arena runs automatically → 3. Tests must pass → 4. Deploy
  1. Define test suite → 2. Run across providers → 3. Compare results → 4. Choose best