ποΈ PromptArena
Comprehensive testing framework for validating LLM prompts across multiple providers
What is PromptArena?
PromptArena is a powerful testing tool that helps you:
- Test prompts systematically across OpenAI, Anthropic, Google, and more
- Compare provider performance side-by-side with detailed metrics
- Validate conversation flows with multi-turn testing scenarios
- Integrate with CI/CD to catch prompt regressions before production
- Generate comprehensive reports with HTML, JSON, and markdown output
Quick Start
Get up and running in 60 seconds with the interactive project generator:
# Install PromptKit (includes PromptArena)
brew install promptkit
# Or with Go
go install github.com/AltairaLabs/PromptKit/tools/arena@latest
# Create a new test project instantly
promprarena init my-test --quick
# Choose your provider when prompted:
# β’ mock - No API calls, instant testing
# β’ openai - OpenAI GPT models
# β’ anthropic - Claude models
# β’ google - Gemini models
# Or use a built-in template for common use cases:
# β’ basic-chatbot - Simple conversational testing
# β’ customer-support - Support agent with tools
# β’ code-assistant - Code generation & review
# β’ content-generation - Creative content testing
# β’ multimodal - Image/audio/video AI
# β’ mcp-integration - MCP server testing
# Run your first test
cd my-test
promptarena run
Thatβs it! The init command creates:
- β Complete Arena configuration
- β Provider setup (ready to use)
- β Sample test scenario
- β Working prompt configuration
- β README with next steps
Need More Control?
Use interactive mode for custom configuration:
promptarena init my-project
# Answer prompts to customize:
# - Project name and description
# - Provider selection
# - System prompt customization
# - Test scenario setup
Or skip the wizard and create files manually (see below).
Next: Your First Arena Test Tutorial
Documentation by Type
π Tutorials (Learn by Doing)
Step-by-step guides that teach you Arena through hands-on exercises:
- Your First Test - Get started in 5 minutes
- Multi-Provider Testing - Compare providers
- Multi-Turn Conversations - Test conversation flows
- MCP Tool Integration - Test with tool calling
- CI/CD Integration - Automate testing
π§ How-To Guides (Accomplish Specific Tasks)
Focused guides for specific Arena tasks:
- Installation - Get Arena running
- Write Test Scenarios - Effective scenario design
- Configure Providers - Provider setup
- Use Mock Providers - Test without API calls
- Validate Outputs - Assertion strategies
- Customize Reports - Report formatting
- Integrate CI/CD - GitHub Actions, GitLab CI
- Session Recording - Capture and replay sessions
π‘ Explanation (Understand the Concepts)
Deep dives into Arenaβs design and philosophy:
- Testing Philosophy - Why test prompts?
- Scenario Design - Effective test patterns
- Provider Comparison - Evaluate providers
- Validation Strategies - Assertion best practices
- Session Recording - Recording architecture and replay
π Reference (Look Up Details)
Complete technical specifications:
- CLI Commands - All Arena commands
- Configuration Schema - Config file format
- Scenario Format - Test scenario structure
- Assertions - All assertion types
- Validators - Built-in validators
- Output Formats - Report formats
Key Features
Multi-Provider Testing
Test the same prompt across different LLM providers simultaneously:
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Arena
metadata:
name: cross-provider-test
spec:
providers:
- path: ./providers/openai.yaml
- path: ./providers/claude.yaml
- path: ./providers/gemini.yaml
scenarios:
- path: ./scenarios/quantum-test.yaml
providers: [openai-gpt4, claude-sonnet, gemini-pro]
Rich Assertions
Validate outputs with powerful assertions:
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: quantum-test
spec:
turns:
- role: user
content: "Explain quantum computing"
assertions:
- type: content_includes
params:
patterns: ["quantum"]
message: "Should mention quantum"
- type: content_matches
params:
pattern: "(qubit|superposition|entanglement)"
message: "Should mention key quantum concepts"
Performance Metrics
Automatically track:
- Response time (latency)
- Token usage (input/output)
- Cost estimation
- Success/failure rates
CI/CD Integration
Run tests in your pipeline:
# .github/workflows/test-prompts.yml
- name: Test Prompts
run: promptarena run --ci --fail-on-error
Use Cases
For Prompt Engineers
- Develop and refine prompts with confidence
- A/B test different prompt variations
- Ensure consistency across providers
- Track performance over time
For QA Teams
- Validate prompt quality before deployment
- Catch regressions in prompt behavior
- Test edge cases and failure modes
- Generate test reports for stakeholders
For ML Ops
- Integrate prompt testing into CI/CD
- Monitor prompt performance
- Compare provider costs and quality
- Automate regression testing
Examples
Real-world Arena testing scenarios:
- Customer Support Testing - Multi-turn support conversations
- MCP Chatbot Testing - Tool calling validation
- Guardrails Testing - Safety and compliance checks
- Multi-Provider Comparison - Provider evaluation
Common Workflows
Development Workflow
- Write prompt β 2. Create test β 3. Run Arena β 4. Refine β 5. Repeat
CI/CD Workflow
- Push changes β 2. Arena runs automatically β 3. Tests must pass β 4. Deploy
Provider Evaluation
- Define test suite β 2. Run across providers β 3. Compare results β 4. Choose best
Getting Help
- Quick Start: Getting Started Guide
- Questions: GitHub Discussions
- Issues: Report a Bug
- Examples: Arena Examples
Related Tools
- PackC: Compile tested prompts for production
- SDK: Use tested prompts in applications
- Complete Workflow: See all tools together