PromptArena
Comprehensive testing framework for validating LLM prompts across multiple providers
What is PromptArena?
Section titled “What is PromptArena?”PromptArena is a powerful testing tool that helps you:
- Test prompts systematically across OpenAI, Anthropic, Google, and more
- Compare provider performance side-by-side with detailed metrics
- Validate conversation flows with multi-turn testing scenarios
- Integrate with CI/CD to catch prompt regressions before production
- Generate comprehensive reports with HTML, JSON, and markdown output
Quick Start
Section titled “Quick Start”Get up and running in 60 seconds with the interactive project generator:
# Install PromptKit (includes PromptArena)brew install promptkit
# Or with Gogo install github.com/AltairaLabs/PromptKit/tools/arena@latest
# Create a new test project instantlypromprarena init my-test --quick
# Choose your provider when prompted:# • mock - No API calls, instant testing# • openai - OpenAI GPT models# • anthropic - Claude models# • google - Gemini models
# Or use a built-in template for common use cases:# • basic-chatbot - Simple conversational testing# • customer-support - Support agent with tools# • code-assistant - Code generation & review# • content-generation - Creative content testing# • multimodal - Image/audio/video AI# • mcp-integration - MCP server testing
# Run your first testcd my-testpromptarena runThat’s it! The init command creates:
- ✅ Complete Arena configuration
- ✅ Provider setup (ready to use)
- ✅ Sample test scenario
- ✅ Working prompt configuration
- ✅ README with next steps
Need More Control?
Section titled “Need More Control?”Use interactive mode for custom configuration:
promptarena init my-project# Answer prompts to customize:# - Project name and description# - Provider selection# - System prompt customization# - Test scenario setupOr skip the wizard and create files manually (see below).
Next: Your First Arena Test Tutorial
Documentation by Type
Section titled “Documentation by Type”📚 Tutorials (Learn by Doing)
Section titled “📚 Tutorials (Learn by Doing)”Step-by-step guides that teach you Arena through hands-on exercises:
- Your First Test - Get started in 5 minutes
- Multi-Provider Testing - Compare providers
- Multi-Turn Conversations - Test conversation flows
- MCP Tool Integration - Test with tool calling
- CI/CD Integration - Automate testing
🔧 How-To Guides (Accomplish Specific Tasks)
Section titled “🔧 How-To Guides (Accomplish Specific Tasks)”Focused guides for specific Arena tasks:
- Installation - Get Arena running
- Write Test Scenarios - Effective scenario design
- Configure Providers - Provider setup
- Use Mock Providers - Test without API calls
- Validate Outputs - Assertion strategies
- Customize Reports - Report formatting
- Integrate CI/CD - GitHub Actions, GitLab CI
- Session Recording - Capture and replay sessions
💡 Explanation (Understand the Concepts)
Section titled “💡 Explanation (Understand the Concepts)”Deep dives into Arena’s design and philosophy:
- Testing Philosophy - Why test prompts?
- Scenario Design - Effective test patterns
- Provider Comparison - Evaluate providers
- Validation Strategies - Assertion best practices
- Session Recording - Recording architecture and replay
📖 Reference (Look Up Details)
Section titled “📖 Reference (Look Up Details)”Complete technical specifications:
- CLI Commands - All Arena commands
- Configuration Schema - Config file format
- Scenario Format - Test scenario structure
- Assertions - All assertion types
- Validators - Built-in validators
- Output Formats - Report formats
Key Features
Section titled “Key Features”Multi-Provider Testing
Section titled “Multi-Provider Testing”Test the same prompt across different LLM providers simultaneously:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Arenametadata: name: cross-provider-test
spec: providers: - path: ./providers/openai.yaml - path: ./providers/claude.yaml - path: ./providers/gemini.yaml
scenarios: - path: ./scenarios/quantum-test.yaml providers: [openai-gpt4, claude-sonnet, gemini-pro]Rich Assertions
Section titled “Rich Assertions”Validate outputs with powerful assertions:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: quantum-test
spec: turns: - role: user content: "Explain quantum computing" assertions: - type: content_includes params: patterns: ["quantum"] message: "Should mention quantum"
- type: content_matches params: pattern: "(qubit|superposition|entanglement)" message: "Should mention key quantum concepts"Performance Metrics
Section titled “Performance Metrics”Automatically track:
- Response time (latency)
- Token usage (input/output)
- Cost estimation
- Success/failure rates
CI/CD Integration
Section titled “CI/CD Integration”Run tests in your pipeline:
# .github/workflows/test-prompts.yml- name: Test Prompts run: promptarena run --ci --fail-on-errorUse Cases
Section titled “Use Cases”For Prompt Engineers
Section titled “For Prompt Engineers”- Develop and refine prompts with confidence
- A/B test different prompt variations
- Ensure consistency across providers
- Track performance over time
For QA Teams
Section titled “For QA Teams”- Validate prompt quality before deployment
- Catch regressions in prompt behavior
- Test edge cases and failure modes
- Generate test reports for stakeholders
For ML Ops
Section titled “For ML Ops”- Integrate prompt testing into CI/CD
- Monitor prompt performance
- Compare provider costs and quality
- Automate regression testing
Examples
Section titled “Examples”Real-world Arena testing scenarios:
- Customer Support Testing - Multi-turn support conversations
- MCP Chatbot Testing - Tool calling validation
- Guardrails Testing - Safety and compliance checks
- Multi-Provider Comparison - Provider evaluation
Common Workflows
Section titled “Common Workflows”Development Workflow
Section titled “Development Workflow”- Write prompt → 2. Create test → 3. Run Arena → 4. Refine → 5. Repeat
CI/CD Workflow
Section titled “CI/CD Workflow”- Push changes → 2. Arena runs automatically → 3. Tests must pass → 4. Deploy
Provider Evaluation
Section titled “Provider Evaluation”- Define test suite → 2. Run across providers → 3. Compare results → 4. Choose best
Getting Help
Section titled “Getting Help”- Quick Start: Getting Started Guide
- Questions: GitHub Discussions
- Issues: Report a Bug
- Examples: Arena Examples
Related Tools
Section titled “Related Tools”- PackC: Compile tested prompts for production
- SDK: Use tested prompts in applications
- Complete Workflow: See all tools together